Hi,
this fix seems to have been working with version 5.3 - since we upgraded to 5.4 we stopped receiving emails and not sure if destination email address is blacklisted or anything else going wrong. Even with restarting the alertengine process or the entire NAV VM we do not get any message in the /var/log/nav/alertengine anymore.
The last entries in the rotated alertengine says something like: 2022-06-23 09:01:50,572 [WARNING] [nav.alertengine.alertaddress.send] Not sending alert 15496 xxx.yyy@domain.com as handler Email is blacklisted: [Errno 110] Connection timed out 2022-06-23 09:01:50,575 [INFO] [nav.alertengine.check_alerts] 0 new alert(s), sent 0 alert(s), 268 queued alert(s), 0 alert(s) deleted, 268 failed send(s), 0 ignored 2022-06-23 09:01:50,576 [WARNING] [nav.alertengine.check_alerts] Send 268 alerts failed. 2022-06-23 09:02:07,613 [WARNING] [nav.alertengine] SIGTERM received: Shutting down
The postgres alertqmsg contains 19101 msg - not sure if this is supposed to be empty once email are sent or not: nav=# select count(*) from alertqmsg; count ------- 19101 (1 row)
While the alertq table contains 9550 row nav=# select count(*) from alertq; count ------- 9559 (1 row)
What would be the best way here ? Is there any way to clean up old messages which has not been sent and have the alertengine back working ?
Thanks to anyone who could help me 😊
Andrea
-----Original Message----- From: nav-users-request@uninett.no nav-users-request@uninett.no On Behalf Of Morten Brekkevold Sent: Thursday, March 10, 2022 10:20 AM To: Andrea Verni Andrea.Verni@u-blox.com Cc: nav-users@uninett.no Subject: Re: nav 5.3 - alertengine not sending alert / blacklisted and connection timeout (error 110)
*** This is an EXTERNAL email. It was sent from outside of u-blox. ***
On Tue, 8 Mar 2022 07:43:27 +0000 Andrea Verni Andrea.Verni@u-blox.com wrote:
I'm running NAV 5.3 and have issue with alertengine and email notifications. Email seems to remain stuck in the queue until I force a reboot of the VM.
Rebooting sounds pretty drastic and unnecessary for anything that doesn't involve a kernel upgrade; how about just restarting the NAV process that has issues? `nav restart alertengine`
The error message I see in /var/log/nav/alertengine.log is: 2022-03-08 08:35:10,707 [WARNING] [nav.alertengine.alertaddress.send] Not sending alert 5824 ****** as handler Email is blacklisted: [Errno 110] Connection timed out 2022-03-08 08:35:10,716 [WARNING] [nav.alertengine.alertaddress.send] Not sending alert 5825 to ****** as handler Email is blacklisted: [Errno 110] Connection timed out 2022-03-08 08:35:10,720 [INFO] [nav.alertengine.check_alerts] 0 new alert(s), sent 0 alert(s), 910 queued alert(s), 0 alert(s) deleted, 910 failed send(s), 0 ignored
However, by running a tcpdump I do not see any outbound/inbound connection to the configured email server.
That's because the email handler has been blacklisted by alertengine.
A handler that keeps producing errors will be "blacklisted" internally by the alertengine process. This means that alertengine will stop using it for the rest of the process runtime.
In this case, the email handler has likely been timing out while trying to talk to your SMTP server, eventually causing it to be blacklisted by alertengine - which stops any further attempt at sending e-mail for the lifetime of that process.
A simple restart of the process will kick things back into gear.
-- Sincerely, Morten Brekkevold
Sikt – Norwegian Agency for Shared Services in Education and Research