Hi,
I'm running NAV 5.3 and have issue with alertengine and email notifications. Email seems to remain stuck in the queue until I force a reboot of the VM.
The error message I see in /var/log/nav/alertengine.log is: 2022-03-08 08:35:10,707 [WARNING] [nav.alertengine.alertaddress.send] Not sending alert 5824 ****** as handler Email is blacklisted: [Errno 110] Connection timed out 2022-03-08 08:35:10,716 [WARNING] [nav.alertengine.alertaddress.send] Not sending alert 5825 to ****** as handler Email is blacklisted: [Errno 110] Connection timed out 2022-03-08 08:35:10,720 [INFO] [nav.alertengine.check_alerts] 0 new alert(s), sent 0 alert(s), 910 queued alert(s), 0 alert(s) deleted, 910 failed send(s), 0 ignored
However, by running a tcpdump I do not see any outbound/inbound connection to the configured email server.
After the reboot it seems messages are going through (and I do see outbound connections to the configured mail server)
2022-03-08 08:39:38,277 [INFO] [nav.alertengine.alertaddress.send] alert 5367 sent by Email to ****** due to immediately subscription 2 2022-03-08 08:39:38,500 [INFO] [nav.alertengine.alertaddress.send] alert 5320 sent by Email to ****** due to immediately subscription 2 2022-03-08 08:41:34,963 [INFO] [nav.alertengine.check_alerts] 0 new alert(s), sent 910 alert(s), 0 queued alert(s), 0 alert(s) deleted, 0 failed send(s), 0 ignored
I'm running Debian 10 and not have iptables/fail2ban on it.
Any help?
Thanks!
Andrea
On Tue, 8 Mar 2022 07:43:27 +0000 Andrea Verni Andrea.Verni@u-blox.com wrote:
I'm running NAV 5.3 and have issue with alertengine and email notifications. Email seems to remain stuck in the queue until I force a reboot of the VM.
Rebooting sounds pretty drastic and unnecessary for anything that doesn't involve a kernel upgrade; how about just restarting the NAV process that has issues? `nav restart alertengine`
The error message I see in /var/log/nav/alertengine.log is: 2022-03-08 08:35:10,707 [WARNING] [nav.alertengine.alertaddress.send] Not sending alert 5824 ****** as handler Email is blacklisted: [Errno 110] Connection timed out 2022-03-08 08:35:10,716 [WARNING] [nav.alertengine.alertaddress.send] Not sending alert 5825 to ****** as handler Email is blacklisted: [Errno 110] Connection timed out 2022-03-08 08:35:10,720 [INFO] [nav.alertengine.check_alerts] 0 new alert(s), sent 0 alert(s), 910 queued alert(s), 0 alert(s) deleted, 910 failed send(s), 0 ignored
However, by running a tcpdump I do not see any outbound/inbound connection to the configured email server.
That's because the email handler has been blacklisted by alertengine.
A handler that keeps producing errors will be "blacklisted" internally by the alertengine process. This means that alertengine will stop using it for the rest of the process runtime.
In this case, the email handler has likely been timing out while trying to talk to your SMTP server, eventually causing it to be blacklisted by alertengine - which stops any further attempt at sending e-mail for the lifetime of that process.
A simple restart of the process will kick things back into gear.
Hi,
this fix seems to have been working with version 5.3 - since we upgraded to 5.4 we stopped receiving emails and not sure if destination email address is blacklisted or anything else going wrong. Even with restarting the alertengine process or the entire NAV VM we do not get any message in the /var/log/nav/alertengine anymore.
The last entries in the rotated alertengine says something like: 2022-06-23 09:01:50,572 [WARNING] [nav.alertengine.alertaddress.send] Not sending alert 15496 xxx.yyy@domain.com as handler Email is blacklisted: [Errno 110] Connection timed out 2022-06-23 09:01:50,575 [INFO] [nav.alertengine.check_alerts] 0 new alert(s), sent 0 alert(s), 268 queued alert(s), 0 alert(s) deleted, 268 failed send(s), 0 ignored 2022-06-23 09:01:50,576 [WARNING] [nav.alertengine.check_alerts] Send 268 alerts failed. 2022-06-23 09:02:07,613 [WARNING] [nav.alertengine] SIGTERM received: Shutting down
The postgres alertqmsg contains 19101 msg - not sure if this is supposed to be empty once email are sent or not: nav=# select count(*) from alertqmsg; count ------- 19101 (1 row)
While the alertq table contains 9550 row nav=# select count(*) from alertq; count ------- 9559 (1 row)
What would be the best way here ? Is there any way to clean up old messages which has not been sent and have the alertengine back working ?
Thanks to anyone who could help me 😊
Andrea
-----Original Message----- From: nav-users-request@uninett.no nav-users-request@uninett.no On Behalf Of Morten Brekkevold Sent: Thursday, March 10, 2022 10:20 AM To: Andrea Verni Andrea.Verni@u-blox.com Cc: nav-users@uninett.no Subject: Re: nav 5.3 - alertengine not sending alert / blacklisted and connection timeout (error 110)
*** This is an EXTERNAL email. It was sent from outside of u-blox. ***
On Tue, 8 Mar 2022 07:43:27 +0000 Andrea Verni Andrea.Verni@u-blox.com wrote:
I'm running NAV 5.3 and have issue with alertengine and email notifications. Email seems to remain stuck in the queue until I force a reboot of the VM.
Rebooting sounds pretty drastic and unnecessary for anything that doesn't involve a kernel upgrade; how about just restarting the NAV process that has issues? `nav restart alertengine`
The error message I see in /var/log/nav/alertengine.log is: 2022-03-08 08:35:10,707 [WARNING] [nav.alertengine.alertaddress.send] Not sending alert 5824 ****** as handler Email is blacklisted: [Errno 110] Connection timed out 2022-03-08 08:35:10,716 [WARNING] [nav.alertengine.alertaddress.send] Not sending alert 5825 to ****** as handler Email is blacklisted: [Errno 110] Connection timed out 2022-03-08 08:35:10,720 [INFO] [nav.alertengine.check_alerts] 0 new alert(s), sent 0 alert(s), 910 queued alert(s), 0 alert(s) deleted, 910 failed send(s), 0 ignored
However, by running a tcpdump I do not see any outbound/inbound connection to the configured email server.
That's because the email handler has been blacklisted by alertengine.
A handler that keeps producing errors will be "blacklisted" internally by the alertengine process. This means that alertengine will stop using it for the rest of the process runtime.
In this case, the email handler has likely been timing out while trying to talk to your SMTP server, eventually causing it to be blacklisted by alertengine - which stops any further attempt at sending e-mail for the lifetime of that process.
A simple restart of the process will kick things back into gear.
-- Sincerely, Morten Brekkevold
Sikt – Norwegian Agency for Shared Services in Education and Research
On Tue, 2 Aug 2022 07:25:04 +0000 Andrea Verni Andrea.Verni@u-blox.com wrote:
this fix seems to have been working with version 5.3 - since we upgraded to 5.4 we stopped receiving emails and not sure if destination email address is blacklisted or anything else going wrong. Even with restarting the alertengine process or the entire NAV VM we do not get any message in the /var/log/nav/alertengine anymore.
That sounds pretty strange. Are you able to verify that the alertengine process is actually running on your system?
What if you stop the background process and run `alertengine.py --test`, is there any output to the terminal?
The last entries in the rotated alertengine says something like:
2022-06-23 09:01:50,572 [WARNING] [nav.alertengine.alertaddress.send] Not sending alert 15496 xxx.yyy@domain.com as handler Email is blacklisted: [Errno 110] Connection timed out
Well, before it stopped logging, it seems your problem was still that you had connection problems with your SMTP server, and if you don't detect this issue, you may end up with a large amount of unsent alerts from NAV.
Have you configured NAV with an external mail relay server? If you are having repeated issues with talking to this SMTP server, I would suggest that you consider installing an MTA on your NAV server, and configuring NAV to use localhost as its SMTP server. The local MTA should at least queue outgoing mail and retry when the external server isn't responding.
The postgres alertqmsg contains 19101 msg - not sure if this is supposed to be empty once email are sent or not:
It should be empty when there are no more notifications to send out (whether the notifcations are via e-mail, sms, jabber, slack etc. is irrelevant).
nav=# select count(*) from alertqmsg; count
19101 (1 row)
While the alertq table contains 9550 row
That is a pretty hefty amount of unprocessed alerts. However, this doesn't necessarily mean that there are 9550 e-mails waiting in a queue. Each of these 9550 alerts must still match some user's alert profile to result in an actual e-mail or other notification. (of course, if you are subscribed to *everything*, you're in for a ride).
What would be the best way here ? Is there any way to clean up old messages which has not been sent and have the alertengine back working?
There is a way, but it's not necessarily user friendly.
You could simply issue the SQL statement `DELETE FROM alertq;`. This would normally completely empty the queue, no questions asked.
(I thought I remembered there was an easy command line switch for alertengine to truncate the queue, but it seems I mixed it up with the SMS daemon, which has such a thing specifically for the SMS message queue)
Thanks to anyone who could help me 😊
Sorry I couldn't help you sooner, I've been away on vacation for most of August.