I can't get NAV to send alerts when IP devices is down, neither e-mail nor sms alert. I have added G01: All alerts for the profile, I am allowed to receive SMS in my account. Gammu is also installed. What am I doing wrong? I also tried adding my own filter group with different filter alerts, but nothing worked.
Regards, Lene Maria Myhre Wireless Trondheim
Hello Lene,
As you seem to have the correct AlertProfiles profile set up, I'm going to skip that for now.
Okay, I'm gonna start in the "obvious" direction to make sure things are working right. Are you sure that your Gammu is actually capable to send SMS messages? First of all, "gammu searchphone" should show you something similar to:
Connection "at19200" on device "/dev/ttyS0" Manufacturer : Siemens Model : unknown (MC35i)
You also might have to authenticate your SIMcard by using the proper PIN key commands. To ensure you're actually able to send messages, you can try "echo This is a test message |gammu -sendsms TEXT XXXXXXXX" from your UNIX command shell, where X is your private cell. If this seems to be the issue, I would assume that you're having a problem with exim (the maildaemon backend) as well. Check the respective exim4 logs and see if your MTA is able to chip your emails away. I don't think exim4 is set up to send "remote" emails by default.
So if this is not the issue either, I'd suggest you editing your nav.conf to enable debug logging on all daemons (see the examples), and start tailing eventengine.log and alertdaemon.log for possible errors after you've restarted NAV. Also make sure your <daemon name>.err.log files does not contain too much clutter, as this is generally a bad sign (any clutter at all is a bad sign actually..) I'd suggest you mail/reply to the list with the results and I'm sure someone can help you further :)
Hope this helps :)
------------------------------------------------------ Vidar Stokkenes Networking Consulant Dep. for networking og telecommunications HN IKT - Tromsø
Tlf: 76 16 61 87 / 77 66 99 55 Cell: 95 87 99 42 e-mail: vidar.stokkenes@hn-ikt.no
Before printing, think about the environment
-----Opprinnelig melding----- Fra: lene.myhre@item.ntnu.no [mailto:lene.myhre@item.ntnu.no] Sendt: 3. november 2008 15:54 Til: nav-users@uninett.no Emne: Alert profiles
I can't get NAV to send alerts when IP devices is down, neither e-mail nor sms alert. I have added G01: All alerts for the profile, I am allowed to receive SMS in my account. Gammu is also installed. What am I doing wrong? I also tried adding my own filter group with different filter alerts, but nothing worked.
Regards, Lene Maria Myhre Wireless Trondheim
On Mon, 3 Nov 2008 15:54:04 +0100 (CET) lene.myhre@item.ntnu.no wrote:
I can't get NAV to send alerts when IP devices is down, neither e-mail nor sms alert. I have added G01: All alerts for the profile, I am allowed to receive SMS in my account. Gammu is also installed. What am I doing wrong? I also tried adding my own filter group with different filter alerts, but nothing worked.
Is your G01 subscription in an active timeperiod of your active profile? Are all your NAV backend processes running (nav status)?
When an IP device goes down, three things happen in NAV:
1. pping receives no ping response from the device, and posts a boxState start event on NAV's event queue.
2. eventEngine picks up the boxState event from the queue, and tries to figure out what to do with it. Typically, it will wait for one minute to see if pping can re-establish contact with the device (this is to prevent alert spamming when the network or device is flapping). After that minute has passed, it will post a boxDownWarning alert to NAV's alert queue. After three more minutes have passed without any contact with the device, eventEngine will post a boxDown alert to NAV's alert queue, and the device's down status is registered permanently in NAV's history.
3. AlertEngine receives the boxDownWarning on the alert queue, interprets the alert profiles of the individual users and decides who will receive the alert, and in what medium (email, sms). Later, it receives the boxDown alert and does the same.
If you are having problems getting alerts through, you should monitor the log files of these three processes. First pping.log, to confirm that it cannot ping the device. Then eventEngine.log, to confirm that eventEngine receives the event and dispatches an alert. Then finally alertengine.log, to confirm that the AlertEngine receives the alert and correctly finds that you should receive a copy of it via email or SMS.
I used the time periods already defined in NAV, I didn't make my own. I have added G01 in all of these. These time periods are in my active profile.
Where do I see if all my NAV backend processes are running?
Looking at pping.log it says that 3 hosts are currently down. This is not the case, only 1 IP device is currently down, and the GUI NAV shows this right. I'm not sure which other two NAV thinks is down.
Regarding eventEngine, that is a folder with three log files:
One is called eventEngine-stderr.log.2008-11-03-1309.log and the output is: Device not found, trying DB update: 52 Device not found, trying DB update: 54 Device not found, trying DB update: 170
Another one is called eventEngine.stderr.log and contain: Device not found, trying DB update: 171
The last log file is called eventEngine.stdout.log and is empty.
alertengine.log contains: PID: /var/lib/nav/run/alertengine.pid Fri Oct 31 14:03:42 2008 alertEngine Log-3-printlog: Level not defined: Engine shutdownConstruct Got signal :TERM:! nice shutdown. PID: /var/lib/nav/run/alertengine.pid PID: /var/lib/nav/run/alertengine.pid
So if you can see any error here, please get back to me and tell me how to correct them.
Regards, Lene Maria
Morten Brekkevold skrev:
On Mon, 3 Nov 2008 15:54:04 +0100 (CET) lene.myhre@item.ntnu.no wrote:
I can't get NAV to send alerts when IP devices is down, neither e-mail nor sms alert. I have added G01: All alerts for the profile, I am allowed to receive SMS in my account. Gammu is also installed. What am I doing wrong? I also tried adding my own filter group with different filter alerts, but nothing worked.
Is your G01 subscription in an active timeperiod of your active profile? Are all your NAV backend processes running (nav status)?
When an IP device goes down, three things happen in NAV:
pping receives no ping response from the device, and posts a boxState start event on NAV's event queue.
eventEngine picks up the boxState event from the queue, and tries to figure out what to do with it. Typically, it will wait for one minute to see if pping can re-establish contact with the device (this is to prevent alert spamming when the network or device is flapping). After that minute has passed, it will post a boxDownWarning alert to NAV's alert queue. After three more minutes have passed without any contact with the device, eventEngine will post a boxDown alert to NAV's alert queue, and the device's down status is registered permanently in NAV's history.
AlertEngine receives the boxDownWarning on the alert queue, interprets the alert profiles of the individual users and decides who will receive the alert, and in what medium (email, sms). Later, it receives the boxDown alert and does the same.
If you are having problems getting alerts through, you should monitor the log files of these three processes. First pping.log, to confirm that it cannot ping the device. Then eventEngine.log, to confirm that eventEngine receives the event and dispatches an alert. Then finally alertengine.log, to confirm that the AlertEngine receives the alert and correctly finds that you should receive a copy of it via email or SMS.
On Wed, 05 Nov 2008 16:14:19 +0100 Lene Maria Myhre lene.myhre@item.ntnu.no wrote:
Where do I see if all my NAV backend processes are running?
The shell command "nav status" will tell you that.
Looking at pping.log it says that 3 hosts are currently down. This is not the case, only 1 IP device is currently down, and the GUI NAV shows this right. I'm not sure which other two NAV thinks is down.
Try restarting pping. This will cause it to initialize its list of hosts up/down from the database.
Regarding eventEngine, that is a folder with three log files:
One is called eventEngine-stderr.log.2008-11-03-1309.log and the output is:
Those are the wrong files to look at. The file log/eventEngine/eventEngine-stderr.log is only (well mostly anyway) written to when there are unhandled exceptions in the eventEngine. This is mostly useful in the event of a total daemon crash. The files with dates in their names are from old runs of eventEngine.
Your eventEngine loglevel is probably too high for anything to have been logged in /var/log/nav/eventEngine.log. I would suggest that you edit /etc/nav/nav.conf and set "DEBUG_LEVEL = 6". Then restart eventEngine and tail the contents of /var/log/nav/eventEngine.log .