Peder Magne Sefland wrote:
To test that service-monitor I stoped the sshd and nav gave me the alert. But it still shows down after I have restarted the sshd.
[...]
[2006-03-29 10:30:50] abstractChecker.py:updateRrd:182 [Error] rrd update failed for XXX-XX.hivolda.no:ssh [illegal attempt to update using time 1143621050 when last update time is 1143621059 (minimum one second step)]
/usr/local/nav/bin/nav status shows that everything is up and running
Sometimes it is also comforting to check specifically which NAV-related processes are actually running (the start/stop system still needs improvements in areas). Try doing a "ps w -u navcron", I'd like to see what is says.
Maybe there has been an ntp-update and nav does not how to handle the time-sync problem.
I think NTP would not normally adjust the system time in as large steps as 10 seconds, but I guess that depends on how you configure your NTP daemon. Anyway, the fact that the RRD file was not updated should not interfere with the actual dispatch of the serviceUp event.
Are there more of these RRD errors in your log file? Does the log indicate when the SSH service was detected to be down, and whether it was detected up again?
But how can I correct this problem or is the problem somewhere else?
Does restarting the servicemon process help?