Helo. I've got a couple of question regarding alerts in NAV (3.0.0).
1. Is the alertq table suppose to just increase and increase? manage=> select count(*) from alertq; count ------- 22726 (1 row)
I could be wrong (I probably am), but if its not suppose to increase, my guess is this is why : => /usr/local/nav/lib/perl/NAV/AlertEngine/Alert.pm, line 233 sub delete() #deletes alert from db { my $this=shift; if(!$this->{queued}) { $this->{log}->printlog("Alert","delete",$Log::debugging, "deleted alertqid=$this->{id}"); #$this->{dbh}->do("delete from alertq where alertqid=$this->{id}"); } }
----------------------------------------------
2. Some alerts are missing sysname in the alerts. It shows netbox.deviceid instead of netbox.sysname. This is very confusing for IT-guys at our facultys thats using NAV for receiving alerts regarding equipment at their location.
I.e : manage=> select * from alerthistmsg where alerthistid = '16292'; alerthistid | state | msgtype | language | msg -------------+-------+---------+----------+---------------------------------------------- 16292 | x | sms | no | HW Ver endret: 1009 V01 A0 16292 | x | sms | en | HW Ver changed: 1009 V01 A0 16292 | x | email | en | Subject: Hardware version changed for (1009) This is an automatically generated message from NAV:
Hardware version changed to: A0 from: V01 16292 | x | email | no | Subject: Hardware versjon endret for (1009) Dette er en automatisk generert melding fra NAV:
Hardware versjon endret til: A0 fra: V01 (4 rows)
-----------------------------------------------------------------
3. DNS mismatch alerts are "wrong". I.e : manage=> select * from alerthistmsg where alerthistid = 204; 204 | x | email | en | Subject: DNS Mismatch (xxyyzz.uio.no) This is an automatically generated message from NAV:
xxyyzz.uio.no does not match xxyyzz.uio.no 204 | x | email | no | Subject: DNS ikke i samsvar (xxyyzz.uio.no) Dette er en automatisk generert melding fra NAV:
xxyyzz.uio.no matcher ikke xxyyzz.uio.no (2 rows)
It should say "'hostname from switch' does not match DNSname" i guess..
------------------------------------------------------------------
4. Why are alerts like the example in '3.' coming? We get them from ALOT of devices every 6. hour.. These are more or less only Cisco Catalyst devices, if thats of importance.
By the way. I've made a perl-deamon that controlls that all of NAVs internal and external onces are running, and reports by mail and/or sms if something is wrong. Version two will try to restart dead processes as well. If anybody wants it, just drop me an email.
-Asbj?rn-
From kreide at gmail.com Thu Apr 20 02:49:13 2006
From: kreide at gmail.com (Kristian Eide) Date: Thu Apr 20 10:49:20 2006 Subject: [Nav-users] Bug-report In-Reply-To: Pine.SOL.4.63-L.0604200956050.24757@saruman.uio.no References: Pine.SOL.4.63-L.0604200956050.24757@saruman.uio.no Message-ID: b4c110fd0604200149s17d27715n735e061540864ff6@mail.gmail.com
- Is the alertq table suppose to just increase and increase?
manage=> select count(*) from alertq; #$this->{dbh}->do("delete from alertq where alertqid=$this->{id}");
I think this was commented out mainly for debugging. It is probably safe to enable the above delete command.
- Some alerts are missing sysname in the alerts. It shows netbox.deviceid
You can change the text in alertmsg.conf.
- Why are alerts like the example in '3.' coming? We get them from ALOT
of devices every 6. hour.. These are more or less only Cisco Catalyst devices, if thats of importance.
I believe this is due to a string compare being case sensitive when it should be not, and that has been fixed in the svn tree (and will thus be part of the next version of nav if you do not want to compile yourself).
By the way. I've made a perl-deamon that controlls that all of NAVs internal and external onces are running, and reports by mail and/or sms if something is wrong. Version two will try to restart dead processes as well. If anybody wants it, just drop me an email.
This seems like a very welcomed addition to NAV. Morten, can we get it in for 3.1?
-- Kristian
Asbj?rn Pr?is wrote:
- Is the alertq table suppose to just increase and increase?
No.
$this->{log}->printlog("Alert","delete",$Log::debugging,
"deleted alertqid=$this->{id}"); #$this->{dbh}->do("delete from alertq where alertqid=$this->{id}");
Yes, this is a good guess. I think Arne commented out this line for debugging, and noone remembered to remove the hash sign before the releae of 3.0.0. This was fixed on March 24th, and will be part of the 3.0.1 release.
- Some alerts are missing sysname in the alerts. It shows
netbox.deviceid instead of netbox.sysname. This is very confusing for IT-guys at our facultys thats using NAV for receiving alerts regarding equipment at their location.
As Kristian suggested - change alertmsg.conf. Please share your changes with us, I don't think anyone is interested in device numbers in their alerts ;-)
- DNS mismatch alerts are "wrong". I.e :
xxyyzz.uio.no does not match xxyyzz.uio.no
It should say "'hostname from switch' does not match DNSname" i guess..
This is known, and has been fixed. It was a two-part problem: 1. DNS and sysname comparisons were case sensitive, leading to erroneous dnsMismatch events in some cases where people use mixed case to name their devices. 2. The error message references the dns name of the device twice, making the alert very confusing ("X does not match X").
- Why are alerts like the example in '3.' coming? We get them from
ALOT of devices every 6. hour.. These are more or less only Cisco Catalyst devices, if thats of importance.
One out of two reasons, I guess. One is what Kristian suggested - that you may have registered the device sysname and dnsname using different character cases.
The other might be that the reported sysnames actually do not match the dns names of the devices. This happened at lot at NTNU when all devices were moved from the ntnu.no to the nettel.ntnu.no domain - the device configurations still kept the old sysname.ntnu.no names. Whether they fixed this or just suppress the dnsMismatch messages I don't know..
By the way. I've made a perl-deamon that controlls that all of NAVs internal and external onces are running, and reports by mail and/or sms if something is wrong. Version two will try to restart dead processes as well. If anybody wants it, just drop me an email.
It sounds interesting, but I have two suggestions/questions:
Does it recognise that a NAV service may have been stopped or disabled on purpose by the administrator? Restarting such a service could have adverse effects...
If the daemon were implemented in Python, it could benefit from being able to directly use NAV's startstop API in the nav.startstop python module. The startstop API is also up for some improvements for either the 3.1 or 3.2 release. One idea is to have a NAV service status page in the web interface, but this would require the Apache server to be able to act as the navcron user.