Some of you have noticed problems with getDeviceData running out of Java heap memory, and then getting out of sync with the PostgreSQL database driver - it ends up doing strange updates to the database. The issue is described in this bug report: http://sourceforge.net/tracker/index.php?func=detail&aid=1675508&group_id=10... For the technically inclined, here's a complete summary: I've traced the problem back to the Java SNMP package provided by drexel.edu. This package is used by the getDeviceData and getBoksMacs NAV processes to perform SNMP queries. The problem is an infinite loop, in which the library sends the same GETNEXT request over and over, and growing a list of SNMP responses until the heap is full. In a getDeviceData context, this problem can occur during OID testing of a device (This is when gDD attempts to figure which SNMP OIDs a new device supports). When the OID tester attempts to query an OID using GETNEXT requests, it uses the Java SNMP package's method for snmpwalking a MIB tree (retrieveMIBTable). If the OID is outside the MIB view of an SNMPv2c device, the device will respond with an SNMPv2 endOfMibView exception. Ufortunately, the retrieveMIBTable method of drexel's Java SNMP package treats this exception as just another value and adds it to the list of responses. Since the exception's object identifier is the same as the one used in the GETNEXT request, the method will continue to issue GETNEXT requests for the same OID until it no longer receives any response, or until it runs out of memory, whichever comes first. I've written two different patches to fix the problem in the Java SNMP Package, and I have submitted these patches to the upstream authors. The simplest patch is available at the aforementioned bug report page, for those who want to apply it themselves. I'm also planning to write a workaround for NAV. It is quite unnecessary for the NAV OID tester to walk entire SNMP tables just to test OID compatibility. Issuing a single GETNEXT request should be enough to determine whether the OID is supported or not. By doing this, the OID tester can avoid using the retrieveMIBTable method of the drexel library entirely. -- mvh Morten Brekkevold UNINETT
From gwc2004 at gmail.com Sat Mar 10 07:49:28 2007 From: gwc2004 at gmail.com (Greg Cooper) Date: Sat Mar 10 14:49:35 2007 Subject: [Nav-users] getDeviceData Java errors Message-ID: <4ddd24660703100549m38e8bf53l858b508ce07b4069@mail.gmail.com>
Hi all. I have a stock installation of NAV 3.2.1 on a CentOS 4.4 box. I am seeing this in the getDeviceData-stderr.log. Any idea what it means? java.lang.NullPointerException at org.postgresql.jdbc2.AbstractJdbc2Statement.execute( AbstractJdbc2Statement.java:53) at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate( AbstractJdbc1Statement.java:273) at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate( AbstractJdbc1Statement.java:259) at no.ntnu.nav.Database.Database.update(Database.java:1004) at OidTester.doTest(OidTester.java:308) at OidTester.oidTest(OidTester.java:43) at QueryNetbox.run(QueryNetbox.java:722) java.lang.NullPointerException at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate( AbstractJdbc1Statement.java:274) at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate( AbstractJdbc1Statement.java:259) at no.ntnu.nav.Database.Database.update(Database.java:1004) at no.ntnu.nav.Database.Database.update(Database.java:989) at no.ntnu.nav.getDeviceData.dataplugins.Swport.SwportHandler.handleData( SwportHandler.java:305) at DataContainersImpl.callDataHandlers(DataContainersImpl.java:69) at QueryNetbox.run(QueryNetbox.java:806) Thanks! Greg (Lok*)
From gwc2004 at gmail.com Sun Mar 11 10:24:00 2007 From: gwc2004 at gmail.com (Greg Cooper) Date: Sun Mar 11 16:24:04 2007 Subject: [Nav-users] Cricket Question Message-ID: <4ddd24660703110824p1c9406b0q5cf48ae10657dd53@mail.gmail.com>
Hello, Does anyone know what happens when "collect-subtree normal" is launched every 5 minutes by cron, but it takes 10 minutes to complete running? I see entries in the ~cricket/logs/normal.* logs that seem like cricket is getting the data from each device and port. I am having problems with empty graphs when I look at the statistics, though. I am collecting on about 9700 ports, though. Is that a stupid amount of ports to collect info on? Is there anything I should do after following the RHEL/CentOS installation guide to configure cricket to handle that many ports? I read that collect-subtrees can collect from devices in parallel. How do you configure that? Would that bring down my collect-subtrees runtimes so it would complete in the 5minute cron window? Thanks! Greg (lok*)
From john.m.bredal at ntnu.no Mon Mar 12 07:56:51 2007 From: john.m.bredal at ntnu.no (John-Magne Bredal) Date: Mon Mar 12 07:53:12 2007 Subject: [Nav-users] Cricket Question In-Reply-To: <4ddd24660703110824p1c9406b0q5cf48ae10657dd53@mail.gmail.com> References: <4ddd24660703110824p1c9406b0q5cf48ae10657dd53@mail.gmail.com> Message-ID: <45F4F9B3.7060504@ntnu.no>
Greg Cooper wrote:
Hello,
Does anyone know what happens when "collect-subtree normal" is launched every 5 minutes by cron, but it takes 10 minutes to complete running?
I see entries in the ~cricket/logs/normal.* logs that seem like cricket is getting the data from each device and port.
I am having problems with empty graphs when I look at the statistics, though. I am collecting on about 9700 ports, though. Is that a stupid amount of ports to collect info on? Is there anything I should do after following the RHEL/CentOS installation guide to configure cricket to handle that many ports?
9700 ports are a considerable amount. Cricket is rather CPU and I/O demanding, so it depends a bit on your hardware. If you have dual CPU's you may get an increase in performance by configuring Cricket to use several processes. This is done by editing the subtree-sets file (located in the cricket-directory). This file defines the "sets" of directories you collect from. One set equals one process. Try to add one (or more) sets, preferably so that the number of ports in each set is about the same. Also remember to update the $NAVinstall/etc/cron.d/cricket file and add a cron-job for the new sets (the collect-subtrees job), and restart cricket with "nav restart cricket". NB: This will increase the CPU and I/O load on your server considerably. Cricket is not the most efficient program, and for bigger installations it is common to use a dedicated server (or more) for Cricket. At NTNU we are going to test this as we are currently only gathering data from a small subset of our network because of the load on the server. There was a discussion on the cricket-users mailing list some months ago (first post from 21. nov.) regarding this issue. It had the title "Huge Cricket installations". It seems that the forum is down (http://sourceforge.net/mailarchive/forum.php?forum=cricket-users) but if it comes up you can read that thread for more thorough information.
I read that collect-subtrees can collect from devices in parallel. How do you configure that? Would that bring down my collect-subtrees runtimes so it would complete in the 5minute cron window?
Thanks!
Greg (lok*) _______________________________________________ nav-users mailing list nav-users@itea.ntnu.no http://mailman.itea.ntnu.no/mailman/listinfo/nav-users
yours -- John Magne Bredal NTNU, ITEA - Nettgruppa 91897366 / (735)90250
From silvije.milisic at carnet.hr Mon Mar 12 09:39:46 2007 From: silvije.milisic at carnet.hr (Silvije) Date: Mon Mar 12 09:39:59 2007 Subject: [Nav-users] NAV 3.2.1 from snapshot on ubuntu server problem Message-ID: <45F511D2.7080401@carnet.hr>
HI all! I have a few questions: Anyone tried to install NAV on Ubuntu server? I have but no luck for now. I get this error when I open https://server:8888/: Mod_python error: "PythonHeaderParserHandler nav.web" Traceback (most recent call last): File "/usr/lib/python2.4/site-packages/mod_python/apache.py", line 287, in HandlerDispatch log=debug) File "/usr/lib/python2.4/site-packages/mod_python/apache.py", line 461, in import_module f, p, d = imp.find_module(parts[i], path) ImportError: No module named nav I am using apache2 and configured nav web as virtualhost on port 8888 Is this ok to do or not? I think I got all dependencies right. PYTHONPATH is set to /usr/local/nav/lib/python... Thanks for any answer, Regards, Silvije
From kjartan.malde at uis.no Mon Mar 12 12:16:14 2007 From: kjartan.malde at uis.no (kjartan.malde@uis.no) Date: Mon Mar 12 12:16:18 2007 Subject: [Nav-users] Errormessage from arnold.pl Message-ID: <OF2987456F.94B67E9E-ONC125729C.003CBF90-C125729C.003DE911@uis.no>
Does anyone know what is wrong when running Arnold from cli. I get following error: Illegal division by zero at /usr/local/nav/bin/arnold.pl line 162. when using command: /usr/local/nav/bin/start_arnold.pl -i 1 -f <file with ip-list> Best regards, -- Kjartan Malde Univ. of Stavanger
From john.m.bredal at ntnu.no Mon Mar 12 12:29:16 2007 From: john.m.bredal at ntnu.no (John-Magne Bredal) Date: Mon Mar 12 12:25:37 2007 Subject: [Nav-users] Errormessage from arnold.pl In-Reply-To: <OF2987456F.94B67E9E-ONC125729C.003CBF90-C125729C.003DE911@uis.no> References: <OF2987456F.94B67E9E-ONC125729C.003CBF90-C125729C.003DE911@uis.no> Message-ID: <45F5398C.5090602@ntnu.no>
kjartan.malde@uis.no wrote:
Does anyone know what is wrong when running Arnold from cli. I get following error:
Illegal division by zero at /usr/local/nav/bin/arnold.pl line 162.
when using command: /usr/local/nav/bin/start_arnold.pl -i 1 -f <file with ip-list>
I would like to know the output of the start_arnold.log in $NAVinstall/var/log/arnold. And also the output from arnold.log at the same timestamp. yours -- John Magne Bredal NTNU, ITEA - Nettgruppa 91897366 / (735)90250
From kjartan.malde at uis.no Mon Mar 12 12:39:54 2007 From: kjartan.malde at uis.no (kjartan.malde@uis.no) Date: Mon Mar 12 12:39:58 2007 Subject: [Nav-users] Errormessage from arnold.pl In-Reply-To: <45F5398C.5090602@ntnu.no> Message-ID: <OFFDADB940.1EBB2C97-ONC125729C.003F9B9A-C125729C.0040140F@uis.no>
John-Magne Bredal <john.m.bredal@ntnu.no> 12.03.2007 12:29 To kjartan.malde@uis.no cc nav-users@itea.ntnu.no Subject Re: [Nav-users] Errormessage from arnold.pl kjartan.malde@uis.no wrote:
Does anyone know what is wrong when running Arnold from cli. I get following error:
Illegal division by zero at /usr/local/nav/bin/arnold.pl line 162.
when using command: /usr/local/nav/bin/start_arnold.pl -i 1 -f <file with ip-list>
I would like to know the output of the start_arnold.log in $NAVinstall/var/log/arnold. And also the output from arnold.log at the same timestamp. It seems path to Arnold is given somewhere. It seems to append my paths to some default. /home/malde/testmail is the path to mailfile given in the blocktypes. start_arnold.log ========== NEW LOGENTRY 070312-121758 ========== Got option /home/malde/arnoldip /usr/local/nav/bin/arnold.pl -xdisable -m/home/malde/testmail -r1 -e5 -ucron -f/home/malde/arnoldip ========== NEW LOGENTRY 070312-122922 ========== Got option /home/malde/arnoldip /usr/local/nav/bin/arnold.pl -xdisable -m/home/malde/testmail -r1 -e5 -ucron -f/home/malde/arnoldip arnold.log ========== NEW LOGENTRY 070312-121758 ========== Connected successfully to block. WARNING: Could not find /usr/local/nav/etc/arnold/mailtemplates//home/malde/testmail, no mail will be sent. Using incremental increase in blockdays (default 5). Setting filename = /usr/local/nav/var/arnold//home/malde/arnoldip. ========== NEW LOGENTRY 070312-122923 ========== Connected successfully to block. WARNING: Could not find /usr/local/nav/etc/arnold/mailtemplates//home/malde/testmail, no mail will be sent. Using incremental increase in blockdays (default 5). Setting filename = /usr/local/nav/var/arnold//home/malde/arnoldip. -- Kjartan