Some of you have noticed problems with getDeviceData running out of Java heap
memory, and then getting out of sync with the PostgreSQL database driver - it
ends up doing strange updates to the database.
The issue is described in this bug report:
http://sourceforge.net/tracker/index.php?func=detail&aid=1675508&gro...
For the technically inclined, here's a complete summary:
I've traced the problem back to the Java SNMP package provided by drexel.edu.
This package is used by the getDeviceData and getBoksMacs NAV processes to
perform SNMP queries. The problem is an infinite loop, in which the library
sends the same GETNEXT request over and over, and growing a list of SNMP
responses until the heap is full.
In a getDeviceData context, this problem can occur during OID testing of a
device (This is when gDD attempts to figure which SNMP OIDs a new device
supports). When the OID tester attempts to query an OID using GETNEXT
requests, it uses the Java SNMP package's method for snmpwalking a MIB tree
(retrieveMIBTable). If the OID is outside the MIB view of an SNMPv2c device,
the device will respond with an SNMPv2 endOfMibView exception.
Ufortunately, the retrieveMIBTable method of drexel's Java SNMP package treats
this exception as just another value and adds it to the list of responses.
Since the exception's object identifier is the same as the one used in the
GETNEXT request, the method will continue to issue GETNEXT requests for the
same OID until it no longer receives any response, or until it runs out of
memory, whichever comes first.
I've written two different patches to fix the problem in the Java SNMP
Package, and I have submitted these patches to the upstream authors. The
simplest patch is available at the aforementioned bug report page, for those
who want to apply it themselves.
I'm also planning to write a workaround for NAV. It is quite unnecessary for
the NAV OID tester to walk entire SNMP tables just to test OID compatibility.
Issuing a single GETNEXT request should be enough to determine whether the
OID is supported or not. By doing this, the OID tester can avoid using the
retrieveMIBTable method of the drexel library entirely.
--
mvh
Morten Brekkevold
UNINETT
>From gwc2004 at gmail.com Sat Mar 10 07:49:28 2007
From: gwc2004 at gmail.com (Greg Cooper)
Date: Sat Mar 10 14:49:35 2007
Subject: [Nav-users] getDeviceData Java errors
Message-ID:
4ddd24660703100549m38e8bf53l858b508ce07b4069@mail.gmail.com
Hi all.
I have a stock installation of NAV 3.2.1 on a CentOS 4.4 box. I am seeing
this in the getDeviceData-stderr.log. Any idea what it means?
java.lang.NullPointerException
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(
AbstractJdbc2Statement.java:53)
at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate(
AbstractJdbc1Statement.java:273)
at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate(
AbstractJdbc1Statement.java:259)
at no.ntnu.nav.Database.Database.update(Database.java:1004)
at OidTester.doTest(OidTester.java:308)
at OidTester.oidTest(OidTester.java:43)
at QueryNetbox.run(QueryNetbox.java:722)
java.lang.NullPointerException
at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate(
AbstractJdbc1Statement.java:274)
at org.postgresql.jdbc1.AbstractJdbc1Statement.executeUpdate(
AbstractJdbc1Statement.java:259)
at no.ntnu.nav.Database.Database.update(Database.java:1004)
at no.ntnu.nav.Database.Database.update(Database.java:989)
at
no.ntnu.nav.getDeviceData.dataplugins.Swport.SwportHandler.handleData(
SwportHandler.java:305)
at DataContainersImpl.callDataHandlers(DataContainersImpl.java:69)
at QueryNetbox.run(QueryNetbox.java:806)
Thanks!
Greg (Lok*)
>From gwc2004 at gmail.com Sun Mar 11 10:24:00 2007
From: gwc2004 at gmail.com (Greg Cooper)
Date: Sun Mar 11 16:24:04 2007
Subject: [Nav-users] Cricket Question
Message-ID:
4ddd24660703110824p1c9406b0q5cf48ae10657dd53@mail.gmail.com
Hello,
Does anyone know what happens when "collect-subtree normal" is launched
every 5 minutes by cron, but it takes 10 minutes to complete running?
I see entries in the ~cricket/logs/normal.* logs that seem like cricket is
getting the data from each device and port.
I am having problems with empty graphs when I look at the statistics,
though. I am collecting on about 9700 ports, though. Is that a stupid
amount of ports to collect info on? Is there anything I should do after
following the RHEL/CentOS installation guide to configure cricket to handle
that many ports?
I read that collect-subtrees can collect from devices in parallel. How do
you configure that? Would that bring down my collect-subtrees runtimes so
it would complete in the 5minute cron window?
Thanks!
Greg (lok*)
>From john.m.bredal at ntnu.no Mon Mar 12 07:56:51 2007
From: john.m.bredal at ntnu.no (John-Magne Bredal)
Date: Mon Mar 12 07:53:12 2007
Subject: [Nav-users] Cricket Question
In-Reply-To:
4ddd24660703110824p1c9406b0q5cf48ae10657dd53@mail.gmail.com
References:
4ddd24660703110824p1c9406b0q5cf48ae10657dd53@mail.gmail.com
Message-ID:
45F4F9B3.7060504@ntnu.no
Greg Cooper wrote:
> Hello,
>
> Does anyone know what happens when "collect-subtree normal" is launched
> every 5 minutes by cron, but it takes 10 minutes to complete running?
>
> I see entries in the ~cricket/logs/normal.* logs that seem like cricket is
> getting the data from each device and port.
>
> I am having problems with empty graphs when I look at the statistics,
> though. I am collecting on about 9700 ports, though. Is that a stupid
> amount of ports to collect info on? Is there anything I should do after
> following the RHEL/CentOS installation guide to configure cricket to handle
> that many ports?
9700 ports are a considerable amount. Cricket is rather CPU and I/O
demanding, so it depends a bit on your hardware. If you have dual CPU's
you may get an increase in performance by configuring Cricket to use
several processes.
This is done by editing the subtree-sets file (located in the
cricket-directory). This file defines the "sets" of directories you
collect from. One set equals one process. Try to add one (or more) sets,
preferably so that the number of ports in each set is about the same.
Also remember to update the $NAVinstall/etc/cron.d/cricket file and add
a cron-job for the new sets (the collect-subtrees job), and restart
cricket with "nav restart cricket". NB: This will increase the CPU and
I/O load on your server considerably.
Cricket is not the most efficient program, and for bigger installations
it is common to use a dedicated server (or more) for Cricket. At NTNU we
are going to test this as we are currently only gathering data from a
small subset of our network because of the load on the server.
There was a discussion on the cricket-users mailing list some months ago
(first post from 21. nov.) regarding this issue. It had the title "Huge
Cricket installations". It seems that the forum is down
(
http://sourceforge.net/mailarchive/forum.php?forum=cricket-users) but
if it comes up you can read that thread for more thorough information.
> I read that collect-subtrees can collect from devices in parallel. How do
> you configure that? Would that bring down my collect-subtrees runtimes so
> it would complete in the 5minute cron window?
>
> Thanks!
>
> Greg (lok*)
> _______________________________________________
> nav-users mailing list
> nav-users@itea.ntnu.no
>
http://mailman.itea.ntnu.no/mailman/listinfo/nav-users
yours
--
John Magne Bredal
NTNU, ITEA - Nettgruppa
91897366 / (735)90250
>From silvije.milisic at carnet.hr Mon Mar 12 09:39:46 2007
From: silvije.milisic at carnet.hr (Silvije)
Date: Mon Mar 12 09:39:59 2007
Subject: [Nav-users] NAV 3.2.1 from snapshot on ubuntu server problem
Message-ID:
45F511D2.7080401@carnet.hr
HI all!
I have a few questions:
Anyone tried to install NAV on Ubuntu server?
I have but no luck for now. I get this error when I open
https://server:8888/:
Mod_python error: "PythonHeaderParserHandler nav.web"
Traceback (most recent call last):
File "/usr/lib/python2.4/site-packages/mod_python/apache.py", line
287, in HandlerDispatch
log=debug)
File "/usr/lib/python2.4/site-packages/mod_python/apache.py", line
461, in import_module
f, p, d = imp.find_module(parts[i], path)
ImportError: No module named nav
I am using apache2 and configured nav web as virtualhost on port 8888
Is this ok to do or not?
I think I got all dependencies right. PYTHONPATH is set to
/usr/local/nav/lib/python...
Thanks for any answer,
Regards,
Silvije
>From kjartan.malde at uis.no Mon Mar 12 12:16:14 2007
From: kjartan.malde at uis.no (kjartan.malde@uis.no)
Date: Mon Mar 12 12:16:18 2007
Subject: [Nav-users] Errormessage from arnold.pl
Message-ID:
OF2987456F.94B67E9E-ONC125729C.003CBF90-C125729C.003DE911@uis.no
Does anyone know what is wrong when running Arnold from cli. I get
following error:
Illegal division by zero at /usr/local/nav/bin/arnold.pl line 162.
when using command: /usr/local/nav/bin/start_arnold.pl -i 1 -f <file with
ip-list>
Best regards,
--
Kjartan Malde
Univ. of Stavanger
>From john.m.bredal at ntnu.no Mon Mar 12 12:29:16 2007
From: john.m.bredal at ntnu.no (John-Magne Bredal)
Date: Mon Mar 12 12:25:37 2007
Subject: [Nav-users] Errormessage from arnold.pl
In-Reply-To:
OF2987456F.94B67E9E-ONC125729C.003CBF90-C125729C.003DE911@uis.no
References:
OF2987456F.94B67E9E-ONC125729C.003CBF90-C125729C.003DE911@uis.no
Message-ID:
45F5398C.5090602@ntnu.no
kjartan.malde@uis.no wrote:
> Does anyone know what is wrong when running Arnold from cli. I get
> following error:
>
> Illegal division by zero at /usr/local/nav/bin/arnold.pl line 162.
>
> when using command: /usr/local/nav/bin/start_arnold.pl -i 1 -f <file with
> ip-list>
>
I would like to know the output of the start_arnold.log in
$NAVinstall/var/log/arnold. And also the output from arnold.log at the
same timestamp.
yours
--
John Magne Bredal
NTNU, ITEA - Nettgruppa
91897366 / (735)90250
>From kjartan.malde at uis.no Mon Mar 12 12:39:54 2007
From: kjartan.malde at uis.no (kjartan.malde@uis.no)
Date: Mon Mar 12 12:39:58 2007
Subject: [Nav-users] Errormessage from arnold.pl
In-Reply-To:
45F5398C.5090602@ntnu.no
Message-ID:
OFFDADB940.1EBB2C97-ONC125729C.003F9B9A-C125729C.0040140F@uis.no
John-Magne Bredal
john.m.bredal@ntnu.no
12.03.2007 12:29
To
kjartan.malde@uis.no
cc
nav-users@itea.ntnu.no
Subject
Re: [Nav-users] Errormessage from arnold.pl
kjartan.malde@uis.no wrote:
> Does anyone know what is wrong when running Arnold from cli. I get
> following error:
>
> Illegal division by zero at /usr/local/nav/bin/arnold.pl line 162.
>
> when using command: /usr/local/nav/bin/start_arnold.pl -i 1 -f <file
with
> ip-list>
>
I would like to know the output of the start_arnold.log in
$NAVinstall/var/log/arnold. And also the output from arnold.log at the
same timestamp.
It seems path to Arnold is given somewhere. It seems to append my paths to
some default.
/home/malde/testmail
is the path to mailfile given in the blocktypes.
start_arnold.log
========== NEW LOGENTRY 070312-121758 ==========
Got option /home/malde/arnoldip
/usr/local/nav/bin/arnold.pl -xdisable -m/home/malde/testmail -r1 -e5
-ucron -f/home/malde/arnoldip
========== NEW LOGENTRY 070312-122922 ==========
Got option /home/malde/arnoldip
/usr/local/nav/bin/arnold.pl -xdisable -m/home/malde/testmail -r1 -e5
-ucron -f/home/malde/arnoldip
arnold.log
========== NEW LOGENTRY 070312-121758 ==========
Connected successfully to block.
WARNING: Could not find
/usr/local/nav/etc/arnold/mailtemplates//home/malde/testmail, no mail will
be sent.
Using incremental increase in blockdays (default 5).
Setting filename = /usr/local/nav/var/arnold//home/malde/arnoldip.
========== NEW LOGENTRY 070312-122923 ==========
Connected successfully to block.
WARNING: Could not find
/usr/local/nav/etc/arnold/mailtemplates//home/malde/testmail, no mail will
be sent.
Using incremental increase in blockdays (default 5).
Setting filename = /usr/local/nav/var/arnold//home/malde/arnoldip.
--
Kjartan