Hi,
I have a problem with /usr/lib/nav/powersupplywatch.py:
I get the following error runninng this script as user navcron:
[2014-03-24 18:11:51,243] [DEBUG] Allocate SNMP-handle for cat-XXX-vl-742.pz.p.unibas.ch [2014-03-24 18:11:51,244] [DEBUG] Polling cat-XXX-vl-742.pz.p.unibas.ch: WS-C2960G-8TC-L - Power Supply 0 [2014-03-24 18:11:51,244] [DEBUG] Stored state = Unknown; polled state = Unknown [2014-03-24 18:11:51,247] [DEBUG] Allocate SNMP-handle for cat-XXX-vl-742.pz.p.unibas.ch snmp_open: Unknown host (10.XX.XX.186:161) [2014-03-24 18:11:51,247] [ERROR] snmp_open: Unknown host (10.XX.XX.186:161) Traceback (most recent call last): File "/usr/lib/nav/powersupplywatch.py", line 390, in <module> File "/usr/lib/nav/powersupplywatch.py", line 174, in main File "/usr/lib/nav/powersupplywatch.py", line 244, in check_psus_and_fans File "/usr/lib/nav/powersupplywatch.py", line 275, in get_snmp_handle File "/usr/lib/pymodules/python2.7/nav/Snmp/pynetsnmp.py", line 66, in __init__ File "/usr/lib/pymodules/python2.7/pynetsnmp/netsnmp.py", line 421, in open pynetsnmp.netsnmp.SnmpError: snmp_open
NAV can actually poll the device with snmp. When I delete the device, the next device in the list will fail. It's always device No. 1020 (see below)
root@urz-nav-pet:~# sudo -u navcron /usr/lib/nav/powersupplywatch.py -v -d 2>&1| grep Alloc | wc -l 1020
We run Debian 7.4 with root@urz-nav-pet:~# dpkg -l | grep nav ii nav 2+3.15.6-2 all Network Administration Visualized
any ideas?
On Mon, 24 Mar 2014 18:04:19 +0000 Mischa Diehm mischa.diehm@unibas.ch wrote:
I have a problem with /usr/lib/nav/powersupplywatch.py:
I get the following error runninng this script as user navcron:
[snip]
pynetsnmp.netsnmp.SnmpError: snmp_open
NAV can actually poll the device with snmp. When I delete the device, the next device in the list will fail. It's always device No. 1020 (see below)
root@urz-nav-pet:~# sudo -u navcron /usr/lib/nav/powersupplywatch.py -v -d 2>&1| grep Alloc | wc -l 1020
any ideas?
I have an inkling, yes. This is one of the reasons I asked the original author to implement it as an ipdevpoll plugin, not as a separate program; which he did anyway.
I would venture to guess the powersupplywatch program never closes its SNMP sockets when it's finished with them, meaning it is leaking file handles all over the place. A process will typically be limited to 1024 simultaneous file handles (and this internal limit is even hardcoded into NET-SNMP, so increasing the number of handles a process is allowed would not help in this case).
If you would file a bug report at [1], I will see if the code can't be fixed.
[1] https://bugs.launchpad.net/nav/+filebug