I'm trying out the NAV virtual appliance (4.1.0), talking to 25 or so Cisco switches. It was working fine but after a few weeks, one of the switches is showing a problem, "SNMP agent down". The SNMP agent is not down. The switch is fine, its configuration hasn't changed. I can talk SNMP to it fine using the same credentials from other machines, and the NAV VM can talk SNMP fine to all of the other switches - all have the same SNMP credentials, and most are the same model of Cisco switch. (I was going to test from the NAV VM or watch the network traffic but the appliance doesn't come with the snmp tools or tcpdump, and is isolated from the Internet so not easily able to install them)
Restarting the switch made no difference.
I've had a rummage in the logs without seeing anything obvious. Any clues where to look to find why NAV is unable to talk SNMP to this one switch suddenly, after working fine for several weeks?
Steve.
On Thu, 9 Oct 2014 16:59:16 +0000 Steve Kersley steve.kersley@keble.ox.ac.uk wrote:
(I was going to test from the NAV VM or watch the network traffic but the appliance doesn't come with the snmp tools or tcpdump, and is isolated from the Internet so not easily able to install them)
That's too bad, because that's exactly what I would have suggested you try.
I've had a rummage in the logs without seeing anything obvious. Any clues where to look to find why NAV is unable to talk SNMP to this one switch suddenly, after working fine for several weeks?
Could it be that responses from this particular switch have become too slow for NAV's default SNMP timeout value? You could try adjusting the timeout in `ipdevpoll.conf` and then restarting ipdevpoll.
`ipdevpoll.log` would be the log to look for relevant messages.
Thanks for the input. Nothing obvious in the logfiles for the switch in question - it seemed to be running OK: 2014-10-13 17:17:23,350 [INFO schedule.netboxjobscheduler] [snmpcheck library.keb] snmpcheck for library.keb completed in 0:00:00.004969. next run in 0:29:59.995051. (however, only this check and three DNS checks on each loop)
However, I just rebooted the NAV appliance and suddenly after a reboot, all is well and it's talking to the SNMP agent on the switch again, and logging more results than above. I am pretty sure I had rebooted it before which hadn't fixed it, but can't be quite certain whether the last reboot was before or after it broke - but I assume that's the reason I rebooted!
So bit of a mystery why it had stopped working and why it had started again. It clearly wasn't anything up with the network or the configuration of either NAV or the switch as nothing has changed on either.
Have also now configured it to be able to access the Internet to install tcpdump etc so will do that in case it breaks again!
Steve.