Hello,
We are currently trying to set up a new installation of NAV (4.9.7). I have imported all of our devices and locations but am experiencing some issues.
Some of our devices have interfaces that are showing wildly inaccurate data. It's mostly on any ten gig interfaces but also on a few gig interfaces but they're showing up to 15Tbps of traffic when in reality they're pushing far less than that and the graphs are very broken (see attached screenshots). Most ports on the same devices report properly so it's just a few isolated interfaces.
The other issue we're running into is devices not showing linked to their neighbors. I tried working through the troubleshooting steps in the docs but the devices don't show up under unrecognized neighbors. On the devices however they show up in the ARP/CAM tables. We do have some links that don't support CDP but NAV should still be building these neighbor relationships through the ARP/CAM tables correct?
Thanks for any assistance that you can provide!
Regards,
Brian Mock
Unwired Broadband
On Tue, 2 Jul 2019 18:50:24 +0000 Brian Mock bmock@getunwired.com wrote:
Some of our devices have interfaces that are showing wildly inaccurate data. It's mostly on any ten gig interfaces but also on a few gig interfaces but they're showing up to 15Tbps of traffic when in reality they're pushing far less than that and the graphs are very broken (see attached screenshots). Most ports on the same devices report properly so it's just a few isolated interfaces.
Interesting. You are, of course, using SNMP v2c?
I did once speak to a customer who had made some live software or hardware upgrades to his Cisco switches, which resulted in erratic traffic counters on _some_ ports, but not all. The erratic counters could be confirmed by issuing SNMP requests from the command line against these devices. I've no idea if they ever opened a Cisco TAC for this issue. You could use something like the `snmpdelta` command line program to verify NAV's reports for the TenGigabitEthernet ports.
The other issue we're running into is devices not showing linked to their neighbors. I tried working through the troubleshooting steps in the docs but the devices don't show up under unrecognized neighbors. On the devices however they show up in the ARP/CAM tables.
CAM is only used to identify _known_ neighbors. Unrecognized neighbors are only registered from CDP and LLDP records that NAV could not identify.
We do have some links that don't support CDP but NAV should still be building these neighbor relationships through the ARP/CAM tables correct?
It can do that, yes, but those aren't necessarily reliable in all cases. In any case, if you have a heterogenous network with other vendors than Cisco, you'd be better off disabling CDP and enabling LLDP on everything. In our experience, mixing CDP and LLDP in a network tends to give strange topology results - especially when mixing in devices that reflect CDP records in their LLDP tables (as some HP devices will do) - and also because CDP packets will pass through non-CDP-speaking devices as if they were invisible.
Hello Morten,
To confirm your question, we are using SNMP v2c. I don't believe the issue is with the devices themselves. We currently use another monitoring tool that polls traffic data via SNMP and is reporting correctly on all of the interfaces in question within NAV. If you think it would make sense to still test the snmp results from the nav box command line I'd be happy to try but to incorrect data appears to be isolated to just that server.
As for the neighbor relationships, I'll talk with the rest of our engineering team to see if it'd make sense to move over to lldp instead of cdp.
Thanks for your help!
Brian Mock
________________________________ From: Morten Brekkevold morten.brekkevold@uninett.no Sent: Wednesday, July 24, 2019 01:42 To: Brian Mock Cc: nav-users@uninett.no Subject: Re: New Install Issues
On Tue, 2 Jul 2019 18:50:24 +0000 Brian Mock bmock@getunwired.com wrote:
Some of our devices have interfaces that are showing wildly inaccurate data. It's mostly on any ten gig interfaces but also on a few gig interfaces but they're showing up to 15Tbps of traffic when in reality they're pushing far less than that and the graphs are very broken (see attached screenshots). Most ports on the same devices report properly so it's just a few isolated interfaces.
Interesting. You are, of course, using SNMP v2c?
I did once speak to a customer who had made some live software or hardware upgrades to his Cisco switches, which resulted in erratic traffic counters on _some_ ports, but not all. The erratic counters could be confirmed by issuing SNMP requests from the command line against these devices. I've no idea if they ever opened a Cisco TAC for this issue. You could use something like the `snmpdelta` command line program to verify NAV's reports for the TenGigabitEthernet ports.
The other issue we're running into is devices not showing linked to their neighbors. I tried working through the troubleshooting steps in the docs but the devices don't show up under unrecognized neighbors. On the devices however they show up in the ARP/CAM tables.
CAM is only used to identify _known_ neighbors. Unrecognized neighbors are only registered from CDP and LLDP records that NAV could not identify.
We do have some links that don't support CDP but NAV should still be building these neighbor relationships through the ARP/CAM tables correct?
It can do that, yes, but those aren't necessarily reliable in all cases. In any case, if you have a heterogenous network with other vendors than Cisco, you'd be better off disabling CDP and enabling LLDP on everything. In our experience, mixing CDP and LLDP in a network tends to give strange topology results - especially when mixing in devices that reflect CDP records in their LLDP tables (as some HP devices will do) - and also because CDP packets will pass through non-CDP-speaking devices as if they were invisible.
-- mvh Morten Brekkevold Uninett
On Mon, 29 Jul 2019 15:59:55 +0000 Brian Mock bmock@getunwired.com wrote:
To confirm your question, we are using SNMP v2c. I don't believe the issue is with the devices themselves. We currently use another monitoring tool that polls traffic data via SNMP and is reporting correctly on all of the interfaces in question within NAV.
If you think it would make sense to still test the snmp results from the nav box command line I'd be happy to try but to incorrect data appears to be isolated to just that server.
No, having a second monitoring system that reports correctly is equivalent to what I was asking for (and less work) :-)
Graphite can be a bit finicky if data is not inserted at the expected intervals, and this can lead to strange results in graphs. Sometimes, Graphite's expectations are wrong, due to misconfiguration. Sometimes, NAV's insertion of data is not happening at the expected intervals because of performance problems.
These issues are discussed in the documentation [1], which I urge you to read through.
Some of these issues may already be known to you, but because the graphs you attached seem pretty strange, I would also like you to graph the raw data. NAV sends the traffic counter values verbatim to the carbon backend, and only asks Graphite to derive a rate from those data as it renders graphs of it. It would be interesting to see a graph of the (what should be) ever-increasing counter values of one of your Ten-gig ports.
NAV would produce a graph of a port's ifInOctets with a Graphite target equivalent to this long string (which asks Graphite to scale and derive the octet counter values into a rate of bits per second):
scaleToSeconds(nonNegativeDerivative(scale(nav.devices.knwt-mnda-bh1_example_org.ports.Te0_1.ifInOctets,8)),1)
To get the raw data, you would simply use a target of:
nav.devices.knwt-mnda-bh1_example_org.ports.Te0_1.ifInOctets
To have Graphite-web render a graph of the last day of raw values, you could use something like this URL:
https://your-nav-server/graphite/render?from=-1day&until=now&target=...
As for the neighbor relationships, I'll talk with the rest of our engineering team to see if it'd make sense to move over to lldp instead of cdp.
Great, let me know how it goes :)
[1] https://nav.uit.no/doc/faq/graph_gaps.html