Hi all, I have been tasked with creating a new NAV (3.9.1) install. All appears to be fine until I compare it against the previous old (3.5.4) installation. The majoirty of our switches are Cisco 3750's nearly all stacked, but the new install doesn't seem to acknowledge this under 'IP Device Info' where the 'Modules' field shows '0', but the 'switch port status' table shows all interfaces spanning the stack. On the old install the 'Modules' field has the correct number and the 'Switch port status' shows the right number of separate modules.
Looking under: 'Report-> Vendors -> Cisco' confirms that the new install is receiving less SNMP OIDs per switch stack; typically 75. Where as the old install is showing between 88-90 for the same stacks. Also in some cases the new install is not showing any returns under #mod, #swp, #gwp and #prefixes, where is does on the old install.
Both installs are running libsnmp1.4-java; the old 1.4-5, and the new install 1.4.2-2. Would that account for the differences? What's the best way to test the two installs side by side in an output based comparison?
Many thanks, seb.
On Tue, 04 Oct 2011 16:07:24 +0100 Seb Rupik snr@ecs.soton.ac.uk wrote:
Hi all, I have been tasked with creating a new NAV (3.9.1) install. All appears to be fine until I compare it against the previous old (3.5.4) installation. The majoirty of our switches are Cisco 3750's nearly all stacked, but the new install doesn't seem to acknowledge this under 'IP Device Info' where the Modules' field shows '0', but the 'switch port status' table shows all interfaces spanning the stack. On the old install the 'Modules' field has the correct number and the 'Switch port status' shows the right number of separate modules.
Looking under: 'Report-> Vendors -> Cisco' confirms that the new install is receiving less SNMP OIDs per switch stack; typically 75. Where as the old install is showing between 88-90 for the same stacks. Also in some cases the new install is not showing any returns under #mod, #swp, #gwp and #prefixes, where is does on the old install.
Both installs are running libsnmp1.4-java; the old 1.4-5, and the new install 1.4.2-2. Would that account for the differences? What's the best way to test the two installs side by side in an output based comparison?
No. The collection system and central parts of NAV's data model changed from version 3.5 to 3.6.
It is likely what you're seeing is an effect of NAV no longer using Cisco-proprietary MIBs for discovering "modules", but the IETF-standard ENTITY-MIB (RFC 4133).
I haven't much experience with stacked 3750's, but barring use of proprietary MIBs, such a stack would present itself as a single device SNMP-wise. I'm guessing that your 3750's will list multiple chassis in the ENTITY-MIB::entPhysicalTable.
NAV currently has no way to model multiple stacked chassis, which could in principle each have multiple internal modules.
There are a myriad of ways to stack and/or aggregate multiple devices, all depending on vendor and model. Some stacking methods involve aggregating multiple simple devices into a single superdevice, while other methods let each stack member retain their autonomy while channeling all management through a single IP address on a master device.
There is very little stacking going on in UNINETT's customer base, which means that for us there is little demand for supporting this, and there's little equipment for us to test against.
If you have ideas on how to model the various kinds of stacking paradigms that exist, and/or are willing/able to provide some sort of feedback or testbed for development (or patches, for that matter), we're all ears. I would suggest the nav-dev mailing list for technical discussions on the matter.
I'm having the same issue. I'm a consultant, and I have several customers who want a great network management/monitoring tool. I recommend NAV to them, since I've used it for a long time.
However, they almost all have 3750 stacks. At this one particular customer, I'm not getting really any inventory info from ipdevpoll when it talks to a 3750 stack. I see:
2011-11-14 18:31:10,711 [ERROR jobs.jobhandler] [inventory sw01-3750] Job 'inventory' for evv-sw01-3750.brake.local aborted: User timeout caused connection failure.
I've been searching for a way to fix this... Does anyone have any advice to help me? I'm at the latest available version of NAV (from the debian sources)
Thanks!
Greg
On Mon, Nov 14, 2011 at 4:40 AM, Morten Brekkevold < morten.brekkevold@uninett.no> wrote:
On Tue, 04 Oct 2011 16:07:24 +0100 Seb Rupik snr@ecs.soton.ac.uk wrote:
Hi all, I have been tasked with creating a new NAV (3.9.1) install. All appears
to be
fine until I compare it against the previous old (3.5.4) installation. The majoirty of our switches are Cisco 3750's nearly all stacked, but
the new
install doesn't seem to acknowledge this under 'IP Device Info' where the Modules' field shows '0', but the 'switch port status' table shows all interfaces spanning the stack. On the old install the 'Modules' field has the correct number and the
'Switch
port status' shows the right number of separate modules.
Looking under: 'Report-> Vendors -> Cisco' confirms that the new install
is
receiving less SNMP OIDs per switch stack; typically 75. Where as the old install is showing between 88-90 for the same stacks. Also in some cases
the
new install is not showing any returns under #mod, #swp, #gwp and
#prefixes,
where is does on the old install.
Both installs are running libsnmp1.4-java; the old 1.4-5, and the new
install
1.4.2-2. Would that account for the differences? What's the best way to test the
two
installs side by side in an output based comparison?
No. The collection system and central parts of NAV's data model changed from version 3.5 to 3.6.
It is likely what you're seeing is an effect of NAV no longer using Cisco-proprietary MIBs for discovering "modules", but the IETF-standard ENTITY-MIB (RFC 4133).
I haven't much experience with stacked 3750's, but barring use of proprietary MIBs, such a stack would present itself as a single device SNMP-wise. I'm guessing that your 3750's will list multiple chassis in the ENTITY-MIB::entPhysicalTable.
NAV currently has no way to model multiple stacked chassis, which could in principle each have multiple internal modules.
There are a myriad of ways to stack and/or aggregate multiple devices, all depending on vendor and model. Some stacking methods involve aggregating multiple simple devices into a single superdevice, while other methods let each stack member retain their autonomy while channeling all management through a single IP address on a master device.
There is very little stacking going on in UNINETT's customer base, which means that for us there is little demand for supporting this, and there's little equipment for us to test against.
If you have ideas on how to model the various kinds of stacking paradigms that exist, and/or are willing/able to provide some sort of feedback or testbed for development (or patches, for that matter), we're all ears. I would suggest the nav-dev mailing list for technical discussions on the matter.
-- Morten Brekkevold UNINETT
I am not sure wether this is specifically a 3750 stack problem. We have numerous of 3750 and 3750X (48PS/12-S/24-S) The only weirdities I've noticed about chassis stacked switches, is that the switchports all appear under "No module" which is merely a cosmetic problem for me. It appears to be able to poll port data from the switch just fine. I'm not sure if there is any workaround to make NAV correctly determine wether it's a single or multichassis stack (modular or stacked) and make it look pretty on the ipdeviceinfo page, but as I said - its just a cosmetical thing for me.
-Vidar
Fra: nav-users-request@uninett.no [mailto:nav-users-request@uninett.no] På vegne av Greg Cooper Sendt: 14. november 2011 18:41 Til: Morten Brekkevold Kopi: Seb Rupik; nav-users@uninett.no Emne: Re: Differences in SNMP results between versions
I'm having the same issue. I'm a consultant, and I have several customers who want a great network management/monitoring tool. I recommend NAV to them, since I've used it for a long time.
However, they almost all have 3750 stacks. At this one particular customer, I'm not getting really any inventory info from ipdevpoll when it talks to a 3750 stack. I see:
2011-11-14 18:31:10,711 [ERROR jobs.jobhandler] [inventory sw01-3750] Job 'inventory' for evv-sw01-3750.brake.local aborted: User timeout caused connection failure.
I've been searching for a way to fix this... Does anyone have any advice to help me? I'm at the latest available version of NAV (from the debian sources)
Thanks!
Greg
On Mon, Nov 14, 2011 at 4:40 AM, Morten Brekkevold <morten.brekkevold@uninett.nomailto:morten.brekkevold@uninett.no> wrote: On Tue, 04 Oct 2011 16:07:24 +0100 Seb Rupik <snr@ecs.soton.ac.ukmailto:snr@ecs.soton.ac.uk> wrote:
Hi all, I have been tasked with creating a new NAV (3.9.1) install. All appears to be fine until I compare it against the previous old (3.5.4) installation. The majoirty of our switches are Cisco 3750's nearly all stacked, but the new install doesn't seem to acknowledge this under 'IP Device Info' where the Modules' field shows '0', but the 'switch port status' table shows all interfaces spanning the stack. On the old install the 'Modules' field has the correct number and the 'Switch port status' shows the right number of separate modules.
Looking under: 'Report-> Vendors -> Cisco' confirms that the new install is receiving less SNMP OIDs per switch stack; typically 75. Where as the old install is showing between 88-90 for the same stacks. Also in some cases the new install is not showing any returns under #mod, #swp, #gwp and #prefixes, where is does on the old install.
Both installs are running libsnmp1.4-java; the old 1.4-5, and the new install 1.4.2-2. Would that account for the differences? What's the best way to test the two installs side by side in an output based comparison?
No. The collection system and central parts of NAV's data model changed from version 3.5 to 3.6.
It is likely what you're seeing is an effect of NAV no longer using Cisco-proprietary MIBs for discovering "modules", but the IETF-standard ENTITY-MIB (RFC 4133).
I haven't much experience with stacked 3750's, but barring use of proprietary MIBs, such a stack would present itself as a single device SNMP-wise. I'm guessing that your 3750's will list multiple chassis in the ENTITY-MIB::entPhysicalTable.
NAV currently has no way to model multiple stacked chassis, which could in principle each have multiple internal modules.
There are a myriad of ways to stack and/or aggregate multiple devices, all depending on vendor and model. Some stacking methods involve aggregating multiple simple devices into a single superdevice, while other methods let each stack member retain their autonomy while channeling all management through a single IP address on a master device.
There is very little stacking going on in UNINETT's customer base, which means that for us there is little demand for supporting this, and there's little equipment for us to test against.
If you have ideas on how to model the various kinds of stacking paradigms that exist, and/or are willing/able to provide some sort of feedback or testbed for development (or patches, for that matter), we're all ears. I would suggest the nav-dev mailing list for technical discussions on the matter.
-- Morten Brekkevold UNINETT
On Mon, 14 Nov 2011 11:40:45 -0600 Greg Cooper gwc2004@gmail.com wrote:
I'm having the same issue. I'm a consultant, and I have several customers who want a great network management/monitoring tool. I recommend NAV to them, since I've used it for a long time.
Glad to hear that :-)
However, they almost all have 3750 stacks. At this one particular customer, I'm not getting really any inventory info from ipdevpoll when it talks to a 3750 stack. I see:
2011-11-14 18:31:10,711 [ERROR jobs.jobhandler] [inventory sw01-3750] Job 'inventory' for evv-sw01-3750.brake.local aborted: User timeout caused connection failure.
This is an SNMP timeout. It means your device may be too slow to answer some SNMP requests, or it didn't send an answer at all.
Now, I've been gone a while, but IIRC, there may be some issue with timeouts in the bridge plugin when working with Cisco devices (or when it thinks it's working with Cisco devices), so it would be helpful if you configured debug logging so we could see which plugin reported the timeout.
If you don't have the logging.conf file (not logger.conf) in your NAV config directory, please add it and insert the following two lines:
[levels] nav.ipdevpoll.jobs = DEBUG
Then restart ipdevpoll.
Yes, it was the bridge plugin. It seems to time out in about 4 seconds... Here's some debug output:
2011-11-14 21:50:01,476 [INFO plugins.oidprofiler.oidprofiler] [profiling sw01-3750.] profile update: add 1 / del 0 2011-11-14 21:50:01,477 [DEBUG jobs.jobhandler.queue] [profiling sw01-3750.] [(59575120, NetboxSnmpOid(netbox=Netbox(id=17, sysname=u'sw01-3750.', up_to_date=True), frequency=3600)), Netbox(id=17, sysname=u'sw01-3750.', up_to_date=True))] 2011-11-14 21:50:01,488 [DEBUG jobs.jobhandler.queue] [profiling sw01-3750.] containers after save: ContainerRepository({<class 'nav.ipdevpoll.shadows.Netb ox'>: {None: Netbox(id=17, sysname=u'sw01-3750.', up_to_date=True)}, <class 'nav.ipdevpoll.shadows.NetboxSnmpOid'>: {33: NetboxSnmpOid(id=2718L, netbox=Net box(id=17, sysname=u'sw01-3750.', up_to_date=True), frequency=3600)}}) 2011-11-14 21:50:01,489 [DEBUG jobs.jobhandler] [profiling sw01-3750.] Storing to database complete (11.542 ms) 2011-11-14 21:50:01,489 [DEBUG jobs.jobhandler] [profiling sw01-3750.] Running cleanup routines for 2 classes ([<class 'nav.ipdevpoll.shadows.Netbox'>, <cl ass 'nav.ipdevpoll.shadows.NetboxSnmpOid'>]) 2011-11-14 21:50:01,489 [INFO jobs.jobhandler] [profiling sw01-3750.] Job profiling for sw01-3750. done. 2011-11-14 21:50:01,489 [DEBUG jobs.jobhandler.timings] [profiling sw01-3750.] Job 'profiling' timings for sw01-3750.: 2011-11-14 21:50:01,490 [INFO schedule.netboxjobscheduler] [profiling sw01-3750.] Next 'profiling' job for sw01-3750. will be in 275 seconds (2011-11-14 21:54:37.156020) 2011-11-14 21:54:37,158 [DEBUG jobs.jobhandler] [profiling sw01-3750.] Job 'profiling' initialized with plugins: ['oidprofiler'] 2011-11-14 21:54:37,159 [DEBUG jobs.jobhandler] [profiling sw01-3750.] AgentProxy created for sw01-3750.: <nav.ipdevpoll.snmp.pynetsnmp.Agen tProxy instance at 0x20705f0> 2011-11-14 21:54:37,159 [DEBUG jobs.jobhandler] [profiling sw01-3750.] Plugin oidprofiler wouldn't handle sw01-3750. 2011-11-14 21:54:37,159 [DEBUG jobs.jobhandler] [profiling sw01-3750.] No plugins for this job 2011-11-14 21:54:37,159 [INFO schedule.netboxjobscheduler] [profiling sw01-3750.] Next 'profiling' job for sw01-3750. will be in 299 seconds (2011-11-14 21:59:37.158798) 2011-11-14 21:57:46,567 [DEBUG plugins.bridge.bridge] [inventory sw01-3750.] Collecting bridge data 2011-11-14 21:57:46,843 [DEBUG plugins.bridge.bridge] [inventory sw01-3750.] Alternate BRIDGE-MIB instances: [(...SNMP COMMUNITIES AND VLANS...)] 2011-11-14 21:57:46,844 [DEBUG plugins.bridge.bridge] [inventory sw01-3750.] Querying the following alternative instances: ['vlan50', 'vlan132', 'vlan52', 'vlan150', 'vlan30', 'vlan801', 'vlan32', 'vlan998', 'vlan39', 'vlan61', 'vlan41', 'vlan99', 'vlan1', 'vlan127', 'vlan1004', 'vlan130', 'vlan1003', 'vlan42', 'vlan11', 'v lan48', 'Catalyst 29xx/35xx/37xx', 'vlan51', 'vlan141', 'vlan20', 'vlan400', 'vlan31', 'vlan997', 'vlan33', 'vlan999', 'vlan40', 'vlan98', 'vlan126', 'vlan1002', 'vlan128 ', 'vlan100', 'vlan1005', 'vlan131', 'vlan10', 'vlan43', 'vlan49'] 2011-11-14 21:57:46,844 [DEBUG plugins.bridge.bridge] [inventory sw01-3750.] Now querying None 2011-11-14 21:57:50,150 [DEBUG plugins.bridge.bridge] [inventory sw01-3750.] Now querying 'vlan50' 2011-11-14 21:57:54,441 [ERROR jobs.jobhandler] [inventory sw01-3750.] Job 'inventory' for sw01-3750. aborted: User timeout caused connection failure.
2011-11-15 17:31:43,387 [ERROR jobs.jobhandler] [inventory sw01-3750.] Job 'inventory' for sw01-3750. aborted: User timeout caused connection failure. 2011-11-15 17:37:00,211 [ERROR jobs.jobhandler] [inventory sw01-3750.] Job 'inventory' for sw01-3750. aborted: User timeout caused connection failure. 2011-11-15 17:45:10,271 [ERROR jobs.jobhandler] [inventory sw01-3750.] Job 'inventory' for sw01-3750. aborted: User timeout caused connection failure. 2011-11-15 17:50:55,175 [ERROR jobs.jobhandler] [inventory sw01-3750.] Job 'inventory' for sw01-3750. aborted: User timeout caused connection failure. 2011-11-15 17:51:59,675 [DEBUG jobs.jobhandler] [inventory sw01-3750.] Job 'inventory' initialized with plugins: ['dnsname', 'typeoid', 'modules', 'bridge', 'interfaces', 'dot1q', 'ciscovlan', 'extremevlan', 'prefix', 'lastupdated'] 2011-11-15 17:51:59,676 [DEBUG jobs.jobhandler] [inventory sw01-3750.] AgentProxy created for sw01-3750.: <nav.ipdevpoll.snmp.pynetsnmp.AgentProxy instance at 0x213df38> 2011-11-15 17:51:59,676 [DEBUG jobs.jobhandler] [inventory sw01-3750.] Plugins to call: DnsName,TypeOid,Modules,Bridge,Interfaces,Dot1q,CiscoVlan,ExtremeVlan,Prefix,LastUpdated 2011-11-15 17:51:59,676 [INFO jobs.jobhandler] [inventory sw01-3750.] Starting job 'inventory' for sw01-3750. 2011-11-15 17:51:59,676 [DEBUG jobs.jobhandler] [inventory sw01-3750.] Now calling plugin: nav.ipdevpoll.plugins.dnsname.DnsName(u'sw01-3750.') 2011-11-15 17:51:59,720 [DEBUG jobs.jobhandler] [inventory sw01-3750.] Now calling plugin: nav.ipdevpoll.plugins.typeoid.TypeOid(u'sw01-3750.') 2011-11-15 17:51:59,754 [DEBUG jobs.jobhandler] [profiling sw01-3750.] Job 'profiling' initialized with plugins: ['oidprofiler'] 2011-11-15 17:51:59,754 [DEBUG jobs.jobhandler] [profiling sw01-3750.] AgentProxy created for sw01-3750.: <nav.ipdevpoll.snmp.pynetsnmp.AgentProxy instance at 0x213f830> 2011-11-15 17:51:59,754 [DEBUG jobs.jobhandler] [profiling sw01-3750.] Plugin oidprofiler wouldn't handle sw01-3750. 2011-11-15 17:51:59,755 [DEBUG jobs.jobhandler] [profiling sw01-3750.] No plugins for this job 2011-11-15 17:51:59,789 [DEBUG jobs.jobhandler] [logging sw01-3750.] Job 'logging' initialized with plugins: ['arp'] 2011-11-15 17:51:59,790 [DEBUG jobs.jobhandler] [logging sw01-3750.] AgentProxy created for sw01-3750.: <nav.ipdevpoll.snmp.pynetsnmp.AgentProxy instance at 0x22ab6c8> 2011-11-15 17:51:59,790 [DEBUG jobs.jobhandler] [logging sw01-3750.] Plugins to call: Arp 2011-11-15 17:51:59,790 [INFO jobs.jobhandler] [logging sw01-3750.] Starting job 'logging' for sw01-3750. 2011-11-15 17:51:59,790 [DEBUG jobs.jobhandler] [logging sw01-3750.] Now calling plugin: nav.ipdevpoll.plugins.arp.Arp(u'sw01-3750.') 2011-11-15 17:51:59,968 [DEBUG jobs.jobhandler] [inventory sw01-3750.] Now calling plugin: nav.ipdevpoll.plugins.modules.Modules(u'sw01-3750.') 2011-11-15 17:52:01,613 [DEBUG jobs.jobhandler.queue] [logging sw01-3750.] [(36345616, Netbox(id=17, sysname=u'sw01-3750.'))] 2011-11-15 17:52:01,618 [DEBUG jobs.jobhandler.queue] [logging sw01-3750.] containers after save: ContainerRepository({<class 'nav.ipdevpoll.shadows.Netbox'>: {None: Netbox(id=17, sysname=u'sw01-3750.')}}) 2011-11-15 17:52:01,618 [DEBUG jobs.jobhandler] [logging sw01-3750.] Storing to database complete (4.845 ms) 2011-11-15 17:52:01,618 [DEBUG jobs.jobhandler] [logging sw01-3750.] Running cleanup routines for 1 classes ([<class 'nav.ipdevpoll.shadows.Netbox'>]) 2011-11-15 17:52:01,698 [INFO jobs.jobhandler] [logging sw01-3750.] Job logging for sw01-3750. done. 2011-11-15 17:52:01,698 [DEBUG jobs.jobhandler.timings] [logging sw01-3750.] Job 'logging' timings for sw01-3750.: 2011-11-15 17:52:05,384 [DEBUG jobs.jobhandler] [inventory sw01-3750.] Now calling plugin: nav.ipdevpoll.plugins.bridge.Bridge(u'sw01-3750.') 2011-11-15 17:52:12,867 [DEBUG jobs.jobhandler] [inventory sw01-3750.] Plugin nav.ipdevpoll.plugins.bridge.Bridge(u'sw01-3750.') reported a timeout 2011-11-15 17:52:12,867 [DEBUG jobs.jobhandler.timings] [inventory sw01-3750.] Job 'inventory' timings for sw01-3750.: 2011-11-15 17:52:12,868 [ERROR jobs.jobhandler] [inventory sw01-3750.] Job 'inventory' for sw01-3750. aborted: User timeout caused connection failure.
SNMP is configured properly on the 3750. What do you think?
On Tue, Nov 15, 2011 at 2:30 AM, Morten Brekkevold < morten.brekkevold@uninett.no> wrote:
On Mon, 14 Nov 2011 11:40:45 -0600 Greg Cooper gwc2004@gmail.com wrote:
I'm having the same issue. I'm a consultant, and I have several
customers
who want a great network management/monitoring tool. I recommend NAV to them, since I've used it for a long time.
Glad to hear that :-)
However, they almost all have 3750 stacks. At this one particular customer, I'm not getting really any inventory info from ipdevpoll when
it
talks to a 3750 stack. I see:
2011-11-14 18:31:10,711 [ERROR jobs.jobhandler] [inventory sw01-3750] Job 'inventory' for evv-sw01-3750.brake.local aborted: User timeout caused connection failure.
This is an SNMP timeout. It means your device may be too slow to answer some SNMP requests, or it didn't send an answer at all.
Now, I've been gone a while, but IIRC, there may be some issue with timeouts in the bridge plugin when working with Cisco devices (or when it thinks it's working with Cisco devices), so it would be helpful if you configured debug logging so we could see which plugin reported the timeout.
If you don't have the logging.conf file (not logger.conf) in your NAV config directory, please add it and insert the following two lines:
[levels] nav.ipdevpoll.jobs = DEBUG
Then restart ipdevpoll.
-- Morten Brekkevold UNINETT
On Tue, 15 Nov 2011 10:58:20 -0600 Greg Cooper gwc2004@gmail.com wrote:
Yes, it was the bridge plugin. It seems to time out in about 4 seconds... Here's some debug output:
[snip]
2011-11-15 17:52:12,867 [DEBUG jobs.jobhandler] [inventory sw01-3750.] Plugin nav.ipdevpoll.plugins.bridge.Bridge(u'sw01-3750.') reported a timeout
[snip]
SNMP is configured properly on the 3750. What do you think?
The Bridge plugin tries to figure out which interfaces on a device are switch ports. Cisco complicates matters by using alternate BRIDGE-MIB instances per. VLAN, making community indexing necessary.
I suspect the plugin sometimes might query with a modified community that the switch doesn't respond to, causing a timeout. Unfortunately, that derails the entire collection job. I would consider making the plugin ignore timeouts and continue to the next query.
On the other hand, your 3750 may just be plain slow at responding to SNMP queries, as indicated by one of your other posts. Unfortunately, ipdevpoll has no mechanism to adjust timeout values individually.
There may be two locations in the code where you could patch NAV to increase the timeout on snmpwalks for all devices; if you care to try, I can write a quick patch.
Hi Morten, I'm still very much in the learning phase of SNMP so I couldn't really make any suggestions on any models. I do however have four 3750G-48TS switches which I'd happily wire together for the purpose of testing any relevant patches.
cheers, Seb.
On 14/11/11 10:40, Morten Brekkevold wrote:
On Tue, 04 Oct 2011 16:07:24 +0100 Seb Rupiksnr@ecs.soton.ac.uk wrote:
Hi all, I have been tasked with creating a new NAV (3.9.1) install. All appears to be fine until I compare it against the previous old (3.5.4) installation. The majoirty of our switches are Cisco 3750's nearly all stacked, but the new install doesn't seem to acknowledge this under 'IP Device Info' where the Modules' field shows '0', but the 'switch port status' table shows all interfaces spanning the stack. On the old install the 'Modules' field has the correct number and the 'Switch port status' shows the right number of separate modules.
Looking under: 'Report-> Vendors -> Cisco' confirms that the new install is receiving less SNMP OIDs per switch stack; typically 75. Where as the old install is showing between 88-90 for the same stacks. Also in some cases the new install is not showing any returns under #mod, #swp, #gwp and #prefixes, where is does on the old install.
Both installs are running libsnmp1.4-java; the old 1.4-5, and the new install 1.4.2-2. Would that account for the differences? What's the best way to test the two installs side by side in an output based comparison?
No. The collection system and central parts of NAV's data model changed from version 3.5 to 3.6.
It is likely what you're seeing is an effect of NAV no longer using Cisco-proprietary MIBs for discovering "modules", but the IETF-standard ENTITY-MIB (RFC 4133).
I haven't much experience with stacked 3750's, but barring use of proprietary MIBs, such a stack would present itself as a single device SNMP-wise. I'm guessing that your 3750's will list multiple chassis in the ENTITY-MIB::entPhysicalTable.
NAV currently has no way to model multiple stacked chassis, which could in principle each have multiple internal modules.
There are a myriad of ways to stack and/or aggregate multiple devices, all depending on vendor and model. Some stacking methods involve aggregating multiple simple devices into a single superdevice, while other methods let each stack member retain their autonomy while channeling all management through a single IP address on a master device.
There is very little stacking going on in UNINETT's customer base, which means that for us there is little demand for supporting this, and there's little equipment for us to test against.
If you have ideas on how to model the various kinds of stacking paradigms that exist, and/or are willing/able to provide some sort of feedback or testbed for development (or patches, for that matter), we're all ears. I would suggest the nav-dev mailing list for technical discussions on the matter.
On Thu, 17 Nov 2011 08:18:39 +0000 Seb Rupik snr@ecs.soton.ac.uk wrote:
Hi Morten, I'm still very much in the learning phase of SNMP so I couldn't really make any suggestions on any models. I do however have four 3750G-48TS switches which I'd happily wire together for the purpose of testing any relevant patches.
Thanks, Seb, that could come in handy; however, I currently have several ideas that could help solve the apparently 3750-related timeout issues, while I have no good idea how to model the various stacking paradigms that are in the wild.
If you feel that 3750 stack members that don't appear as modules in NAV are more than just a cosmetic problem, I would really like some suggestions or a discussion on how to model things, before we blindly make some patch.
Warning: Brain dump follows.
NAV used to have a terrible data model for HP virtual stacks, which we entirely ripped out of the code at one point. An HP virtual stack is basically just a bunch of autonomous switches which take up only a single IP address for management - one still needs to communicate with each stack member individually through the stack commander as a proxy, by using a modified community for each member.
My best suggestion for modeling those (though we never implemented support for those again) was automatically adding the stack members as separate IP devices sharing the stack commander's IP, but with modified communities. That wouldn't necessitate changing NAV's code all over the place, which was necessary under the old model. This still left the problem of presentation, as network operators still wanted to display all the ports of a stack on a single NAV page.
I guess Cisco clustering works much the same way as HP virtual stacking, and could be modeled the same way.
Although I have no experience with it, stacking of 3750's and Cisco VSS sound like they're basically the same thing model-wise: Via SNMP, a stack and a VSS will just look like a big switch with a lot of ports.
The 3750 stack may be simple to model. I don't know the hardware very well, but I'm guessing the 3750's aren't modular, so the switch ports in each will appear to be part of the chassis. Thus, the 3750 stack members could probably be modeled as modules in NAV.
But a VSS could consist of several modular switches. I'm guessing NAV would find the slot modules of each stack member and say that they're all modules in the same big box, but NAV wouldn't know about the multiple chassis, and has no good way of modeling that two-level hierarchy.
At the moment, the NAV model basically says that "a Netbox/IP Device = A chassis" and "a module = some replaceable sub-unit inside a chassis". Should we add an extra chassis-level, and say that the netbox/IP device is just a model of the details needed to communicate with some network host?