Hei,
Takk for et glitrende program! Jeg har testkjørt dette hos oss i et halvårs tid, men sliter med et alvorlig og et mindre alvorlig problem:
1: Cisco ASA. Vi bruker denne serien ganske mye og i de fleste tilfeller fungerer NAV bra. Mot bokser som har 10-30 VLAN ser det ut til å gå som det skal. På bokser med mange VLAN (128+) får vi derimot en feil på SNMP-tasken hvert kvarter:
2013-05-02T11:27:18.100301+02:00 XXX <164>May 02 2013 11:27:18: %ASA-4-711004: Task ran for 150 msec, Process = snmp, PC = 8c4b7c8, Call stack = 2013-05-02T11:27:18.100301+02:00 XXX <164>May 02 2013 11:27:18: %ASA-4-711004: Task ran for 150 msec, Process = snmp, PC = 8c4b7c8, Call stack = 0x08B70783 0x08B50B1D 0x08B4F87C 0x08063B63 2013-05-02T11:27:18.100311+02:00 XXX <164>May 02 2013 11:27:18: %ASA-4-711004: Task ran for 152 msec, Process = snmp, PC = 8b77990, Call stack = 2013-05-02T11:27:18.100311+02:00 XXX <164>May 02 2013 11:27:18: %ASA-4-711004: Task ran for 152 msec, Process = snmp, PC = 8b77990, Call stack = 0x08B77990 0x08B76699 0x08B734A9 0x08B75F48 0x08B50E0E 0x08B4F87C 0x08063B63 2013-05-02T11:27:23.113341+02:00 XXX <164>May 02 2013 11:27:23: %ASA-4-711004: Task ran for 150 msec, Process = snmp, PC = 8b7772e, Call stack = 2013-05-02T11:27:23.113417+02:00 XXX <164>May 02 2013 11:27:23: %ASA-4-711004: Task ran for 150 msec, Process = snmp, PC = 8b7772e, Call stack = 0x08B7772E 0x08B7645B 0x08B734A9 0x08B75F48 0x08B50E0E 0x08B4F87C 0x08063B63
Melding ASA-4-711004 gir beskjed om at en task er et "CPU Hog", noe som kan få alvorlige konsekvenser tradisjonelle ASA-bokser (pre-SMP). Når dette skjer, vil nodene i ASA-klyngen oppleve pakketap og vil nodene vil kunne miste kontakten med hverandre både på inn og utsiden. Det siste ser ut til å være avhengig av den øvrige CPU-belastningen i øyeblikket.
2013-05-02T11:27:28.147580+02:00 XXX <161>May 02 2013 11:27:28: %ASA-1-105005: (Secondary) Lost Failover communications with mate on interface YYY 2013-05-02T11:27:28.147628+02:00 XXX <161>May 02 2013 11:27:28: %ASA-1-105008: (Secondary) Testing Interface YYY 2013-05-02T11:27:28.349692+02:00 XXX <161>May 02 2013 11:27:28: %ASA-1-105009: (Secondary) Testing on interface YYY Passed 2013-05-02T11:27:33.263707+02:00 XXX <161>May 02 2013 11:27:33: %ASA-1-105005: (Secondary) Lost Failover communications with mate on interface outside 2013-05-02T11:27:33.416249+02:00 XXX <161>May 02 2013 11:27:33: %ASA-1-105008: (Secondary) Testing Interface outside 2013-05-02T11:27:33.726430+02:00 XXX <161>May 02 2013 11:27:33: %ASA-1-105009: (Secondary) Testing on interface outside Passed
Er dette et kjent problem? Mulig det er tilfeldig, men jeg fikk samme problem når jeg forsøkte noen SNMP bulk-transfers.
2: Cisco switcher. Her får vi en feilmelding fra samtlige switcher hvert kvarter. Meldingen er noe obskur:
"2013-05-01T00:04:36.711253+02:00 ZZZ <132>266532: 147851: May 1 00:04:36.638 MEST: %BIT-4-OUTOFRANGE: bit 0 is not in the expected range of 1 to 4094"
med hilsen,
-- Inge Arnesen
On Thu, 2 May 2013 12:42:03 +0000 Inge Bjørnvall Arnesen inge@basefarm.no wrote:
Hei,
Takk for et glitrende program!
You're very welcome, and thank you for your kind words, though I would like to remind you that English is the official language of this mailing list. I will respond accordingly.
Jeg har testkjørt dette hos oss i et halvårs tid, men sliter med et alvorlig og et mindre alvorlig problem:
1: Cisco ASA. Vi bruker denne serien ganske mye og i de fleste tilfeller fungerer NAV bra. Mot bokser som har 10-30 VLAN ser det ut til å gå som det skal. På bokser med mange VLAN (128+) får vi derimot en feil på SNMP-tasken hvert kvarter:
This may correspond with the 15 minute interval of the ipdevpoll topo job, which collects CDP/LLDP and the forwarding tables of switches.
Er dette et kjent problem? Mulig det er tilfeldig, men jeg fikk samme problem når jeg forsøkte noen SNMP bulk-transfers.
I don't have any direct experience with Cisco ASA devices, but I do know that some devices have problems breathing when SNMP bulk requests are used. On Cisco, there is also the issue that we need to use multiple SNMP sessions due the way Cisco handles VLAN information using multiple VLAN-indexed communities.
ipdevpoll uses a default max-repetitions value of 50 for bulk requests, which is known to cause problems with some devices. You might want to try to reduce this to a much lower number in `ipdevpoll.conf`. We've reduced this to 10 for many of our customers; for one of them we've even had to reduce it further.
We're considering extending the config file to allow changing the snmp parameters for individual devices, so you don't need to run with a low max-repetitions for all your devices, just because of the one or two problem devices.
2: Cisco switcher. Her får vi en feilmelding fra samtlige switcher hvert kvarter. Meldingen er noe obskur:
"2013-05-01T00:04:36.711253+02:00 ZZZ <132>266532: 147851: May 1 00:04:36.638 MEST: %BIT-4-OUTOFRANGE: bit 0 is not in the expected range of 1 to 4094"
I have no idea what this means. A bit has the entire range of 0 to 1, so 4094 is out of the question :)
The numbers themselves make it look like a VLAN related issue, but it still isn't very informative as to what's wrong. I'd ask Cisco.