Hi,
my graphs are getting white spaces and I'm trying to tune ipdevpolld:
Running with no tuning at all about 80% of the graph is white on every port/device.
If I restart ipdevpolld manually with -m most of my graphs are ok, occasionally I get white spaces, lets say around 1-2% of the last day.
First of all, is there a way to run ipdevpolld with multiprocess without doing a manual start of the process? (I would expect something in /etc/default/nav.)
Besides running multiprocess, what else is usefull? I've tried incereasing max-repetitions without much luck. I do have some timeout issues on satellite links so I've increased this to 4.5.
I initially thought this was a graphite problem, but I've heavilly tuned my graphite-host and there seems to be no problems there, running carbon-relay and four carbon-cache and from what I can gather they are not hit hard at all.
I have about 46k ports and 1500 devices, one dedicated server for graphite, one for postgresql and one for gui, ipdevpolld and other nav processes.
Any hints?
Cheers, -Sigurd
On Thu, 25 Sep 2014 15:11:15 +0200 Sigurd Mytting sigurd@mytting.no wrote:
my graphs are getting white spaces and I'm trying to tune ipdevpolld:
Running with no tuning at all about 80% of the graph is white on every port/device.
If I restart ipdevpolld manually with -m most of my graphs are ok, occasionally I get white spaces, lets say around 1-2% of the last day.
First of all, is there a way to run ipdevpolld with multiprocess without doing a manual start of the process? (I would expect something in /etc/default/nav.)
The multiprocess mode has never been officially taken out of "experimental" status, so there is no "normal" way to configure it as the default.
Usually, we have set it as the default mode on individual installations by modifying the startup arguments in `/etc/nav/init.d/ipdevpoll` .
Besides running multiprocess, what else is usefull? I've tried incereasing max-repetitions without much luck. I do have some timeout issues on satellite links so I've increased this to 4.5.
Please don't increase max-repetitions, that will likely only exacerbate your problems! Poorly implemented SNMP agents on some devices have a hard time assembling large bulk responses, and will respond very slowly to bulk requests with high max-repetition numbers. You should probably rather decrease it to a value of 10 or less.
I have about 46k ports and 1500 devices, one dedicated server for graphite, one for postgresql and one for gui, ipdevpolld and other nav processes.
Any hints?
I would use either multiprocess mode (or possibly limit the max_concurrent_jobs in `ipdevpoll.conf`), combined with lowering max-repetitions.