On Mon, 24 Jun 2013 07:22:58 +0000 Mischa Diehm mischa.diehm@unibas.ch wrote:
Are all jobs overdue for these devices, or just some of the jobs? Does
As far as I can see it's mainly inventory and topo jobs which are overdue.
Those are the heaviest jobs, so that's not surprising, given the circumstances.
NAV consider the devices to be reachable and responding to SNMP requests? Does `ipdevpoll.conf` indicate that the jobs are failing due to errors, or just that they are delayed or time out?
Devices are marked up and and snmp_status = ok. I don't understand what you mean by the last sentence.
I'm sorry, there's a typo there. I meant the log file `ipdevpoll.log`. Please check it for error messages; are jobs failing due errors or timeouts, or are they just running very long. If you send a SIGUSR1 signal to the ipdevpoll daemon process, it will log a list of currently running jobs and their current runtimes.
We will very soon be migrating PostgreSQL off this server and onto a dedicated server with identical specifications, specifically to alleviate some of the load issues we are experiencing.
ok. So we will be thinking about doing the same thing if that is what is necessary to return to more sane loads.
The first step to any scale-out operation involving a database is to give the database software dedicated hardware; it's also the fastest/easiest action to take (and as always: Adding more hardware is almost always cheaper than the man-hours required to optimize software).
ipdevpoll is currently running in its "experimental" multiprocess-mode on this system, which means each of the configured jobs in `ipdevpoll.conf` get their own dedicated process (which improves things on multicore systems). This can be achieved on a more permanent basis by adding the "-m" switch to the ipdevpoll command in the `/etc/nav/init.d/ipdevpoll` script.
adding the -m switch seems not so easy here. The problem is the way daemon() is written and su - xxx -c $CMD is working. How do I add the -m Switch so it's actually using it
You are correct, I checked up on what we did on this specific server, and we extracted the following separate bash function in `/etc/nav/init.d/ipdevpoll`:
runit() { su - ${user} -c "${IPDEVPOLLD} -m" }
and replaced the daemon call under the start section with a call to "daemon runit". We'll be looking at different ways of implementing a multiprocess mode; whatever we come up with will probably use a config file option instead.
ipdevpoll once we migrate PostgreSQL to a dedicated server. I can post our findings here once we get there, but that probably won't be until August, as I'll be offline most of July.
That would still be very much appreciated.
Making a note of that, then :)