Hi,
On 19.06.13 12:54, "Morten Brekkevold" morten.brekkevold@uninett.no wrote:
On Tue, 18 Jun 2013 10:25:25 +0000 Mischa Diehm mischa.diehm@unibas.ch wrote:
Hi,
Hi Mischa!
first of all welcome to the list!
Uh, thank you? Welcome yourself :)
A problem we have since a while is that our system is permanently under a lot of load (mostly CPU bound) and we haven't really found a way to reduce the pressure. The hardware we use is a hp blade (ProLiant BL460c G6) with:
At the moment we have 1460 active Devices (mainly Cisco Switches). Around 30 or 40 are OVERDUE in: https://urz-nav/report/lastupdated
Are all jobs overdue for these devices, or just some of the jobs? Does
As far as I can see it's mainly inventory and topo jobs which are overdue.
NAV consider the devices to be reachable and responding to SNMP requests? Does `ipdevpoll.conf` indicate that the jobs are failing due to errors, or just that they are delayed or time out?
Devices are marked up and and snmp_status = ok. I don't understand what you mean by the last sentence.
So my question is, do you have any good experience with HW-systems that are actually dealing with this amount of devices or is there any tuning possibility (without losing functionality) we could try to reduce the pressure on the system?
At the moment, the closest I have access to is a system monitoring 882 devices, but it still isn't in full production mode (meaning, they still have more devices to add). The load number of the system varies wildly with which collection jobs are running at any given moment. They might be seen as high numbers, but the system has 4 cores (with hyperthreading enabled), so the load average is mostly less than the number of cores.
Yes indeed. The average on our machine is ok but there are ongoing peeks with very high load.
This system is a HP DL360p Gen-8 server, with 12GB RAM and 4x600GB SAS 10K drives mounted in a hardware RAID 1+0 configuration.
We will very soon be migrating PostgreSQL off this server and onto a dedicated server with identical specifications, specifically to alleviate some of the load issues we are experiencing.
ok. So we will be thinking about doing the same thing if that is what is necessary to return to more sane loads.
ipdevpoll is currently running in its "experimental" multiprocess-mode on this system, which means each of the configured jobs in `ipdevpoll.conf` get their own dedicated process (which improves things on multicore systems). This can be achieved on a more permanent basis by adding the "-m" switch to the ipdevpoll command in the `/etc/nav/init.d/ipdevpoll` script.
adding the -m switch seems not so easy here. The problem is the way daemon() is written and su - xxx -c $CMD is working. How do I add the -m Switch so it's actually using it (root is using bash on this system):
Debug output when starting with the init script:
Starting ipdevpoll: + daemon 'su - navcron -c /usr/lib/nav/ipdevpolld -m' + su - navcron -c /usr/lib/nav/ipdevpolld -m
This way the -m is never executed... I couldn't find a way to integrate -m without changing the nav code.
We will be using this system for testing performance optimizations to ipdevpoll once we migrate PostgreSQL to a dedicated server. I can post our findings here once we get there, but that probably won't be until August, as I'll be offline most of July.
That would still be very much appreciated.
Cheers, Mischa
-- Morten Brekkevold UNINETT