On Tue, 18 Jun 2013 10:25:25 +0000 Mischa Diehm mischa.diehm@unibas.ch wrote:
Hi,
Hi Mischa!
first of all welcome to the list!
Uh, thank you? Welcome yourself :)
A problem we have since a while is that our system is permanently under a lot of load (mostly CPU bound) and we haven't really found a way to reduce the pressure. The hardware we use is a hp blade (ProLiant BL460c G6) with:
At the moment we have 1460 active Devices (mainly Cisco Switches). Around 30 or 40 are OVERDUE in: https://urz-nav/report/lastupdated
Are all jobs overdue for these devices, or just some of the jobs? Does NAV consider the devices to be reachable and responding to SNMP requests? Does `ipdevpoll.conf` indicate that the jobs are failing due to errors, or just that they are delayed or time out?
So my question is, do you have any good experience with HW-systems that are actually dealing with this amount of devices or is there any tuning possibility (without losing functionality) we could try to reduce the pressure on the system?
At the moment, the closest I have access to is a system monitoring 882 devices, but it still isn't in full production mode (meaning, they still have more devices to add). The load number of the system varies wildly with which collection jobs are running at any given moment. They might be seen as high numbers, but the system has 4 cores (with hyperthreading enabled), so the load average is mostly less than the number of cores.
This system is a HP DL360p Gen-8 server, with 12GB RAM and 4x600GB SAS 10K drives mounted in a hardware RAID 1+0 configuration.
We will very soon be migrating PostgreSQL off this server and onto a dedicated server with identical specifications, specifically to alleviate some of the load issues we are experiencing.
ipdevpoll is currently running in its "experimental" multiprocess-mode on this system, which means each of the configured jobs in `ipdevpoll.conf` get their own dedicated process (which improves things on multicore systems). This can be achieved on a more permanent basis by adding the "-m" switch to the ipdevpoll command in the `/etc/nav/init.d/ipdevpoll` script.
We will be using this system for testing performance optimizations to ipdevpoll once we migrate PostgreSQL to a dedicated server. I can post our findings here once we get there, but that probably won't be until August, as I'll be offline most of July.