On Tue, 26 Mar 2019 07:12:58 +0100 ken.livesey@subzero.com wrote:
Greetings,
I am looking for assistance in determining respective NAV VM system parameters for managing 800-1000 Network devices across two data centers and 30 sites. Most of the network devices are Cisco and all sites are connected with high- speed and low latent connections. Please let me know if any additional information would be helpful.
Hi Ken,
once you get to that size, I would normally recommend splitting up your NAV installation into either two or three servers - which I think would work equally well for VMs, as you will have the flexibility of provisioning the resources where they are needed.
Some thoughts:
1. Split lines: The NAV, PostgreSQL and Graphite components are easily split into separate servers. If you are already up and running on one server, the simplest and most effective split is to move PostgreSQL to a dedicated server.
2. All our new servers are provisioned with 32GB of RAM these days, but they are normally intended to run all the components on a single server. If splitting, 16GB ought to be more than enough per server. You could even get away with less. If separating Graphite to a server of its own, it would probably need the least amount of RAM.
3. Storage I/O performance: It becomes crucial to have very high I/O throughput, especially for Graphite, which is really write-heavy. We _always_ deploy Graphite storage on SSD. It's not always necessary, but with the number of nodes you'll be monitoring, it will be.
4. Storage space: Compared to the smallest drives we can buy today, NAV uses very little storage. Some numbers: An installation having monitored ~500 nodes for several years now consumes approx. 47G for Graphite storage, and 40G for PostgreSQL storage. Same numbers for an installation monitoring ~800 nodes: PostgreSQL: 152G, Graphite: 67G. Graphite's usage is normally fixed and grows linearly with the number of metrics. PostgreSQL grows over time, as events are logged, but normally you would run navclean regularly to throw out old ARP/CAM records (often depending on privacy laws in your locale), which take up most of the space.
5. Assuming you've provisioned NAV with multiple cores, configure ipdevpoll to run in multiprocess mode - one process per core: https://nav.uninett.no/doc/latest/reference/ipdevpoll.html#multiprocess-mode