Re: Tuning av ipdevpoll

24 Feb 2017


      On Fri, 24 Feb 2017 08:15:50 +0100 Ingeborg Hellemo ingeborg.hellemo@uit.no wrote:
...
Jeg håper optimalisering av ipdevpoll står på lista over oppgaver.
Hi Ingeborg,
may I remind you that the preferred language on our mailing lists is
still English? :-)
Some work on ipdevpoll is definitely on the list, even 3rd on the
nav-ref list (which YOU have voted on):
https://nav.uninett.no/wiki/nav-ref:nav-ref-arbeidsliste
I.e., reworking the multiprocess model is registered here:
https://github.com/UNINETT/nav/issues/1174
And, in case you don't see it, a pull request for this has already been
accepted (#1422). Sigmund gladly took up the task as he was learning his
way around the NAV code recently.
So, in the 4.7 release, you can specify the number of worker processes,
and jobs will be assigned in a round-robin fashion to free workers.
Things like killing workers after a set number of jobs, or spawning
remote workers is not implemented yet.
...
Vi kjører i dag 'ipdevpoll -m', altså 8 parallelle jobber som alle har 10 
forbindelser åpne mot databasen.
[snip]
...
I fjor på denne tida hadde jeg et problem med at enkeltspørringer mot
bokser tok for lang tid (mer enn ett minutt). Dette løste vi ved å
justere ned max_concurrent_jobs til 20 siden et (mye) større tall
gjorde at enkeltjobbene blei stående å henge på manglende forbindelse
mot databasen.
Det jeg så starten på i da, men som er blitt meir og meir uttalt i
løpet av året er at spesielt 1minstats ikke greier å snurre raskt nok.
Grep(1) i loggen viser at vi i løpet av ett minutt greier å trøkke
gjennom mellom 170-200 jobber. Når vi har 645 (GW,SW,GSW) sier det seg
selv at dette ikke funker. Vi får hullete grafer.
If you are unable to scale 1minstats through other means (including the
new, upcoming multiprocess model), you might want to consider whether
you need those stats every minute. I do believe NTNU have moved several
of the plugins from the 1minstats job to the 5minstats job (though, this
requires a schema resize for the corresponding Whisper files that are
storing these stats).
...
Hva kan gjøres? Jeg tror ikke utfordringa ligger i CPU/minne, men i
selve implementasjonen.
It is quite difficult to get useful metrics on how much time was spent
in waiting for free DB connections, waiting for actual PostgreSQL
responses, waiting for SNMP responses, or how much time was just spent
running Python code.
Some rudimentary metrics can be had. If you set DEBUG level logging for
`nav.ipdevpoll.jobs.jobhandler_timings`, each job will log a summary of
time spent in each plugin, and how much was spent in overhead (i.e.
updating the database at the end of the job). There is, however, no
separate metric for how much time was spent talking to the DB inside the
plugins (normally, all the DB access is before or after the job, not
inside plugins).
There are also issues like this one:
https://github.com/UNINETT/nav/issues/1403
Some SQL statements may take a longer time to complete because of
locking in the database, and this potentially gets worse as the number
of parallel connections increases.
...
Dersom løsninga er å øke antall oppkoplinger mot databasen må man
samtidig gjøre det mulig å tune dette per jobb. De mindre jobbene som
f.eks. topo, dns, ip2mac trenger strengt tatt ikke å kjøre som
separate jobber og i allefall ikke å ta opp masse ressurser mot
databasen.
Tuning per job is a moot point in 4.7, so this will not be an issue by
that time.
It would be interesting, however, if we are able to log more of the
potential metrics mentioned above, like the time spent just waiting for
free DB threads. Those kinds of numbers might help you decide on tuning
parameters.
-- 
Morten Brekkevold
UNINETT

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: Tuning av ipdevpoll