On Fri, Dec 10, 2010 at 12:17:34AM +0100, Sigurd Mytting wrote:
Second; Anyone have a good tip on how to tune Cricket to run a sensible number of jobs to get thru it all in a sensible amount of time? Currently, if the largest job (switch interfaces) don't die after only a few minutes it usually runs for 25-30 minutes.
Got a tip of running 50ish devices per Cricket-collector, this seems to both stop the collector from dying and lets Cricket finish before it's next run.
Glad to see your problem was solved. I'll still add a few comments.
I've no experience with Cricket consistently crashing and leaving its lock files behind. I do have experience with Cricket going bananas and eating all available RAM and running forever until its config tree is recompiled.
If you set your cricket to log debug info, can you glean any idea of why it crashes from the logs?
Also, when NTNU originally added the Cricket integration to NAV, they were completely unable to collect traffic statistics from all their access ports - there was just too many of them to complete rounds in anything remotely close to five minutes.
They decided not to collect stats for access ports, and that is how the EDGE category was born. An EDGE device is the same as an SW device, except that no Cricket configuration is generated for switch-ports on EDGE devices.
I do remember talk of someone who wrote a program to automatically split the Cricket config tree into sizable chunks so that multiple collectors could run in parallel and complete collection in a timely manner. There's also the issue of optimizing RRD writes, which can be horribly inefficient when attempting to scale up.