Hi all,
I've been trying NAV out for the third time now, as I am experimenting with several similar tools. First of all, I really like NAV and it is the tool that shows the most potential from all I tried, in terms of network monitoring.
BUT (there is always a but in theses emails), I am having trouble getting thresholds and their respective alerts to work. I have really kept my installation to the simples possible, so at the moment I only have one device and I have only setup one single threshold, purposefully so that it immediately is exceeded. But after I do that, there is no event and it is as though the threshold has not been exceeded.
When I look into the thresholdmon.log this is what I see repeatedly: 2016-09-21 14:40:01,992 [INFO nav.thresholdmon] evaluating 1 rules 2016-09-21 14:40:02,513 [WARNING nav.thresholdmon] did not find any matching values for rule u'nav.devices.XXXXXXXXX.ports.Bridge-Aggregation1.ifInOctets' <50% 2016-09-21 14:40:02,513 [INFO nav.thresholdmon] done
What does this mean? This interface has traffic and that traffic is beeing graphed without issues...
So my question is: what am I missing? This is supposed to be very simple to configure but apparently does not work. Am I doing some basic error?
Thank you for your attention. Best regards, Óscar
On Wed, 21 Sep 2016 13:49:37 +0000 "Patricio, Oscar (Coriant - PT/Lisbon)" oscar.patricio@coriant.com wrote:
Thanks for checking out NAV, Óscar :)
This isn't necessarily simple for interface counters, as there are several timing issues involved.
My guess is that you have not specified a period value for your threshold rule, which means it defaults to a value of 5 minutes.
Interface counters are only collected every 5 minutes (as opposed to sensor values, which are collected every 60 seconds), so at least two such data points are required to calculate a traffic rate, which means a minimum period of 10 minutes to get an actual value.
Then there is another complexity issue: Data points aren't necessarily inserted at exact 5 minute intervals. There may be delays of several seconds, meaning that if you are unlucky enough to take a reading just between the expected time of insertion and the actual time of insertion, there will be no data there yet - so you may fail to calculate a rate, even for the last 10 minute period. This is why the form hint for the "Period" field recommends using a 15m period for interface counters.
Incidentally, this last complexity issue is probably also why you keep seeing the "did not find any matching values" in the logs. Thresholdmon tries to get the counter values for that last 5 minutes, but this single data point hasn't been inserted yet - it is likely inserted just after thresholdmon checks (as thresholdmon is run in _exact_ 5 minute intervals by crond).