Hi list
In my search for the missing link, I'm looking for a clue stick as to where my interface graphs disappear. I have installed Nav version 4.8.3 on Debian Stretch and currently monitoring two HP switches, ProCurve 2810 and 2910.
The problem I have is in the "ranked statistics"-page, there is no interface traffic data in the graphs - instead I get a message: "No data was found for the given timeframe. This may indicate a problem with the data collection and/or presentation. Try choosing a longer timeframe to see when the problem started."
But I do get graphs if I choose the "CPU highest average" - so SNMP apparently is working fine. When I access Graphite directly on port 8000 and traverse the tree Metrics-> nav-> devices-> ip-> port-> ifinOctets I am able to see a graph for that interface.
What link between Nav and Graphite am I missing?
Curiously I tried the shrink wrapped image on a VirtualBox and had the opposite problem. Monitoring another Procurve 2910 I only got traffic graphs and no "CPU highest average"-graphs.
Kind regards Søren Aurehøj - Fab:IT
On Thu, 1 Mar 2018 08:16:49 +0100 soren@fab-it.dk wrote:
"No data was found for the given timeframe. This may indicate a problem with the data collection and/or presentation. Try choosing a longer timeframe to see when the problem started."
But I do get graphs if I choose the "CPU highest average" - so SNMP apparently is working fine.
When I access Graphite directly on port 8000 and traverse the tree Metrics-> nav-> devices-> ip-> port-> ifinOctets I am able to see a graph for that interface.
But are you able to browse the graphs for individual interfaces using NAV? How about the system/cpu graphs for individual devices?
From what you're saying, it sounds like your problem is only with the
ranked statistics tool itself, not with data collection.
I'm curious as to whether this is the message displayed by Ranked statistics if the Graphite request times out before producing a response (I'm not sure).
Ranked statistics offloads all its work to graphite-web, but while graphite has an excellent query language for producing this kind of ranked data and/or graphs, it is sadly not very performant. If you have 10.000 interfaces, it needs to chew through the data of 10.000 whisper files, and since there is no indexing, it's going to take a long time. Sometimes, so much that something will eventually time out while waiting for a response from the graphite-web backend.
Hi Morten
Thank you for your reply.
Den 1. mar. 2018 kl. 08.55 skrev Morten Brekkevold morten.brekkevold@uninett.no:
On Thu, 1 Mar 2018 08:16:49 +0100 soren@fab-it.dk wrote:
"No data was found for the given timeframe. This may indicate a problem with the data collection and/or presentation. Try choosing a longer timeframe to see when the problem started."
But I do get graphs if I choose the "CPU highest average" - so SNMP apparently is working fine.
When I access Graphite directly on port 8000 and traverse the tree Metrics-> nav-> devices-> ip-> port-> ifinOctets I am able to see a graph for that interface.
But are you able to browse the graphs for individual interfaces using NAV? How about the system/cpu graphs for individual devices?
When I access /ipdevinfo/x.x.x.x/#!ports aka the “ports” tab I am able to see the correct link state, vlan, speed and last used. The next tab /ipdevinfo/x.x.x.x/#!portmetrics shows no graph data in any combination of date/packet type. Just to add to the fun - no matter what timeframe I choose, it only shows the last 24 hours. That also goes for the next tab /ipdevinfo/x.x.x.x/#!sysmetrics
From what you're saying, it sounds like your problem is only with the ranked statistics tool itself, not with data collection.
It appears to me that Nav correctly collect the data, but fails to display it via Nav. Nav can access the switches via SNMP, Nav can display CPU load - although only for the last 24 hours. In the /ipdevinfo/?query=x.x.x.x&submit=Search I can the correct data for the switch in the device info tab, and “status” in the right corner is all green - except ip2mac which is yellow and according to the mouse-over is overdue.
I'm curious as to whether this is the message displayed by Ranked statistics if the Graphite request times out before producing a response (I'm not sure).
It might be - I started Apache with no virtual host listening on port 8000 and Nav responds with the expected "internal error”-page. But then of course this blocks all communication going to graphite - so probably not a valid test.
Ranked statistics offloads all its work to graphite-web, but while graphite has an excellent query language for producing this kind of ranked data and/or graphs, it is sadly not very performant. If you have 10.000 interfaces, it needs to chew through the data of 10.000 whisper files, and since there is no indexing, it's going to take a long time. Sometimes, so much that something will eventually time out while waiting for a response from the graphite-web backend.
I’m testing on 2 switches, the second switch was only added due to the missing graphs to rule out incompatibility. I have a total of 107 ports combined on two 24 port switches.
Kind regards from Copenhagen /Søren Aurehøj - Fab:IT
On Thu, 1 Mar 2018 09:54:53 +0100 Søren Aurehøj soren@fab-it.dk wrote:
When I access /ipdevinfo/x.x.x.x/#!ports aka the “ports” tab I am able to see the correct link state, vlan, speed and last used. The next tab /ipdevinfo/x.x.x.x/#!portmetrics shows no graph data in any combination of date/packet type. Just to add to the fun - no matter what timeframe I choose, it only shows the last 24 hours. That also goes for the next tab /ipdevinfo/x.x.x.x/#!sysmetrics
Aha, this sounds like a classic case of not configuring carbon-cache at all. With carbon-cache's default storage schema, every metric is only kept for 24 hours, and all metrics are expected to come at 1 minute intervals - meaning anything coming in at 5 minute intervals will appear gappy or completely missing.
Please double check with [1] and [2].
[1] https://nav.uninett.no/doc/4.8/intro/install.html#integrating-graphite-with-... [2] https://nav.uninett.no/doc/4.8/faq/graph_gaps.html#whisper-files-have-the-wr...
Hi Morten
Spot on - it was the carbon-cache storage schema. During installation of Nav, I started Nav - unfortunately before configuring Graphite in accordance with your first link. Initially I skipped your second link in my search for the error, as I was not seeing gaps but was missing the graphs entirely.
Thank you very much for your help.
For the archives:
I could confirm the problem with whisper-info showing the secondsPerPoint in Archive 0 set at 60, not the expected 300
root@nav:~# whisper-info /var/lib/graphite/whisper/nav/devices/192_168_0_10/ports/24/ifInOctets.wsp maxRetention: 51840000 xFilesFactor: 0.5 aggregationMethod: last fileSize: 45568
Archive 0 retention: 604800 secondsPerPoint: 300 <- was wrongly set to 60 points: 2016 size: 24192 offset: 64
To get back to a clean slate i did the following:
** WARNING -- WILL REMOVE ALL GRAPHITE + NAV-DATA **
/etc/init.d/nav stop; /etc/init.d/carbon-cache stop; /etc/init.d/apache2 stop rm -rf /var/lib/graphite/whisper/* rm /var/lib/graphite/graphite.db python /usr/share/pyshared/graphite/manage.py syncdb chown _graphite:_graphite /var/lib/graphite/graphite.db chmod o-r /var/lib/graphite/graphite.db su - postgres -c 'psql nav -c "truncate device, vlan, netbios cascade;"' /etc/init.d/carbon-cache start; /etc/init.d/nav start; /etc/init.d/apache2 start
Which I found in the nav-users mailing list [1]
[1] http://nav-users.itea.ntnu.narkive.com/wlPupwV8/setting-up-nav-from-scratch-...
On Fri, 2 Mar 2018 21:42:35 +0100 Søren Aurehøj soren@fab-it.dk wrote:
** WARNING -- WILL REMOVE ALL GRAPHITE + NAV-DATA **
/etc/init.d/nav stop; /etc/init.d/carbon-cache stop; /etc/init.d/apache2 stop rm -rf /var/lib/graphite/whisper/* rm /var/lib/graphite/graphite.db python /usr/share/pyshared/graphite/manage.py syncdb chown _graphite:_graphite /var/lib/graphite/graphite.db chmod o-r /var/lib/graphite/graphite.db su - postgres -c 'psql nav -c "truncate device, vlan, netbios cascade;"' /etc/init.d/carbon-cache start; /etc/init.d/nav start; /etc/init.d/apache2 start
Which I found in the nav-users mailing list [1]
I'm glad you got it to work, but to me, this looks like total overkill for your situation. You are completely trashing parts of your NAV database, when the problem is only with Graphite's whisper files.
Probably, this would have been more than enough:
cd /etc/carbon rm storage-schemas.conf storage-aggregation.conf ln -s /etc/nav/graphite/storage-schemas.conf . ln -s /etc/nav/graphite/storage-aggregation.conf . service carbon-cache restart rm -rf /var/lib/graphite/whisper/*
Happy NAVing! :-)