Hi,
We did an upgrade to a new VM, we installed Debian Buster and we installed NAV 4.9.8. After that we did a restore of the postgress db with navsyncdb, and we also restored the /var/lib/graphite directory to have the history of the graphs.
But now if i reboot the new NAV server, the graphs are resetted tot he point we did the restore. In the console log we see this error:
16/09/2019 08:02:28 :: Unhandled Error Traceback (most recent call last): File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 250, in inContext result = inContext.theWork() File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 266, in <lambda> inContext.theWork = lambda: context.call(ctx, func, *args, **kw) File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 122, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 85, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "/usr/lib/python3/dist-packages/carbon/writer.py", line 189, in writeForever writeCachedDataPoints() File "/usr/lib/python3/dist-packages/carbon/writer.py", line 98, in writeCachedDataPoints (metric, datapoints) = cache.drain_metric() File "/usr/lib/python3/dist-packages/carbon/cache.py", line 187, in drain_metric metric = self.strategy.choose_item() File "/usr/lib/python3/dist-packages/carbon/cache.py", line 116, in choose_item return next(self.queue) builtins.StopIteration:
Is this easy to fix?
Kind regards,
René.
On Mon, 16 Sep 2019 06:06:25 +0000 René Romijn Rene.Romijn@tabsholland.nl wrote:
We did an upgrade to a new VM, we installed Debian Buster and we installed NAV 4.9.8. After that we did a restore of the postgress db with navsyncdb, and we also restored the /var/lib/graphite directory to have the history of the graphs.
But now if i reboot the new NAV server, the graphs are resetted tot he point we did the restore.
You mean your old time series data appears to be missing, even after a restore of the data files? That just seems strange. Either your carbon-cache has overwritten the files, which would be a serious bug, or your carbon-cache and/or graphite-web installations are not configured to use the directory where you restored your files.
Are you able to confirm that the files in use have the old data in them (e.g. using whisper-dump), and that their timestamps are being updated as carbon-cache writes to them?
Other than that, there's probably more knowledgeable support for Graphite issues if you ask questions in the Graphite project's preferred forums: [1].
File "/usr/lib/python3/dist-packages/carbon/cache.py", line 116, in choose_item return next(self.queue) builtins.StopIteration:
Is this easy to fix?
I have no idea, and I'm not sure how this would tie in with graphs appearing to be reset. This looks very much like this bug report from the graphite-project's issue tracker: [2]
Debian Buster features graphite-carbon version 1.1.4, but the bug was fixed in 1.1.5. I see a bug report [3] has been filed with Debian, as this bug causes a hard crash, but there's no fix yet. It might help to push the package maintainer along by adding your own feedback on [3].
Unless this is fixed, we'll probably run into this problem as we begin upgrading our production platforms to Buster as well :-P
[1] https://graphite.readthedocs.io/en/latest/install.html?highlight=help#help-i... [2] https://github.com/graphite-project/carbon/issues/815 [3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923464
Hi Morten,
We installed a new VM with Buster and we installed NAV via this link: https://github.com/Uninett/navappliance/blob/master/scripts/nav.sh
Then i restored the /var/lib/graphite directory from the old server tot he new one. I did see all the old graphs and also new graphs is created.. but after a reboot after 3 days graphite didnt write back the newly created graphs and 'reverted' to the point of restore, so still the old data.
René Romijn
tel 030-2346255 mob 06-55191199 sip:rene.romijn@tabsholland.nl KvK 37077570 www.tabsholland.nl
-----Oorspronkelijk bericht----- Van: Morten Brekkevold morten.brekkevold@uninett.no Verzonden: maandag 16 september 2019 09:26 Aan: René Romijn Rene.Romijn@tabsholland.nl CC: 'nav-users@uninett.no' nav-users@uninett.no Onderwerp: Re: Upgrade Debian to Buster and NAV to 4.9.8
On Mon, 16 Sep 2019 06:06:25 +0000 René Romijn Rene.Romijn@tabsholland.nl wrote:
We did an upgrade to a new VM, we installed Debian Buster and we installed NAV 4.9.8. After that we did a restore of the postgress db with navsyncdb, and we also restored the /var/lib/graphite directory to have the history of the graphs.
But now if i reboot the new NAV server, the graphs are resetted tot he point we did the restore.
You mean your old time series data appears to be missing, even after a restore of the data files? That just seems strange. Either your carbon-cache has overwritten the files, which would be a serious bug, or your carbon-cache and/or graphite-web installations are not configured to use the directory where you restored your files.
Are you able to confirm that the files in use have the old data in them (e.g. using whisper-dump), and that their timestamps are being updated as carbon-cache writes to them?
Other than that, there's probably more knowledgeable support for Graphite issues if you ask questions in the Graphite project's preferred forums: [1].
File "/usr/lib/python3/dist-packages/carbon/cache.py", line 116, in choose_item return next(self.queue) builtins.StopIteration:
Is this easy to fix?
I have no idea, and I'm not sure how this would tie in with graphs appearing to be reset. This looks very much like this bug report from the graphite-project's issue tracker: [2]
Debian Buster features graphite-carbon version 1.1.4, but the bug was fixed in 1.1.5. I see a bug report [3] has been filed with Debian, as this bug causes a hard crash, but there's no fix yet. It might help to push the package maintainer along by adding your own feedback on [3].
Unless this is fixed, we'll probably run into this problem as we begin upgrading our production platforms to Buster as well :-P
[1] https://graphite.readthedocs.io/en/latest/install.html?highlight=help#help-i... [2] https://github.com/graphite-project/carbon/issues/815 [3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923464
-- mvh Morten Brekkevold Uninett
On Mon, 16 Sep 2019 10:55:54 +0000 René Romijn Rene.Romijn@tabsholland.nl wrote:
We installed a new VM with Buster and we installed NAV via this link: https://github.com/Uninett/navappliance/blob/master/scripts/nav.sh
Then i restored the /var/lib/graphite directory from the old server tot he new one. I did see all the old graphs and also new graphs is created.. but after a reboot after 3 days graphite didnt write back the newly created graphs and 'reverted' to the point of restore, so still the old data.
You're saying you lost *three days* of Graphite data after a reboot of the VM? I cannot imagine carbon-cache could keep that much data in memory before committing to disk - that just sounds like something is very wrong with your VM hosting software.
Again, since this problem appears to be very specific to Graphite (or your VM infrastructure), I would advise you to ask somewhere where the level of Graphite expertise is higher. It seems their preferred forum is on Launchpad: https://answers.launchpad.net/graphite (and I'm going to watch that forum, because this issue makes me really curious!).
This is indeed what i am saying 😊
After a couple of day's the memory consumption from the carbon proces is sky high, it seems like carbon cannot write back the cache or something like that.
For now i have reverted tot he older installation and i will try to investigate whats going on.
Kind regards,
René Romijn
-----Oorspronkelijk bericht----- Van: Morten Brekkevold morten.brekkevold@uninett.no Verzonden: dinsdag 17 september 2019 07:51 Aan: René Romijn Rene.Romijn@tabsholland.nl CC: 'nav-users@uninett.no' nav-users@uninett.no Onderwerp: Re: Upgrade Debian to Buster and NAV to 4.9.8
On Mon, 16 Sep 2019 10:55:54 +0000 René Romijn Rene.Romijn@tabsholland.nl wrote:
We installed a new VM with Buster and we installed NAV via this link: https://github.com/Uninett/navappliance/blob/master/scripts/nav.sh
Then i restored the /var/lib/graphite directory from the old server tot he new one. I did see all the old graphs and also new graphs is created.. but after a reboot after 3 days graphite didnt write back the newly created graphs and 'reverted' to the point of restore, so still the old data.
You're saying you lost *three days* of Graphite data after a reboot of the VM? I cannot imagine carbon-cache could keep that much data in memory before committing to disk - that just sounds like something is very wrong with your VM hosting software.
Again, since this problem appears to be very specific to Graphite (or your VM infrastructure), I would advise you to ask somewhere where the level of Graphite expertise is higher. It seems their preferred forum is on Launchpad: https://answers.launchpad.net/graphite (and I'm going to watch that forum, because this issue makes me really curious!).
-- sincerely, Morten Brekkevold Uninett
On Tue, 17 Sep 2019 05:54:18 +0000 René Romijn Rene.Romijn@tabsholland.nl wrote:
This is indeed what i am saying 😊
After a couple of day's the memory consumption from the carbon proces is sky high, it seems like carbon cannot write back the cache or something like that.
If your VM has been provisioned with enough RAM, that could actually be the case, depending on your data influx rate...
For now i have reverted tot he older installation and i will try to investigate whats going on.
If the carbon-cache-bug I pointed out in their issue tracker can cause something like this, then that might be your culprit. In that case, I would advise you to either stay on Debian Stretch or patch carbon-cache yourself (until the Debian team patches the Buster package).
Knowing this, I would *not* deploy on Buster until I had a proper fix.
Hi Morten,
I replaced the cache.py with the patched one from 1.1.5. It does seems to be stable now.. I will see if it's ok tomorrow.
Kind regards,
René.
-----Oorspronkelijk bericht----- Van: Morten Brekkevold morten.brekkevold@uninett.no Verzonden: dinsdag 17 september 2019 08:01 Aan: René Romijn Rene.Romijn@tabsholland.nl CC: 'nav-users@uninett.no' nav-users@uninett.no Onderwerp: Re: Upgrade Debian to Buster and NAV to 4.9.8
On Tue, 17 Sep 2019 05:54:18 +0000 René Romijn Rene.Romijn@tabsholland.nl wrote:
This is indeed what i am saying 😊
After a couple of day's the memory consumption from the carbon proces is sky high, it seems like carbon cannot write back the cache or something like that.
If your VM has been provisioned with enough RAM, that could actually be the case, depending on your data influx rate...
For now i have reverted tot he older installation and i will try to investigate whats going on.
If the carbon-cache-bug I pointed out in their issue tracker can cause something like this, then that might be your culprit. In that case, I would advise you to either stay on Debian Stretch or patch carbon-cache yourself (until the Debian team patches the Buster package).
Knowing this, I would *not* deploy on Buster until I had a proper fix.
-- sincerely, Morten Brekkevold Uninett
Hi Morten,
I can confirm that the patched cache.py is the solution for this.
The only wierd thing i still see is a lot of entry's in the tagdb.log like this:
18/09/2019 07:32:52 :: Error tagging nav.devices.172_30_104_21.ipdevpoll.statuscheck.success-count: Error requesting http://127.0.0.1:80/tags/tagMultiSeries: HTTPSConnectionPool(host='nav.<domainname>tags', port=443): Max retries exceeded with url: /tagMultiSeries (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fd98ce072e8>: Failed to establish a new connection: [Errno -2] Name or service not known'))
But in the gui all seems to be ok.
Kind regards,
René Romijn
Van: nav-users-request@uninett.no nav-users-request@uninett.no Namens René Romijn Verzonden: dinsdag 17 september 2019 16:09 Aan: Morten Brekkevold morten.brekkevold@uninett.no CC: 'nav-users@uninett.no' nav-users@uninett.no Onderwerp: RE: Upgrade Debian to Buster and NAV to 4.9.8
Hi Morten,
I replaced the cache.py with the patched one from 1.1.5. It does seems to be stable now.. I will see if it's ok tomorrow.
Kind regards,
René.
-----Oorspronkelijk bericht----- Van: Morten Brekkevold morten.brekkevold@uninett.no Verzonden: dinsdag 17 september 2019 08:01 Aan: René Romijn Rene.Romijn@tabsholland.nl CC: 'nav-users@uninett.no' nav-users@uninett.no Onderwerp: Re: Upgrade Debian to Buster and NAV to 4.9.8
On Tue, 17 Sep 2019 05:54:18 +0000 René Romijn Rene.Romijn@tabsholland.nl wrote:
This is indeed what i am saying 😊
After a couple of day's the memory consumption from the carbon proces is sky high, it seems like carbon cannot write back the cache or something like that.
If your VM has been provisioned with enough RAM, that could actually be the case, depending on your data influx rate...
For now i have reverted tot he older installation and i will try to investigate whats going on.
If the carbon-cache-bug I pointed out in their issue tracker can cause something like this, then that might be your culprit. In that case, I would advise you to either stay on Debian Stretch or patch carbon-cache yourself (until the Debian team patches the Buster package).
Knowing this, I would *not* deploy on Buster until I had a proper fix.
-- sincerely, Morten Brekkevold Uninett
On Tue, 17 Sep 2019 14:09:23 +0000 René Romijn Rene.Romijn@tabsholland.nl wrote:
I replaced the cache.py with the patched one from 1.1.5. It does seems to be stable now.. I will see if it's ok tomorrow.
Waiting with bated breath for the results :-)
Hi Morten,
As i mentioned before, the patched cache.py worked! I can safely reboot if necessary and also the carbon proces is not taking much more memory anymore.
Also i changed the /etc/carbon/carbon.conf. i changed the value ENABLE_TAGS to False. I am not realy sure what the function is doing, but the 'tagging' resulted in a lot of logging in the /var/log/carbon/tagdb.log
So, everything seems to be working fine now 😊 running on the latest.
Kind regards,
René Romijn
-----Oorspronkelijk bericht----- Van: Morten Brekkevold morten.brekkevold@uninett.no Verzonden: woensdag 18 september 2019 08:45 Aan: René Romijn Rene.Romijn@tabsholland.nl CC: 'nav-users@uninett.no' nav-users@uninett.no Onderwerp: Re: Upgrade Debian to Buster and NAV to 4.9.8
On Tue, 17 Sep 2019 14:09:23 +0000 René Romijn Rene.Romijn@tabsholland.nl wrote:
I replaced the cache.py with the patched one from 1.1.5. It does seems to be stable now.. I will see if it's ok tomorrow.
Waiting with bated breath for the results :-)
-- sincerely Morten Brekkevold Uninett
On Thu, 19 Sep 2019 05:25:25 +0000 René Romijn Rene.Romijn@tabsholland.nl wrote:
Also i changed the /etc/carbon/carbon.conf. i changed the value ENABLE_TAGS to False. I am not realy sure what the function is doing, but the 'tagging' resulted in a lot of logging in the /var/log/carbon/tagdb.log
Graphite 1.1 introduced tagging support (which I have not used thus far). The TagDB is an SQL database managed by graphite-web, so it seems carbon-cache needs to know graphite-web's URL to send tag information to it.
I don't think the virtual appliance configures this URL, so carbon-cache will try to post tag data to a non-existent location.
So, everything seems to be working fine now 😊 running on the latest.
Glad to hear it! Please do consider bumping the Debian issue, though; anything that prods the package maintainer to make a more permanent solution would be helpful :)
On Fri, 20 Sep 2019 08:39:13 +0200 Morten Brekkevold morten.brekkevold@uninett.no wrote:
On Thu, 19 Sep 2019 05:25:25 +0000 René Romijn Rene.Romijn@tabsholland.nl wrote:
So, everything seems to be working fine now 😊 running on the latest.
Glad to hear it! Please do consider bumping the Debian issue, though; anything that prods the package maintainer to make a more permanent solution would be helpful :)
I have just uploaded an NMU package of graphite-carbon 1.1.5 in the buster archive at https://nav.uninett.no/debian , in an attempt to overcome the issues described (and not yet solved by the package maintainer) at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923464
If you're already pulling the NAV Debian packages from our repository, your Debian system should likely pull the new graphite-carbon package from there when you issue an `apt-get update && apt-get upgrade`.
-- mvh Morten Brekkevold Uninett