Hi,
I've run into trouble after upgrading to 3.9.3 via apt-source, system is debian 6.0.3, this was not a problem with 3.9.2.
First the message during upgrade:
Setting up nav (2+3.9.3-1) ... Installing new version of config file /etc/nav/sortedStats.conf ... Installing new version of config file /etc/nav/smsd.conf ... Installing new version of config file /etc/nav/report/report.conf ... Traceback (most recent call last): File "/usr/share/doc/nav/sql/syncdb.py", line 463, in <module> main() File "/usr/share/doc/nav/sql/syncdb.py", line 44, in main sync.synchronize() File "/usr/share/doc/nav/sql/syncdb.py", line 220, in synchronize self.apply_changes() File "/usr/share/doc/nav/sql/syncdb.py", line 305, in apply_changes self.apply_change_script(version, script) File "/usr/share/doc/nav/sql/syncdb.py", line 326, in apply_change_script self.execute_sql_file(script) File "/usr/share/doc/nav/sql/syncdb.py", line 352, in execute_sql_file self.cursor.execute(sql) psycopg2.extensions.QueryCanceledError: canceling statement due to statement timeout CONTEXT: SQL statement "DELETE FROM ONLY "manage"."rrd_datasource" WHERE $1 OPERATOR(pg_catalog.=) "rrd_fileid""
and if I try adding a new device I get:
Something went wrong Something went wrong while processing your request. Please try again later. If the problems persists, contact your system administrator and/or report a bug on NAV bugtracker.
And an error when trying to delete a device:
Error: canceling statement due to statement timeout CONTEXT: SQL statement "UPDATE ONLY "manage"."cam" SET "netboxid" = NULL WHERE $1 OPERATOR(pg_catalog.=) "netboxid""
When digging into the postgres-log I find (everything from a database restart with two delete attempts):
2011-12-07 10:11:04 CET LOG: database system was shut down at 2011-12-07 10:10:55 CET 2011-12-07 10:11:04 CET LOG: database system is ready to accept connections 2011-12-07 10:11:04 CET LOG: autovacuum launcher started 2011-12-07 10:11:05 CET LOG: incomplete startup packet 2011-12-07 10:12:00 CET ERROR: canceling statement due to statement timeout 2011-12-07 10:12:00 CET CONTEXT: SQL statement "UPDATE ONLY "manage"."cam" SET "netboxid" = NULL WHERE $1 OPERATOR(pg_catalog.=) "netboxid"" 2011-12-07 10:12:00 CET STATEMENT: DELETE FROM "netbox" WHERE "netboxid" IN (162) 2011-12-07 10:15:01 CET LOG: unexpected EOF on client connection 2011-12-07 10:18:22 CET ERROR: canceling statement due to statement timeout 2011-12-07 10:18:22 CET CONTEXT: SQL statement "UPDATE ONLY "manage"."cam" SET "netboxid" = NULL WHERE $1 OPERATOR(pg_catalog.=) "netboxid"" 2011-12-07 10:18:22 CET STATEMENT: DELETE FROM "netbox" WHERE "netboxid" IN (162) 2011-12-07 10:20:02 CET LOG: unexpected EOF on client connection
Anyone having any idea on what's going on? Postgresql has been tuned a bit with allowing 1000 simultaneous connections, but should otherwise be standard.
Best regards,
-Sigurd
On 2011-12-07 10:29, Sigurd Mytting wrote:
Hi,
Hi :-)
I've run into trouble after upgrading to 3.9.3 via apt-source, system is debian 6.0.3, this was not a problem with 3.9.2.
First the message during upgrade:
Setting up nav (2+3.9.3-1) ...
I did a little digging and found the statement timeout was due to a 30 second time limit on transactions in postgresql.conf, changed that to the default value of 0 (wait forever). After then trying to run syncdb.py for seven hours I terminated it. Some more digging showed the statement "DELETE FROM rrd_file WHERE netboxid IS NULL;" did not terminate, but "select *" gave over 7700 lines. Deleting them one by one using the rrd_fileid and "netboxid IS NULL" cleaned up the database in a few minutes.
After adjusting the timeout value I can delete devices, but still get errors when adding devices - however bulk import seems to work.
Don't know if it's related, but the postgres-log contains a lot of "LOG: unexpected EOF on client connection".
cheers,
-Sigurd
On Wed, 07 Dec 2011 21:39:34 +0100 Sigurd Mytting sigurd@mytting.no wrote:
On 2011-12-07 10:29, Sigurd Mytting wrote:
Hi,
Hi :-)
I've run into trouble after upgrading to 3.9.3 via apt-source, system is debian 6.0.3, this was not a problem with 3.9.2.
First the message during upgrade:
Setting up nav (2+3.9.3-1) ...
I did a little digging and found the statement timeout was due to a 30 second time limit on transactions in postgresql.conf, changed that to the default value of 0 (wait forever).
Ouch, that could have been a problem with many long-running queries.
After then trying to run syncdb.py for seven hours I terminated it. Some more digging showed the statement "DELETE FROM rrd_file WHERE netboxid IS NULL;" did not terminate, but "select *" gave over 7700 lines. Deleting them one by one using the rrd_fileid and "netboxid IS NULL" cleaned up the database in a few minutes.
"DELETE FROM rrd_file WHERE netboxid IS NULL" should never take 7 hours. I'm guessing something was holding on to a database lock that the statement needed to wait for.
Does a "ps axuw" listing show many postgres processes with the status "idle in transaction"?
After adjusting the timeout value I can delete devices, but still get errors when adding devices - however bulk import seems to work.
What kind of errors?
Don't know if it's related, but the postgres-log contains a lot of "LOG: unexpected EOF on client connection".
It's not related.
Hi :-)
On Thu, 08 Dec 2011 10:57:58 +0100, Morten Brekkevold morten.brekkevold@uninett.no wrote:
"DELETE FROM rrd_file WHERE netboxid IS NULL" should never take 7 hours. I'm guessing something was holding on to a database lock that the statement needed to wait for.
Does a "ps axuw" listing show many postgres processes with the status "idle in transaction"?
I tried restarting postgres just before issuing the command, so it shouldn't be any at the time.
For the last 30 minutes its been stable at 7. I noticed before upgrading I had up to max_connections and pping crashed when max_connections was reached. This behavior is gone now.
After adjusting the timeout value I can delete devices, but still get
errors
when adding devices - however bulk import seems to work.
What kind of errors?
Webpage says: "Something went wrong
Something went wrong while processing your request. Please try again later.
If the problems persists, contact your system administrator and/or report a bug on NAV bugtracker."
Apache access log: nav.vegvesen.no:80 146.2.x.y - - [08/Dec/2011:16:56:39 +0100] "POST /seeddb/netbox/add/ HTTP/1.1" 200 2011 "http://nav.vegvesen.no/seeddb/netbox/add/" "Mozilla/5.0 (Ubuntu; X11; Linux x86_64; rv:8.0) Gecko/20100101 Firefox/8.0" nav.vegvesen.no:80 146.2.x.y - - [08/Dec/2011:16:56:41 +0100] "POST /seeddb/netbox/add/ HTTP/1.1" 500 472 "http://nav.vegvesen.no/seeddb/netbox/add/" "Mozilla/5.0 (Ubuntu; X11; Linux x86_64; rv:8.0) Gecko/20100101 Firefox/8.0"
Error log: /usr/lib/python2.6/dist-packages/mod_python/importer.py:32: DeprecationWarning: the md5 module is deprecated; use hashlib instead import md5 (I guess not related)
Any other logs relevant?
Cheers,
-Sigurd
On Thu, 8 Dec 2011 17:01:19 +0100 Sigurd Mytting sigurd@mytting.no wrote:
What kind of errors?
Webpage says: "Something went wrong
Something went wrong while processing your request. Please try again later.
Please see the FAQ entry at [1].
Error log: /usr/lib/python2.6/dist-packages/mod_python/importer.py:32: DeprecationWarning: the md5 module is deprecated; use hashlib instead import md5 (I guess not related)
No, that's just a "cosmetic" problem with the mod_python package in Debian.
Any other logs relevant?
Follow the instructions from the FAQ entry and see what you get.
[1] http://metanav.uninett.no/navfaq#how_can_i_get_help_for_something_went_wrong...