Hi all!
As mentioned on the IRC channel before easter week, I've been taking a shot at the deprecate-psycopg1 blueprint (https://blueprints.launchpad.net/nav/+spec/deprecate-psycopg1)
Preliminary results can be found on this branch: http://metanav.uninett.no/hg/features/deprecate-psycopg1/
I would like to merge this code with the default branch as soon as possible. I would like it, though, if people who feel they have some sort of responsibility for code they've written would review the diffs.
A lot of code was using the dictfetchall, dictfetchone and dictfetchmany call on cursors. These are non-standard, and are no longer supported in psycopg2. To avoid making too many code changes related to this, I've opted to use a psycopg2 extension which allows me to use customized cursor classes.
The psycopg2 extensions already comes with a custom cursor class, which provides result rows that can be accessed both as tuples and dictionaries. To use the DictCursor class, I've had to manually specify the class as an argument to every call to connection.cursor(), where the code later expected to use the dictfetch methods on the cursor. All dictfetch calls have been reduced to regular fetch calls.
The custom cursor change affects the following subsystems:
* Arnold * deviceManagement * ipinfo * l2trace * lib-python (rrdPresenter) * maintenance * messages * Netmap * snmptrapd * statTools
See this changeset to see how I may have mauled your code: http://metanav.uninett.no/hg/features/deprecate-psycopg1/rev/4cbc70349d55
On another note: Does all this code _really_ need to use dictionary result rows? I don't know the internals of psycopg2's DictCursor, but I seems to me that it could be a real memory-waste to pull thousands of table rows as dictionary instances instead of simple tuples.
In many cases, I think looping over the results like this would suffice:
cursor.execute("SELECT column1, column2, column3 FROM foo") for (column1, column2, column3) in cursor.fetchall(): pass # or actually do something
On Tue, 14 Apr 2009 11:51:50 +0200 Morten Brekkevold morten.brekkevold@uninett.no wrote:
On another note: Does all this code _really_ need to use dictionary result rows? I don't know the internals of psycopg2's DictCursor, but I seems to me that it could be a real memory-waste to pull thousands of table rows as dictionary instances instead of simple tuples.
After stating this, I became a bit curious about the actual DictCursor implementation, of course.
It seems to be "properly" optimized, in that the cursor object caches a column name index after each query. Each DictRow object (which is what is returned by the fetch methods) keeps a reference to the column name index from the originating cursor, and uses this for lookups when the DictRow object is indexed by anything but integers.
On Tue, 14 Apr 2009 11:51:50 +0200 Morten Brekkevold morten.brekkevold@uninett.no wrote:
The custom cursor change affects the following subsystems:
- Arnold
- deviceManagement
- ipinfo
- l2trace
- lib-python (rrdPresenter)
- maintenance
- messages
- Netmap
- snmptrapd
- statTools
See this changeset to see how I may have mauled your code: http://metanav.uninett.no/hg/features/deprecate-psycopg1/rev/4cbc70349d55
Looks like I just found some breakage on my own, in the Netmap backend. datacollector.py assumes all results fetched from cursors are pure dicts, and attempts to modify them. DictRow objects do not support updating like a dictionary.
Also, DictRow objects use list semantics for the "in" operator: Checking for the existence of a key using "key in row" will fail, since this in reality checks the values of the row. datacollector.py does this too, and it causes it to crash on a failed assertion.
My quick workaround was to convert all result rows to regular dicts before they are used in datacollector.py, but I haven't committed these changes yet.
If any of you know of code that tries to update result row dictionaries, or check for the presence of specific keys in them using the 'in' operator, please give me a shout (or better yet, clone the deprecate-psycogp1 branch, write a patch and then give me shout)!
On Tue, 14 Apr 2009 11:51:50 +0200 Morten Brekkevold morten.brekkevold@uninett.no wrote:
Hi all!
As mentioned on the IRC channel before easter week, I've been taking a shot at the deprecate-psycopg1 blueprint (https://blueprints.launchpad.net/nav/+spec/deprecate-psycopg1)
Preliminary results can be found on this branch: http://metanav.uninett.no/hg/features/deprecate-psycopg1/
I would like to merge this code with the default branch as soon as possible. I would like it, though, if people who feel they have some sort of responsibility for code they've written would review the diffs.
Well, for all the recommendations to migrate from the unmaintained psycopg1 branch to psycopg2, we got bit by bugs in psycopg2 pretty quickly.
While using psycopg2, snmptrapd would promptly crash after receiving a trap, issuing a rather strange traceback for a psycopg2.OperationalError. The database connection was lost, and the nav.db API will try to create a new one, but the traceback shows that the OperationalError is seemingly raised in an unrelated part of the code (a line that iterates a list, which has nothing to do with calls to psycopg2).
So I found this psycopg mailing list thread, which basically is a bug report of the same issue, and a confirmation from the psycopg author that it is a bug. The posting is from 2007: http://osdir.com/ml/python.db.psycopg.devel/2007-04/msg00000.html
We're still developing mainly on Debian Etch, which has psycopg2 version 2.0.5 (released in 2006). Testing on our new dev server, which is running Debian Lenny and psycopg 2.0.7, I'm unable to reproduce the bug.
I will try to backport psycopg2 2.0.7 to Etch for time being, but this bug basically will make NAV 3.6 incompatible with Debian Etch (without supplying a psycopg2 backport).