OidTester dupeMap - Nav-dev

23 Jan 2007

      Hi Kristian,
I'm debugging some getDeviceData problems that occur when devices change
type in NAV, where, among other things, OID testing sometimes does not
run properly.
Apparently, OID testing runs properly only every second time a device
changes its type (this is what my tests show when using the refresh
function and switching type several times in a row).  When OID testing
does not run properly, all the OIDTester logs is that OID testing is
done, without having tested anything.
I've found that the doTest method of the OIDTester exits early because
all the OIDs are found in a MultiMap called dupeMap.
What I would like to know, if you can remember, what exactly is the
purpose of this dupeMap?
Incidentally, the netbox refresh function of gDD suddenly stops working
after an arbitrary number of refreshes and will not work again until
after a gDD restart.  gDD logs the refresh event and says it will start
an immediate collection run for that box, but then proceeds to do nothing...
-- 
mvh
Morten Brekkevold
UNINETT
>From kreide at gmail.com  Mon Jan 29 01:25:27 2007
From: kreide at gmail.com (Kristian Eide)
Date: Mon Jan 29 10:25:32 2007
Subject: [Nav-dev] Re: OidTester dupeMap
In-Reply-To: 45B5EBF8.7010602@uninett.no
References: 45B5EBF8.7010602@uninett.no
Message-ID: b4c110fd0701290125i1668ea71we8fa4b812b335e7a@mail.gmail.com

> I've found that the doTest method of the OIDTester exits early because
> all the OIDs are found in a MultiMap called dupeMap.
> What I would like to know, if you can remember, what exactly is the
> purpose of this dupeMap?

Sorry for the late reply, but here goes: it is used to flag a
combination of IP and OID as already tested. I added it because
testing happens in parallel and a IP/OID combination can be tried both
from testing a single OID with all IPs (netboxes) and a single IP with
all OIDs.

I remember I tried to be clever when implementing this map since I
wanted it to have O(1) lookup cost, but I have no doubt that there are
bugs in the implementation, especially since I did not write any tests
for it, something which is a necessity for testing the corner cases of
something like this. I cannot remember how entries are removed from
it, but since this is going to be tricky with the current
implementation I probably got it wrong back then (and this is probably
the source of the problem).

> Incidentally, the netbox refresh function of gDD suddenly stops working
> after an arbitrary number of refreshes and will not work again until
> after a gDD restart.  gDD logs the refresh event and says it will start
> an immediate collection run for that box, but then proceeds to do nothing...

I will recommend that you add a proper test case to the queueing
mechanism and I am sure you will find this (and probably a few more)
bugs in short order.

-- 
Kristian