> code, adding more debug output (and cursing the logging system, which I
> would like to replace with Log4j). What I've found is that when the
Yes, if I had known better at the time I would have used log4j. It
should not be that hard to do a bulk replace of all the calls though
(or even easier, simply have my log system forward to log4j).
> and a null value otherwise. But it appears that is also checks whether
> the OID is "ready to be fetched"; i.e. it checks an internal OID
> run-queue in the NetboxImpl instance, and if it's nextRun timestamp has
> passed, the OID is returned - otherwise, null is returned.
You are correct, this is the cause of the problem.
> What seems to be happening is that, for some reason, one hour after a
> normal collection run, another run is initiated (could this be a
> moduleMon run?). This seems to trigger the regular set of
Yes, this is the moduleMon run.
> What I have yet to figure out is what this oid run-queue and "oid is
> ready/not ready to be fetched" is good for?
The idea of the run queue is that different OIDs can be collected at
different intervals, however, the level of complexity due to this is
unfortunately (perhaps too) high - gDD certainly is an ambitious
project.
This problem is caused by a simple bug, however. The ModuleMon for
Cisco uses the CiscoModule plugin to note which modules are down, but
to avoid this situation ModuleMon will not mark modules as down unless
mmc.commit() is called. If you look in CiscoModule.java that method is
called regardless of whether module info was collected or not.
The solution is to only call this method if collection happened, e.g.
you can check if catModModel is non-null. This of course assumes that
the schedule for all the Cisco module OIDs is the same, but for now I
think you should make that a requirement.
--
Kristian
>From morten.brekkevold at uninett.no Thu Oct 12 11:31:37 2006
From: morten.brekkevold at uninett.no (Morten Brekkevold)
Date: Thu Oct 12 10:31:58 2006
Subject: [Nav-dev] Re: Modules up and down
In-Reply-To: <b4c110fd0610111157n33f01970gd77e1cdddb889384(a)mail.gmail.com>
References: <44C0D670.6050306(a)uninett.no> <451BCE52.4090903(a)uninett.no>
<b4c110fd0610111157n33f01970gd77e1cdddb889384(a)mail.gmail.com>
Message-ID: <452DFD69.2040402(a)uninett.no>
Kristian Eide wrote, On 11-10-2006 20:57:
> Yes, if I had known better at the time I would have used log4j. It
> should not be that hard to do a bulk replace of all the calls though
> (or even easier, simply have my log system forward to log4j).
Although the latter sounds tempting, I would rather make sure I had a
proper logger hierarchy based on class names. My primary need is to be
able to change the logging levels for different parts of gDD (and the
other Java applications), and filter out the irrelevant stuff, without
having to resort to grep.
>> What seems to be happening is that, for some reason, one hour after a
>> normal collection run, another run is initiated (could this be a
>> moduleMon run?). This seems to trigger the regular set of
>
> Yes, this is the moduleMon run.
My previous understanding of this was that when the moduleMon oidkey is
scheduled for collection, only the device-plugins who have expressed
their ability to handle this oid would be triggered. Am I now to
understand that all the regular plugins matching a given device will
trigger, but among all the SNMP queries they put forth, only a request
for the moduleMon OID will succeed?
> The idea of the run queue is that different OIDs can be collected at
> different intervals, however, the level of complexity due to this is
> unfortunately (perhaps too) high - gDD certainly is an ambitious
> project.
I knew about the idea of different collection intervals for OIDs, of
course, but I thought the implemention relied on triggering only the
relevant plugins. I did not know the design actually prevented the OIDs
from being collected at such a low level.
> This problem is caused by a simple bug, however. The ModuleMon for
> Cisco uses the CiscoModule plugin to note which modules are down, but
> to avoid this situation ModuleMon will not mark modules as down unless
> mmc.commit() is called. If you look in CiscoModule.java that method is
> called regardless of whether module info was collected or not.
Aha! Thanks for this valuable hint :)
> This of course assumes that the schedule for all the Cisco module
> OIDs is the same, but for now I think you should make that a
> requirement.
I'm sure a lot of things will break down if we start experimenting with
different collection intervals for arbitrary OIDs from the database ;)
So far, it's mostly the moduleMon OID that has been interesting to
collect more often than the rest.
--
mvh
Morten Brekkevold
UNINETT