Service port is down on 158.XX.XX.XXX
<alertengine@XXX.XXXX.no> 20.01.2006 10:35:22 >>> Service port on 158.XX.XX.XXX is down since 2006-01-20 10:34:34. Server up: Yes Status: Main building
This was a crash and should not be "Server up: Yes" Some logg from servicemon.log [2006-01-20 10:34:06] abstractChecker.py:run:117 [Info] 158.XX.XX.XXX:port -> timed out [2006-01-20 10:34:06] abstractChecker.py:run:131 [Notice] 158.XX.XX.XXX:port -> State changed. New check in 5 sec. (DOWN, timed out) This is normal if we stop the service that we check on the server [2006-01-20 10:34:34] abstractChecker.py:run:117 [Info] 158.XX.XX.XXX:port -> (113, 'No route to host') [2006-01-20 10:34:34] abstractChecker.py:run:140 [Alert ] 158.XX.XX.XXX:port -> DOWN, (113, 'No route to host') But these line are special for the server that crash and should not give "Server up: Yes" Has this been solved in the latest nav-3.0.0-1.noarch.rpm? Peder
From magnus at ntnu.no Fri Jan 20 11:20:49 2006 From: magnus at ntnu.no (Magnus Nordseth) Date: Fri Jan 20 11:21:07 2006 Subject: [Nav-users] Service port is down on 158.XX.XX.XXX In-Reply-To: <s3d0c2ce.064@HVO-3.hivolda.no> References: <s3d0c2ce.064@HVO-3.hivolda.no> Message-ID: <20060120102048.GA15629@stud.ntnu.no>
Peder Magne Sefland:
This was a crash and should not be "Server up: Yes"
Some logg from servicemon.log [2006-01-20 10:34:06] abstractChecker.py:run:117 [Info] 158.XX.XX.XXX:port -> timed out [2006-01-20 10:34:06] abstractChecker.py:run:131 [Notice] 158.XX.XX.XXX:port -> State changed. New check in 5 sec. (DOWN, timed out)
This is normal if we stop the service that we check on the server
[2006-01-20 10:34:34] abstractChecker.py:run:117 [Info] 158.XX.XX.XXX:port -> (113, 'No route to host') [2006-01-20 10:34:34] abstractChecker.py:run:140 [Alert ] 158.XX.XX.XXX:port -> DOWN, (113, 'No route to host')
But these line are special for the server that crash and should not give "Server up: Yes"
Has this been solved in the latest nav-3.0.0-1.noarch.rpm?
Service state monitoring is handled by one subsystem, while pping handles netbox state monitoring. pping usually operates at a higher frequency than servicemon, and should detect the server crash before servicemon. Both pping and statemon posts events to eventengine. Eventengine should wait for some seconds, before processing the event, to see if correlated events comes in. To debug this, I suggest that you check the log of pping and eventengine to see if a netbox down event is sent. -- Magnus Nordseth echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
From peder.sefland at hivolda.no Fri Jan 20 12:57:44 2006 From: peder.sefland at hivolda.no (Peder Magne Sefland) Date: Fri Jan 20 12:58:34 2006 Subject: SV: Re: [Nav-users] Service port is down on 158.XX.XX.XXX Message-ID: <s3d0de62.023@HVO-3.hivolda.no>
Magnus Nordseth <magnus@ntnu.no> 20.01.2006 11:20:49 >>> Peder Magne Sefland: This was a crash and should not be "Server up: Yes"
Some logg from servicemon.log [2006-01-20 10:34:06] abstractChecker.py:run:117 [Info] 158.XX.XX.XXX:port -> timed out [2006-01-20 10:34:06] abstractChecker.py:run:131 [Notice] 158.XX.XX.XXX:port -> State changed. New check in 5 sec. (DOWN, timed out)
This is normal if we stop the service that we check on the server
[2006-01-20 10:34:34] abstractChecker.py:run:117 [Info] 158.XX.XX.XXX:port -> (113, 'No route to host') [2006-01-20 10:34:34] abstractChecker.py:run:140 [Alert ] 158.XX.XX.XXX:port -> DOWN, (113, 'No route to host')
But these line are special for the server that crash and should not give "Server up: Yes"
Has this been solved in the latest nav-3.0.0-1.noarch.rpm?
Service state monitoring is handled by one subsystem, while pping handles netbox state monitoring. pping usually operates at a higher frequency than servicemon, and should detect the server crash before servicemon. Both pping and statemon posts events to eventengine. Eventengine should wait for some seconds, before processing the event, to see if correlated events comes in. To debug this, I suggest that you check the log of pping and eventengine to see if a netbox down event is sent. -- Magnus Nordseth echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc Thank you for your respond pping.log says [2006-01-20 10:34:46] pping.py:generateEvents:142 [Notice] 158.XX.XX.XXX (158.XX.XX.XXX) marked as down. [2006-01-20 10:36:46] pping.py:generateEvents:159 [Notice] 158.XX.XX.XXX (158.XX.XX.XXX) marked as up. eventEngine.log says Jan 20 10:34:47 2006 eventEngine BOX_STATE_EVENTHANDLER-6-HANDLE Box going down: Box [ip=158.XX.XX.XXX, sysname=158.XX.XX.XXX, status=up] Jan 20 10:35:47 2006 eventEngine BOX_STATE_EVENTHANDLER-6-CALLBACK Box down: 158.XX.XX.XXX If we look at the time in the alert-mail, it is send before pping dectect that the server is down
<alertengine@XXX.XXXX.no> 20.01.2006 10:35:22 >>> Service port on 158.XX.XX.XXX is down since 2006-01-20 10:34:34.
Peder
From kaa at msu.net Mon Jan 23 16:00:56 2006 From: kaa at msu.net (Alexander Krapivin) Date: Mon Jan 23 14:00:59 2006 Subject: [Nav-users] Errors while building NAV from sources Message-ID: <43D4D388.2060801@msu.net>
Hello! I'm trying to build NAV from latest trunk sources. Everything is OK, until 'make install'. Please, advice me where I'm wrong. While doing 'make install' i've got the following errors: --cut here-- compile: [javac] Compiling 1 source file to /usr/local/src/nav/trunk/src/Database/build [javac] /usr/local/src/nav/trunk/src/Database/src/no/ntnu/nav/Database/Database.java:40: package no.ntnu.nav.logger does not exist [javac] import no.ntnu.nav.logger.Log; [javac] ^ [javac] /usr/local/src/nav/trunk/src/Database/src/no/ntnu/nav/Database/Database.java:774: cannot find symbol [javac] symbol : variable Log [javac] location: class no.ntnu.nav.Database.Database [javac] Log.e("DATABASE-QUERY", "Got Exception; database is probably down: " + msg); [javac] ^ [javac] /usr/local/src/nav/trunk/src/Database/src/no/ntnu/nav/Database/Database.java:1024: cannot find symbol [javac] symbol : variable Log [javac] location: class no.ntnu.nav.Database.Database [javac] Log.e("DATABASE-UPDATE", "Got Exception; database is probably down: " + msg); [javac] ^ [javac] 3 errors BUILD FAILED /usr/local/src/nav/trunk/src/Database/build.xml:30: Compile failed; see the compiler error output for details. Total time: 1 second make[1]: *** [install] Error 1 make[1]: Leaving directory `/usr/local/src/nav/trunk/src' make: *** [install-all] Error 1 --cut here-- -- With best regards, Alexander Krapivin mailto:kaa@msu.net NOC MSUNET ICQ UIN:3967345 MSU, Moscow, Russia
From magnus at ntnu.no Mon Jan 23 14:59:34 2006 From: magnus at ntnu.no (Magnus Nordseth) Date: Mon Jan 23 14:59:37 2006 Subject: [Nav-users] Service port is down on 158.XX.XX.XXX In-Reply-To: <s3d0de62.024@HVO-3.hivolda.no> References: <s3d0de62.024@HVO-3.hivolda.no> Message-ID: <20060123135934.GA9534@stud.ntnu.no>
Peder Magne Sefland:
pping.log says [2006-01-20 10:34:46] pping.py:generateEvents:142 [Notice] 158.XX.XX.XXX (158.XX.XX.XXX) marked as down. [2006-01-20 10:36:46] pping.py:generateEvents:159 [Notice] 158.XX.XX.XXX (158.XX.XX.XXX) marked as up.
eventEngine.log says Jan 20 10:34:47 2006 eventEngine BOX_STATE_EVENTHANDLER-6-HANDLE Box going down: Box [ip=158.XX.XX.XXX, sysname=158.XX.XX.XXX, status=up] Jan 20 10:35:47 2006 eventEngine BOX_STATE_EVENTHANDLER-6-CALLBACK Box down: 158.XX.XX.XXX
If we look at the time in the alert-mail, it is send before pping dectect that the server is down
<alertengine@XXX.XXXX.no> 20.01.2006 10:35:22 >>> Service port on 158.XX.XX.XXX is down since 2006-01-20 10:34:34.
Peder, What is your pping and servicemon checkinterval? Check pping.conf and servicemon.conf. -- Magnus Nordseth echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
From peder.sefland at hivolda.no Mon Jan 23 15:30:21 2006 From: peder.sefland at hivolda.no (Peder Magne Sefland) Date: Mon Jan 23 15:30:48 2006 Subject: SV: Re: [Nav-users] Service port is down on 158.XX.XX.XXX Message-ID: <s3d4f69e.082@HVO-3.hivolda.no>
Magnus Nordseth <magnus@ntnu.no> 23.01.2006 14:59:34 >>> Peder Magne Sefland: pping.log says [2006-01-20 10:34:46] pping.py:generateEvents:142 [Notice] 158.XX.XX.XXX (158.XX.XX.XXX) marked as down. [2006-01-20 10:36:46] pping.py:generateEvents:159 [Notice] 158.XX.XX.XXX (158.XX.XX.XXX) marked as up.
eventEngine.log says Jan 20 10:34:47 2006 eventEngine BOX_STATE_EVENTHANDLER-6-HANDLE Box going down: Box [ip=158.XX.XX.XXX, sysname=158.XX.XX.XXX, status=up] Jan 20 10:35:47 2006 eventEngine BOX_STATE_EVENTHANDLER-6-CALLBACK Box down: 158.XX.XX.XXX
If we look at the time in the alert-mail, it is send before pping dectect that the server is down
<alertengine@XXX.XXXX.no> 20.01.2006 10:35:22 >>> Service port on 158.XX.XX.XXX is down since 2006-01-20 10:34:34.
Peder, What is your pping and servicemon checkinterval? Check pping.conf and servicemon.conf. pping.conf # How often do you want to ping checkinterval = 20# servicemon.conf How often do we want to check each service checkinterval = 60 Peder
participants (1)
-
peder.sefland@hivolda.no