Hello all,
The metrics that were currently collect only keep increasing overtime. We would like to separate the various parts of NAV on to different servers to spread the resource burden. I have explored the idea of moving postgres and carbon to a different server. What would you recommend? Can I get away with just clustering Carbon?
Thanks,
William Daly Network Specialist Vacaville Unified School District (707) 453-6170
On Thu, 16 Aug 2018 21:23:13 +0000 "William Daly - Network Specialist, VUSD Technology" WilliamD@VUSD.SolanoCOE.K12.CA.US wrote:
The metrics that were currently collect only keep increasing overtime. We would like to separate the various parts of NAV on to different servers to spread the resource burden.
I assume you've already gone down the path of multiprocess ipdevpoll?
I have explored the idea of moving postgres and carbon to a different server. What would you recommend? Can I get away with just clustering Carbon?
The largest installation under our control consists of three separate hardware servers: One for NAV, one for PostgreSQL, and one for Graphite/Carbon.
This split is simple to execute, and quickly improves the performance of large installations. We always recommend storing Graphite data on SSDs, and we never run less than 2 carbon-cache processes (fronted by carbon-relay) on a server. On a dedicated server, one carbon-cache process per core is probably a good goal.
We do have some future plans for extending the ipdevpoll multiprocess mode to include remote processes for either distributed collection or proxying, but I at the moment I cannot say when we will have time to look at it.
Hello,
So I have 5 separate VM's running now. One dedicated to NAV and PostgreSQL, with ipdevpoll running in multiprocessor mode.
Another is dedicated to carbon-relay and the main graphite webapp. This small VM sends the metrics to three other servers running carbon-caches (also fronted by carbon-relay). The three dedicated servers are older (one quad-core cpu), although I added SSD's to them. This is why my installation is scaled in a more complicated manner. I do hope this all makes sense.
Everything is now running great, however there is one issue: When viewing the netmap, the bandwidth usage does not update across all links. Some links show a color for the corresponding load, but some show as grey and never update.
Of course this issue arose after I moved Graphite/Carbon into four separate servers.
I hope this makes sense. Thank you for your guys' time.
-----Original Message----- From: nav-users-request@uninett.no [mailto:nav-users-request@uninett.no] On Behalf Of Morten Brekkevold Sent: Friday, August 31, 2018 4:25 AM To: William Daly - Network Specialist, VUSD Technology WilliamD@VUSD.SolanoCOE.K12.CA.US Cc: nav-users@uninett.no Subject: Re: Separating back-end processes
On Thu, 16 Aug 2018 21:23:13 +0000 "William Daly - Network Specialist, VUSD Technology" WilliamD@VUSD.SolanoCOE.K12.CA.US wrote:
The metrics that were currently collect only keep increasing overtime. We would like to separate the various parts of NAV on to different servers to spread the resource burden.
I assume you've already gone down the path of multiprocess ipdevpoll?
I have explored the idea of moving postgres and carbon to a different server. What would you recommend? Can I get away with just clustering Carbon?
The largest installation under our control consists of three separate hardware servers: One for NAV, one for PostgreSQL, and one for Graphite/Carbon.
This split is simple to execute, and quickly improves the performance of large installations. We always recommend storing Graphite data on SSDs, and we never run less than 2 carbon-cache processes (fronted by carbon-relay) on a server. On a dedicated server, one carbon-cache process per core is probably a good goal.
We do have some future plans for extending the ipdevpoll multiprocess mode to include remote processes for either distributed collection or proxying, but I at the moment I cannot say when we will have time to look at it.
-- mvh Morten Brekkevold Uninett
Hello,
I have attached a copy of the error I am receiving. I have tried researching the issue, but I am lost as to where to look. Perhaps one of you can point me in the right direction.
Internal Server Error: /netmap/traffic/layer2/ Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 111, in get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python2.7/dist-packages/django/views/decorators/cache.py", line 52, in _wrapped_view_func response = view_func(request, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/django/views/decorators/csrf.py", line 57, in wrapped_view return view_func(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/django/views/generic/base.py", line 69, in view return self.dispatch(request, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/rest_framework/views.py", line 403, in dispatch response = self.handle_exception(exc) File "/usr/lib/python2.7/dist-packages/rest_framework/views.py", line 400, in dispatch response = handler(request, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/nav/web/netmap/views.py", line 110, in get traffic = get_layer2_traffic(roomid) File "/usr/lib/python2.7/dist-packages/nav/web/netmap/cache.py", line 43, in get_traffic result = func(location_or_room_id) File "/usr/lib/python2.7/dist-packages/nav/web/netmap/graph.py", line 175, in get_layer2_traffic get_traffic_interfaces(edges, interfaces)) File "/usr/lib/python2.7/dist-packages/nav/netmap/traffic.py", line 105, in get_traffic_for data = get_metric_average(sorted(targets), start=TRAFFIC_TIMEPERIOD) File "/usr/lib/python2.7/dist-packages/nav/metrics/data.py", line 46, in get_metric_average data = get_metric_data(target, start, end) File "/usr/lib/python2.7/dist-packages/nav/metrics/data.py", line 122, in get_metric_data "{0} is unreachable".format(base), err) GraphiteUnreachableError: http://10.170.199.21:80/ is unreachable (HTTP Error 400: Bad Request)
Request repr(): <WSGIRequest path:/netmap/traffic/layer2/, GET:<QueryDict: {u'_': [u'1536697886508']}>, POST:<QueryDict: {}>, COOKIES:{'nav_sessionid': 'wcfgw8hkv5k0ig9vquadgbldzgf0pakp'}, META:{'CONTEXT_DOCUMENT_ROOT': '/usr/share/nav/htdocs', 'CONTEXT_PREFIX': '', 'DOCUMENT_ROOT': '/usr/share/nav/htdocs', 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTP_ACCEPT': 'application/json, text/javascript, */*; q=0.01', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.9', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_COOKIE': 'nav_sessionid=wcfgw8hkv5k0ig9vquadgbldzgf0pakp', 'HTTP_HOST': '10.170.199.20', 'HTTP_REFERER': 'http://10.170.199.20/netmap/', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36', 'HTTP_X_REQUESTED_WITH': 'XMLHttpRequest', 'PATH_INFO': u'/netmap/traffic/layer2/', 'PATH_TRANSLATED': '/usr/share/pyshared/nav/wsgi.py/netmap/traffic/layer2/', 'QUERY_STRING': '_=1536697886508', 'REMOTE_ADDR': '10.170.86.209', 'REMOTE_PORT': '21854', 'REQUEST_METHOD': 'GET', 'REQUEST_SCHEME': 'http', 'REQUEST_URI': '/netmap/traffic/layer2/?_=1536697886508', 'SCRIPT_FILENAME': '/usr/share/pyshared/nav/wsgi.py', 'SCRIPT_NAME': u'', 'SERVER_ADDR': '10.170.199.20', 'SERVER_ADMIN': 'williamd@vacavilleusd.org', 'SERVER_NAME': '10.170.199.20', 'SERVER_PORT': '80', 'SERVER_PROTOCOL': 'HTTP/1.1', 'SERVER_SIGNATURE': '<address>Apache/2.4.10 (Debian) Server at 10.170.199.20 Port 80</address>\n', 'SERVER_SOFTWARE': 'Apache/2.4.10 (Debian)', 'apache.version': (2, 4, 10), 'mod_wsgi.application_group': '', 'mod_wsgi.callable_object': 'application', 'mod_wsgi.daemon_connects': '1', 'mod_wsgi.daemon_restarts': '0', 'mod_wsgi.daemon_start': '1536699753682271', 'mod_wsgi.enable_sendfile': '0', 'mod_wsgi.handler_script': '', 'mod_wsgi.input_chunked': '0', 'mod_wsgi.listener_host': '', 'mod_wsgi.listener_port': '80', 'mod_wsgi.process_group': 'NAV', 'mod_wsgi.queue_start': '1536699753681908', 'mod_wsgi.request_handler': 'wsgi-script', 'mod_wsgi.request_start': '1536699753681569', 'mod_wsgi.script_reloading': '1', 'mod_wsgi.script_start': '1536699753682399', 'mod_wsgi.version': (4, 3, 0), 'wsgi.errors': <mod_wsgi.Log object at 0x7fb870651db0>, 'wsgi.file_wrapper': <type 'mod_wsgi.FileWrapper'>, 'wsgi.input': <mod_wsgi.Input object at 0x7fb85aa533b0>, 'wsgi.multiprocess': True, 'wsgi.multithread': True, 'wsgi.run_once': False, 'wsgi.url_scheme': 'http', 'wsgi.version': (1, 0)}>
-----Original Message----- From: nav-users-request@uninett.no [mailto:nav-users-request@uninett.no] On Behalf Of William Daly - Network Specialist, VUSD Technology Sent: Friday, August 31, 2018 12:13 PM To: Morten Brekkevold morten.brekkevold@uninett.no Cc: nav-users@uninett.no Subject: RE: Separating back-end processes
Hello,
So I have 5 separate VM's running now. One dedicated to NAV and PostgreSQL, with ipdevpoll running in multiprocessor mode.
Another is dedicated to carbon-relay and the main graphite webapp. This small VM sends the metrics to three other servers running carbon-caches (also fronted by carbon-relay). The three dedicated servers are older (one quad-core cpu), although I added SSD's to them. This is why my installation is scaled in a more complicated manner. I do hope this all makes sense.
Everything is now running great, however there is one issue: When viewing the netmap, the bandwidth usage does not update across all links. Some links show a color for the corresponding load, but some show as grey and never update.
Of course this issue arose after I moved Graphite/Carbon into four separate servers.
I hope this makes sense. Thank you for your guys' time.
-----Original Message----- From: nav-users-request@uninett.no [mailto:nav-users-request@uninett.no] On Behalf Of Morten Brekkevold Sent: Friday, August 31, 2018 4:25 AM To: William Daly - Network Specialist, VUSD Technology WilliamD@VUSD.SolanoCOE.K12.CA.US Cc: nav-users@uninett.no Subject: Re: Separating back-end processes
On Thu, 16 Aug 2018 21:23:13 +0000 "William Daly - Network Specialist, VUSD Technology" WilliamD@VUSD.SolanoCOE.K12.CA.US wrote:
The metrics that were currently collect only keep increasing overtime. We would like to separate the various parts of NAV on to different servers to spread the resource burden.
I assume you've already gone down the path of multiprocess ipdevpoll?
I have explored the idea of moving postgres and carbon to a different server. What would you recommend? Can I get away with just clustering Carbon?
The largest installation under our control consists of three separate hardware servers: One for NAV, one for PostgreSQL, and one for Graphite/Carbon.
This split is simple to execute, and quickly improves the performance of large installations. We always recommend storing Graphite data on SSDs, and we never run less than 2 carbon-cache processes (fronted by carbon-relay) on a server. On a dedicated server, one carbon-cache process per core is probably a good goal.
We do have some future plans for extending the ipdevpoll multiprocess mode to include remote processes for either distributed collection or proxying, but I at the moment I cannot say when we will have time to look at it.
-- mvh Morten Brekkevold Uninett
On Tue, 11 Sep 2018 22:28:57 +0000 "William Daly - Network Specialist, VUSD Technology" WilliamD@VUSD.SolanoCOE.K12.CA.US wrote:
I have attached a copy of the error I am receiving. I have tried researching the issue, but I am lost as to where to look. Perhaps one of you can point me in the right direction.
[snip]
GraphiteUnreachableError: http://10.170.199.21:80/ is unreachable (HTTP Error 400: Bad Request)
This means the (graphite) web server thinks the request NAV sent was a bad HTTP request.
You should check the graphite-web server's logs to see what the actual request was. NAV may also have logged what it did (which should have ended up in your Apache error log, in that case).
Newer graphite-web versions have become more picky about what requests they will accept, so it would be interesting to know the actual request NAV produced.
GraphiteUnreachableError: http://10.170.199.21:80/ is unreachable (HTTP Error
400: Bad Request)
This means the (graphite) web server thinks the request NAV sent was a bad HTTP request.
You should check the graphite-web server's logs to see what the actual request was. NAV may also have logged what it did (which should have ended up in your Apache error log, in that case).
Newer graphite-web versions have become more picky about what requests they will accept, so it would be interesting to know the actual request NAV produced.
We're seeing a lot of these errors too.
Andreas Dobloug USIT/UiO