Hello,
I am currently experiencing an issue with multicore with IPdevpool being enabled and encountering many errors.
Essentially when I enable multi core support, the web gui stops rendering graphs at a high frequency and things do not load. In the logs it appears it shows errors with django. As soon as multi core support is turned off everything returns to normal, however my graphs become very spotty.
Thanks, Will
On Wed, 11 Oct 2017 17:35:37 +0000 "William Daly - Network Specialist, VUSD Technology" WilliamD@VUSD.SolanoCOE.K12.CA.US wrote:
I am currently experiencing an issue with multicore with IPdevpool being enabled and encountering many errors.
Essentially when I enable multi core support, the web gui stops rendering graphs at a high frequency and things do not load. In the logs it appears it shows errors with django. As soon as multi core support is turned off everything returns to normal, however my graphs become very spotty.
Hi,
the ipdevpoll multiprocess mode is still considered somewhat experimental. It is not a good idea to enable it without adjusting it's settings.
If you run ipdevpoll with `-m`, but don't specify a number of worker processes, it will automatically detect the number of cores and start a corresponding number of workers (which isn't necessarily a good idea). Each of those workers will start with a thread pool size of 10, meaning they will have 10 separate connections to PostgreSQL each. By default, PostgreSQL is configured to accept a maximum of 100 connections (having too many connections will slow things down).
So, you see, if you have a lot of cores, you will quickly eat up all the available database connections using ipdevpoll alone (8 cores will cause it to grab 80 connections!), and leave none for the other NAV daemons and the NAV web interface.
These are just general observations - since you provide no log excerpts or other details, I cannot say for sure what your exact problem is.
Hello,
Thank you for the assistance.
Below is a copy of one the errors:
Internal Server Error: /graphite/render Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 87, in get_response response = middleware_method(request) File "/usr/lib/python2.7/dist-packages/nav/django/auth.py", line 45, in process_request if ACCOUNT_ID_VAR not in session: File "/usr/lib/python2.7/dist-packages/django/contrib/sessions/backends/base.py", line 46, in __contains__ return key in self._session File "/usr/lib/python2.7/dist-packages/django/contrib/sessions/backends/base.py", line 182, in _get_session self._session_cache = self.load() File "/usr/lib/python2.7/dist-packages/django/contrib/sessions/backends/db.py", line 21, in load expire_date__gt=timezone.now() File "/usr/lib/python2.7/dist-packages/django/db/models/manager.py", line 92, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 351, in get num = len(clone) File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 122, in __len__ self._fetch_all() File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 966, in _fetch_all self._result_cache = list(self.iterator()) File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 265, in iterator for row in compiler.results_iter(): File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 701, in results_iter for rows in self.execute_sql(MULTI): File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 785, in execute_sql cursor = self.connection.cursor() File "/usr/lib/python2.7/dist-packages/django/db/backends/__init__.py", line 167, in cursor cursor = utils.CursorWrapper(self._cursor(), self) File "/usr/lib/python2.7/dist-packages/django/db/backends/__init__.py", line 138, in _cursor self.ensure_connection() File "/usr/lib/python2.7/dist-packages/django/db/backends/__init__.py", line 133, in ensure_connection self.connect() File "/usr/lib/python2.7/dist-packages/django/db/utils.py", line 94, in __exit__ six.reraise(dj_exc_type, dj_exc_value, traceback) File "/usr/lib/python2.7/dist-packages/django/db/backends/__init__.py", line 133, in ensure_connection self.connect() File "/usr/lib/python2.7/dist-packages/django/db/backends/__init__.py", line 122, in connect self.connection = self.get_new_connection(conn_params) File "/usr/lib/python2.7/dist-packages/django/db/backends/postgresql_psycopg2/base.py", line 130, in get_new_connection connection = Database.connect(**conn_params) File "/usr/lib/python2.7/dist-packages/psycopg2/__init__.py", line 164, in connect conn = _connect(dsn, connection_factory=connection_factory, async=async) OperationalError: FATAL: remaining connection slots are reserved for non-replication superuser connections
Request repr(): <WSGIRequest path:/graphite/render, GET:<QueryDict: {u'from': [u'-1day'], u'target': [u'alias(scaleToSeconds(nonNegativeDerivative(scale(nav.devices.esc-main-router.ports.A1.ifInOctets,8)),1),"In")', u'alias(scaleToSeconds(nonNegativeDerivative(scale(nav.devices.esc-main-router.ports.A1.ifOutOctets,8)),1),"Out")'], u'title': [u'Traffic on esc-main-router:A1 (10Gb to Comcast-Sites)'], u'height': [u'250'], u'width': [u'465'], u'template': [u'nav'], u'vtitle': [u'bits/s'], u'until': [u'now']}>, POST:<QueryDict: {}>, COOKIES:{'nav_sessionid': 'd9eq8pbvd8cpnxg1yz9t2tl4g5blshby'}, META:{'CONTEXT_DOCUMENT_ROOT': '/usr/share/nav/htdocs', 'CONTEXT_PREFIX': '', 'DOCUMENT_ROOT': '/usr/share/nav/htdocs', 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTP_ACCEPT': 'image/webp,image/apng,image/*,*/*;q=0.8', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.8', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_COOKIE': 'nav_sessionid=d9eq8pbvd8cpnxg1yz9t2tl4g5blshby', 'HTTP_HOST': '10.170.199.20', 'HTTP_REFERER': 'http://10.170.199.20/', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36', 'PATH_INFO': u'/graphite/render', 'PATH_TRANSLATED': '/usr/share/pyshared/nav/wsgi.py/graphite/render', 'QUERY_STRING': 'from=-1day&target=alias%28scaleToSeconds%28nonNegativeDerivative%28scale%28nav.devices.esc-main-router.ports.A1.ifInOctets%2C8%29%29%2C1%29%2C%22In%22%29&target=alias%28scaleToSeconds%28nonNegativeDerivative%28scale%28nav.devices.esc-main-router.ports.A1.ifOutOctets%2C8%29%29%2C1%29%2C%22Out%22%29&title=Traffic+on+esc-main-router%3AA1+%2810Gb+to+Comcast-Sites%29&height=250&width=465&template=nav&vtitle=bits%2Fs&until=now', 'REMOTE_ADDR': '10.170.86.209', 'REMOTE_PORT': '56455', 'REQUEST_METHOD': 'GET', 'REQUEST_SCHEME': 'http', 'REQUEST_URI': '/graphite/render?from=-1day&target=alias%28scaleToSeconds%28nonNegativeDerivative%28scale%28nav.devices.esc-main-router.ports.A1.ifInOctets%2C8%29%29%2C1%29%2C%22In%22%29&target=alias%28scaleToSeconds%28nonNegativeDerivative%28scale%28nav.devices.esc-main-router.ports.A1.ifOutOctets%2C8%29%29%2C1%29%2C%22Out%22%29&title=Traffic+on+esc-main-router%3AA1+%2810Gb+to+Comcast-Sites%29&height=250&width=465&template=nav&vtitle=bits%2Fs&until=now', 'SCRIPT_FILENAME': '/usr/share/pyshared/nav/wsgi.py', 'SCRIPT_NAME': u'', 'SERVER_ADDR': '10.170.199.20', 'SERVER_ADMIN': 'webmaster@localhost', 'SERVER_NAME': '10.170.199.20', 'SERVER_PORT': '80', 'SERVER_PROTOCOL': 'HTTP/1.1', 'SERVER_SIGNATURE': '<address>Apache/2.4.10 (Debian) Server at 10.170.199.20 Port 80</address>\n', 'SERVER_SOFTWARE': 'Apache/2.4.10 (Debian)', 'apache.version': (2, 4, 10), 'mod_wsgi.application_group': '', 'mod_wsgi.callable_object': 'application', 'mod_wsgi.daemon_connects': '1', 'mod_wsgi.daemon_restarts': '0', 'mod_wsgi.daemon_start': '1507916018443386', 'mod_wsgi.enable_sendfile': '0', 'mod_wsgi.handler_script': '', 'mod_wsgi.input_chunked': '0', 'mod_wsgi.listener_host': '', 'mod_wsgi.listener_port': '80', 'mod_wsgi.process_group': 'NAV', 'mod_wsgi.queue_start': '1507916018443231', 'mod_wsgi.request_handler': 'wsgi-script', 'mod_wsgi.request_start': '1507916018443054', 'mod_wsgi.script_reloading': '1', 'mod_wsgi.script_start': '1507916018443415', 'mod_wsgi.version': (4, 3, 0), 'wsgi.errors': <mod_wsgi.Log object at 0x7fc85fd7a978>, 'wsgi.file_wrapper': <type 'mod_wsgi.FileWrapper'>, 'wsgi.input': <mod_wsgi.Input object at 0x7fc8644419b0>, 'wsgi.multiprocess': True, 'wsgi.multithread': True, 'wsgi.run_once': False, 'wsgi.url_scheme': 'http', 'wsgi.version': (1, 0)}>
-----Original Message----- From: Morten Brekkevold [mailto:morten.brekkevold@uninett.no] Sent: Friday, October 13, 2017 3:32 AM To: William Daly - Network Specialist, VUSD Technology Cc: nav-users@uninett.no Subject: Re: Multicore support IPdevpool and Django Errors
On Wed, 11 Oct 2017 17:35:37 +0000 "William Daly - Network Specialist, VUSD Technology" WilliamD@VUSD.SolanoCOE.K12.CA.US wrote:
I am currently experiencing an issue with multicore with IPdevpool being enabled and encountering many errors.
Essentially when I enable multi core support, the web gui stops rendering graphs at a high frequency and things do not load. In the logs it appears it shows errors with django. As soon as multi core support is turned off everything returns to normal, however my graphs become very spotty.
Hi,
the ipdevpoll multiprocess mode is still considered somewhat experimental. It is not a good idea to enable it without adjusting it's settings.
If you run ipdevpoll with `-m`, but don't specify a number of worker processes, it will automatically detect the number of cores and start a corresponding number of workers (which isn't necessarily a good idea). Each of those workers will start with a thread pool size of 10, meaning they will have 10 separate connections to PostgreSQL each. By default, PostgreSQL is configured to accept a maximum of 100 connections (having too many connections will slow things down).
So, you see, if you have a lot of cores, you will quickly eat up all the available database connections using ipdevpoll alone (8 cores will cause it to grab 80 connections!), and leave none for the other NAV daemons and the NAV web interface.
These are just general observations - since you provide no log excerpts or other details, I cannot say for sure what your exact problem is.
-- Morten Brekkevold UNINETT
On Fri, 13 Oct 2017 17:36:53 +0000 "William Daly - Network Specialist, VUSD Technology" WilliamD@VUSD.SolanoCOE.K12.CA.US wrote:
OperationalError: FATAL: remaining connection slots are reserved for non-replication superuser connections
As I suspected, you need to look at controlling the number of ipdevpoll processes and their threadpool sizes (you could also increase the number of allowed connections in PostgreSQL, but I would not normally recommend that).