Hi,
we switched back to our normal production system at 11:50 today. The cause of the error appears to be a bug in the load balancers.
The timeline of the error is something like:
1. feide-prod02 is removed from the load balancer at 10:00:04. 2. feide-prod02 is upgraded & rebooted. 3. feide-prod04 is removed from the load balancer at 10:06:15. 4. feide-prod02 is added back to the load balancer at 10:07:10. 5. feide-prod04 is upgraded. 6. feide-prod04 is rebooted at 10:13. 7. The load balancer fails at 10:13:42. 8. We switch to the backup system at 10:21:33.
When the load balancer failed, it stopped passing traffic to any of the backend server. However, it didn't fail completely, so the standby load balancer didn't take over automatically.
Best regards, Olav Morken UNINETT / Feide
On Wed, Jan 28, 2015 at 10:42:36 +0100, Olav Morken wrote:
Hi,
unfortunately something went wrong during the update, and as one of the servers was taken down for reboot, the load balancers stopped distributing traffic to the other servers.
The consequence was that Feide login was unavailable from 10:13:30 until 10:21:30, at which point we switched to our backup system. We are currently running on the backup system, and will continue do to so until the problems on the production system is diagnosed.
Best regards, Olav Morken UNINETT / Feide
On Wed, Jan 28, 2015 at 09:05:04 +0100, Olav Morken wrote:
Hi,
due to a recently disclosed vulnerability in glibc, we will be rebooting the Feide production servers between 10:00 and 12:00 today. We are not aware of any applications on the login servers that are vulnerable to the issue, but to be certain the servers will be rebooted.
Since the servers will be rebooted one-by-one, the reboots will not directly affect user logins. However, because all Feide session data is stored in-memory, user who have logged in before we the reboot will lose their single-signon session and must enter their username and password the next time they access the Feide login page.
For more information about the vulnerability, see the following article at Ars Technica:
http://arstechnica.com/security/2015/01/highly-critical-ghost-allowing-code-...
Best regards, Olav Morken UNINETT / Feide