Hi,
due to a recently disclosed vulnerability in glibc, we will be rebooting the Feide production servers between 10:00 and 12:00 today. We are not aware of any applications on the login servers that are vulnerable to the issue, but to be certain the servers will be rebooted.
Since the servers will be rebooted one-by-one, the reboots will not directly affect user logins. However, because all Feide session data is stored in-memory, user who have logged in before we the reboot will lose their single-signon session and must enter their username and password the next time they access the Feide login page.
For more information about the vulnerability, see the following article at Ars Technica:
http://arstechnica.com/security/2015/01/highly-critical-ghost-allowing-code-...
Best regards, Olav Morken UNINETT / Feide
Hi,
unfortunately something went wrong during the update, and as one of the servers was taken down for reboot, the load balancers stopped distributing traffic to the other servers.
The consequence was that Feide login was unavailable from 10:13:30 until 10:21:30, at which point we switched to our backup system. We are currently running on the backup system, and will continue do to so until the problems on the production system is diagnosed.
Best regards, Olav Morken UNINETT / Feide
On Wed, Jan 28, 2015 at 09:05:04 +0100, Olav Morken wrote:
Hi,
due to a recently disclosed vulnerability in glibc, we will be rebooting the Feide production servers between 10:00 and 12:00 today. We are not aware of any applications on the login servers that are vulnerable to the issue, but to be certain the servers will be rebooted.
Since the servers will be rebooted one-by-one, the reboots will not directly affect user logins. However, because all Feide session data is stored in-memory, user who have logged in before we the reboot will lose their single-signon session and must enter their username and password the next time they access the Feide login page.
For more information about the vulnerability, see the following article at Ars Technica:
http://arstechnica.com/security/2015/01/highly-critical-ghost-allowing-code-...
Best regards, Olav Morken UNINETT / Feide
Hi,
we switched back to our normal production system at 11:50 today. The cause of the error appears to be a bug in the load balancers.
The timeline of the error is something like:
1. feide-prod02 is removed from the load balancer at 10:00:04. 2. feide-prod02 is upgraded & rebooted. 3. feide-prod04 is removed from the load balancer at 10:06:15. 4. feide-prod02 is added back to the load balancer at 10:07:10. 5. feide-prod04 is upgraded. 6. feide-prod04 is rebooted at 10:13. 7. The load balancer fails at 10:13:42. 8. We switch to the backup system at 10:21:33.
When the load balancer failed, it stopped passing traffic to any of the backend server. However, it didn't fail completely, so the standby load balancer didn't take over automatically.
Best regards, Olav Morken UNINETT / Feide
On Wed, Jan 28, 2015 at 10:42:36 +0100, Olav Morken wrote:
Hi,
unfortunately something went wrong during the update, and as one of the servers was taken down for reboot, the load balancers stopped distributing traffic to the other servers.
The consequence was that Feide login was unavailable from 10:13:30 until 10:21:30, at which point we switched to our backup system. We are currently running on the backup system, and will continue do to so until the problems on the production system is diagnosed.
Best regards, Olav Morken UNINETT / Feide
On Wed, Jan 28, 2015 at 09:05:04 +0100, Olav Morken wrote:
Hi,
due to a recently disclosed vulnerability in glibc, we will be rebooting the Feide production servers between 10:00 and 12:00 today. We are not aware of any applications on the login servers that are vulnerable to the issue, but to be certain the servers will be rebooted.
Since the servers will be rebooted one-by-one, the reboots will not directly affect user logins. However, because all Feide session data is stored in-memory, user who have logged in before we the reboot will lose their single-signon session and must enter their username and password the next time they access the Feide login page.
For more information about the vulnerability, see the following article at Ars Technica:
http://arstechnica.com/security/2015/01/highly-critical-ghost-allowing-code-...
Best regards, Olav Morken UNINETT / Feide
Hi,
a delayed update wrt. the cause of the error on the production system Wednesday 28 January:
The cause was not as we believed earlier due to an error in the load balancer. Instead, the problem was caused by feide-prod04 "stealing" the IP address of the load balancer when it was rebooted.
Due to the way our load balancer works, the public IP of idp.feide.no is configured on the loopback interface on each of the production servers. During normal operation, these servers will not respond directly to ARP requests from the routers for this address, so all traffic will go via the loadbalancer.
Unfortunately, due to a configuration problem, there is a tiny window during system startup where the servers can respond to ARP requests for this address.
If we are unlucky, the routers may send an ARP request during this window, and when they receive a ARP response from the production server they will start sending traffic directly to that one, instead of the load balancers.
Based on the MAC address history of the routers, this is the most likely cause of the incident that occurred. To prevent it from occurring again, the most promising solution is to make sure that the configuration that prevent the ARP responses from being sent is applied before the network link is brought up.
Best regards, Olav Morken UNINETT / Feide
On Wed, Jan 28, 2015 at 15:01:55 +0100, Olav Morken wrote:
Hi,
we switched back to our normal production system at 11:50 today. The cause of the error appears to be a bug in the load balancers.
The timeline of the error is something like:
- feide-prod02 is removed from the load balancer at 10:00:04.
- feide-prod02 is upgraded & rebooted.
- feide-prod04 is removed from the load balancer at 10:06:15.
- feide-prod02 is added back to the load balancer at 10:07:10.
- feide-prod04 is upgraded.
- feide-prod04 is rebooted at 10:13.
- The load balancer fails at 10:13:42.
- We switch to the backup system at 10:21:33.
When the load balancer failed, it stopped passing traffic to any of the backend server. However, it didn't fail completely, so the standby load balancer didn't take over automatically.
Best regards, Olav Morken UNINETT / Feide
On Wed, Jan 28, 2015 at 10:42:36 +0100, Olav Morken wrote:
Hi,
unfortunately something went wrong during the update, and as one of the servers was taken down for reboot, the load balancers stopped distributing traffic to the other servers.
The consequence was that Feide login was unavailable from 10:13:30 until 10:21:30, at which point we switched to our backup system. We are currently running on the backup system, and will continue do to so until the problems on the production system is diagnosed.
Best regards, Olav Morken UNINETT / Feide
On Wed, Jan 28, 2015 at 09:05:04 +0100, Olav Morken wrote:
Hi,
due to a recently disclosed vulnerability in glibc, we will be rebooting the Feide production servers between 10:00 and 12:00 today. We are not aware of any applications on the login servers that are vulnerable to the issue, but to be certain the servers will be rebooted.
Since the servers will be rebooted one-by-one, the reboots will not directly affect user logins. However, because all Feide session data is stored in-memory, user who have logged in before we the reboot will lose their single-signon session and must enter their username and password the next time they access the Feide login page.
For more information about the vulnerability, see the following article at Ars Technica:
http://arstechnica.com/security/2015/01/highly-critical-ghost-allowing-code-...
Best regards, Olav Morken UNINETT / Feide