Server Outage Report, 10/1/2014

The recent outage was a doozy, lasting over 12 hours. This entry will detail what happened, and what steps we are taking to mitigate the effect on our cloud beta testers. First, I received this email an hour before the servers went down:

Your security is important to use and we are therefore running an urgent
security update on our 1&1 DCS Servers this week. This is update is
necessary because a new vulnerability was discovered in the Citrix and Xen
Hypervisor, affecting all DCS servers.

We will begin resolving this issue shortly.

What this means for you: During the update, your server will be unavailable
between 10 minutes and 2 hours. Data loss is highly unlikely. We apologize
that we cannot give you a definite time frame for the maintenance of your
system.

Please also note: This vulnerability is not related to the current Shell
Shock vulnerability.

For more information about the vulnerability, refer to number XSA-108 on
the following page:
http://xenbits.xen.org/xsa/

We apologize for any inconvenience this may cause you.

In short, the vulnerability was leaked to security companies before the public, and the security people at the server company and their upstream data providers pulled the plug in order to prevent data theft of epic proportion.

What we learned, and what we’re changing:

  • The cloud will not progress out of beta until we have a fail-over server. That is, you’ll never know one is down unless both are down simultaneously, which is unlikely given that the average down-time for Cloud1.Amsoftcloud.com was under 300 seconds per month going back to Jan 1, 2014 before this outage, including scheduled reboots in the middle of the night after critical updates.
  • You need a quick way to know what is going on. To that end, I built http://status.myamsoft.com . At the very top of the page is a red or green light with a status message for the server. The site is phone and tablet friendly. In an upcoming Amsoft update, we will build a cloud status indicator into the program.
  • I need a quick way to tell you what is going on. To that end, I will be getting your email or phone text information so that I can get a message to each cloud user. I will get this information in the coming week and will have it ready for any future outages.

Post a Comment

Your email is never published nor shared. Required fields are marked *