A Lesson in Network Redundancy with Healthcare.gov

Redundant Switches in a NetworkThe new government healthcare website, Healthcare.gov, has recently been heavily featured on news networks for the myriad of problems that have repeatedly plagued the site. One of those problems was the service outage experienced at the end of October when their webhost, Verizon Terremark, experienced a networking component failure. The outage prevented consumers from utilizing the site’s application and enrollment system and further exacerbated the public’s already negative experiences with the site. This outage is a cautionary tale for web hosts everywhere and emphasizes the importance of network redundancies and transparency policies in a data center. When choosing a web host, keeping a few best practices in mind can help you avoid similar damaging outages.

While we can only speculate as to the nature of the failed networking component at Terremark’s data center, there are a number of redundancies that web hosts should have built into their network to minimize the risk of such issues occurring. A few things to keep an eye out for include:

  • Multiple transit providers: Multiple providers come into the data center through border routers, with a full view of the Internet from each provider. Because of this, traffic can easily failover between providers if needed.
  • Redundant fiber lines: To prevent fiber cuts from causing an outage, fiber lines are spread out on different paths between data centers, and from a major Point of Presence, PoP, (Chicago for Liquid Web) to each data center.
  • Redundant core routers: Multiple routers should exist for each data center in case of necessary failover.
  • Redundant distribution routers: Within the data center, multiple distribution routers for each section provide failover capabilities.
  • Redundant ethernet links to each rack switch: Multiple uplinks for each rack switch add redundancy and additional capacity.
  • Redundant power supplies: Redundant power to equipment prevents outages and keeps the network running.

There are additional redundancies that may not come with every hosting plan, but are well worth the cost of upgrading. Whether or not you choose to use these added features often depends on individual customer needs. Such features include:

  • Firewalls: Allow redundant uplinks and Active/Standby
  • Load Balancers: Cluster of servers with failover abilities
  • Global Load Balancing: Distributes traffic across multiple locations and data centers based on availability and performance
  • Switches: Multiple switches for added redundancy
  • Servers: Specific servers support multiple uplinks to a customer’s private switch.

Not only is built in network redundancy important, but so is finding a web host that will solve potential problems quickly and offer complete transparency regarding the cause of a problem. If and when a problem happens, customer support should be onsite, around the clock to fix it. After the problem is fixed, a full report should be provided to customers about what happened, why it happened and what the web host is doing to prevent it from happening again.

One example of effective transparency in customer support is when Liquid Web discovered an issue with one of our Power Distribution Units (PDU) earlier this year. While that situation was not a networking issue like Verizon Terremark experienced, it does showcase complete transparency. We were able to fix the issue, and provided our customers with regular updates as to its status. No matter what type of problem occurs, 100% of your web hosts’ efforts should be devoted to fixing the problem and keeping customers informed throughout the process.

Downtime can happen to anyone, as the team behind Healthcare.gov recently found out, but a prepared web host can mitigate potential issues from even occurring. Looking for these best practices when choosing your web host will help you avoid unfortunate outages. At Liquid Web we make sure to follow these best practices as well, and our network comes complete with a comprehensive set of redundancies to minimize issues. Not to mention, with our responsive, 24/7/365 Heroic Support team we can be sure to keep our customers’ sites up and running, no matter what.

Have you experienced an outage due to a networking problem? How did your web host respond?