Liquid Web’s Incident Management Process

Posted on by Nathan Mollenkopf | Updated:
Reading Time: 4 minutes

Incident Management is the process of managing the lifecycle of all incidents, with the primary objective of returning systems to their working state as quickly as possible. Using ITIL (Information Technology Infrastructure Library) guidelines, Liquid Web developed and follows our own Incident Management Process. This process covers all of our organization’s components, both customer-facing and internal. 

Goals of Incident Management

Products

The primary goal of the Incident Management process is to provide the best possible products to our customers. Incident Management works to achieve this through internal monitoring of our infrastructure and continual improvement through new releases.

Liquid Web has many hosting solutions to meet your needs, whether it is a single-tenant dedicated server, high performance VPS, or scalable Cloud servers. You can also enhance your server’s security and protect against vulnerabilities with one of our protection package bundles or a la cart services.

Transparent Communication

Throughout an incident, especially when unplanned interruptions occur, transparent communication is a priority. We ensure information is available for all of our teams to view in real-time, and our status pages are used for posting customer-facing notifications. In addition to the status pages, our Support teams are available to answer customer questions to provide more information. These steps keep all parties informed so that decisions can be made with complete information.

Identification and Prioritization

Quickly identifying and prioritizing an incident reduces potential impacts caused by the incident. Once an incident is identified and prioritized, we alert the proper incident response team to begin remediation steps. To this end, Liquid Web maintains internal monitoring systems that send alerts when a system enters into an error state. These monitoring tools can also provide a general idea of the scale of an incident. The larger the scale, the higher the priority and response. 

Minimize or Prevent Impact of Future Incidents

Once an incident has been resolved, a Root Cause Analysis (RCA) is conducted. A Root Cause Analysis includes a wide variety of approaches and techniques to identify and uncover the cause of a problem. During this meeting, the incident’s root cause is confirmed, and then action items are identified. The goal of these actions is to implement updates that will reduce the impact of future similar incidents or prevent them entirely. 

Incident Management Roles

Each member of our organization has a role defined in our Incident Management Process, which includes employees from every team, and executives during high-priority incidents. Support teams work directly with customers to help with communication and implementation of any initial mitigation. Operations and engineering teams work to identify the root cause and then implement updates to resolve the incident. This holistic view of responsibility ensures incidents are resolved efficiently and completely while maintaining transparency.

A high-priority incident is an incident that is or can potentially affect a large number of our customers. An example of this is the vulnerability found within the sudo package that affected a considerable portion of Linux servers and devices. Many teams across our organization worked to successfully resolve this incident.

External Customer Communication

There are multiple ways customers can obtain information about ongoing incidents. This can be accomplished by viewing their customer account, status update pages, or contacting our Support teams.

My Liquid Web Account

Access your My Liquid Web Account to determine the status of your server if you are experiencing server issues or access difficulties. To reach one of our Helpful Humans for assistance, customers can open a ticket or live chat.

Status Page Updates

Throughout an incident, we communicate externally to our customers through our public-facing status pages:

These pages also notify customers of upcoming maintenances. Users can subscribe to receive page update notifications by clicking the Subscribe To Updates button located at the top of each status page.

subscribe-to-updates

We strongly recommend signing up for notifications to receive important updates from our teams. Updates regarding an incident are posted from discovery through resolution and recovery.

Contacting Support

During an incident, our Support teams are available to provide more information via phone, support ticket, or live chat, as well as the Help section of your My Liquid Web account.

my-support-tickets

Incident Review

After an incident has been resolved, stakeholders meet to discuss an incident in an RCA. During the meeting, the root cause of the incident is discussed. Action items are identified to be enacted. The goal of action items is to minimize or prevent future impact by updating the impacted system(s).

Using the information obtained from RCAs, incident trends are reviewed each month and discussed further to determine if additional mitigation steps should be taken. Examples of these mitigation steps are increased monitoring of a product and the additional review conducted around a specific system that is the potential cause of multiple incidents.

Conclusion

Incident Management is a process that Liquid Web views as essential to provide our customers the best possible products. Managing the lifecycle of an incident and resolving the root cause directly increases uptime for all products and services. This process is intended to be transparent to provide understanding in all situations to make proper decisions.

If you would like some help on how to access any of the external customer communication options listed above or would like to discuss if additional security and performance add-ons are right for you, reach out to one of our helpful Support team members today via phone, support ticket, or live chat. They will be happy to help you!

Avatar for Nathan Mollenkopf

About the Author: Nathan Mollenkopf

Nathan Mollenkopf is the PIC Coordinator at Liquid Web. Nathan oversees the Problem, Incident, and Change Management processes on a day-to-day basis, ensuring the procedures are running smoothly. When not at work Nathan loves finding new music and playing Counter-Strike: Global Offensive.

Latest Articles

How to Make OpenStack Work for You

Read Article

How to Install MediaWiki on CentOS 7

Read Article

Five Steps to Create a Robots.txt File for Your Website

Read Article

Premium Business Email Pricing FAQ

Read Article

Microsoft Exchange Server Security Update

Read Article