If you're invested in the world of web development, you might have already heard of containerization and all its advantages or even enjoyed them yourself by using one of the many available containerization solutions. It’s not an exaggeration to say that software containerization has changed the world, just as the change from bare metal to virtual machines (VMs) did.
What Is a Container?
A container is a package of software that includes all of the operating system tools it uses (also called dependencies) to function as a standard unit. Containerized applications are designed to run reliably regardless of the dependencies installed on the host machine.
Containers usually are a few dozen megabytes in size instead of regular virtual machines, which clock in at tens of gigabytes. Containers also start up near-instantly and use much fewer resources from the host than full virtual machines while still providing similar process isolation and environment consistency regardless of where they're running. How do they manage to achieve all those benefits?
Looking Under the Hood
From a technical standpoint, a container can be considered a type of virtual machine. The critical difference is that container virtualization happens at the OS level, while traditional hypervisors virtualize at the hardware layer. In other words, instead of mimicking a hardware platform to run a complete operating system, containers mimic the operating system in order to run an application.
Thus, containers reduce the number of required resources dedicated to virtualization and share some of those resources with the host.
Containers can be described as a standard unit of code since it involves encapsulating applications and their dependencies to make sure that they run reliably regardless of the computing environment. Given that all interactions with the operating system are virtualized, containers run as isolated processes similar to a traditional virtual machine.
The structure of container formats is standardized by the Open Container Initiative (OCI). The most popular containerization engine that implements these standards is Docker, which adds several helpful features to the standard capabilities of containers. Docker container images are read-only, with a read/write volume that handles any changes to the virtual filesystem. When spinning up multiple copies of a container, these use the same base image. Only the changes are stored separately, which dramatically cuts the disk space required to handle dozens or hundreds of containers within the same physical machine.
Security and Performance
Part of what allows containers to achieve this remarkably low resource usage is that they share the host machine’s kernel despite being virtualized at the process level. This means that a container that compromises the OS kernel’s stability can potentially affect the host machine or other containers running on top of it.
Thankfully, even in the case of a misbehaving application, containers provide methods to recover from failure and mitigate this type of issue. Since the base image is still intact, it is possible to automatically reboot the container back into a known, clean state and immediately proceed to continue handling requests. With a startup time measured in milliseconds, any downtime caused by a container restart is hardly noticeable.
Given how easy it is to spin up new containers, how few resources they use, and how well they can individually package applications, they can be provisioned on-the-fly to handle specific tasks and powered down when no longer needed. This technique, known as orchestration or cloud automation, is very valuable for infrastructure automation and distributed applications and runs a large number of tests simultaneously during application development. When orchestration combines with traditional virtualization, several containers run across multiple VMs to use the resources in a data center more efficiently.
The two most prominent names in industry-standard container orchestration are Kubernetes and Docker Swarm. Both coordinate containers within a cluster at scale. Kubernetes was initially designed by Google and is now maintained by the Cloud Native Computing Foundation. Despite what the name might imply, Docker Swarm is not the only orchestration tool that can handle Docker images; Kubernetes is most often used with Docker images.
In general terms, Kubernetes is a more advanced tool and suited for high load/high availability workloads, while Docker Swarm is easier to configure and deploy with a simple installation.
As we’ve just seen, one of the most valuable advantages of containers is the ease with which a large number of them can be created and destroyed. However, files saved on the container’s virtual disk are lost when the container is deleted or restarted from its original image. Furthermore, since containers are process-isolated from each other and the host operating system, these files are not easily accessible outside the container. That begs the question: how do you store persistent data to avoid losing it once the container is no longer needed?
Docker offers multiple ways to handle persistent file storage, depending on your needs. The most basic type is bind mounts, which take a file or directory on the host machine and mount it onto a container. Bind mounts are high-performance and can be used for sharing data in and out of the container. However, because they allow containers to access or change important files on the host filesystem, use bind mounts carefully.
For this reason, it is recommended instead to use volumes whenever possible. A volume is a storage-dedicated container, which can be mounted to many other containers simultaneously and is supported by Kubernetes. Volumes are not automatically deleted when no longer in use by a container, so their data stays on the host machine until manually removed.
Other Storage Options
The difference between volumes and bind mounts mainly hinges on their underlying functionality, but to the container, these appear just the same as any other file or directory. However, other data stores don’t necessarily follow the same structure and therefore are suited for different tasks.
A key-value store like etcd is very useful for keeping track of critical infrastructure data such as configuration, state, and metadata when orchestrating a large number of containers using Kubernetes or Docker. Data is stored hierarchically as with a regular filesystem, but it is read and written using standard HTTP tools, such as cURL. The data stored in etcd is available on every single node of the cluster, while a leader node handles all decisions that need consensus across the cluster, such as writes.
An election process happens automatically whenever the leader dies or is no longer responsive to prevent the leader from becoming a single point of failure, making etcd very resilient even in unreliable environments.
For other types of structured information, a database is the most suitable option. This database can live externally and receive requests from the containers through the network, but it can also be hosted within a container (with the database engine and the actual database storage in separate container volumes) to take advantage of all the containerization benefits discussed above.
The Purpose of Containers
The resource efficiency of containers makes them great for orchestration and load balancing across a large number of servers, but containers also fill a niche on the other end of the spectrum. Given that containers allow application dependencies to be standardized and run reliably from one environment to another, they’re perfect for developing applications on a workstation and smoothly pushing out server changes.
Given that containers share the kernel with the host machine, it would seem impossible to run a container made for Linux on a Windows workstation. But thanks to additional virtualization tools such as Windows Subsystem for Linux and Hyper-V, Docker for Windows can run Linux container images without creating a full virtual machine. This means that even a mixed development environment can work smoothly and share code between developers and servers.
Having the application fully containerized during the software development life cycle also makes it easy to create additional tools to automate testing and application deployment after making changes. Orchestration tools quickly go through a large number of test scenarios in parallel.
When Are Virtual Machines Preferred To Containers?
Some legacy applications that require an older kernel version may not run happily within a container. Since stability is much more critical with this type of application, fully virtualizing the operating system is the best option to migrate to a cloud-based environment without the need to maintain aging and failure-prone hardware.
Always run untrusted code in a fully virtual machine to mitigate the possibility of a kernel exploit allowing a containerized piece of malware to affect the host or other containers running on top of it. This also applies to critical security systems where a vulnerability might be devastating regardless of its low probability.
Our Sales and Support teams are available 24 hours by phone or e-mail to assist.