Nick Gerasimatos, senior director of cloud services engineering at FICO, dives into the lack of persistent storage with containers and how Docker volumes and data containers provide a fix.

image

The idea of running applications within containers is not new. Yes, it’s a trending, hot topic full of individuals name dropping containers as if they discovered a cure for a terminal illness. Containers are going to fix everything.

The origins of the containers can actually be traced back to mainframes, and it’s not new, it’s a technology that has finally started to mature and is gaining user interest and acceptance at a remarkable rate.

Containers enable applications to run concurrently on a single operating system (OS), either deployed directly onto a physical server or as a virtual instance.

This is achieved by providing the capability to execute multiple copies of “user space”, which is the place where applications run on a platform, system or privileged code runs in the kernel.

The current appeal of containers stems from issues and overheads associated with running virtual instances, namely each instance has to be provided with dedicated memory and storage resources. They are also typically over or undersized and when needing to scale rapidly, it’s slow going.

The design of virtual instances provides isolation and the ability to upgrade each individual instances uniquely, but in large environments that are running similar or identical OS releases, each runs a duplicate set of processes that consumes memory and retains a near-identical boot volume.

In the goal to move towards web-scale computing, traditional virtualization can be argued as inefficient, wasting physical resources such a memory, CPU, storage, rack space, power, cooling and logical resources such as management, IP addressing and such.

Containers provide a degree of separation as they are a isolated from their neighbors making it look as if the container owns the whole operating system. This isolation allows them to interact with the outside world.

With exponential growth in 2014, containers and the ecosystem have gained traction in enterprise environments in 2015, but it’s far from mass adoption.

There are a handful of backup software providers offering container backup support but is there a way to backup containers with any backup software?

Expected to be transient in nature compared to virtual instances and that applies to the storage assigned to them as well, containers use a feature known as an overlay file system to implement a copy-on-write process that stores any updated information to the root file system of a container compared to the original image on which it is based. These changes are typically lost if the container is deleted. A container therefore does not have persistent storage by default.

However, distributions like Docker provide two features that enable access to more persistent storage resources – Docker volumes and data containers.

A Docker volume permits data to be stored in a container outside of the boot volume, within the root file system and can be implemented in a few ways. A container can be created with one or more volumes by providing a share name passed to the “-v” switch parameter.

This has the effect of creating an entity within the Docker configuration folder (/var/lib/docker) that represents the contents of the volume. Configuration data on volumes is stored in the /var/lib/docker/volumes folder, with each sub-directory representing a volume name based on a universally unique identifier (UUID). The data itself is stored in the /var/lib/docker/vfs/dir folder based on the UUID name.

The data in any volume can be browsed and edited by the host operating system, and standard permissions apply; however, the use of volumes has advantages and disadvantages. As the data is stored in a standard file system, it can be backed up, copied or moved in and out by the operating system.

The disadvantage here is that the volume name is in UUID format and that makes it tricky to associate with a container name. Docker has made things easier by providing the “docker cp” command that allows files and folders to be copied from a host directory to a container directory path by specifying the container name. This is similar to rsync.

It would be possible to provide access to external shared storage on an NFS share or LUN by using the volume option to access a host share created on the external storage, though this is not recommended.

A Docker volume can also be associated with a host directory. This is again specified on the “-v” switch, using a format that separates the host and container paths with a colon, as follows: “-v /host:/container”. This method allows persistent data on the host to be accessed by a container.

It would therefore be possible to provide access to external shared storage on an NFS share or LUN by using the volume option to access a host share created on the external storage. This method could also be used to back up the data accessed by containers.

Another option for managing data in Docker is to use a Docker data container. The concept is essentially a dormant container that has one or more volumes created within it. These volumes can then be exported to one or more other containers using the ‘-volumes-from’ switch when starting up additional containers.

The data volume container acts like an internal Docker NFS server, providing access to containers from a central mount point.

The benefit of this method is that it abstracts the location of the original data, making the data container a logical mount point. It also allows “application” containers accessing the data container volumes to be created and destroyed while keeping the data persistent in a dedicated container.

There are a number of issues to be aware of when using volumes and data containers.

Orphan storage

Currently it is possible to delete a container without deleting related volumes. In fact, this is the default behavior unless specifically overridden. In the end this can make it easy to end up with orphan volumes that have no referenced container.

Cleaning up orphan storage is a difficult task that requires trawling through the container configuration files to match up containers and their associated volumes.

Security

There is no additional security on volumes or data containers other than standard file permissions and the option to configure read-only or read-write access. This means file access permissions for users on the containers needs to match the host settings.

Data integrity

Sharing data using volumes and data containers provides for no level of data integrity protection. Features such as file locking need to be managed by the containers themselves. This represents an additional overhead that must be added to the application.

Containers provide no data protection facilities, such as snapshots or replication, so data management has to be handled by the host or the container.

There is also a lack of support for external storage. There is no specific support within Docker for external storage other than the features provided by the host operating system.

Container volumes are stored by default in the /var/lib/dockerdirectory, which can become a capacity and performance bottleneck. However, it is possible to change this location using a switch at startup of the Docker daemon.

The last point above highlights one of the current problems with Containers and storage: the inability to manage data shared between containers that run on separate physical hosts.

Containers volumes can be placed on external storage, but the current design does not facilitate the ability to use a volume from one host to another. To resolve this, solutions such as Flocker from ClusterHQ are emerging to address issues of volume portability.There are also proposed changes to distributions like Docker to add more functionality around the management of volumes.

In the near term, data management will be an issue though hopefully these issues will be addressed rapidly.

Catch a glimpse of how FICO combines container technology with OpenStack during Gerasimatos’ keynote at OpenStack Day Seattle last month.

Gerasimatos first posted this article on LinkedIn. You can also follow him on Twitter at @N_Gerasimatos. Superuser is always interested in how-tos and other contributions, please get in touch: [email protected].

Cover Photo // CC BY NC

Nick Gerasimatos
Latest posts by Nick Gerasimatos (see all)