Lesson 3.1: Container Storage Basics
Welcome to Phase 3! So far, you've built, tagged, and pushed images, and you've run containers. But every container you've run so far has been stateless: any data created inside the container vanished when the container was removed. In this lesson, we'll explore why containers are ephemeral by default, and lay the groundwork for persistent storage solutions like volumes and bind mounts.
Learning Objectives
TIP
By the end of this lesson, you will be able to:
- Explain why containers are ephemeral and how the writable container layer works.
- Understand the difference between image layers and the container layer.
- Observe data loss when a container is removed.
- List common use cases where persistent storage is needed.
- Describe the storage drivers Docker uses (conceptually) and how they affect performance.
- Identify the three main ways to manage data in Docker: volumes, bind mounts, and tmpfs mounts.
1. The Ephemeral Nature of Containers
A container is an instance of an image, with a thin writable layer added on top of the image's read-only layers. When you create a file, modify a configuration, or install software inside a running container, all those changes are written to this writable layer.
Key points:
- The writable layer exists only as long as the container exists.
- When you stop and restart a container, the writable layer persists (unless you used
--rm). - When you remove a container (
docker rm), the writable layer is deleted, and all data stored there is lost.
This design is intentional: containers are meant to be disposable. It makes scaling, updating, and replacing containers simple and predictable.
1.1. Why Ephemeral Is Good
- Immutability: You can replace a container with a fresh one without worrying about leftover state.
- Scalability: Easily spin up multiple copies of a container; each has its own isolated writable layer.
- Consistency: A container started from the same image behaves identically, regardless of its history.
1.2. When Ephemeral Is a Problem
Many applications need to preserve data across container restarts and removals:
- Databases (MySQL, PostgreSQL)
- Content management systems (WordPress)
- File uploads, logs, configuration
- Any stateful service
For these, you need persistent storage that outlives the container.
2. Where Does Data Live?
Docker stores images and containers in its storage directory (usually /var/lib/docker on Linux). Inside, the storage driver (e.g., overlay2) manages the layers.
2.1. Image Layers
Read-only layers that are shared across containers.
2.2. Container Layer
The writable layer unique to each container.
2.3. Volumes
Docker volumes are special directories managed by Docker, stored outside the container's writable layer, and can survive container removal. We'll dive deep into volumes in the next lesson.
2.4. Bind Mounts
Bind mounts allow you to mount any directory from the host machine into a container. This also provides persistence, but with direct host filesystem access.
2.5. tmpfs Mounts
In-memory storage, not persisted to disk, useful for temporary files or secrets.
3. Demonstration: Data Loss on Container Removal
Let's see this in action.
3.1. Create a Container and Add Data
Run an Ubuntu container and create a file inside:
docker run -it --name demo ubuntu bashInside the container:
echo "Important data" > /tmp/data.txt
exit3.2. Check the Container and Data
List containers:
docker ps -aYou'll see the demo container in Exited state.
Restart the container and verify the file is still there:
docker start -i demoInside:
cat /tmp/data.txt # Outputs: Important data
exit3.3. Remove the Container and Lose Data
Now remove the container:
docker rm demoRun a new container from the same image and try to find the file:
docker run --rm ubuntu cat /tmp/data.txtError: cat: /tmp/data.txt: No such file or directory. The data is gone.
This illustrates that data stored in the writable layer does not survive container removal.
4. Why Not Just Keep Containers Forever?
You could choose not to remove containers, but that leads to:
- Accumulation of stopped containers consuming disk space.
- Inability to easily update the image (you'd have to recreate the container).
- State tied to a specific container instance, making scaling or moving workloads difficult.
Instead, we separate data from the container lifecycle.
5. Storage Drivers: Under the Hood (Conceptual)
Docker uses a storage driver to manage the layers and the container's writable layer. Common storage drivers:
- overlay2: Default on modern Linux, efficient and stable.
- aufs, devicemapper, btrfs, zfs (legacy or specialized).
The storage driver affects performance, especially for write-heavy workloads. When using volumes or bind mounts, the storage driver is bypassed for those mounted directories (they are directly accessed by the host filesystem), which often improves performance.
You can see the storage driver with docker info | grep "Storage Driver".
6. Introduction to Persistent Storage Options
Docker provides three main ways to persist data:
6.1. Volumes
- Managed by Docker: Stored in
/var/lib/docker/volumes/(on Linux). - Portable: Can be backed up, restored, and managed with Docker CLI commands.
- Preferred for production: Volumes are the recommended way to persist data because they are decoupled from the host filesystem and work on all platforms (including Docker Desktop).
6.2. Bind Mounts
- Host-controlled: You mount a specific directory from the host into the container.
- Flexible: Great for development (live code reload) or providing configuration files.
- Less portable: Paths are host-specific; can be security risk if you mount sensitive host directories.
6.3. tmpfs Mounts
- In-memory: Stored only in the container's memory; never written to the host disk.
- Ephemeral: Useful for temporary data that should not persist (e.g., secrets, cache).
We'll cover each in depth in the next lessons.
Hands-On Tasks
Task 1: Verify Data Loss
- Run an interactive Alpine container, create a file in
/tmp, and exit. - Remove the container.
- Run a new container from the same image and confirm the file is missing.
Task 2: Explore Storage Driver and Docker Root
- Run
docker infoand look for "Storage Driver" and "Docker Root Dir". - If you're on Linux, navigate to
/var/lib/docker(requires root) and see theoverlay2subdirectories. (On Docker Desktop, this is inside a VM; you can explore withdocker run -it --privileged --pid=host alpine nsenter -t 1 -m -u -i -n shbut that's advanced.)
Task 3: Observe the Container Layer
- Run a container with
--name testand create a file. - Use
docker diff testto see which files have been added (A), changed (C), or deleted (D) in the container's writable layer. - Remove the container and note the changes disappear.
Task 4: Run a Container with --rm and Attempt to Preserve Data
- Run
docker run --rm -it alpine sh, create a file, then exit. What happened to the container? Why couldn't you inspect it later?
Task 5: Multiple Containers from Same Image
- Run two containers from the same image (
ubuntuoralpine) in detached mode (with a sleep command). - In one container, create a file. Is it visible in the other? (No, because each container has its own writable layer.)
- Stop and remove them.
Summary
Key Takeaways
- Containers are ephemeral by design; data written inside the container's writable layer is lost when the container is removed.
- The writable layer is separate from the image layers and is unique to each container.
- For persistent data, Docker offers volumes, bind mounts, and tmpfs mounts.
- Storage drivers manage the layering; using volumes/bind mounts bypasses the storage driver for those paths.
- Understanding where data lives is the first step to designing stateful containerized applications.
Check Your Understanding
- What happens to the data stored in a container's writable layer when the container is removed?
- Why would a database need persistent storage in a container?
- What command shows the changes made to a container's filesystem?
- Name three ways Docker can manage persistent data.
- What is the main difference between volumes and bind mounts in terms of Docker management?
Click to see answers
- The data is permanently deleted. The writable layer is tied to the container and removed with it.
- Databases write data to disk that must persist across container restarts, updates, and removals. Without persistent storage, the database would lose all data on the next container restart.
docker diff <container>shows added (A), changed (C), and deleted (D) files.- Volumes, bind mounts, and tmpfs mounts.
- Volumes are managed by Docker using CLI commands and stored in Docker's control. Bind mounts map host directories directly and are managed by the user.
Additional Resources
Next Up
In the next lesson, we'll dive into Volumes – the recommended way to persist data in production. We'll create volumes, inspect them, and see how they survive container removal. See you there!