Skip to content

Lesson 3.5: Backup and Restore

Welcome to the final lesson of Phase 3! You've learned how to persist data with volumes, bind mounts, and tmpfs. Now it's time to learn how to protect that data. In this lesson, you'll master techniques to back up and restore Docker volumes – a critical skill for any production deployment. By the end, you'll be able to safely back up your stateful containers and recover from disasters.


Learning Objectives

TIP

By the end of this lesson, you will be able to:

  • Explain why backing up volumes is essential for stateful applications.
  • Perform a backup of a volume using a temporary container.
  • Restore a volume from a backup using a similar technique.
  • Automate volume backups using simple scripts.
  • Understand best practices for volume backup strategies.
  • Differentiate between backing up volumes vs. bind mounts.

1. Why Back Up Volumes?

Volumes are the primary way to persist data in Docker (e.g., databases, application data). While they survive container removal, they are still stored on the host filesystem and can be lost due to:

  • Host disk failure.
  • Accidental deletion (docker volume rm).
  • Corruption or human error.
  • Migration between hosts.

Regular backups ensure you can recover critical data. Docker itself does not provide a built-in backup command for volumes, but the Docker ecosystem makes it easy to back up volumes using simple containers.


2. Backup Strategy: Using a Temporary Container

The standard approach to backing up a volume is to run a temporary container that mounts the volume and creates an archive of its contents, then saves that archive to a safe location (e.g., host directory, cloud storage).

Basic pattern:

bash
docker run --rm -v volume_name:/source -v /host/backup/dir:/backup alpine tar czf /backup/backup.tar.gz -C /source .
  • --rm removes the container after it finishes.
  • -v volume_name:/source mounts the volume to be backed up at /source.
  • -v /host/backup/dir:/backup mounts a host directory where the backup file will be stored.
  • alpine is a small base image with tar.
  • tar czf /backup/backup.tar.gz -C /source . creates a compressed tarball of the entire /source directory.

3. Performing a Backup

3.1. Backup a Named Volume

Assume you have a volume named postgres-data used by a PostgreSQL container.

  1. Stop any containers using the volume (optional but recommended for data consistency). If the database is writing while you back up, you might get a corrupted backup. For databases, consider using native dump tools (e.g., pg_dump) instead of file-level backup.

  2. Create a backup:

    bash
    docker run --rm -v postgres-data:/source -v $(pwd):/backup alpine tar czf /backup/postgres-backup-$(date +%Y%m%d-%H%M%S).tar.gz -C /source .

    This creates a timestamped backup file in the current directory.

3.2. Backup an Anonymous Volume

Anonymous volumes (with random names) are harder to identify. Use docker volume ls to find them, or better, give your volumes names. If you must back up an anonymous volume, mount it by container:

bash
# Find the container using the anonymous volume
docker ps -a

# Backup by referencing the container's volume mount (using --volumes-from)
docker run --rm --volumes-from some_container -v $(pwd):/backup alpine tar czf /backup/backup.tar.gz -C /data .

But naming volumes is strongly recommended for manageability.

3.3. Backup Bind Mounts

Bind mounts are host directories; you can back them up using standard host tools (e.g., tar, rsync). Docker doesn't provide special commands for them. For consistency, you may need to stop containers that write to the bind mount.


4. Restoring a Volume

Restoring a volume involves unpacking a backup archive into the volume.

Basic pattern:

bash
docker run --rm -v volume_name:/target -v /host/backup/dir:/backup alpine sh -c "rm -rf /target/* && tar xzf /backup/backup.tar.gz -C /target"
  • rm -rf /target/* clears the volume's existing contents (optional; be careful).
  • tar xzf ... extracts the backup into the volume.

4.1. Restore Example

  1. If the volume already exists, you can restore into it:

    bash
    docker run --rm -v postgres-data:/target -v $(pwd):/backup alpine sh -c "rm -rf /target/* && tar xzf /backup/postgres-backup-20250321-120000.tar.gz -C /target"
  2. If the volume doesn't exist, Docker will create it automatically when you mount it (if you use a named volume). The restore will populate it.


5. Database-Specific Backups

For databases, file-level backups of the data directory may be inconsistent if the database is running. It's better to use native tools:

  • PostgreSQL: pg_dump or pg_dumpall for logical backups, or use pg_basebackup for physical.
  • MySQL: mysqldump or mysqlpump.
  • MongoDB: mongodump.

You can run these tools in a temporary container that connects to the database container (using network or --link in legacy, or a shared network). Example for PostgreSQL:

bash
docker run --rm --network my_network -v $(pwd):/backup postgres:13 pg_dump -h postgres-db -U postgres mydb > /backup/db.sql

Then restore with psql. This approach is safer and often smaller than backing up the entire data directory.


6. Automating Backups

You can schedule backups using cron (on Linux) or a CI/CD pipeline. Example cron job (as root or via user crontab):

cron
0 2 * * * /usr/local/bin/backup-postgres.sh

Where backup-postgres.sh contains:

bash
#!/bin/bash
docker run --rm -v postgres-data:/source -v /backups:/backup alpine tar czf /backup/postgres-backup-$(date +\%Y\%m\%d-\%H\%M\%S).tar.gz -C /source .
# Keep only last 7 days
find /backups -name "postgres-backup-*.tar.gz" -mtime +7 -delete

For cloud storage, you can mount S3 via s3fs or use awscli inside the backup container.


7. Backup Best Practices

TIP

  • Name your volumes: Use meaningful names (db-data, app-uploads) instead of anonymous ones.
  • Stop writes during backup if possible: For file-level backups, stop containers or put them in read-only mode to ensure consistency. For databases, use native dumps.
  • Test your restores: Regularly restore a backup to a test environment to verify it works.
  • Store backups off-host: Keep copies on a different machine, cloud storage, or tape.
  • Version your backups: Use timestamps or retention policies.
  • Document the process: Ensure others know how to restore.
  • Consider using volume drivers: Some volume drivers (like cloud-backed ones) offer snapshot capabilities that can be more efficient.

Hands-On Tasks

Task 1: Backup and Restore a Simple Volume

  1. Create a volume test-data and put a file in it:
    bash
    docker run --rm -v test-data:/data alpine sh -c "echo 'Hello Backup' > /data/hello.txt"
  2. Back up the volume to your current directory:
    bash
    docker run --rm -v test-data:/source -v $(pwd):/backup alpine tar czf /backup/test-backup.tar.gz -C /source .
  3. Remove the volume and its contents:
    bash
    docker volume rm test-data
  4. Restore from the backup:
    bash
    docker run --rm -v test-data:/target -v $(pwd):/backup alpine sh -c "tar xzf /backup/test-backup.tar.gz -C /target"
  5. Verify the file is back:
    bash
    docker run --rm -v test-data:/data alpine cat /data/hello.txt

Task 2: Backup a Running Database (PostgreSQL)

  1. Run a PostgreSQL container with a volume:
    bash
    docker run -d --name postgres-db -e POSTGRES_PASSWORD=secret -v pgdata:/var/lib/postgresql/data postgres:13
  2. Create a database and table (optional, e.g., via docker exec).
  3. Perform a pg_dump backup:
    bash
    docker run --rm --network container:postgres-db -v $(pwd):/backup postgres:13 pg_dump -U postgres -h localhost -d postgres > pgdump.sql
  4. Stop and remove the container, and delete the volume.
  5. Run a new PostgreSQL container with the same volume name (or new one) and restore:
    bash
    docker run --rm --network container:postgres-db -v $(pwd):/backup postgres:13 psql -U postgres -h localhost -d postgres < pgdump.sql

Task 3: Automate Backup with a Script

  1. Write a shell script that backs up a volume with a timestamp, deletes backups older than 7 days.
  2. Schedule it in cron to run daily.

Task 4: Backup a Bind Mount

  1. Create a bind mount directory ~/bind-data with some files.
  2. Use tar directly on the host to back it up (no Docker needed).
  3. Restore by extracting.

Task 5: Compare File-Level vs. Database-Level Backup

  1. Set up a database with some data.
  2. Perform a file-level backup of the volume.
  3. Perform a logical dump.
  4. Compare sizes and ease of restoration.

Summary

Key Takeaways

  • Backup volumes by running a temporary container that archives the volume's contents.
  • Use tar or native database tools for consistent backups.
  • Restore by extracting the archive into the volume.
  • For databases, prefer logical dumps (pg_dump, mysqldump) over file-level backups.
  • Automate backups with scripts and cron, and store them off-host.
  • Always test your restore process.

Check Your Understanding

  1. Why is it important to back up volumes?
  2. Write a command to back up a volume named app-data to a host directory /backups with a timestamp.
  3. How would you restore that backup into the same volume?
  4. Why might you prefer a database-specific dump over a file-level backup of the data directory?
  5. What is the purpose of --volumes-from in backup commands?
  6. List two best practices for volume backups.
Click to see answers
  1. Volumes can be lost due to accidental deletion, disk failure, corruption, or when migrating between hosts. Regular backups ensure you can recover critical data.
  2. docker run --rm -v app-data:/source -v /backups:/backup alpine tar czf /backup/app-data-backup-$(date +\%Y\%m\%d-\%H\%M\%S).tar.gz -C /source .
  3. docker run --rm -v app-data:/target -v /backups:/backup alpine sh -c "rm -rf /target/* && tar xzf /backup/app-data-backup-YYYYMMDD-HHMMSS.tar.gz -C /target"
  4. Database dumps are consistent (they capture a point-in-time snapshot) and don't risk corruption from writing during the backup. File-level backups of a running database can capture partially written transactions.
  5. --volumes-from mounts all volumes from a specified container into the backup container, useful for backing up anonymous volumes whose names you don't know.
  6. Any two: name your volumes, stop writes before backing up, test restores, store backups off-host, use timestamps, use database-native dump tools.

Additional Resources


Next Up

This concludes Phase 3: Data Persistence & Storage. You now have a solid understanding of how to manage data in containers. In Phase 4, we'll move on to Networking in Docker, where you'll learn how containers communicate with each other and the outside world. See you there!