Lesson 3.5: Backup and Restore
Welcome to the final lesson of Phase 3! You've learned how to persist data with volumes, bind mounts, and tmpfs. Now it's time to learn how to protect that data. In this lesson, you'll master techniques to back up and restore Docker volumes – a critical skill for any production deployment. By the end, you'll be able to safely back up your stateful containers and recover from disasters.
Learning Objectives
TIP
By the end of this lesson, you will be able to:
- Explain why backing up volumes is essential for stateful applications.
- Perform a backup of a volume using a temporary container.
- Restore a volume from a backup using a similar technique.
- Automate volume backups using simple scripts.
- Understand best practices for volume backup strategies.
- Differentiate between backing up volumes vs. bind mounts.
1. Why Back Up Volumes?
Volumes are the primary way to persist data in Docker (e.g., databases, application data). While they survive container removal, they are still stored on the host filesystem and can be lost due to:
- Host disk failure.
- Accidental deletion (
docker volume rm). - Corruption or human error.
- Migration between hosts.
Regular backups ensure you can recover critical data. Docker itself does not provide a built-in backup command for volumes, but the Docker ecosystem makes it easy to back up volumes using simple containers.
2. Backup Strategy: Using a Temporary Container
The standard approach to backing up a volume is to run a temporary container that mounts the volume and creates an archive of its contents, then saves that archive to a safe location (e.g., host directory, cloud storage).
Basic pattern:
docker run --rm -v volume_name:/source -v /host/backup/dir:/backup alpine tar czf /backup/backup.tar.gz -C /source .--rmremoves the container after it finishes.-v volume_name:/sourcemounts the volume to be backed up at/source.-v /host/backup/dir:/backupmounts a host directory where the backup file will be stored.alpineis a small base image withtar.tar czf /backup/backup.tar.gz -C /source .creates a compressed tarball of the entire/sourcedirectory.
3. Performing a Backup
3.1. Backup a Named Volume
Assume you have a volume named postgres-data used by a PostgreSQL container.
Stop any containers using the volume (optional but recommended for data consistency). If the database is writing while you back up, you might get a corrupted backup. For databases, consider using native dump tools (e.g.,
pg_dump) instead of file-level backup.Create a backup:
bashdocker run --rm -v postgres-data:/source -v $(pwd):/backup alpine tar czf /backup/postgres-backup-$(date +%Y%m%d-%H%M%S).tar.gz -C /source .This creates a timestamped backup file in the current directory.
3.2. Backup an Anonymous Volume
Anonymous volumes (with random names) are harder to identify. Use docker volume ls to find them, or better, give your volumes names. If you must back up an anonymous volume, mount it by container:
# Find the container using the anonymous volume
docker ps -a
# Backup by referencing the container's volume mount (using --volumes-from)
docker run --rm --volumes-from some_container -v $(pwd):/backup alpine tar czf /backup/backup.tar.gz -C /data .But naming volumes is strongly recommended for manageability.
3.3. Backup Bind Mounts
Bind mounts are host directories; you can back them up using standard host tools (e.g., tar, rsync). Docker doesn't provide special commands for them. For consistency, you may need to stop containers that write to the bind mount.
4. Restoring a Volume
Restoring a volume involves unpacking a backup archive into the volume.
Basic pattern:
docker run --rm -v volume_name:/target -v /host/backup/dir:/backup alpine sh -c "rm -rf /target/* && tar xzf /backup/backup.tar.gz -C /target"rm -rf /target/*clears the volume's existing contents (optional; be careful).tar xzf ...extracts the backup into the volume.
4.1. Restore Example
If the volume already exists, you can restore into it:
bashdocker run --rm -v postgres-data:/target -v $(pwd):/backup alpine sh -c "rm -rf /target/* && tar xzf /backup/postgres-backup-20250321-120000.tar.gz -C /target"If the volume doesn't exist, Docker will create it automatically when you mount it (if you use a named volume). The restore will populate it.
5. Database-Specific Backups
For databases, file-level backups of the data directory may be inconsistent if the database is running. It's better to use native tools:
- PostgreSQL:
pg_dumporpg_dumpallfor logical backups, or usepg_basebackupfor physical. - MySQL:
mysqldumpormysqlpump. - MongoDB:
mongodump.
You can run these tools in a temporary container that connects to the database container (using network or --link in legacy, or a shared network). Example for PostgreSQL:
docker run --rm --network my_network -v $(pwd):/backup postgres:13 pg_dump -h postgres-db -U postgres mydb > /backup/db.sqlThen restore with psql. This approach is safer and often smaller than backing up the entire data directory.
6. Automating Backups
You can schedule backups using cron (on Linux) or a CI/CD pipeline. Example cron job (as root or via user crontab):
0 2 * * * /usr/local/bin/backup-postgres.shWhere backup-postgres.sh contains:
#!/bin/bash
docker run --rm -v postgres-data:/source -v /backups:/backup alpine tar czf /backup/postgres-backup-$(date +\%Y\%m\%d-\%H\%M\%S).tar.gz -C /source .
# Keep only last 7 days
find /backups -name "postgres-backup-*.tar.gz" -mtime +7 -deleteFor cloud storage, you can mount S3 via s3fs or use awscli inside the backup container.
7. Backup Best Practices
TIP
- Name your volumes: Use meaningful names (
db-data,app-uploads) instead of anonymous ones. - Stop writes during backup if possible: For file-level backups, stop containers or put them in read-only mode to ensure consistency. For databases, use native dumps.
- Test your restores: Regularly restore a backup to a test environment to verify it works.
- Store backups off-host: Keep copies on a different machine, cloud storage, or tape.
- Version your backups: Use timestamps or retention policies.
- Document the process: Ensure others know how to restore.
- Consider using volume drivers: Some volume drivers (like cloud-backed ones) offer snapshot capabilities that can be more efficient.
Hands-On Tasks
Task 1: Backup and Restore a Simple Volume
- Create a volume
test-dataand put a file in it:bashdocker run --rm -v test-data:/data alpine sh -c "echo 'Hello Backup' > /data/hello.txt" - Back up the volume to your current directory:bash
docker run --rm -v test-data:/source -v $(pwd):/backup alpine tar czf /backup/test-backup.tar.gz -C /source . - Remove the volume and its contents:bash
docker volume rm test-data - Restore from the backup:bash
docker run --rm -v test-data:/target -v $(pwd):/backup alpine sh -c "tar xzf /backup/test-backup.tar.gz -C /target" - Verify the file is back:bash
docker run --rm -v test-data:/data alpine cat /data/hello.txt
Task 2: Backup a Running Database (PostgreSQL)
- Run a PostgreSQL container with a volume:bash
docker run -d --name postgres-db -e POSTGRES_PASSWORD=secret -v pgdata:/var/lib/postgresql/data postgres:13 - Create a database and table (optional, e.g., via
docker exec). - Perform a
pg_dumpbackup:bashdocker run --rm --network container:postgres-db -v $(pwd):/backup postgres:13 pg_dump -U postgres -h localhost -d postgres > pgdump.sql - Stop and remove the container, and delete the volume.
- Run a new PostgreSQL container with the same volume name (or new one) and restore:bash
docker run --rm --network container:postgres-db -v $(pwd):/backup postgres:13 psql -U postgres -h localhost -d postgres < pgdump.sql
Task 3: Automate Backup with a Script
- Write a shell script that backs up a volume with a timestamp, deletes backups older than 7 days.
- Schedule it in cron to run daily.
Task 4: Backup a Bind Mount
- Create a bind mount directory
~/bind-datawith some files. - Use
tardirectly on the host to back it up (no Docker needed). - Restore by extracting.
Task 5: Compare File-Level vs. Database-Level Backup
- Set up a database with some data.
- Perform a file-level backup of the volume.
- Perform a logical dump.
- Compare sizes and ease of restoration.
Summary
Key Takeaways
- Backup volumes by running a temporary container that archives the volume's contents.
- Use
taror native database tools for consistent backups. - Restore by extracting the archive into the volume.
- For databases, prefer logical dumps (
pg_dump,mysqldump) over file-level backups. - Automate backups with scripts and cron, and store them off-host.
- Always test your restore process.
Check Your Understanding
- Why is it important to back up volumes?
- Write a command to back up a volume named
app-datato a host directory/backupswith a timestamp. - How would you restore that backup into the same volume?
- Why might you prefer a database-specific dump over a file-level backup of the data directory?
- What is the purpose of
--volumes-fromin backup commands? - List two best practices for volume backups.
Click to see answers
- Volumes can be lost due to accidental deletion, disk failure, corruption, or when migrating between hosts. Regular backups ensure you can recover critical data.
docker run --rm -v app-data:/source -v /backups:/backup alpine tar czf /backup/app-data-backup-$(date +\%Y\%m\%d-\%H\%M\%S).tar.gz -C /source .docker run --rm -v app-data:/target -v /backups:/backup alpine sh -c "rm -rf /target/* && tar xzf /backup/app-data-backup-YYYYMMDD-HHMMSS.tar.gz -C /target"- Database dumps are consistent (they capture a point-in-time snapshot) and don't risk corruption from writing during the backup. File-level backups of a running database can capture partially written transactions.
--volumes-frommounts all volumes from a specified container into the backup container, useful for backing up anonymous volumes whose names you don't know.- Any two: name your volumes, stop writes before backing up, test restores, store backups off-host, use timestamps, use database-native dump tools.
Additional Resources
- Docker volume backup and restore (official docs)
- PostgreSQL backup and restore with Docker
- MySQL backup with Docker
- Using rsync for volume backups
Next Up
This concludes Phase 3: Data Persistence & Storage. You now have a solid understanding of how to manage data in containers. In Phase 4, we'll move on to Networking in Docker, where you'll learn how containers communicate with each other and the outside world. See you there!