Lesson 2.2: Layering and Caching
Welcome to Lesson 2.2! You've already built a few images, but now it's time to understand what's happening behind the scenes. Docker images are composed of layers, and Docker uses a build cache to speed up subsequent builds. In this lesson, you'll learn how layers work, how caching can dramatically improve build times, and best practices to write efficient Dockerfiles.
Learning Objectives
TIP
By the end of this lesson, you will be able to:
- Explain how Docker images are built as a stack of layers.
- Describe how layer caching works during
docker build. - Identify which instructions invalidate the cache and why.
- Reorder Dockerfile instructions to maximize cache reuse.
- Use
--no-cacheand other build options to control caching. - Apply best practices to create smaller, faster-building images.
1. How Layers Work
When you build an image using a Dockerfile, each instruction (like FROM, RUN, COPY) creates a new layer. Layers are stacked on top of each other, and each layer is only the changes from the previous layer.
1.1. Layered Filesystem
Docker uses a storage driver (like overlay2) to combine these layers into a single unified filesystem. When you run a container, Docker adds a thin writable container layer on top of the image layers.
INFO
Key characteristics:
- Layers are read-only (except the container layer).
- Each layer is identified by a unique hash (SHA256).
- Layers are cached and reused across images if they are identical.
- If a layer hasn't changed, Docker can reuse it from the cache, skipping the rebuild.
1.2. Viewing Layers
You can see the layers of an image with:
docker history <image>For example, docker history nginx:latest shows each layer with its creation command and size. Layers marked <missing> are intermediate layers from the build process (they don't exist as separate images but are part of the image history).
Visual: Image Layer Stack
+---------------------------+
| Writable Layer | <- Container (ephemeral)
+---------------------------+
| CMD / LABEL | <- Layer N
+---------------------------+
| COPY . . | <- Layer 3
+---------------------------+
| RUN pip install ... | <- Layer 2
+---------------------------+
| COPY requirements.txt | <- Layer 1
+---------------------------+
| WORKDIR /app | <- Layer 0
+---------------------------+
| FROM python:3.11-slim | <- Base Image
+---------------------------+2. Layer Caching During Build
When you run docker build, Docker executes each instruction in order. For each instruction, Docker checks if it can reuse a cached layer from a previous build.
2.1. Cache Matching Rules
Docker looks for an existing layer that matches the instruction and the build context. The matching is based on:
- The instruction itself (e.g.,
RUN apt-get update). - The exact command string.
- For
COPYandADD, the checksum of the files being copied. - Base image and previous layers.
If a match is found, Docker uses the cached layer and moves to the next instruction. If not, it executes the instruction and all subsequent instructions are executed (cache invalidated).
2.2. Cache Invalidation Triggers
WARNING
Cache is invalidated when:
- The instruction changes (e.g., you modify a
RUNcommand). - For
COPY/ADD, if any file content changes (checksum differs). - The base image changes (e.g., you update the tag from
ubuntu:22.04toubuntu:23.04). - A previous layer was rebuilt, forcing all later layers to rebuild.
2.3. Example: Cache in Action
Consider this Dockerfile:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "app.py"]First build:
- All layers are built fresh.
Second build (no changes):
- Docker checks each instruction:
FROM– cached.WORKDIR– cached.COPY requirements.txt .– no changes → cached.RUN pip install ...– cached.COPY . .– no changes → cached.
- Entire build uses cache → instant.
Third build (you modify app.py but not requirements.txt):
FROM,WORKDIR,COPY requirements.txtare cached.RUN pip installis cached (because its input –requirements.txt– hasn't changed).COPY . .sees that files (includingapp.py) have changed → cache invalidated, this layer rebuilds.CMDis just metadata, but it's part of the image. Since the previous layer rebuilt, all subsequent layers (none here) would rebuild if there were any.
Result: Only the final COPY . . and metadata steps are rebuilt – much faster than a full rebuild.
Fourth build (you modify requirements.txt):
FROM,WORKDIRcached.COPY requirements.txt .– file changed → cache invalidated, this layer rebuilds.RUN pip install– because previous layer changed, cache is invalidated, it rebuilds.COPY . .– because previous layer changed, it rebuilds (even thoughapp.pydidn't change, the cache is broken further down).
TIP
This shows the importance of ordering instructions: put things that change less often earlier in the Dockerfile.
3. Best Practices for Leveraging Cache
3.1. Order Instructions from Least to Most Frequently Changing
Typical order:
- Base image (
FROM) – rarely changes. - Metadata (
LABEL,WORKDIR,ENV) – may change occasionally. - Dependency definitions (
COPY requirements.txt,package.json) – change moderately. - Dependency installation (
RUN pip install,npm install) – based on above. - Source code (
COPY . .) – changes most frequently.
This maximizes cache hits for expensive steps like dependency installation.
3.2. Combine RUN Commands to Reduce Layers
Each RUN creates a layer. While more layers aren't necessarily bad, combining related commands (e.g., apt-get update && apt-get install -y ...) reduces the number of layers and also prevents caching issues where one RUN might leave behind temporary files that another RUN would need. It's also good practice to clean up in the same layer.
Bad:
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get cleanGood:
RUN apt-get update && \
apt-get install -y curl && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*3.3. Use Specific Base Image Tags
Using :latest can break caching if the base image updates. Pin to a specific version (e.g., python:3.11-slim) for reproducible builds.
3.4. Leverage Buildkit for Better Caching (Optional)
Docker Buildkit (enabled by default in recent versions) offers advanced caching features like mounting cache directories, but that's beyond this lesson.
3.5. Use --no-cache When You Need a Fresh Build
Sometimes you want to bypass the cache entirely, e.g., to force a re-download of packages or to ensure all steps run:
docker build --no-cache -t myimage .3.6. Use --cache-from for CI/CD
In CI pipelines, you can specify an external image as a cache source. This is advanced but worth knowing.
4. Inspecting and Debugging Caching
4.1. --progress=plain to See Cache Status
When building, you can see which steps are using cache by setting build output to plain:
docker build --progress=plain -t myimage .Lines with CACHED indicate cache hits.
4.2. docker history to Inspect Layers
docker history myimageShows the size and creation time of each layer, helping you see if layers are unexpectedly large.
4.3. dive Tool for Advanced Layer Inspection
The open-source tool dive provides an interactive way to explore layers, see what each adds, and identify wasted space.
Install and run:
dive myimageHands-On Tasks
Task 1: Observe Caching in Action
- Create a directory
cache-demo. - Create a Dockerfile:dockerfile
FROM alpine:latest RUN echo "Step 1: Installing packages" && sleep 2 RUN echo "Step 2: Configuring" && sleep 2 COPY . /app RUN echo "Step 3: Building" && sleep 2 CMD echo "Done" - Build with
docker build -t cache-demo .– note the time. - Build again (no changes) – observe that all steps are cached and it's instant.
- Modify a file in the context (e.g., create a new file) and rebuild. Which steps are cached? Which rebuild? (The
COPYlayer should invalidate and everything after it rebuilds.)
Task 2: Optimize a Dockerfile
Start with a suboptimal Dockerfile:
FROM node:18
COPY . /app
WORKDIR /app
RUN npm install
CMD ["npm", "start"]- Build it (first build).
- Modify a source file (e.g.,
index.js) and rebuild. Notice thatnpm installreruns even though dependencies didn't change – this is becauseCOPY . /appcopies everything, including source changes, causing cache invalidation beforeRUN npm install. - Optimize by reordering:dockerfile
FROM node:18 WORKDIR /app COPY package*.json ./ RUN npm install COPY . . CMD ["npm", "start"] - Rebuild (fresh). Then modify a source file again and rebuild. Observe that
npm installis now cached.
Task 3: Experiment with Cache Invalidation Triggers
- Create a Dockerfile that uses an environment variable:dockerfile
FROM alpine ENV GREETING="Hello" RUN echo $GREETING > /message CMD cat /message - Build and run – prints "Hello".
- Change
ENV GREETING="Hi"and rebuild. Is theRUNlayer cached? (No, because the environment changed and affects the command string.) - Try changing only the value but keeping the instruction identical? (Actually, the instruction string is the same, but Docker may detect that the environment changed – it's safest to assume cache is invalidated.)
Task 4: Use .dockerignore to Improve Cache Efficiency
- Create a project with a large, irrelevant directory (e.g.,
node_modulesordata). - Write a Dockerfile that copies the entire context.
- Build, then touch a file inside the ignored directory. Without
.dockerignore, theCOPYlayer would detect changes and invalidate. With.dockerignore, changes to ignored files do not affect the checksum of the copy. Test this:- Create
.dockerignorewithdata/. - Build.
- Modify a file inside
data/and rebuild – theCOPYlayer should be cached. - Modify a non-ignored file – the
COPYlayer rebuilds.
- Create
Task 5: Compare Layer Sizes with docker history
- Build the optimized and unoptimized versions of a Dockerfile (e.g., one that installs packages and copies source).
- Run
docker history <image>on both and compare the sizes of layers. Note how combinedRUNcommands produce smaller total size because intermediate files are removed in the same layer.
Summary
Key Takeaways
- Docker images consist of layers, each representing a set of changes.
- The build cache reuses unchanged layers, speeding up subsequent builds.
- Cache is invalidated when the instruction or its input (files, base image) changes.
- Order instructions from least to most frequently changing to maximize cache hits.
- Combine related commands in a single
RUNto reduce layers and clean up. - Use
.dockerignoreto prevent irrelevant file changes from invalidating cache. - Tools like
docker historyanddivehelp analyze layer efficiency.
Check Your Understanding
- What is a Docker image layer?
- How does Docker decide whether to use a cached layer for a
COPYinstruction? - If you change a file that is not copied into the image (i.e., it's excluded by
.dockerignore), will that invalidate the cache for theCOPYlayer? Why or why not? - Why is it beneficial to copy dependency files (
requirements.txt,package.json) before copying the rest of the source code? - What command can you use to see the layers of an existing image?
- When would you use
docker build --no-cache?
Click to see answers
- A Docker image layer is a read-only snapshot of filesystem changes created by a single Dockerfile instruction. Layers are stacked to form the complete filesystem of an image.
- Docker computes a checksum of the files being copied. If the checksum matches the cached layer's checksum, the cached layer is reused.
- No. Since
.dockerignoreexcludes the file from the build context, Docker never sees it, so the checksum of theCOPYoperation remains unchanged. - Because dependency files change less frequently than source code. This way, when you only modify source files, the dependency installation layer stays cached and doesn't need to rerun.
docker history <image>shows all layers with their sizes and creation commands.- When you need a completely fresh build, such as forcing re-download of packages, bypassing stale cache, or ensuring all steps run in a CI/CD environment.
Additional Resources
- Docker best practices: Understanding layer caching
- Docker build cache (official docs)
- dive tool for analyzing images
- BuildKit cache mounts (advanced)
Next Up
In the next lesson, we'll cover environment variables and build arguments, giving you more control over your images. See you there!