Docker Images & Registries

Docker Images

A Docker image, also referred to as container image, is a standardized package that includes all of the files, binaries, libraries, and configurations to run a container.

There are two important principles of images:

Images are immutable. Once an image is created, it can't be modified. Any changes require building a new image or adding layers on top of the existing one.
Container images are composed of layers. Each layer represents a set of filesystem changes that add, remove, or modify files.

Layers

Docker images follow a layered structure, where layers are stacked on top of each other to form the final image. Each layer represents a set of filesystem changes.

[TBD] Insert Image Layers pic here

Base Image

This is also referred to as Base Layer or Parent Image. Every Docker image starts from a base layer, typically an official OS image such as alpine, ubuntu, debian etc.

If no base image is specified, Docker uses an empty layer (scratch). This is useful in the context of building base images itself or super minimal images. However, unlike other base images, FROM scratch is a no-op in the Dockerfile, meaning it does not create an additional layer in the image.

Dockerfile

A Dockerfile is a text file that contains a set of instructions for building an image. It is covered in detail in a later page, but note that any instruction in Dockerfile that modifies the filesystem will result in a new layer.

Example:

dockerfile

FROM ubuntu:latest    # Base layer
LABEL maintainer="Your Name <[email protected]>" \
    version="1.0" \
    description="This is a sample Docker image." # No new layer, just adds metadata
RUN apt-get update && apt-get install -y curl  # Creates a new layer
COPY app/ /app/       # Creates another layer
WORKDIR /app          # No new layer, just sets working directory
ENV APP_ENV=prod      # No new layer, just sets an environment variable
CMD ["python3", "app.py"]  # No new layer, just sets default command

Image Tags

A Docker image tag is a label used to reference a specific version of an image. Tags help differentiate between multiple versions of the same image stored in a registry.

When pulling or running an image, you specify its name followed by a tag using a colon (:):

# Here 22.04 is the tag
docker pull ubuntu:22.04

If no tag is specified, Docker assumes the latest tag by default:

docker pull ubuntu  # Equivalent to ubuntu:latest

A single image can have multiple tags pointing to it. For example, if ubuntu:latest and ubuntu:22.04 refer to the same image ID, both tags will pull the same content.

Common Tagging Conventions

Tags are often used to indicate:

Versions: node:18, python:3.11
Latest stable release: nginx:latest
Specific builds: myapp:20240212
Environments: app:dev, app:prod

The `latest` Tag Misconception

Many assume that latest always refers to the newest available version, but this is not enforced by Docker. latest is just another tag, and its meaning depends on how the image maintainer updates it. So, it's better to use specific tags instead of latest, especially in production.

Reusing Tags & Dangling Images

Docker allows tag reuse, meaning you can overwrite an existing tag with a new image. However, reusing tags can cause unexpected issues, making it difficult to track which containers are using which version of the image. To ensure consistency and avoid confusion, it's best to use unique, versioned tags instead of reusing the same tag.

A dangling image is an image that has no tags assigned but still exists on your system. This typically occurs when an image is rebuilt using an existing tag, causing the tag to be reassigned to the new image. The older image loses its tag, becoming dangling:

docker images --filter "dangling=true"

These images take up space but are not directly accessible by name. You can remove them using:

docker image prune

Pull, List & Remove images

Pull Images

Pulling a Docker image means downloading it from a container registry (such as Docker Hub) to your local machine. The process involves multiple steps:

Docker first checks if the requested image (with the specified tag or digest) already exists locally. If found, it skips downloading unless the --pull always flag is used.
If the image isn't found locally, Docker queries the specified registry (default is Docker Hub) to fetch metadata about the image.
Docker downloads all layers of the image. If some layers already exist locally (from other images), Docker reuses them instead of downloading again.
Once all layers are downloaded, Docker reconstructs the image and stores it in the local image cache.

Using default tag: latest
latest: Pulling from library/nginx
31dd5ebca5ef: Pull complete 
23b9c7b43572: Pull complete 
d64d424f0f4d: Pull complete 
1a261fd43d39: Pull complete 
e6ed9a9657ef: Pull complete 
c98339c0f285: Pull complete 
Digest: sha256:e37941f0e23c87a4c5f23a80c3a56a9f30a5df5e2b82a2c5e3822f8e63cb365a
Status: Downloaded newer image for nginx:latest
docker.io/library/nginx:latest

The lines with 'Pull complete' indicate image layers. This example image has 6 layers.

Also, there are several ways to pull Docker images:

Pulling by Tag (Specific or Default)
- You've already seen this in the above section.

docker pull nginx:1.25.3
docker pull nginx  # Defaults to `nginx:latest`

Pulling from a different registry
- By default, images are pulled from Docker Hub registry, but you can pull from other registries by yusing the domain name as follows:

docker pull registry.example.com/myimage:1.0

Pulling by Digest (Immutable Reference)
- Each version of a Docker image has a unique SHA256 hash, known as a digest, which remains constant regardless of tag changes. Unlike tags, which can be reassigned to different versions of an image, a digest is immutable and only changes if the image content itself changes. This ensures that pulling an image by its digest always retrieves the exact same version.

docker pull nginx@sha256:<digest>

Pulling for a Specific Architecture (Multi-Arch)
- You can also pull the image for a specific system architecture (such as arm64, amd64 etc.).

docker pull --platform linux/arm64 nginx

Pulling all available tags
- You can pull all available versions (or tags) of an image at once. Note that this may significantly increase disk usage and bandwidth consumption.

docker pull -a ubuntu

List Images

To list all locally available images, use:

docker images

To include dangling images, use:

docker images -a

Delete Images

NOTE

You cannot delete an image that is being used by a running container.
You cannot delete an image layer if it is shared by another image.

To delete an image, use the following command:

docker rmi <Image ID>

This removes the specified image and its layers from the system. However, if any layers are shared with other images, they will be retained.

In order to delete an image, you must first stop and remove all containers using that image.

To clean up all unused images, including dangling ones, use:

docker image prune -a

Inspecting an Image

To inspect the details of a Docker image, such as its metadata, environment variables, and layers, use:

docker inspect <image-name>

This command returns a JSON object containing information about the image, including its ID, parent image, exposed ports, labels, and entry point.

Multi-Architecture Images

Multi-Architecture (multi-arch) images allow a single image name (and tag) to support multiple CPU architectures, such as x86_64, arm64 etc.

Docker automatically pulls the correct image version based on the system architecture, but you can choose to pull the image for a specific architecture as shown above. This is useful for cross-platform development or when pulling images on a different architecture than your current system.

A multi-arch image is actually a manifest list, which is a collection of multiple images built for different architectures. When you pull an image, Docker:

Detects your system’s architecture.
Queries the registry for available architectures under the requested tag.
Downloads the appropriate image for your platform.

If no matching architecture is found, Docker returns an error.

You can inspect a multi-arch image manifest using:

docker manifest inspect <image-name>:<tag>

This outputs a JSON manifest listing all supported architectures, such as linux/amd64, linux/arm64 etc.

Building Multi-Arch Images

To create multi-arch images, use BuildKit and buildx:

docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64 -t my-image:latest .

This command builds and pushes images for both AMD64 and ARM64 under the same tag.

Docker Registries

An image registry is a centralized location for storing and sharing your container images. It can be either public or private. Docker Hub is a public registry that anyone can use and is the default registry.

While Docker Hub is a popular option, there are many other available container registries available today such as:

Amazon Elastic Container Registry (ECR): Managed registry by AWS.
Azure Container Registry (ACR): Microsoft’s registry for Azure users.
Google Container Registry (GCR): Container registry service by Google Cloud.

You can even run your private registry on your local system or inside your organization. For example: Harbor, JFrog Artifactory, GitLab Container registry etc.

Registry vs Repository

A registry is a centralized location that stores and manages container images, whereas a repository is a collection of related container images within a registry. Think of repository as a folder where you organize your images based on projects.

Each repository contains one or more container images. And a registry contains one or more repositories.

TBD (Diagram here from https://docs.docker.com/get-started/docker-concepts/the-basics/what-is-a-registry/)

Docker Hub

Docker Hub is the default container registry, so when you pull an image without specifying a registry, Docker assumes docker.io. All the following commands pull the same image:

docker pull ubuntu
docker pull library/ubuntu
docker pull docker.io/library/ubuntu
docker pull docker.io/_/ubuntu

Official images, like Ubuntu, are stored under the library/ namespace. These images are maintained by Docker and the community, following strict security and best practices.

Other images that belong to a user or an organization are stored under their respective namespaces, known as second-level namespaces. Unlike official images, these require specifying the namespace when pulling them. For example, the image docker.io/myuser/myapp belongs to the user or organization myuser. To pull it, use:

docker pull myuser/myapp

Working with Registries

Here are the steps when working with Docker registries:

TIP

If you're using Docker Hub and only pulling public images, login isn't required—docker pull will work without authentication.

Login to a registry: docker login registry.example.com
Push an image: docker push registry.example.com/myapp:1.0
Pull an image: docker pull registry.example.com/myapp:1.0

Docker Images & Registries ​

Docker Images ​

Layers ​

Base Image ​

Dockerfile ​

Image Tags ​

Common Tagging Conventions ​

The latest Tag Misconception ​

Reusing Tags & Dangling Images ​

Pull, List & Remove images ​

Pull Images ​

List Images ​

Delete Images ​

Inspecting an Image ​

Multi-Architecture Images ​

Building Multi-Arch Images ​

Docker Registries ​

Registry vs Repository ​

Docker Hub ​

Working with Registries ​