Anatomy of a docker-compose file

Docker is an excellent platform that uses OS-level virtualization to provide software packages that we call containers. Containers are basically everything you need to run an app - whether it's a database, web server, proxy, whatever's packaged in it, an app will have access to it.

Docker can run on various platforms, whether it's bare-metal or a virtual machine. With all of its benefits, one of the major things that it didn't have natively, was a management interface that would allow administrator to have an overview of docker containers that are running and how they were started in the first place. The reason why we said that it didn't have it natively was because docker-compose is/was considered a plug-in to docker and until recently it wasn't installed directly with docker engine itself. Nowadays, it's simply installed via apt install docker-compose-plugin command.

So, what is a point of docker-compose in a nutshell? Well, instead of running a kielbasa of a command to start a container, we can nicely organise everything inside of a YAML formatted docker-compose file. One thing to remember, though, is that, being a YAML file, indentation is the king. Basically, no tabs are allowed, just a lot of spacebar hits.

Let's take a look at a sample of a docker-compose file:

version: '3.3'
services:
  simple-web-app:
    container_name: web-app
    image: nginx:latest
    hostname: web.domain.com
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Toronto
    volumes:
      - ./web:/usr/share/nginx/html
#or
      - /absolute/path/web:/usr/share/nginx/html
      - /etc/localtime:/etc/localtime:ro
    ports:
      - 80:80
    dns:
      - 10.5.0.1
    networks:
      ext-net:

networks:
  ext-net:
    external: true

Let us analyse this sample in a little bit more details. We'll start with the first line:

version: '3.3'

This will define the version of the compose file that we will use. You can read more about compose file versioning here:

Compose file versions and upgrading
Compose file reference

Depending on the version of docker engine you're running, it has limitations and requirements on how the compose file should look like.

Next, we go with:

services:

This is where we will actually start defining our services. Notice that everything one level below this line is considered a name of the service - in our example that would be:

  simple-web-app:

Everything indented below this is now defining this service that we called simple-web-app.

Let's look at the simple ones first:

    container_name: web-app
    image: nginx:latest
    hostname: web.domain.com

First we have container_name - instead of relying on docker engine to name our container, we will manually assign a name to it - in this case, we named it web-app. This is important because if we later reference that container, we have to do it by the container_name parameter. container_name is an optional parameter.

Next one is image. This is where the image will be pulled from. By default, docker will use Docker's integrated hub, hub.docker.com, but other docker repositories can be used as well. In this case, it will try to pull an image with name nginx, and because we specified a tag with it (latest) it will look for that specific image as well. Basically, images in a repository are tagged so that we can be very specific on which image we want to pull from the repository - this is the most simplistic explanation possible.

Finally, we have a hostname parameter - this is because each image we pull actually runs a limited version of an OS to start the app when that OS is started. This allows complete container vs. host isolation - whatever's run in the container has no bearing on what's happening on the host (unless we expose host to the container, but that's another topic). What hostname parameter does is that it actually sets the hostname of the underlying OS once it's started. This is purely cosmetic, but we find it to be a good practice as it helps keep things organized. In this case we gave the hostname an FQDN name, but it can be whatever you choose - as long as the following is respected:

💡
Each element of the hostname must be from 1 to 63 characters long and the entire hostname, including the dots, can be at most 253 characters long. Valid characters for hostnames are ASCII(7) letters from a to z, the digits from 0 to 9, and the hyphen (-).

hostname is an optional parameter.

Next, we will deal with environment variables. In most cases each container will have its own specific environment variables and they will usually be listed with the container itself. In this case, we will specify some of the variables that are most widely used and are available in most containers:

    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Toronto

These variables define what user and what group the container will run as. This has a 1:1 relationship with the host system; if there are user and group on the host system that have the same ID, they will be owner/group of the container files. Usually you need to be very careful with these settings as they will define who owns files and that also means who can read them and execute them. If you're not sure, leave these variables out of your compose file - i.e. don't define them.

One parameter that usually isn't harmful to the container is TZ or timezone. This allows us to set a timezone for the container so that the logs are properly timestamped.

In general, a good rule of thumb is to define only environment variables that the app supports, and if you want, then you can start adding system-related environment variables. Do note, though, that sometimes containers will have internally pre-defined system environment variables and once you run the container it may already create files with certain permissions that can't be changed without root privileges.

environment is an optional parameter unless specified differently in the app documentation.

Which actually leads us to our next section, volumes. You may have started your docker journey with simple, ephemeral docker containers. Or, simply put, docker containers that exist only while they're running - once they are shut down, all the data goes away.

Here is where volumes jump in. They allow us to define storage for our containers by mapping host storage resources to internal, container resources. The mapping rule is simple:

host_storage:container_storage

There are multiple ways on how to define volumes on docker hosts, but we prefer to do it by using host volumes. These are nothing more than host directories mounted to the container itself. This also helps us keep everything organized and also helps with backing up containers and moving them if necessary to another host. You can read more about volumes here:

Volumes
Learn how to create, manage, and use volumes instead of bind mounts for persisting data generated and used by Docker.

When defining host volumes (directories on the host), we can use system variables if we want, but we prefer to use absolute path. This is also needed in case we will utilize git in the future and then use Portainer to pull docker-compose files from git to deploy apps. If you don't plan on doing that, you can stick with defining relative paths in your host volumes. In our case we have two volumes that we map to the container:

      - /absolute/path/web:/usr/share/nginx/html
      - /etc/localtime:/etc/localtime:ro

First one uses absolute path to the web subdirectory on the host system and then maps it to the /usr/share/nginx/html directory in the container. Please note that for this specific image in this example, nginx, default path for the web root directory for serving files is exactly this - it may be different in some other nginx images, especially if they are not official images (they may look something like /var/www/html or similar). What this now allows us to do is simple - whatever we put in the web sub-directory, will be served as part of the web root directory for serving files in the container itself. If we were to create a simple index.html file and then move it to the web subdirectory, that page would be displayed whenever we accessed the container address in the browser.

Second entry is used in case TZ environment variable is not supported by the container we use. It basically maps the /etc/localtime file from the host to the container and hardcodes the timezone to the container. This is usually not a problem with most containers, but we've had experience with some containers that absolutely didn't want to have time and/or timezone hardcoded and needed to rely on UTC as a timezone.

volumes is an optional parameter unless specified differently in the app documentation.

So how to we actually access the container app from outside? This is where the next parameter comes handy:

    ports:
      - 80:80

Similar to volumes, ports allows us to map TCP (by default) and UDP (has to be explicitly specified) ports between the host and the container.

Why is this important? First of all, there's a limited number of available ports on any given system - 65536 ports. A lot of them are reserved, but that depends on the operating system you're running. You may already have apps and services running on your host system that are consuming some of those ports - a rule is that a port that's already consumed on the system cannot be used for other services.

In a nutshell, an app usually needs a port assigned with it so that it can listen for requests on that port and serve the clients when they request data. This is how most of the apps and services on the Internet operate. Some apps/services will have pre-defined (well-known) ports that are always associated with those services, but a lot of them will not. Most of the time apps/images that you want to serve via docker will have information in their documentation on what port they will expect to listen for requests, and sometimes they will even allow changing the listening port via environment variables.

So, once again, why is all of this important? Well, what if you have two apps that want to listen on the same port? Let's say that you want to run two instances of the webserver nginx. How will you access them if the docker engine doesn't know where to forward the traffic once it hits the host?

💡
Remember, unless you are running docker engine on your desktop machine, you will most likely want to accesss docker container through the IP address of the host (well, you can do that even if you're running docker engine on your desktop machine, but usually you would use localhost to access the container via your browser, not the IP address, but that's also another story).

In both of these cases, port mapping is what will resolve the issue here. Port mapping will basically tell docker engine to listen on external host port and map it to internal container port. In our case, we have a simple 80:80 mapping, which means that whatever traffic comes to port 80 on the IP address of the host, it will be mapped to port 80 on the docker container, and if there is a web server running on it, it will serve whatever file is in its web root directory. If we now want to add an additional web server, we can no longer use external port number 80, but we can still use 80 on the internal (container) side, as each container is ran independently. So, another, similar container, would have port mapping that could look something like this: 8080:80. This now means that whatever hits the port 8080 on the external, host IP address, it will be served by this particular container that internally runs on port 80.

ports is an optional parameter unless specified differently in the app documentation.

Finally, we have another set of network parameters:

    dns:
      - 10.5.0.1
    networks:
      ext-net:

These are quite simple - dns allows us to set DNS servers that the container will use, while networks will define what network the container will belong to.

dns and network are optional parameters; while dns is a good practice if you're running your own DNS server in your network and want to make sure that all services have synced DNS between them, defining network is more than good practice. By defining network parameter, we can specifically put a container in a pre-defined network if we want to - whether it's because we want to use a network of a specific type, specific addressing, or whatever the reason is - it's a good practice.

Last, global part of the YAML file is network configuration:

networks:
  ext-net:
    external: true

This part basically defines what networks are defined in the compose file and wheteher they are internal or external. You can read more about network types here:

Networking overview
Overview of Docker networks and networking concepts
💡
Important to remember is that, if we want to allow access to the container from outside of the host, network needs to be external.

And this is it - once you have defined all the mandayory parameters, you can start your container (it is implied that the file's name will be docker-compose.yml) by running a command from the same location where the docker-compose.yml file is:

docker compose up -d

or on some systems that use older plugin, it will be:

docker-compose up -d

Personally, I started my docker journey with docker-compose. It just made sense to start everything by organizing my containers in simple docker-compose files. It did make the learning curve a little bit steeper as I needed to learn all of the things related to docker-compose right from the start instead of learning them as I go, but I believe that the journey was worth it. We will be discussing more about organising our docker collection in one of the future articles that are part of our Starting a homelab collection.