Anatomy of a docker-compose file
Docker is an excellent platform that uses OS-level virtualization to provide software packages that we call containers. Containers are basically everything you need to run an app - whether it's a database, web server, proxy, whatever's packaged in it, an app will have access to it.
Docker can run on various platforms, whether it's bare-metal or a virtual machine. With all of its benefits, one of the major things that it didn't have natively, was a management interface that would allow administrator to have an overview of docker containers that are running and how they were started in the first place. The reason why we said that it didn't have it natively was because docker-compose is/was considered a plug-in to docker and until recently it wasn't installed directly with docker engine itself. Nowadays, it's simply installed via apt install docker-compose-plugin
command.
So, what is a point of docker-compose
in a nutshell? Well, instead of running a kielbasa of a command to start a container, we can nicely organise everything inside of a YAML formatted docker-compose file. One thing to remember, though, is that, being a YAML file, indentation is the king. Basically, no tabs are allowed, just a lot of spacebar hits.
Let's take a look at a sample of a docker-compose file:
version: '3.3'
services:
simple-web-app:
container_name: web-app
image: nginx:latest
hostname: web.domain.com
environment:
- PUID=1000
- PGID=1000
- TZ=America/Toronto
volumes:
- ./web:/usr/share/nginx/html
#or
- /absolute/path/web:/usr/share/nginx/html
- /etc/localtime:/etc/localtime:ro
ports:
- 80:80
dns:
- 10.5.0.1
networks:
ext-net:
networks:
ext-net:
external: true
Let us analyse this sample in a little bit more details. We'll start with the first line:
version: '3.3'
This will define the version of the compose file that we will use. You can read more about compose file versioning here:
Depending on the version of docker
engine you're running, it has limitations and requirements on how the compose file should look like.
Next, we go with:
services:
This is where we will actually start defining our services. Notice that everything one level below this line is considered a name of the service - in our example that would be:
simple-web-app:
Everything indented below this is now defining this service that we called simple-web-app
.
Let's look at the simple ones first:
container_name: web-app
image: nginx:latest
hostname: web.domain.com
First we have container_name
- instead of relying on docker
engine to name our container, we will manually assign a name to it - in this case, we named it web-app
. This is important because if we later reference that container, we have to do it by the container_name
parameter. container_name
is an optional parameter.
Next one is image
. This is where the image will be pulled from. By default, docker will use Docker's integrated hub, hub.docker.com, but other docker repositories can be used as well. In this case, it will try to pull an image with name nginx
, and because we specified a tag with it (latest
) it will look for that specific image as well. Basically, images in a repository are tagged so that we can be very specific on which image we want to pull from the repository - this is the most simplistic explanation possible.
Finally, we have a hostname
parameter - this is because each image we pull actually runs a limited version of an OS to start the app when that OS is started. This allows complete container vs. host isolation - whatever's run in the container has no bearing on what's happening on the host (unless we expose host to the container, but that's another topic). What hostname
parameter does is that it actually sets the hostname of the underlying OS once it's started. This is purely cosmetic, but we find it to be a good practice as it helps keep things organized. In this case we gave the hostname an FQDN name, but it can be whatever you choose - as long as the following is respected:
hostname
is an optional parameter.
Next, we will deal with environment variables
. In most cases each container will have its own specific environment variables and they will usually be listed with the container itself. In this case, we will specify some of the variables that are most widely used and are available in most containers:
environment:
- PUID=1000
- PGID=1000
- TZ=America/Toronto
These variables define what user and what group the container will run as. This has a 1:1
relationship with the host system; if there are user and group on the host system that have the same ID, they will be owner/group of the container files. Usually you need to be very careful with these settings as they will define who owns files and that also means who can read them and execute them. If you're not sure, leave these variables out of your compose file - i.e. don't define them.
One parameter that usually isn't harmful to the container is TZ
or timezone. This allows us to set a timezone for the container so that the logs are properly timestamped.
In general, a good rule of thumb is to define only environment variables that the app supports, and if you want, then you can start adding system-related environment variables. Do note, though, that sometimes containers will have internally pre-defined system environment variables and once you run the container it may already create files with certain permissions that can't be changed without root privileges.
environment
is an optional parameter unless specified differently in the app documentation.
Which actually leads us to our next section, volumes
. You may have started your docker journey with simple, ephemeral docker containers. Or, simply put, docker containers that exist only while they're running - once they are shut down, all the data goes away.
Here is where volumes
jump in. They allow us to define storage for our containers by mapping host storage resources to internal, container resources. The mapping rule is simple:
host_storage:container_storage
There are multiple ways on how to define volumes on docker hosts, but we prefer to do it by using host volumes
. These are nothing more than host directories mounted to the container itself. This also helps us keep everything organized and also helps with backing up containers and moving them if necessary to another host. You can read more about volumes here:
When defining host volumes (directories on the host), we can use system variables if we want, but we prefer to use absolute path. This is also needed in case we will utilize git
in the future and then use Portainer to pull docker-compose
files from git
to deploy apps. If you don't plan on doing that, you can stick with defining relative paths in your host volumes. In our case we have two volumes that we map to the container:
- /absolute/path/web:/usr/share/nginx/html
- /etc/localtime:/etc/localtime:ro
First one uses absolute path to the web
subdirectory on the host system and then maps it to the /usr/share/nginx/html
directory in the container. Please note that for this specific image in this example, nginx
, default path for the web root directory for serving files is exactly this - it may be different in some other nginx
images, especially if they are not official images (they may look something like /var/www/html
or similar). What this now allows us to do is simple - whatever we put in the web
sub-directory, will be served as part of the web root directory for serving files in the container itself. If we were to create a simple index.html
file and then move it to the web
subdirectory, that page would be displayed whenever we accessed the container address in the browser.
Second entry is used in case TZ
environment variable is not supported by the container we use. It basically maps the /etc/localtime
file from the host to the container and hardcodes the timezone to the container. This is usually not a problem with most containers, but we've had experience with some containers that absolutely didn't want to have time and/or timezone hardcoded and needed to rely on UTC as a timezone.
volumes
is an optional parameter unless specified differently in the app documentation.
So how to we actually access the container app from outside? This is where the next parameter comes handy:
ports:
- 80:80
Similar to volumes
, ports
allows us to map TCP (by default) and UDP (has to be explicitly specified) ports between the host and the container.
Why is this important? First of all, there's a limited number of available ports on any given system - 65536
ports. A lot of them are reserved, but that depends on the operating system you're running. You may already have apps and services running on your host system that are consuming some of those ports - a rule is that a port that's already consumed on the system cannot be used for other services.
In a nutshell, an app usually needs a port assigned with it so that it can listen for requests on that port and serve the clients when they request data. This is how most of the apps and services on the Internet operate. Some apps/services will have pre-defined (well-known) ports that are always associated with those services, but a lot of them will not. Most of the time apps/images that you want to serve via docker will have information in their documentation on what port they will expect to listen for requests, and sometimes they will even allow changing the listening port via environment variables.
So, once again, why is all of this important? Well, what if you have two apps that want to listen on the same port? Let's say that you want to run two instances of the webserver nginx
. How will you access them if the docker engine doesn't know where to forward the traffic once it hits the host?
In both of these cases, port mapping is what will resolve the issue here. Port mapping will basically tell docker engine to listen on external host port and map it to internal container port. In our case, we have a simple 80:80
mapping, which means that whatever traffic comes to port 80
on the IP address of the host, it will be mapped to port 80
on the docker container, and if there is a web server running on it, it will serve whatever file is in its web root directory. If we now want to add an additional web server, we can no longer use external port number 80
, but we can still use 80
on the internal (container) side, as each container is ran independently. So, another, similar container, would have port mapping that could look something like this: 8080:80
. This now means that whatever hits the port 8080
on the external, host IP address, it will be served by this particular container that internally runs on port 80
.
ports
is an optional parameter unless specified differently in the app documentation.
Finally, we have another set of network parameters:
dns:
- 10.5.0.1
networks:
ext-net:
These are quite simple - dns
allows us to set DNS servers that the container will use, while networks
will define what network the container will belong to.
dns
and network
are optional parameters; while dns
is a good practice if you're running your own DNS server in your network and want to make sure that all services have synced DNS between them, defining network
is more than good practice. By defining network
parameter, we can specifically put a container in a pre-defined network if we want to - whether it's because we want to use a network of a specific type, specific addressing, or whatever the reason is - it's a good practice.
Last, global part of the YAML
file is network configuration:
networks:
ext-net:
external: true
This part basically defines what networks are defined in the compose file and wheteher they are internal or external. You can read more about network types here:
And this is it - once you have defined all the mandayory parameters, you can start your container (it is implied that the file's name will be docker-compose.yml
) by running a command from the same location where the docker-compose.yml
file is:
docker compose up -d
or on some systems that use older plugin, it will be:
docker-compose up -d
Personally, I started my docker journey with docker-compose
. It just made sense to start everything by organizing my containers in simple docker-compose files. It did make the learning curve a little bit steeper as I needed to learn all of the things related to docker-compose right from the start instead of learning them as I go, but I believe that the journey was worth it. We will be discussing more about organising our docker collection in one of the future articles that are part of our Starting a homelab collection.
Member discussion