dockerized postgresql with volumes - postgresql

i am relatively new to docker. I'd like to set up a postgres database but I wonder how to make sure that the data isn't being lost if I recreated the container.
Then I stumbled over named volumes (not bind volumes) and how to use them.
But... in a Dockerfile you can't use named volumes. E.g. data:/var/lib etc.
As I understood using a Dockerfile it's always an anonymous volume.
So every single time I'd recreate a container it would get its
own new volume.
So here comes my question:
Firstly: how do I make sure, if the container get's updated or recreated that the postgres database from within the new container references to the same data and not losing the reference to the previously created anonymous volume.
Secondly: how does this work with a yml file?
is it possible to reference multiple replicas of such a database container to one volume? (High Availability Mode)?
It would really be great if someone could get me a hint or best practices.
Thank you in advance.

Looking at the Dockerfile for Postgres, you see that it declares a volume instruction:
VOLUME /var/lib/postgresql/data
Everytime you run a new Postgres container, without specifying a --volume option, docker automatically creates a new volume. The volume is given a random name.
You can see all volumes by running the command:
docker volume ls
You can also inspect the files stored on the host by the volume, by inspecting the host path using:
docker volume inspect <volume-name>
So when you don't specify the --volume option for the run command, docker create volumes for all volumes declared in the Dockerfile. This is mainly a safety if you forget to name your volume and the data shouldn't be lost.
Firstly: how do I make sure, if the container get's updated or
recreated that the postgres database from within the new container
references to the same data and not losing the reference to the
previously created anonymous volume.
If you want docker to use the same volume, you need to specify the --volume option. Once specified, docker won't create a new volume and it will simply mount the existing volume onto the specified folder in the docker command.
As a best practice, name your volumes that have valuable data. For example:
docker run --volume postgresData:/var/lib/postgresql/data ...
If you run this command for the first time the volume postgresData will be created and will backup /var/lib/postgresql/data on the host. The second time you run it the same data backed up on the host will be mounted onto the container.
Secondly: how does this work with a yml file? is it possible to
reference multiple replicas of such a database container to one
volume?
Yes, volumes can be shared between multiple containers. You can mount the same volume onto multiple containers, and the containers will use the same files. Docker compose allows you to do that ...
However, beware that volumes are limited to the host they were created. When running containers on multiple machines, the volume needs to be accessible from all the machines. There are ways/tools to achieve that
but they are a bit complex. This is still a limitation to be addressed in Docker.

Related

PostgresSQL Docker image without a volume mount

For automated testing we can't use a DB Docker container with a defined volume. Just wondering if there would be available an "offical" Postgres image with no mounted volume or volume definitions.
Or if someone has a Dockerfile that would create a container without any volume definitions, that would be very helpful to see or try to use one.
Or is there any way to override a defined volume mount and just use datafile inside of to be created Docker container with running DB.
I think you are mixing up volumes and bind mounts.
https://docs.docker.com/storage/
VOLUME Dockerfile command: A volume with the VOLUME command in a Dockerfile is created into the docker area on the host that is /var/lib/docker/volumes/.
I don't think it is possible to run docker without it having access to this directory or it would be not advisable to restrict permission of docker to these directories, these are dockers own directories after all.
So postgres dockerfile has this command in dockerfile, for example: https://github.com/docker-library/postgres/blob/master/15/bullseye/Dockerfile
line 186: VOLUME /var/lib/postgresql/data
This means that the /var/lib/postgresql/data directory that is inside the postgres container will be a VOLUME that will be stored on the host somewhere in /var/lib/docker/volumes/somerandomhashorguid..... in a directory with a random name.
You can also create a volume like this with docker run:
docker run --name mypostgres -e POSTGRES_PASSWORD=password -v /etc postgres:15.1
This way the /etc directory that is inside the container will be stored on the host in the /var/lib/docker/volumes/somerandomhashorguid.....
This volume solution is needed for containers that need extra IO, because the files of the containers (that are not in volumes) are stored in the writeable layer as per the docs: "Writing into a container’s writable layer requires a storage driver to manage the filesystem. The storage driver provides a union filesystem, using the Linux kernel. This extra abstraction reduces performance as compared to using data volumes, which write directly to the host filesystem."
So you could technically remove the VOLUME command from the postgres dockerfile and rebuild the image for yourself and use that image to create your postgres container but it would have lesser performance.
Bind mounts are the type of data storage solution that can be mounted to anywhere on the host filesystem. For example if you would run:
docker run --name mypostgres -e POSTGRES_PASSWORD=password -v /tmp/mypostgresdata:/var/lib/postgresql/data postgres:15.1
(Take not of the -v flag here, there is a colon between the host and the container directory while previously in the volume version of this flag there was no host directory and no colon either.)
then you would have a directory created on your docker host machine /tmp/mypostgresdata and the directory of the container of /var/lib/postgresql/data would be mapped here instead of the docker volumes internal directory /var/lib/docker/volumes/somerandomhashorguid.....
My general rule of thumb would be to use volumes - as in /var/lib/docker/volumes/ - whenever you can and deviate only if really necessary. Bind mounts are not flexible enough to make an image/container portable and the writable container layer has less performance than docker volumes.
You can list docker volumes with docker volume ls but you will not see bind mounted directories here. For that you will need to do docker inspect containername
"You could just copy one of the dockerfiles used by the postgres project, and remove the VOLUME statement. github.com/docker-library/postgres/blob/… –
Nick ODell
Nov 26, 2022 at 18:05"
answered Nick abow.
And that edited Dockerfile would build "almost" Docker Official Image.

How to add an already build docker container to docker-compose?

I have a container called "postgres", build with plain docker command, that has a configured PostgreSQL inside it. Also, I have a docker-compose setup with two services - "api" and "nginx".
How to add the "postgres" container to my existing docker-compose setup as a service, without rebuilding? The PostgreSQL database is configured manually, and filled with data, so rebuilding is a really, really bad option.
I went through the docker-compose documentation, but found no way to do this without a re-build, sadly.
Unfortunately this is not possible.
You don't refer containers on docker-compose, you use images.
You need to create a volume and/or bind mount it to keep your database data.
This is because containers do not save data, if you have filled it with data and did not make a bind mount or a volume to it, you will lose everything on using docker container stop.
Recommendation:
docker cp
Docker cp will copy the contents from container to host. https://docs.docker.com/engine/reference/commandline/container_cp/
Create a folder to save all your PostgreSQL data (ex: /home/user/postgre_data/)
Save the contents of your PostgreSQL container data to this folder (docker hub postgres page for further reference: ;
Run a new PostgreSQL (same version) container with a bind mount poiting to the new folder;
This will maintain all your data and you will be able to volume or bind mount it to use on docker-compose.
Reference of docker-compose volumes: https://docs.docker.com/compose/compose-file/#volumes
Reference of postgres docker image: https://hub.docker.com/_/postgres/
Reference of volumes and bind mounts: https://docs.docker.com/storage/bind-mounts/#choosing-the--v-or---mount-flag
You can save this container in a new image using docker container commit and use that newly created image in your docker-compose
docker container commit [OPTIONS] CONTAINER [REPOSITORY[:TAG]]
I however prefer creating images with the use of Dockerfiles and scripts to fill my data etc.

Docker postgres persistance and container lifetime

I'm new to docker. You can take a look at my last questions here and see that I've been asking questions down this line. I read the docs carefully, and also read several articles on the web (which is pretty difficult given the rapid versioning in docker), but I still can't get a clear picture of how am I supposed to use containers and its impact on persistance.
The official postgres image creates a volume in its Dockerfile using this command
VOLUME /var/lib/postgresql/data
And the readme.md file shows only one example of how to run the image
docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres
When I try that, I can see (with "docker inspect some-postgres") that the volume created lives in a random directory in my host, and it seems to "belong" to that particular container.
So here are some questions that may help my understanding:
It looks (from the official postgres image docs) that expected usage is to use "docker run" to create the container, and "docker start" afterwards (this last bit I inferred from the fact that -d and --name are used). This makes sense to me, but conflicts with a lot of information I've seen regarding containers should be ephemeral. If spin a new container every time, then the default VOLUME config in the Dockerfile doesn't work for persistance. What's the right way of doing things?
Given the above is correct (that I can run once and start many times), the only reason I see for the VOLUME command in the Dockerfile is I/O performance because of the CoW filesystem bypass. Is this right?
Could you please clearly explain what's wrong with using this approach over the (I think unofficially) recommended way of using a data container? I'd like to know the pros/cons to my specific situation, which is a node js intranet application.
Thanks,
Awer
You're correct that you can start the container using 'docker run' and start it again in the future using 'docker start' assuming you haven't removed the container. You're also correct that docker containers are supposed to be ephemeral and you shouldn't be in bad shape if the container disappears. What you can do is mount a volume into the docker container to the storage location of the database.
docker run -v /postgres/storage:/container/postgres --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres
If you know the location of where the database writes to inside the container you can mount it correctly and then even if you remove the postgres container, when you start back up all your data will persist. You may need to mount some other areas that control configurations as well unless you modify and save the container.

Reliability of Docker containers

My question aims at verifying and maybe rectifying my idea of the reliability of Docker containers. I read both, the Docker documentation and several articles on VOLUME in the Dockerfile and --v as an argument when running a container as means to persist data outside a Docker container. Be it in a data container or on the host system. As would like to keep the complexity of my setup simple, I would prefer not to copy/save/store data round and about but keep it in the Docker container itself.
There are several cases through which I discovered the behaviour of Docker containers. I'd like to know if I missed a scenario where a container can be 100% lost unpurposely, i.e. NOT doing $ docker rm -f mycontainer
docker commands to pause, stop and kill a container
-> restartable by $ docker restart mycontainer or $ docker run mycontainer
Host system reboot
-> docker container exits with 0 or 255
Host system unexpected power off
-> What happens?
Application exception
-> docker container exits with -1
Updating or restarting docker (as pointed out by Greg)
-> expected behavior: like on system reboot (?)
In all those cases, the docker container is still existent in the end. So is there any other scenario that can cause a docker container to be lost like with $ docker rm -f mycontainer?
The background is, that I read a lot about mounted volumes and external datastorage on the host system for Postgres but I'd like to avoid storing data outside my containers on the host system if possible. On the other hand, I don't want to wake up and have all data lost. (I do perform regular SQL-dumps, but I don't want to do this every 5 minutes). If a docker container itself is not reliable for persistant data, I don't see why I should create a second container to hold the data for a first one and increase the complexity of my system by adding a new container but not gaining anything in terms of reliability.
Edit: There are two points in the Docker userguide on Volumes which do not explicitly explain which behaviour to expect and therefore making me question if these concepts provide extra reliability:
Changes to a data volume will not be included when you update an
image
-> Does that mean that they get lost or that the content of the volume won't be changed?
Volumes persist until no containers use them
-> What's the definition of 'use'? As long as a container is not stopped, killed, removed? Does that mean that the volume Docker created on the host system will get removed? Or does volume only refer to a virtual bridge between a directory inside Docker and one on the host system?
If you store all your data in the container, what are you going to do when you need to update the image? Updates to images are normally done by changing the Dockerfile and rebuilding the image. If my data is kept separate to my container, I can start a new version of the image, mount the data with --volumes-from or -v and kill the old container. In your case, you have to keep the container running and try to patch in place with something like puppet.
Also, I'm not sure what you think you're saving. If you run the official postgres image, it will have declared volumes in the Dockerfile. Those volumes exist as normal directories on your host system whether you ran the container with -v or not. Even if your Dockerfile has no volumes, clearly the UFS is being stored on your host anyway.
In general, you should consider containers to be temporary and stateless. Whilst you don't have to do this, you will find most of the tooling and support services are designed around this idiom.
Regarding your scenarios, there are a few you're missing:
A bug could make it impossible to restart a stopped container
The updating issue mentioned above
If you want to change storage driver. This will cause a great deal of problems, as you need to migrate your images.
Just for clarity on the commands, docker start will restart stopped or exited containers and docker unpause will unpause paused containers.

The right way to move a data-only docker container from one machine to another

I have a database docker container that is writing its data to another data-only container. The data-only container has a volume where it stores the data of the database. Is there a "docker" way of migrating this data-only container from one machine to another? I read about docker save and docker load but these commands save and load images, not containers. I want to be able to package the docker container along with its volumes and move it to another machine.
Checkout the flocker project. Very interesting solution to this problem, using ZFS to snapshot and replicate the storage volume between hosts.