I am looking at sample dockerfile to see how VOLUME is used , I come across the following lines from - https://github.com/docker-library/postgres/blob/master/Dockerfile-alpine.template
ENV PGDATA /var/lib/postgresql/data
# this 777 will be replaced by 700 at runtime (allows semi-arbitrary "--user" values)
RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 777 "$PGDATA"
VOLUME /var/lib/postgresql/data
What is the purpose of using a volume here , here is my understanding - please confirm
Create directory pointed by $PGDATA in image file system.
Map it with the VOLUME so that any content created later as part of populating the content thorough docker-entrypont.sh by exposing a predefined directory that could be used by the container.
What if the VOLUME instr is not defined ? It might more laborious for someone to figure out where to keep custom changes unless VOLUME is not defined
Volume is define here, so when you start a container ( out of this image ) a new anonymous volume is created.
The volume will hold your sensible data in this regard, so this is all you need to "persist" during normal/soft docker image lifecycled.
Usually when the maintainers of docker images are already aware where the data, which will be sensible to keep, is located ( like here ) there will decorate the folder using VOLUME in the Dockerfile. This will, as mentioned, create a anon-volume during runtime but also makes you aware ( using docker inspect or reading the Dockerfile ) where volumes for persistence are located.
In production you usually will used a named volume / path mount in your docker-compose file mounted to this very folder
docker-compose.yml as named volume
volumes:
mydbdata:/var/lib/postgresql/data
docker-compose.yml as path
volumes:
./local/path/data:/var/lib/postgresql/data
There are actually cons in defining such VOLUME definitions in the Dockerfile, which i will not elaborate here, but the main reason is "lifetime".
Having no VOLUME in the Dockerfile and running
docker-compose up -d
# do something, manipulate the data
docker-compose down
# all your data would be lost when starting again
docker-compose up -d
Would remove not only the running container, but all your DB data, which might not what you intended ( you just wanted to recreated the container ).
With VOLUME in the Dockerfile, the anon-volume would be persisted even over docker-compose down
Related
I am beginner docker user.
I installed docker and postgresql in Mac OS.
and why most of documents mention the directory
/var/lib/postgresql/data as an volume setting???
Because in my local directory, there is not existed /var/lib/postgresql..
Is it default option? or am I missing something?
Yes, correct, /var/lib/postgresql does not exist on your local computer, but it does in the created container. The volumes parameter is used to associate the local data with the container data, in order to preserve the data in case the container crashes
For example:
volumes:
- ./../database/main:/var/lib/postgresql/data
Above we link the local directory from the left side to the container directory
If you are using official PostgreSQL image from Docker Hub, then you can check the contents of its Dockerfile. E.g. here is a fragment of postgres:15 image responsible for data directory:
ENV PGDATA /var/lib/postgresql/data
# this 777 will be replaced by 700 at runtime (allows semi-arbitrary "--user" values)
RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 777 "$PGDATA"
VOLUME /var/lib/postgresql/data
As you can see Postgres is configured to have data in that directory. And to persist the data even if container is stopped and removed, the volume is created. Volumes have lifetime independent of the container which allows them to "survive".
For automated testing we can't use a DB Docker container with a defined volume. Just wondering if there would be available an "offical" Postgres image with no mounted volume or volume definitions.
Or if someone has a Dockerfile that would create a container without any volume definitions, that would be very helpful to see or try to use one.
Or is there any way to override a defined volume mount and just use datafile inside of to be created Docker container with running DB.
I think you are mixing up volumes and bind mounts.
https://docs.docker.com/storage/
VOLUME Dockerfile command: A volume with the VOLUME command in a Dockerfile is created into the docker area on the host that is /var/lib/docker/volumes/.
I don't think it is possible to run docker without it having access to this directory or it would be not advisable to restrict permission of docker to these directories, these are dockers own directories after all.
So postgres dockerfile has this command in dockerfile, for example: https://github.com/docker-library/postgres/blob/master/15/bullseye/Dockerfile
line 186: VOLUME /var/lib/postgresql/data
This means that the /var/lib/postgresql/data directory that is inside the postgres container will be a VOLUME that will be stored on the host somewhere in /var/lib/docker/volumes/somerandomhashorguid..... in a directory with a random name.
You can also create a volume like this with docker run:
docker run --name mypostgres -e POSTGRES_PASSWORD=password -v /etc postgres:15.1
This way the /etc directory that is inside the container will be stored on the host in the /var/lib/docker/volumes/somerandomhashorguid.....
This volume solution is needed for containers that need extra IO, because the files of the containers (that are not in volumes) are stored in the writeable layer as per the docs: "Writing into a container’s writable layer requires a storage driver to manage the filesystem. The storage driver provides a union filesystem, using the Linux kernel. This extra abstraction reduces performance as compared to using data volumes, which write directly to the host filesystem."
So you could technically remove the VOLUME command from the postgres dockerfile and rebuild the image for yourself and use that image to create your postgres container but it would have lesser performance.
Bind mounts are the type of data storage solution that can be mounted to anywhere on the host filesystem. For example if you would run:
docker run --name mypostgres -e POSTGRES_PASSWORD=password -v /tmp/mypostgresdata:/var/lib/postgresql/data postgres:15.1
(Take not of the -v flag here, there is a colon between the host and the container directory while previously in the volume version of this flag there was no host directory and no colon either.)
then you would have a directory created on your docker host machine /tmp/mypostgresdata and the directory of the container of /var/lib/postgresql/data would be mapped here instead of the docker volumes internal directory /var/lib/docker/volumes/somerandomhashorguid.....
My general rule of thumb would be to use volumes - as in /var/lib/docker/volumes/ - whenever you can and deviate only if really necessary. Bind mounts are not flexible enough to make an image/container portable and the writable container layer has less performance than docker volumes.
You can list docker volumes with docker volume ls but you will not see bind mounted directories here. For that you will need to do docker inspect containername
"You could just copy one of the dockerfiles used by the postgres project, and remove the VOLUME statement. github.com/docker-library/postgres/blob/… –
Nick ODell
Nov 26, 2022 at 18:05"
answered Nick abow.
And that edited Dockerfile would build "almost" Docker Official Image.
I have the following docker-compose file
version: '3.7'
volumes:
postgres-data:
services:
postgres:
environment:
- POSTGRES_PASSWORD=mypwd
- POSTGRES_USER=randomuser
image: 'postgres:14'
restart: always
volumes:
- './postgres-data:/var/lib/postgresql/data'
I seem to have multiple issues regarding the volume:
A folder named postgres-data is created in the docker-compose file location when I run up, though it seems that for other images, they get placed in the /var/lib/docker/volumes folder instead (without creating such a folder). Is this expected ? Is it a good practice to have the volume folder created in the same location as the docker-compose file, instead of the /var/lib/docker/volumes folder ?
This folder has weird ownership, I can't get into it as my current user (though I am in the docker group).
I tried reading the image documentation, especially the "Arbitrary --user Notes", but didn't understand what to do with it. I also tried not setting the POSTGRES_USER (which then defaults to postgres), but the result is the same.
What's the correct way to create a volume using this image ?
Your volume mount is explicitly to a subdirectory of the current directory
volumes:
- './postgres-data:/var/lib/postgresql/data'
# ^^ (a slash before the colon always means a bind mount)
If you want to use a named volume you need to declare that at the top level of the Compose file, and refer to the volume name (without a slash) when you use it
volumes:
postgres-data:
services:
...
volumes:
- 'postgres-data:/var/lib/postgresql/data'
# ^^ (no slash)
One isn't really "better" than the other for this case. A bind-mounted host directory is much easier to back up; a named volume will be noticeably faster on MacOS or Windows; you can directly see and edit the files with a bind mount; you can use the Docker ecosystem to clean up named volumes. For a database in particular, seeing the data files isn't very useful and I might prefer a named volume, but that's not at all a strong preference.
File ownership for bind mounts is a frequent question. On native Linux, the numeric user ID is the only thing that matters for permission checking. This is resolved by the /etc/passwd file into a username, but the host and container have different copies of this file (and that's okay). The unusual owner you're seeing with ls -l from the host matches the numeric uid of the default user in the postgres image.
That image is well-designed, though, and the upshot of the section in the Docker Hub documentation is that you can specify any Compose user: you want, probably matching the host uid owning the directory.
sudo rm -rf ./postgres-data # with the wrong owner
id -u # what's my current numeric uid?
version: '3.8'
services:
postgres:
volumes: # using a host directory
- './postgres-data:/var/lib/postgresql/data'
user: 1000 # matches the `id -u` output
The data in the database is intended to be surfaced by an API in another container. Previously, I have successfully loaded the database during run using this suggestion. However, my database is quite large (10gb) and ideally I would not have to load the database again each time I start a new container. I want the database to be loaded on build. To accomplish this, I tried the following for my Dockerfile:
FROM mongo:4.0.6-xenial
COPY dump /data/dump
RUN mongod --fork --logpath /var/log/mongod.log \
&& mongorestore /data/dump \
&& mongo --eval "db.getSiblingDB('db').createUser({user:'user',pwd:'pwd',roles:['readWrite']})" \
&& mongod --shutdown
I expected the database to be in the container when I ran this image, but it was not, nor does the user exist. However, the log file /var/log/mongod.log indicates that the database loaded successfully as far as I can tell. Why did this not work?
The official mongo Docker image writes the database data in a docker volume.
At run time (thus in a docker container), keep in mind that files written to volumes do not end up written on the container file system. This is done to persist your data so that it survives container deletion, but more importantly in the context of database, for performance reasons. To have good I/O performances with disks, disk operations must be done on a volume, not on the container file system itself.
At build time (thus when creating a docker image), if you happen to have RUN/ADD/COPY directives in your Dockerfile write files to a location which is already declared as a volume, those files will be discarded. However, if you write the files to a directory in your Dockerfile, and only after you declare that directory as a volume, then those the volume will keep those files unless you start your container specifying a volume with the docker run -v option.
This means that in the case your own Dockerfile is built FROM mongo, the /data location is already declared as a volume. Writing files to that location is pointless.
What can be done?
Make you own mongo image from scratch
Knowing how volumes works, you could copy the contents from the Dockerfile of the official mongo Docker image and insert a RUN/ADD/COPY directive to write the files you want to the /data/db location before the VOLUME /data/db /data/configdb directive.
Override the entrypoint
Assuming you have a tar archive named mongo-data-db.tar with the contents of the /data/db location from a mongo container having all the database and collections you want, you could use the following Dockerfile and copy-initial-data-entry-point.sh, you can build an image which will copy those data to the /data/db location every time the container is started. This only make sense in a use case where such a container is used for a test suite which requiers the very same initial data everytime such a container is started as previous data are replaced with the inital data at each start.
Dockerfile:
FROM mongo
COPY ./mongo-data-db.tar /mongo-data-db.tar
COPY ./copy-initial-data-entry-point.sh /
RUN chmod +x /copy-initial-data-entry-point.sh
ENTRYPOINT [ "/copy-initial-data-entry-point.sh"]
CMD ["mongod"]
copy-initial-data-entry-point.sh:
#!/bin/bash
set -e
tar xf /mongo-data-db.tar -C /
exec /usr/local/bin/docker-entrypoint.sh "$#"
In order to extract the contents of a /data/db from the volume of a mongo container named my-mongo-container, proceed as follow:
stop the mongo container: docker stop my-mongo-container
create a temporary container to produce the tar archive from the volume: docker run --rm --volumes-from my-mongo-container -v $(pwd):/out ubuntu tar cvf /out/mongo-data-db.tar
Note that this archive will be quite large as it contains the full contents of the mongo server data including indexes as described on the mongo documentation
The postgres image, for example, has a volume baked in at /var/lib/postgresl/data, but it isn't bound to a particular host path. I'm wondering if the database work done in this container is wholly encapsulated by committing the container to an image, or if I need to separately pass along the contents of the unbound volume.
Example in commands
Create container vtest based on postgres image:
$ docker run -d --name vtest postgres
The container has a volume at /var/lib/postgresql/data that is not bound to a host path:
$ docker inspect -f '{{ .Volumes }}' vtest
map[/var/lib/postgresql/data:/var/lib/docker/vfs/dir/bc39da05ff1cd044d7a17bba61381e854a948fb70cf39f897247f5ada66ad906]
$ sudo docker inspect -f '{{ .HostConfig.Binds }}' vtest
<no value>
Create a database and add some records in the vtest container. Then, commit the changes to an image to be able to share with others:
$ docker commit -p vtest postgres:vtest
Will the changes made in the vtest container's /var/lib/postgresql/data persist in this new postgres:vtest image?
The volumes mounted in the container are not committed to the image, not matter it is mounted to a particular folder of your host or it's mounted to a folder in /var/lib/docker. In fact, as you show in your message, the volume is mounted to /var/lib/docker/vfs/dir/bc39da05ff1cd044d7a17bba61381e854a948fb70cf39f897247f5ada66ad906 in host machine. You can browse that folder as root user.
If you want to save the data in a volume you would need to use other approach different to committing. One of the more used ones is using data containers (which will create also a folder in /var/lib/container/... with your data), and then saving that volume using a new container an a packing tool like tar. Check Docker documentation related to this topic for further details.