Trying to create a Docker image with a MongoDB database, but the container does not have it even though it loaded successfully - mongodb

The data in the database is intended to be surfaced by an API in another container. Previously, I have successfully loaded the database during run using this suggestion. However, my database is quite large (10gb) and ideally I would not have to load the database again each time I start a new container. I want the database to be loaded on build. To accomplish this, I tried the following for my Dockerfile:
FROM mongo:4.0.6-xenial
COPY dump /data/dump
RUN mongod --fork --logpath /var/log/mongod.log \
&& mongorestore /data/dump \
&& mongo --eval "db.getSiblingDB('db').createUser({user:'user',pwd:'pwd',roles:['readWrite']})" \
&& mongod --shutdown
I expected the database to be in the container when I ran this image, but it was not, nor does the user exist. However, the log file /var/log/mongod.log indicates that the database loaded successfully as far as I can tell. Why did this not work?

The official mongo Docker image writes the database data in a docker volume.
At run time (thus in a docker container), keep in mind that files written to volumes do not end up written on the container file system. This is done to persist your data so that it survives container deletion, but more importantly in the context of database, for performance reasons. To have good I/O performances with disks, disk operations must be done on a volume, not on the container file system itself.
At build time (thus when creating a docker image), if you happen to have RUN/ADD/COPY directives in your Dockerfile write files to a location which is already declared as a volume, those files will be discarded. However, if you write the files to a directory in your Dockerfile, and only after you declare that directory as a volume, then those the volume will keep those files unless you start your container specifying a volume with the docker run -v option.
This means that in the case your own Dockerfile is built FROM mongo, the /data location is already declared as a volume. Writing files to that location is pointless.
What can be done?
Make you own mongo image from scratch
Knowing how volumes works, you could copy the contents from the Dockerfile of the official mongo Docker image and insert a RUN/ADD/COPY directive to write the files you want to the /data/db location before the VOLUME /data/db /data/configdb directive.
Override the entrypoint
Assuming you have a tar archive named mongo-data-db.tar with the contents of the /data/db location from a mongo container having all the database and collections you want, you could use the following Dockerfile and copy-initial-data-entry-point.sh, you can build an image which will copy those data to the /data/db location every time the container is started. This only make sense in a use case where such a container is used for a test suite which requiers the very same initial data everytime such a container is started as previous data are replaced with the inital data at each start.
Dockerfile:
FROM mongo
COPY ./mongo-data-db.tar /mongo-data-db.tar
COPY ./copy-initial-data-entry-point.sh /
RUN chmod +x /copy-initial-data-entry-point.sh
ENTRYPOINT [ "/copy-initial-data-entry-point.sh"]
CMD ["mongod"]
copy-initial-data-entry-point.sh:
#!/bin/bash
set -e
tar xf /mongo-data-db.tar -C /
exec /usr/local/bin/docker-entrypoint.sh "$#"
In order to extract the contents of a /data/db from the volume of a mongo container named my-mongo-container, proceed as follow:
stop the mongo container: docker stop my-mongo-container
create a temporary container to produce the tar archive from the volume: docker run --rm --volumes-from my-mongo-container -v $(pwd):/out ubuntu tar cvf /out/mongo-data-db.tar
Note that this archive will be quite large as it contains the full contents of the mongo server data including indexes as described on the mongo documentation

Related

A container is a database server. How to ask it's Dockerfile to complete its construction after that container has started?

I am using a postgis/postgis Docker image to set a database server for my application.
The database server must have a tablespace created, then a database.
Then each time another application will start from another container, it will run a Liquibase script that will update the database schema (create tables, index...) when needed.
On a terminal, to prepare the database container, I'm running these commands :
# Run a naked Postgis container
sudo docker run --name ecoemploi-postgis
-e POSTGRES_PASSWORD=postgres
-d -v /data/comptes-france:/data/comptes-france postgis/postgis
# Send 'bash level' commands to create the directory for the tablespace
sudo docker exec -it ecoemploi-postgis
bin/sh -c 'mkdir /tablespace && chown postgres:postgres /tablespace'
Then to complete my step 1, I have to run SQL statements to create the tablespace in a PostGIS point of view, and create the database by a CREATE DATABASE.
I connect myself, manually, under the psql of my container :
sudo docker exec -it ecoemploi-postgis bin/sh
-c 'exec psql -h "$POSTGRES_PORT_5432_TCP_ADDR"
-p "$POSTGRES_PORT_5432_TCP_PORT" -U postgres'
And I run manally these commands :
CREATE TABLESPACE data LOCATION '/tablespace';
CREATE DATABASE comptesfrance TABLESPACE data;
exit
But I would like to have a container created from a single Dockerfile having done all the needed work. The difficulty is that it has to be done in two parts :
One before the container is started. (creating directories, granting them user:group).
One after it is started for the first time : declaring the tablespace and creating the base. If I understand well the base image I took, it should be done after an entrypoint docker-entrypoint.sh has been run ?
What is the good way to write a Dockerfile creating a container having done all these steps ?
The PostGIS image "is based on the official postgres image", so it should be able to use the /docker-entrypoint-initdb.d mechanism. Any files you put in that directory will be run the first time the database container is started. The postgis Dockerfile already uses this directory to install the PostGIS extensions into the default database.
That means you can put your build-time setup directly into the Dockerfile, and copy the startup-time script into that directory.
FROM postgis/postgis:12-3.0
RUN mkdir /tablespace && chown postgres:postgres /tablespace
COPY createdb.sql /docker-entrypoint-initdb.d/20-createdb.sql
# Use default ENTRYPOINT/CMD from base image
For the particular setup you describe, this may not be necessary. Each database runs in an isolated filesystem space and starts with an empty data directory, so there's not a specific need to create an alternate data directory; Docker style is to just run multiple databases if you need isolated storage. Similarly, the base postgres image will create a database for you at first start (named by the POSTGRES_DB environment variable).
In order to run a container, your Dockerfile must be functional and completed.
you must enter the queries in a bash file and in the last line you have to enter an ENTRYPOINT with this bash script

How to copy and use existing postgres data folder into docker postgres container

I want to build postgres docker container for testing some issue.
I have:
Archived folder of postgres files(/var/lib/postgres/data/)
Dockerfile that place folder into doccker postgres:latest.
I want:
Docker image that reset self-state after recreate image.
Container that have database state based on passed into the container postgres files
I don't want to wait for a long time operation of backup and restore existing database in /docker-entrypoint-initdb.d initialization script.
I DON'T WANT TO USE VOLUMES because I don't need to store new data between restart (That's why this post is different from How to use a PostgreSQL container with existing data?. In that post volumes are used)
My suggestion is to copy postgres files(/var/lib/postgres/data/) from host machine into docker's /var/lib/postgres/data/ in build phase.
But postgres docker replace this files when initdb phase is executing.
How to ask Postgres docker not overriding database files?
e.g.
Dockerfile
FROM postgres:latest
COPY ./postgres-data.tar.gz /opt/pg-data/
WORKDIR /opt/pg-data
RUN tar -xzf postgres-data.tar.gz
RUN mv ./data/ /var/lib/postgresql/data/pg-data/
Run command
docker run -p 5432:5432 -e PGDATA=/var/lib/postgresql/data/pg-data --name database-immage1 database-docker
If you don't really need to create a custom image with the database snapshot you could use volumes. Un-tar the database files somewhere on the host say ~/pgdata then run the image. Example:
docker run -v ~/pgdata:/var/lib/postgresql/data/ -p 5432:5432 postgres:9.5
The files must be compatible with the postgres version of the image so use the same image version as the archived database.
If, instead, you must recreate the image you don't need to uncompress the database archive. The ADD instruction will do
that for you. Make sure the tar does not contain any leading directory.
The Dockerfile:
FROM postgres:latest
ADD ./postgres-data.tar.gz /var/lib/postgresql/data/
Build it:
docker build . -t database-docker
Run without overriding the environment variable PGDATA. Note that you copy the files in /var/lib/postgresql/data but the PGDATA points to /var/lib/postgresql/data/pg-data.
Run the container:
docker run -p 5432:5432 --name database-image1 database-docker

Data from Dockerfile RUN step not in image

I'm setting up a MongoDB container to run software integration tests against and I want to have the docker file add the test user to the database so that the tests can log in and execute their steps. I want to be able to capture all of the steps in the Dockerfile so I'm trying to avoid starting the container, manually adding the users, and then recapturing the image.
To that end, I've created this Dockerfile:
FROM mongo:3.2.0
COPY add_user.sh /
RUN /add_user.sh
and add_user.sh contains:
#!/bin/bash
mongod &
RET=1
while [[ RET -ne 0 ]]; do
sleep 1
mongo admin --eval "help" >/dev/null 2>&1
RET=$?
done
echo "Adding testUser..."
mongo admin --eval "db.createUser({user:'testUser',pwd:'P#ssw0rd',roles:['readWrite']})"
echo "User added"
mongo admin --eval "db.getUsers()"
mongod --shutdown
While the image is building, I can see that the user has been successfully added, but when I run the image then the database in the container does not contain any users.
Why isn't the user being captured in the image? How can I add a user during the image build process?
The reason is that the mongo image uses a VOLUME for the data directory, which makes docker store the database data outside of the image, and that data is not persisted in the image (see the Dockerfile for the mongo:3.2 image; https://github.com/docker-library/mongo/blob/fcb9584617e63f1d3db8dc730fb8abb83653c7ad/3.2/Dockerfile#L54)
So, even though the command has run successfully, the database itself is stored outside of the image.
In docker, volumes are used so that the data is persisted independently of a container's lifecycle (i.e., you can destroy a container, but the data inside the volume is kept around, so that you can start a new container, using the same volume/data)
Downsides are, that a you cannot "bake" the data in an image if the Dockerfile uses a VOLUME declaration.
For some discussions around this, see https://github.com/docker/docker/issues/18287, and https://github.com/docker/docker/issues/3465

docker backup and restore mongodb

Create data only container:
docker create -v /mongodb_data --name mongodb_data mongo
Create monogdb container:
docker run --volumes-from mongodb_data --name mongo_db --restart always -d mongo
Then to use mongodb in my own container I use the link command:
--link mongo_db:mongo
So everything works fine. Now I want to backup the mongodb according to the docs: http://docs.docker.com/userguide/dockervolumes/#backup-restore-or-migrate-data-volumes and this command:
docker run --volumes-from mongodb_data -v $(pwd):/backup busybox tar cvf /backup/backup.tar /mongodb_data
However the created tar file has just an empty /mongodb_data folder. The folder contains not a single file.
Any ideas whats wrong? I am using docker 1.7 on ubuntu 14.04 LTS.
The problem is your data only container. You make your volume with path /mongodb_data which doesn't store any mongo db data. By default, the mongo db storage path is /data/db (according to this #Where to Store Data Section)
As the result, your mongodb data is not saved in your data only container. So here is a workaround:
copy /data/db to /mongodb_data in your mongodb container docker exec -it mongo_db bash then cp -r /data/db/* /mongodb_data/
make a backup by following the doc you mentioned
build a new data only container and load the backup
remove current mongo_db container and recreate a new one with the new data only container
OR, you can modify your mongodb config, to change the default directory to /mongodb_data once you copied all data from /data/db to /mongodb_data. You may find this useful Changing MongoDB data store directory
You can use this image that provides docker container for many jobs ( import, export , dump )

How to persist MongoDB data between container restarts?

It is quite easy to run MongoDB containerised using docker. Though each time you start a new mongodb container, you will get new empty database.
What should I do in order to keep the database content between container restarts? I tried to bind external directory to container using -v option but without any success.
I tried using the ehazlett/mongodb image and it worked fine.
With this image, you can easily specify where mongo store its data with DATA_DIR env variable. I am sure it must not be very difficult to change on your image too.
Here is what I did:
mkdir test; docker run -v `pwd`/test:/tmp/mongo -e DATA_DIR=/tmp/mongo ehazlett/mongodb
notice the `pwd` in within the -v, as the server and the client might have different path, it is important to specify the absolute path.
With this command, I can run mongo as many time as I want and the database will always be store in the ./test directory I just created.
When using the official Mongo docker image, which is i.e. version mongo:4.2.2-bionic as writing this answer, and using docker-compose, you can achieve persistent data storage using this docker-compose.yml file example.
In the official mongo image, data is stored in the container under the root directory in the folder /data/db by default.
Map this folder to a folder in your local working directory called data (in this example).
Make sure ports are set and mapped, default 27017-27019:27017-27019.
Example of my docker-compose.yml:
version: "3.2"
services:
mongodb:
image: mongo:4.2.2-bionic
container_name: mongodb
restart: unless-stopped
ports:
- 27017-27019:27017-27019
volumes:
- ./data:/data/db
Run docker-compose up in the directory where the yml file is located to run the mongodb container with persistent storage. If you do not have the official image yet, it will pull it from Dockerhub first.
Old post but may be someone still need quick and easy solution...
The easiest way I found is using binding to volume.
Following that way you can easily attach existing MongoDB data; and it will live even after you destroying the container.
Create volume that points to your folder (may include existing db). In my case it's done under Windows, but you can do it on any file system:
docker volume create --opt type=none --opt o=bind --opt device=d:/data/db db
Create/run docker container with MongoDB using that volume binding:
docker run --name mongodb -d -p 27017:27017 -v db:/data/db mongo