How to restore a Postgresdump while building a Docker image?

How to restore a Postgresdump while building a Docker image? - postgresql

I'm trying to avoid touching a shared dev database in my workflow; to make this easier, I want to have Docker image definitions on my disk for the schemas I need. I'm stuck however at making a Dockerfile that will create a Postgres image with the dump already restored. My problem is that while the Docker image is being built, the Postgres server isn't running.
While messing around in the container in a shell, I tried starting the container manually, but I'm not sure what the proper way to do so. /docker-entrypoint.sh doesn't seem to do anything, and I can't figure out how to "correctly" start the server.
So what I need to do is:
start with "FROM postgres"
copy the dump file into the container
start the PG server
run psql to restore the dump file
kill the PG server
(Steps I don't know are in italics, the rest is easy.)
What I'd like to avoid is:
Running the restore manually into an existing container, the whole idea is to be able to switch between different databases without having to touch the application config.
Saving the restored image, I'd like to be able to rebuild the image for a database easily with a different dump. (Also it doesn't feel very Docker to have unrepeatable image builds.)

This can be done with the following Dockerfile by providing an example.pg dump file:
FROM postgres:9.6.16-alpine
LABEL maintainer="lu#cobrainer.com"
LABEL org="Cobrainer GmbH"
ARG PG_POSTGRES_PWD=postgres
ARG DBUSER=someuser
ARG DBUSER_PWD=P#ssw0rd
ARG DBNAME=sampledb
ARG DB_DUMP_FILE=example.pg
ENV POSTGRES_DB launchpad
ENV POSTGRES_USER postgres
ENV POSTGRES_PASSWORD ${PG_POSTGRES_PWD}
ENV PGDATA /pgdata
COPY wait-for-pg-isready.sh /tmp/wait-for-pg-isready.sh
COPY ${DB_DUMP_FILE} /tmp/pgdump.pg
RUN set -e && \
nohup bash -c "docker-entrypoint.sh postgres &" && \
/tmp/wait-for-pg-isready.sh && \
psql -U postgres -c "CREATE USER ${DBUSER} WITH SUPERUSER CREATEDB CREATEROLE ENCRYPTED PASSWORD '${DBUSER_PWD}';" && \
psql -U ${DBUSER} -d ${POSTGRES_DB} -c "CREATE DATABASE ${DBNAME} TEMPLATE template0;" && \
pg_restore -v --no-owner --role=${DBUSER} --exit-on-error -U ${DBUSER} -d ${DBNAME} /tmp/pgdump.pg && \
psql -U postgres -c "ALTER USER ${DBUSER} WITH NOSUPERUSER;" && \
rm -rf /tmp/pgdump.pg
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
CMD pg_isready -U postgres -d launchpad
where the wait-for-pg-isready.sh is:
#!/bin/bash
set -e
get_non_lo_ip() {
local _ip _non_lo_ip _line _nl=$'\n'
while IFS=$': \t' read -a _line ;do
[ -z "${_line%inet}" ] &&
_ip=${_line[${#_line[1]}>4?1:2]} &&
[ "${_ip#127.0.0.1}" ] && _non_lo_ip=$_ip
done< <(LANG=C /sbin/ifconfig)
printf ${1+-v} $1 "%s${_nl:0:$[${#1}>0?0:1]}" $_non_lo_ip
}
get_non_lo_ip NON_LO_IP
until pg_isready -h $NON_LO_IP -U "postgres" -d "launchpad"; do
>&2 echo "Postgres is not ready - sleeping..."
sleep 4
done
>&2 echo "Postgres is up - you can execute commands now"
For the two "unsure steps":
start the PG server
nohup bash -c "docker-entrypoint.sh postgres &" can take care of it
kill the PG server
It's not really necessary
The above scripts together with a more detailed README are available at https://github.com/cobrainer/pg-docker-with-restored-db

You can utilise volumes.
The postgres image has an enviroment variable you could set with: PGDATA
See docs: https://hub.docker.com/_/postgres/
You could then point a pre created volume with the exact db data that you require and pass this as an argument to the image.
https://docs.docker.com/storage/volumes/#start-a-container-with-a-volume
Alternate solution can also be found here: Starting and populating a Postgres container in Docker

A general approach to this that should work for any system that you want to initialize I remember using on other projects is:
Instead of trying to do do this during the build, use Docker Compose dependencies so that you end up with:
your db service that fires up the database without any initialization that requires it to be live
a db-init service that:
takes a dependency on db
waits for the database to come up using say dockerize
then initializes the database while maintaining idempotency (e.g. using schema migration)
and exits
your application services that now depend on db-init instead of db

Related

Postgres docker: add additional database after initialization

I'm using Postgres within a docker-compose environment to host databases for multiple containers. I basically want to add a database per application directly from docker-compose without the need of manually creating the databases and users. For this, I'm using the init script feature of the Postgres docker image and copy the following bash script by mounting a volume:
db:
image: postgres:9.6
container_name: db
restart: always
volumes:
- postgres-data:/var/lib/postgresql/data
- /opt/docker/pgsql-entrypoint:/docker-entrypoint-initdb.d
environment:
- POSTGRES_PASSWORD={{ vault_pgsql_root_password }}
- POSTGRES_MULTIPLE_DATABASES=confluence-{{ confluence_pgsql_password }},keycloak-{{ keycloak_pgsql_password }},gitlab-{{ gitlab_pgsql_password }},jira-{{ jira_pgsql_password }}
Basically the POSTGRES_MULTIPLE_DATABASESenvironment variable contains all the databases and users that should be created. The password is as follows:
#!/bin/bash
set -e
set -u
function create_user_and_database() {
local database=$1
local password=$2
echo " Creating user and database '$database'"
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" <<-EOSQL
CREATE USER $database WITH PASSWORD '$password';
CREATE DATABASE $database;
GRANT ALL PRIVILEGES ON DATABASE $database TO $database;
EOSQL
}
if [ -n "$POSTGRES_MULTIPLE_DATABASES" ]; then
echo "Multiple database creation requested: $POSTGRES_MULTIPLE_DATABASES"
for entry in $(echo $POSTGRES_MULTIPLE_DATABASES | tr ',' ' '); do
db=$(echo $entry | cut -f1 -d-)
pw=$(echo $entry | cut -f2 -d-)
create_user_and_database $db $pw
done
echo "Multiple databases created"
fi
My problem is: at a certain point (now ;) ) I may want to add an additional service. Just adding an additional pair to the environment variable does not work, as the Postgres image is skipping the init step if data already exists. Is there a way to still achieve this behaviour?
Edit: I should have specified that i want to do it automatically from the compose file, by just changing the environment variable. It’s clear that it can be done manually of course.

You can always connect to the server with a database client and do it manually. If you don't want to expose the database port to the host then you can run the postgres client from the terminal in the postgres container. To open the terminal in the container:
> docker exec -it <container_name> /bin/sh
Then change users to postgres and start the client
# su postgres
postgres#1778e9755f65:/$ psql
Once inside just create a database:https://www.postgresql.org/docs/9.0/sql-createdatabase.html

I ended up not to use docker-compose but using sensible to deploy the database container directly and also make sure the appropriate users, databases and permissions are present. I could not find a meaningful way to do that during startup and environment variables.

Docker volume does not persist data

Here is my docker file:
FROM ubuntu:14.04
RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys B97B0AFCAA1A47F044F244A07FCC7D46ACCC4CF8
RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ precise-pgdg main" > /etc/apt/sources.list.d/pgdg.list
RUN apt-get update && apt-get -y -q install python-software-properties software-properties-common \
&& apt-get -y -q install postgresql-9.3 postgresql-client-9.3 postgresql-contrib-9.3
USER postgres
RUN /etc/init.d/postgresql start \
&& psql --command "CREATE USER pguser WITH SUPERUSER PASSWORD 'pguser';" \
&& createdb -O pguser pgdb
USER root
RUN echo "host all all 0.0.0.0/0 md5" >> /etc/postgresql/9.3/main/pg_hba.conf
RUN echo "listen_addresses='*'" >> /etc/postgresql/9.3/main/postgresql.conf
EXPOSE 5432
RUN mkdir -p /var/run/postgresql && chown -R postgres /var/run/postgresql
VOLUME ["/etc/postgresql", "/var/log/postgresql", "/var/lib/postgresql"]
USER postgres
CMD ["/usr/lib/postgresql/9.3/bin/postgres", "-D", "/var/lib/postgresql/9.3/main", "-c", "config_file=/etc/postgresql/9.3/main/postgresql.conf"]
Here is what I did...
I build the docker image:
docker build --rm=true -t my_image/postgresql:9.3
Then, I create a new directory called data in my current directory and ran the following command:
docker run -i -t -v="data:/data" -p 5432:5432 my_image/postgresql:9.3
I open another terminal and enter the postgres shell by running:
psql -h my_docker_ip -p 5432 -U pguser -W pgdb
and I create a table:
pgdb=# create table test (test_id bigserial primary key);
I verify the table exist using \dt and exit the postgres shell
I terminate the docker process and rerun the following:
docker run -i -t -v="data:/data" -p 5432:5432 my_image/postgresql:9.3
I enter the posgrest shell again and run \dt
I notice
there are no tables.
in the data directory there are no files.
I must be doing something wrong since I am assuming that the table I created will persist. Can someone point out my mistake?

There is something that confused me and for me was not very clear in the official documentation.
To my knowledge, persistent volumes can be created in three ways.
At container invocation time including full path ( -v ~/database:/data ): makes an external folder from the host available inside the docker container. Both can modify it.
At container invocation time using a volume name ( -v datamysql:/data ): makes a volume that is persistent available inside the container. It is created it if it did not exist. You can list them by name with docker volume ls. Internally, it will be stored in a place such as /var/lib/docker/volumes/ae4445f7c9317a22fe84726fb894c47754f38a7fd150c00fd877024889968750/_data.
At container build time ( VOLUME ["/database/data"] in Dockerfile). Every invocation of docker run will create a new volume that will persist even if you delete the container. This can be confusing becausee subsequent invocations will result in different volumes being created that will not be reused.
You can list both named (second case) and unnamed (third case) volumes with
$ docker volume ls
DRIVER VOLUME NAME
local 064593b3e65977097d4d0c8402a6c633f1af69be2937bf118678ab8f97ee9a7e
local 4753ad0437d13e54c76d9c34a30a1843396a1866a0cf9237d500fdcca0d78c5f
local 8d7a35354f666b2e8a26866a35bbae36bb9601701d4c6b505ab8ce6629f69415
local db48eefe8f189b36107ca9c4eebb792690590ab0ba055e7e4e2c9adfd1765b7e
local datamysql
You can see the exact location of a container's volume by using docker inspect mycontainer
{
"Type": "volume",
"Name": "8d7a35354f666b2e8a26866a35bbae36bb9601701d4c6b505ab8ce6629f69415",
"Source": "/media/USBdrive/docker/volumes/8d7a35354f666b2e8a26866a35bbae36bb9601701d4c6b505ab8ce6629f69415/_data",
"Destination": "/var/lib/mysql",
"Driver": "local",
"Mode": "",
"RW": true,
"Propagation": ""
},
It might be handy to remove unused volumes (for the third case, specially).
$ docker volume prune
WARNING! This will remove all volumes not used by at least one container.
Are you sure you want to continue? [y/N] y
Deleted Volumes:
4753ad0437d13e54c76d9c34a30a1843396a1866a0cf9237d500fdcca0d78c5f
Total reclaimed space: 205MB
Because you used the VOLUME directive in your Dockerfile, you are in the third case. Inspect your container to look for the file, and specify the volume from the command line if you want repeated sessions to persist data.

Based on your comment:
the data persisted, but I still can't find the persist data in my host ./data directory
and running this command:
docker run -i -t -v="data:/data" -p 5432:5432 my_image/postgresql:9.3
You appear to be confusing a named volume and a host volume. The named volume is used when you give the volume a name without a path, like data. The named volume stores the data using the docker driver (typically local) under a given name that you can reuse. It has the advantage of being listed in docker volume ls, and being initialized to the content of the image at the mounted location.
If you include a full path, like /home/username/data that would mount the directory from the docker host instead of using the named volume. The biggest disadvantage is that you don't get the directory initialized with the contents from the image, and you will likely encounter permission issues where the uid of the container process won't match the uid you use on your host.
For more details, see https://docs.docker.com/engine/tutorials/dockervolumes/

How do I handle passwords and dockerfiles?

I've created an image for docker which hosts a postgresql server. In the dockerfile, the environment variable 'USER', and I pass a constant password into the a run of psql:
USER postgres
RUN /etc/init.d/postgresql start && psql --command "CREATE USER docker WITH SUPERUSER PASSWORD 'docker';" && createdb -O docker docker
Ideally either before or after calling 'docker run' on this image, I'd like the caller to have to input these details into the command line, so that I don't have to store them anywhere.
I'm not really sure how to go about this. Does docker have any support for reading stdin into an environment variable? Or perhaps there's a better way of handling this all together?

At build time
You can use build arguments in your Dockerfile:
ARG password=defaultPassword
USER postgres
RUN /etc/init.d/postgresql start && psql --command "CREATE USER docker WITH SUPERUSER PASSWORD '$password';" && createdb -O docker docker
Then build with:
$ docker build --build-arg password=superSecretPassword .
At run time
For setting the password at runtime, you can use an environment variable (ENV) that you can evaluate in an entrypoint script (ENTRYPOINT):
ENV PASSWORD=defaultPassword
ADD entrypoint.sh /docker-entrypoint.sh
USER postgres
ENTRYPOINT /docker-entrypoint.sh
CMD ["postgres"]
Within the entrypoint script, you can then create a new user with the given password as soon as the container starts:
pg_ctl -D /var/lib/postgresql/data \
-o "-c listen_addresses='localhost'" \
-w start
psql --command "CREATE USER docker WITH SUPERUSER PASSWORD '$password';"
postgres pg_ctl -D /var/lib/postgresql/data -m fast -w stop
exec $#
You can also have a look at the Dockerfile and entrypoint script of the official postgres image, from which I've borrowed most of the code in this answer.
A note on security
Storing secrets like passwords in environment variables (both build and run time) is not incredibly secure (unfortunately, to my knowledge, Docker does not really offer any better solution for this, right now). An interesting discussion on this topic can be found in this question.

You could use environment variable in your Dockerfile and override the default value when you call docker run using -e or --env argument.
Also you will need to amend the init script to run psql command on startup referenced by the CMD instruction.

Loading PostgreSQL Database Backup Into Docker/Initial Docker Data

I am migrating an application into Docker. One of the issues that I am bumping into is what is the correct way to load the initial data into PostgreSQL running in Docker? My typical method of restoring a database backup file are not working. I have tried the following ways:
gunzip -c mydbbackup.sql.gz | psql -h <docker_host> -p <docker_port> -U <dbuser> -d <db> -W
That does not work, because PostgreSQL is prompting for a password, and I cannot enter a password because it is reading data from STDOUT. I cannot use the $PGPASSWORD environment variable, because the any environment variable I set in my host is not set in my container.
I also tried a similar command above, except using the -f flag, and specify the path to a sql backup file. This does not work because my file is not on my container. I could copy the file to my container with the ADD statement in my Dockerfile, but this does not seem right.
So, I ask the community. What is the preferred method on loading PostgreSQL database backups into Docker containers?

I cannot use the $PGPASSWORD environment variable, because the any
environment variable I set in my host is not set in my container.
I don't use docker, but your container looks like a remote host in the command shown, with psql running locally. So PGPASSWORD never has to to be set on the remote host, only locally.
If the problems boils down to adding a password to this command:
gunzip -c mydbbackup.sql.gz |
psql -h <docker_host> -p <docker_port> -U <dbuser> -d <db> -W
you may submit it using several methods (in all cases, don't use the -W option to psql)
hardcoded in the invocation:
gunzip -c mydbbackup.sql.gz |
PGPASSWORD=something psql -h <docker_host> -p <docker_port> -U <dbuser> -d <db>
typed on the keyboard
echo -n "Enter password:"
read -s PGPASSWORD
export PGPASSWORD
gunzip -c mydbbackup.sql.gz |
psql -h <docker_host> -p <docker_port> -U <dbuser> -d <db>
Note about the -W or --password option to psql.
The point of this option is to ask for a password to be typed first thing, even if the context makes it unnecessary.
It's frequently misunderstood as the equivalent of the -poption of mysql. This is a mistake: while -p is required on password-protected connections, -W is never required and actually goes in the way when scripting.
-W, --password
Force psql to prompt for a password before connecting to a
database.
This option is never essential, since psql will automatically
prompt for a password if the server demands password
authentication. However, psql will waste a connection attempt
finding out that the server wants a password. In some cases it is
worth typing -W to avoid the extra connection attempt.

Why can't you start postgres in docker using "service postgres start"?

All the tutorials point out to running postgres in the format of
docker run -d -p 5432 \
-t <your username>/postgresql \
/bin/su postgres -c '/usr/lib/postgresql/9.2/bin/postgres \
-D /var/lib/postgresql/9.2/main \
-c config_file=/etc/postgresql/9.2/main/postgresql.conf'
Why can't we in our Docker file have:
ENTRYPOINT ["/etc/init.d/postgresql-9.2", "start"]
And simply start the container by
docker run -d psql
Is that not the purpose of Entrypoint or am I missing something?

the difference is that the init script provided in /etc/init.d is not an entry point. Its purpose is quite different; to get the entry point started, in the background, and then report on the success or failure to the caller. that script causes a postgres process, usually indirectly via pg_ctl, to be started, detached from the controlling terminal.
for docker to work best, it needs to run the application directly, attached to the docker process. that way it can usefully and generically terminate it when the user asks for it, or quickly discover and respond to the process crashing.

Exemplify that IfLoop said.
Using CMD into Dockerfiles:
USE postgres
CMD ["/usr/lib/postgresql/9.2/bin/postgres", "-D", "/var/lib/postgresql/9.2/main", "-c", "config_file=/etc/postgresql/9.2/main/postgresql.conf"]
To run:
$docker run -d -p 5432:5432 psql
Watching PostgeSQL logs:
$docker logs -f POSTGRES_CONTAINER_ID

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse