Build postgres docker container with initial schema

Build postgres docker container with initial schema - postgresql

I'm looking to build dockerfiles that represent company databases that already exist. Similarly, I'd like create a docker file that starts by restoring a psql dump.
I have my psql_dump.sql in the . directory.
FROM postgres
ADD . /init_data
run "createdb" "--template=template0" "my_database"
run "psql" "-d" "my_database" --command="create role my_admin superuser"
run "psql" "my_database" "<" "init_data/psql_dump.sql"
I thought this would be good enough to do it. I'd like to avoid solutions that use a .sh script. Like this solution.
I use template0 since the psql documentation says you need the same users created that were in the original database, and you need to create the database with template0 before you restore.
However, it gives me an error:
createdb: could not connect to database template1: could not connect to server: No such file or directory
Is the server running locally and accepting
I'm also using docker compose for the overall application, if solving this problem in docker-compose is better, I'd be happy to use the base psql image and use docker compose to do this.

According to the usage guide for the official PostreSQL Docker image, all you need is:
Dockerfile
FROM postgres
ENV POSTGRES_DB my_database
COPY psql_dump.sql /docker-entrypoint-initdb.d/
The POSTGRES_DB environment variable will instruct the container to create a my_database schema on first run.
And any .sql file found in the /docker-entrypoint-initdb.d/ of the container will be executed.
If you want to execute .sh scripts, you can also provide them in the /docker-entrypoint-initdb.d/ directory.

As said in the comments, #Thomasleveil answer is great and simple if your schema recreation is fast.
But in my case it's slow, and I wanted to use docker volumes, so here is what I did
First use docker image as in #Thomasleveil answer to create a container with postgres with all the schema initialization
Dockerfile:
FROM postgres
WORKDIR /docker-entrypoint-initdb.d
ADD psql_dump.sql /docker-entrypoint-initdb.d
EXPOSE 5432
then run it and create new local dir which contains the postgres data after its populated from the “psql_dump.sql” file: docker cp mypg:/var/lib/postgresql/data ./postgres-data
Copy the data to a temp data folder, and start a new postgres docker-compose container whose volume is at the new temp data folder:
startPostgres.sh:
rm -r ./temp-postgres-data/data
mkdir -p ./temp-postgres-data/data
cp -r ./postgres-data/data ./temp-postgres-data/
docker-compose -p mini-postgres-project up
and the docker-compose.yml file is:
version: '3'
services:
postgres:
container_name: mini-postgres
image: postgres:9.5
ports:
- "5432:5432"
volumes:
- ./temp-postgres-data/data:/var/lib/postgresql/data
Now you can run steps #1 and #2 on a new machine or if your psql_dump.sql changes. And each time you want a new clean (but already initialized) db, you can only run startPostgres.sh from step #3.
And it still uses docker volumes.

#Thomasleveil's answer will re-create the database schema at runtime, which is fine for most cases.
If you want to recreate the database schema at buildtime (i.e. if your schema initialization is really slow) you can invoke the stock docker_entrypoint.sh from within your Dockerfile.
However, since the docker_entrypoint.sh is designed to start a long-running database server, you have to add an extra script to exit the process after database initialization but before booting the long-running server.
Dockerfile (with build time database initialization)
# STAGE 1 - Equivalent to #Thomasleveil
FROM postgres AS runtime_init
ENV POSTGRES_DB my_database
COPY 1-psql_dump.sql /docker-entrypoint-initdb.d/
# STAGE 2 - Initialize the database during the build
FROM runtime_init AS buildtime_init_builder
RUN echo "exit 0" > /docker-entrypoint-initdb.d/100-exit_before_boot.sh
ENV PGDATA=/pgdata
RUN docker-entrypoint.sh postgres
# STAGE 3 - Copy the initialized db to a new image to reduce size.
FROM postgres AS buildtime_init
ENV PGDATA=/pgdata
COPY --chown=postgres:postgres --from=buildtime_init_builder /pgdata /pgdata
Important Notes
The stock postgres image will run initialization scripts in alphabetical order, so ensure that your database restoration scripts appear earlier than the exit_before_boot.sh script created in the Dockerfile.
This is demonstrated by the 1 and 100 prefixes shown above. Modify them to your liking.
Database updates to a running instance of this image will not be persisted across reboots since the PGDATA path where the database files are stored no longer maps to a volume mounted from the host machine.
Further Reading
Instructions from the authors of the official postgres image about writing your own custom_entrypoint.sh. This is arguably the more "official" way to solve this problem, but I personally find my approach easier to understand and implement.
A demo of this concept for PostgreSQL 9, which uses the --help flag to exit the docker-entrypoint.sh before the long-running server boots. Unfortunately, this no longer works as of December 3, 2019
Two discussions (1) (2) of this same question from the official docker postgres repository.

Related

how to restore postgres database in docker when docker container not start?

I want to create a database in PostgreSQL and restore a backup in a docker container. I am able to create the database and run the docker container, and then run the pg_restore to restore the backup.
My Dockerfile is :
FROM postgres:latest
ENV POSTGRES_USER postgres
ENV POSTGRES_PASSWORD 123qwe
ENV POSTGRES_DB docker_pg
COPY createTable.sql /docker-entrypoint-initdb.d/
VOLUME /var/lib/postgresql/data
Then I run the command for restore the backup :
docker exec -i 0d96d6b59d74 pg_restore -U postgres -d docker_pg< backup_latest.sql
It is working fine.
But my requirement is when I run the command for create the docker container database creation and restore the backup both work done in same time, mean at the time of container creation.
How can I do this?

But my requirement is when I run the command for create the docker
container database creation and restore the backup both work done in
same time, mean at the time of container creation.
Both tasks can be performed by the Docker container all you need to place the restore script in the docker-entrypoint-initdb.d folder.
COPY createTable.sql /docker-entrypoint-initdb.d/a_createTable.sql
COPY backup_latest.sql /docker-entrypoint-initdb.d/
As I changed createTable.sql to a_createTable.sql, so it will first create Table and then it will restore the backup.
These initialization files will be executed in sorted name order as
defined by the current locale
Initialization scripts
Or the other option is to create single SQL file and the order will be
ALL DDL
# then
ALL DML
so something like
COPY db.sql /docker-entrypoint-initdb.d/

A container is a database server. How to ask it's Dockerfile to complete its construction after that container has started?

I am using a postgis/postgis Docker image to set a database server for my application.
The database server must have a tablespace created, then a database.
Then each time another application will start from another container, it will run a Liquibase script that will update the database schema (create tables, index...) when needed.
On a terminal, to prepare the database container, I'm running these commands :
# Run a naked Postgis container
sudo docker run --name ecoemploi-postgis
-e POSTGRES_PASSWORD=postgres
-d -v /data/comptes-france:/data/comptes-france postgis/postgis
# Send 'bash level' commands to create the directory for the tablespace
sudo docker exec -it ecoemploi-postgis
bin/sh -c 'mkdir /tablespace && chown postgres:postgres /tablespace'
Then to complete my step 1, I have to run SQL statements to create the tablespace in a PostGIS point of view, and create the database by a CREATE DATABASE.
I connect myself, manually, under the psql of my container :
sudo docker exec -it ecoemploi-postgis bin/sh
-c 'exec psql -h "$POSTGRES_PORT_5432_TCP_ADDR"
-p "$POSTGRES_PORT_5432_TCP_PORT" -U postgres'
And I run manally these commands :
CREATE TABLESPACE data LOCATION '/tablespace';
CREATE DATABASE comptesfrance TABLESPACE data;
exit
But I would like to have a container created from a single Dockerfile having done all the needed work. The difficulty is that it has to be done in two parts :
One before the container is started. (creating directories, granting them user:group).
One after it is started for the first time : declaring the tablespace and creating the base. If I understand well the base image I took, it should be done after an entrypoint docker-entrypoint.sh has been run ?
What is the good way to write a Dockerfile creating a container having done all these steps ?

The PostGIS image "is based on the official postgres image", so it should be able to use the /docker-entrypoint-initdb.d mechanism. Any files you put in that directory will be run the first time the database container is started. The postgis Dockerfile already uses this directory to install the PostGIS extensions into the default database.
That means you can put your build-time setup directly into the Dockerfile, and copy the startup-time script into that directory.
FROM postgis/postgis:12-3.0
RUN mkdir /tablespace && chown postgres:postgres /tablespace
COPY createdb.sql /docker-entrypoint-initdb.d/20-createdb.sql
# Use default ENTRYPOINT/CMD from base image
For the particular setup you describe, this may not be necessary. Each database runs in an isolated filesystem space and starts with an empty data directory, so there's not a specific need to create an alternate data directory; Docker style is to just run multiple databases if you need isolated storage. Similarly, the base postgres image will create a database for you at first start (named by the POSTGRES_DB environment variable).

In order to run a container, your Dockerfile must be functional and completed.
you must enter the queries in a bash file and in the last line you have to enter an ENTRYPOINT with this bash script

How to copy and use existing postgres data folder into docker postgres container

I want to build postgres docker container for testing some issue.
I have:
Archived folder of postgres files(/var/lib/postgres/data/)
Dockerfile that place folder into doccker postgres:latest.
I want:
Docker image that reset self-state after recreate image.
Container that have database state based on passed into the container postgres files
I don't want to wait for a long time operation of backup and restore existing database in /docker-entrypoint-initdb.d initialization script.
I DON'T WANT TO USE VOLUMES because I don't need to store new data between restart (That's why this post is different from How to use a PostgreSQL container with existing data?. In that post volumes are used)
My suggestion is to copy postgres files(/var/lib/postgres/data/) from host machine into docker's /var/lib/postgres/data/ in build phase.
But postgres docker replace this files when initdb phase is executing.
How to ask Postgres docker not overriding database files?
e.g.
Dockerfile
FROM postgres:latest
COPY ./postgres-data.tar.gz /opt/pg-data/
WORKDIR /opt/pg-data
RUN tar -xzf postgres-data.tar.gz
RUN mv ./data/ /var/lib/postgresql/data/pg-data/
Run command
docker run -p 5432:5432 -e PGDATA=/var/lib/postgresql/data/pg-data --name database-immage1 database-docker

If you don't really need to create a custom image with the database snapshot you could use volumes. Un-tar the database files somewhere on the host say ~/pgdata then run the image. Example:
docker run -v ~/pgdata:/var/lib/postgresql/data/ -p 5432:5432 postgres:9.5
The files must be compatible with the postgres version of the image so use the same image version as the archived database.
If, instead, you must recreate the image you don't need to uncompress the database archive. The ADD instruction will do
that for you. Make sure the tar does not contain any leading directory.
The Dockerfile:
FROM postgres:latest
ADD ./postgres-data.tar.gz /var/lib/postgresql/data/
Build it:
docker build . -t database-docker
Run without overriding the environment variable PGDATA. Note that you copy the files in /var/lib/postgresql/data but the PGDATA points to /var/lib/postgresql/data/pg-data.
Run the container:
docker run -p 5432:5432 --name database-image1 database-docker

docker compose: postgresql create db, user pass and grant permission

I have the following docker-compose file:
version: '3'
services:
web:
build:
context: ./django_httpd_mod_wsgi
ports:
- "8000:80"
db:
build:
context: ./postgresql
volumes:
- db-data:/var/lib/postgres/data
volumes:
db-data:
I am building psotgresql image using archlinux:
The following is my postgresql Dockerfile:
FROM archlinux/base
RUN yes | pacman -S postgresql
RUN mkdir /run/postgresql/
RUN chown -R postgres:postgres /run/postgresql/
USER postgres
RUN initdb -D /var/lib/postgres/data
RUN psql -c 'CREATE DATABASE btgapp;'
RUN psql -c "CREATE USER simha WITH PASSWORD 'krishna';"
RUN psql -c 'GRANT ALL PRIVILEGES ON DATABASE btgapp TO simha;'
CMD ["/usr/bin/postgres","-D","/var/lib/postgres/data"]
When i try to do:
docker-compose up
I get the error:
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/run/postgresql/.s.PGSQL.5432"?
ERROR: Service 'db' failed to build: The command '/bin/sh -c psql -c 'CREATE DATABASE dbname;'' returned a non-zero code: 2
I understood that i have to run the psql -c CREATE DATABSE "dbname" after starting the postgresql server by /usr/bin/postgres -D /var/lib/postgres/data
But i cannot start multiple commands in a Dockerfile. So how to do this.
The option is start a script. But then it will be difficult to see postgres running as a single process.

Based on the comments, I will try to answer here.
I believe that you should go with the postgres 11-alpine image. And I will try to explain why here.
Official docker images come with a number of benefits that you should always consider before starting your own.
Upgrade path is easy - when a new revision of the application wrapped in the image is released, the official docker image will in most cases be updated along with it. And ususally the changes respect the configuration conventions that the image has established. Such as environment variables, startup specifics. So that users can simple change the tag in their stacks, and upgrade. There may of course be breaking changes - always check this.
Large user base - when images like postgres have been downloaded more than 10 milliion times (2019), this does not only mean that it is popular, but inherently works like a guarantee that the image has been tested thoroughly. Any elementary bugs have been weeded out already, and you will have an easy time with the image.
Optimized for size and performance - you can be sure that attention has been paid to a lot of details, minimizing the size of the image and maximizing performance. Many projects publish their applications on a few different linux distros. Like postgres - they publish debian and a alpine based images. The alpine image is the smaller one, while the debian is slightly larger, but gives you access to the vast debian package repositories if you need extra packages installed.
Easy configuration - maintainers of the official images usually understand that usecases of their userbase very well. And they try to make our lives as developers and admins easier (god bless them). Official images usually have some pretty good documentation sitting right on their docker hub landing page, or a link to a github repo where the README.md will cover common usecases. I find that these instructions are worth a good read from top to bottom.
I understand that you want to keep the image small, but what do you know - the postgres project has got your usecase covered.
The latest alpine postgres image tagged 11-alpine has a compressed footprint of 28 MB and decompressed of 70MB. While the archlinux/base image that you want to start off with has compressed base footprint of 153MB and a decompressed size of 445MB. And that's before you introduce postgres itself.
Add to that, that the database and user that you want created on startup - can be handled in the environment variables alone for the official postgres image. Like this:
docker run -d --name some-postgres \
-e POSTGRES_PASSWORD=mysecretpassword \
-e POSTGRES_USER=simha \
-e POSTGRES_DB=btgapp \
postgres:11-alpine
If that does not cover the initialization that you need for your database, then you can copy .sql scripts (and .sh scripts) into a special location in the image - and they will be executed on startup. For this you can extend their image like this:
init-user-db.sh
#!/bin/bash
set -e
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
CREATE USER simha;
CREATE DATABASE btgapp;
GRANT ALL PRIVILEGES ON DATABASE btgapp TO simha;
EOSQL
And then with a Dockerfile like this:
Dockerfile
FROM postgres:11-alpine
COPY ./init-user-db.sh /docker-entrypoint-initdb.d/init-user-db.sh
(This is taken from the postgres description on docker hub)
In closing - I would recommend that you do not prioritize the distro that an image is based on over the usability and maintainability. Docker enables us to run applications in containers without really caring too much about what distro is inside the container. It's all linux anyway. At the end of the day, I expect that you want a stable postgres database container like me. This is what I get with the official postgres image.
I hope I helped you evaluate your options on this.

Creating a running Postgres service inside a docker container

I'm a bit new to Docker.
I have two containers running using docker-compose.
One is the API and the other is the actual application.
I want to add a new DB container using the Postgres official image.
It's a bit hard to find a simple tutorial on how to create the container and populate it with a predefined sql file (of schemas and data).
When I start with "CMD /etc/init.d/postgresql start" in the Dockerfile I get an error saying: "No PostgreSQL clusters exist; see "man pg_createcluster" ... (warning)."
Since it takes me too much time to get things going I was wondering if it might be better to get an Ubuntu image and install Postgres on my own since there is only one source on how to use the image - docker hub, and I don't seem to understand it that well.
Any ideas or simple steps on how to compose and 'configure' this image?

If you want populate your database with some file, A simply way to do this is:
How to extend this image
If you would like to do additional initialization in an image derived
from this one, add one or more *.sql, *.sql.gz, or *.sh scripts under
/docker-entrypoint-initdb.d (creating the directory if necessary).
After the entrypoint calls initdb to create the default postgres user
and database, it will run any *.sql files and source any *.sh scripts
found in that directory to do further initialization before starting
the service.
Dockerfile
FROM postgres:alpine
COPY init.sql /docker-entrypoint-initdb.d/init.sql
docker-compose.yml
version: '3'
services:
app:
//your app definition
postgres:
build: .

Pull the postgres image
docker pull postges:14.2
Create the service with the below command
docker service create --name postgres --network my_overlay --env "POSTGRES_PASSWORD=password" --publish 5432:5432 postgres:14.2
Try to connect using userName as postgres and password as password to the default postgres db.
jdbc:postgresql://127.0.0.1:5432/postgres // JDBC connection