I'm trying to extend the official docker postgres image in order to install a custom python module so I can use it from withing a plpython3 stored procedure.
Here's my dockerfile
FROM postgres:9.5
RUN apt-get update && apt-get install -y postgresql-plpython3-9.5 python3
ADD ./repsug/ /opt/smtnel/repsug/
WORKDIR /opt/smtnel/repsug/
RUN ["python3", "setup.py", "install"]
WORKDIR /
My question is: do I need to add ENTRYPOINT and CMD commands to my Dockerfile? Or are they "inherited" FROM the base image?
The example in the official readme.md shows a Dockerfile which only changes the locale, without ENTRYPOINT or CMD.
I also read in the readme that I can extend the image by executing custom sh and/or sql scripts. Should I use this feature instead of creating my custom image? The question in this case is how I make sure the scripts are run only once at "install time" and not every time? I mean, if the database is already created and populated, I would not want to overwrite it.
Thanks,
Awer
If you define a new ENTRYPOINT in your Dockerfile it will override the inherited ENTRYPOINT. So in that case, Postgres will not be able to be initialized automatically (unless yo write the same ENTRYPOINT).
https://docs.docker.com/engine/reference/builder/#entrypoint
Also, the official postgres image let you add .sql/.sh files in the /docker-entrypoint-initdb.d folder, so they can be executed once the database is initialized.
Finally, if you don't want that Postgres remove your data, you can mount a volume between the /var/lib/postgresql/data folder and a local folder in each docker run ... command to persist them.
Related
Is it possible with Docker to combine two images into one?
Like this here:
genericA --
\
---> specificAB
/
genericB --
For example there's an image for Java and an image for MySQL.
I'd like to have an image with Java and MySQL.
No, you can only inherit from one image.
You probably don't want Java and MySQL in the same image as it's more idiomatic to have a single component in a container i.e. create a separate MySQL container and link it to the Java container rather than put both into the same container.
However, if you really must have them in the same image, write a Dockerfile with Java as the base image (FROM statement) and install MySQL in the Dockerfile. You should be able to largely copy the statements from the official MySQL Dockerfile.
Docker doesn't directly support this, but you can use DockerMake (full disclosure: I wrote it) to manage this sort of "inheritance". It uses a YAML file to set up the individual pieces of the image, then drives the build by generating the appropriate Dockerfiles.
Here's how you would build this slightly more complicated example:
--> genericA --
/ \
debian:jessie --> customBase ---> specificAB
\ /
--> genericB --
You would use this DockerMake.yml file:
specificAB:
requires:
- genericA
- genericB
genericA:
requires:
- customBase
build_directory: [some local directory]
build: |
#Dockerfile commands go here, such as
ADD installA.sh
RUN ./installA.sh
genericB:
requires:
- customBase
build: |
#Here are some other commands you could run
RUN apt-get install -y genericB
ENV PATH=$PATH:something
customBase:
FROM: debian:jessie
build: |
RUN apt-get update && apt-get install -y buildessentials
After installing the docker-make CLI tool (pip install dockermake), you can then build the specificAB image just by running
docker-make specificAB
If you do docker commit, it is not handy to see what commands were used in order to build your container, you have to issue a docker history image
If you have a Dockerfile, just look at it and you see how it was built and what it contains.
Docker commit is 'by hand', so prone to errors, docker build using a Dockerfile that works is much better.
You can put multiple FROM commands in a single Dockerfile.
https://docs.docker.com/reference/builder/#from
I am using a postgis/postgis Docker image to set a database server for my application.
The database server must have a tablespace created, then a database.
Then each time another application will start from another container, it will run a Liquibase script that will update the database schema (create tables, index...) when needed.
On a terminal, to prepare the database container, I'm running these commands :
# Run a naked Postgis container
sudo docker run --name ecoemploi-postgis
-e POSTGRES_PASSWORD=postgres
-d -v /data/comptes-france:/data/comptes-france postgis/postgis
# Send 'bash level' commands to create the directory for the tablespace
sudo docker exec -it ecoemploi-postgis
bin/sh -c 'mkdir /tablespace && chown postgres:postgres /tablespace'
Then to complete my step 1, I have to run SQL statements to create the tablespace in a PostGIS point of view, and create the database by a CREATE DATABASE.
I connect myself, manually, under the psql of my container :
sudo docker exec -it ecoemploi-postgis bin/sh
-c 'exec psql -h "$POSTGRES_PORT_5432_TCP_ADDR"
-p "$POSTGRES_PORT_5432_TCP_PORT" -U postgres'
And I run manally these commands :
CREATE TABLESPACE data LOCATION '/tablespace';
CREATE DATABASE comptesfrance TABLESPACE data;
exit
But I would like to have a container created from a single Dockerfile having done all the needed work. The difficulty is that it has to be done in two parts :
One before the container is started. (creating directories, granting them user:group).
One after it is started for the first time : declaring the tablespace and creating the base. If I understand well the base image I took, it should be done after an entrypoint docker-entrypoint.sh has been run ?
What is the good way to write a Dockerfile creating a container having done all these steps ?
The PostGIS image "is based on the official postgres image", so it should be able to use the /docker-entrypoint-initdb.d mechanism. Any files you put in that directory will be run the first time the database container is started. The postgis Dockerfile already uses this directory to install the PostGIS extensions into the default database.
That means you can put your build-time setup directly into the Dockerfile, and copy the startup-time script into that directory.
FROM postgis/postgis:12-3.0
RUN mkdir /tablespace && chown postgres:postgres /tablespace
COPY createdb.sql /docker-entrypoint-initdb.d/20-createdb.sql
# Use default ENTRYPOINT/CMD from base image
For the particular setup you describe, this may not be necessary. Each database runs in an isolated filesystem space and starts with an empty data directory, so there's not a specific need to create an alternate data directory; Docker style is to just run multiple databases if you need isolated storage. Similarly, the base postgres image will create a database for you at first start (named by the POSTGRES_DB environment variable).
In order to run a container, your Dockerfile must be functional and completed.
you must enter the queries in a bash file and in the last line you have to enter an ENTRYPOINT with this bash script
I want to build postgres docker container for testing some issue.
I have:
Archived folder of postgres files(/var/lib/postgres/data/)
Dockerfile that place folder into doccker postgres:latest.
I want:
Docker image that reset self-state after recreate image.
Container that have database state based on passed into the container postgres files
I don't want to wait for a long time operation of backup and restore existing database in /docker-entrypoint-initdb.d initialization script.
I DON'T WANT TO USE VOLUMES because I don't need to store new data between restart (That's why this post is different from How to use a PostgreSQL container with existing data?. In that post volumes are used)
My suggestion is to copy postgres files(/var/lib/postgres/data/) from host machine into docker's /var/lib/postgres/data/ in build phase.
But postgres docker replace this files when initdb phase is executing.
How to ask Postgres docker not overriding database files?
e.g.
Dockerfile
FROM postgres:latest
COPY ./postgres-data.tar.gz /opt/pg-data/
WORKDIR /opt/pg-data
RUN tar -xzf postgres-data.tar.gz
RUN mv ./data/ /var/lib/postgresql/data/pg-data/
Run command
docker run -p 5432:5432 -e PGDATA=/var/lib/postgresql/data/pg-data --name database-immage1 database-docker
If you don't really need to create a custom image with the database snapshot you could use volumes. Un-tar the database files somewhere on the host say ~/pgdata then run the image. Example:
docker run -v ~/pgdata:/var/lib/postgresql/data/ -p 5432:5432 postgres:9.5
The files must be compatible with the postgres version of the image so use the same image version as the archived database.
If, instead, you must recreate the image you don't need to uncompress the database archive. The ADD instruction will do
that for you. Make sure the tar does not contain any leading directory.
The Dockerfile:
FROM postgres:latest
ADD ./postgres-data.tar.gz /var/lib/postgresql/data/
Build it:
docker build . -t database-docker
Run without overriding the environment variable PGDATA. Note that you copy the files in /var/lib/postgresql/data but the PGDATA points to /var/lib/postgresql/data/pg-data.
Run the container:
docker run -p 5432:5432 --name database-image1 database-docker
I am trying to run a Dockerfile that run PostgreSQL and then add some tables to it.
This is my Dockerfile:
FROM postgres:9.6
MAINTAINER agomez#ikerlan.es
USER postgres
ADD ./backup.sql /backup.sql
ADD ./deploy/import_database.sh /import_database.sh
USER root
RUN chmod +x /import_database.sh
USER postgres
ENTRYPOINT ./docker-entrypoint.sh postgres
CMD ./import_database.sh
The entrypoint doesn't finish becouse it runs the postgres server.
How can I run the CMD for example, after 20 seconds of running the ENTRYPOINT, but without finishing the ENTRYPOINT?
Is it possible?
This is not how the CMD and ENTRYPOINT work. I'd recommend reading https://docs.docker.com/engine/reference/builder/#understand-how-cmd-and-entrypoint-interact and checking the chart for a better understanding of how the ENTRYPOINT and CMD interact with each other.
That said, if all you want to do is import a SQL file or run a script at runtime, the official PostgreSQL image already has you covered. See https://github.com/docker-library/docs/tree/master/postgres#how-to-extend-this-image for information on how to do this.
An example in the case of your Dockerfile (if you just wanted to import the SQL file) would be to do something like:
FROM postgres:9.6
ADD ./backup.sql /docker-entrypoint-initdb.d/backup.sql
When starting a container from the image build on this Dockerfile, the default ENTRYPOINT script will start a temporary PostgreSQL instance, wait for it to be ready, import your data, and then restart to serve connections.
Under the background of https://github.com/docker-library/postgres (github repo) and https://registry.hub.docker.com/_/postgres/ (docker hub)
It can be seen database is started by Entrypoint and CMD with bash script
/docker-entrypoint.sh
with
ENTRYPOINT ["/docker-entrypoint.sh"]
EXPOSE 5432
CMD ["postgres"]
another script hook provided to change database is
/docker-entrypoint-initdb.d
which means the database starts (can be pqsl) only at runtime, when docker run command is typed in.
This causes a problem, we could not customize the database before it runs in build time, for example add extensions and populate db with data.
Of course, it could be done in run time. But it has the advantage to repeat the operation every time when the image is run.
So, what is the logic behind this design from docker or postgres perspective? How could I add extension and populate data in build time ?
If you were to customize (create, populate data) a database at build time, that would imply that the database data is written into the docker image filesystem itself (as one cannot mount a volume at build time).
The issue with that is that the docker image filesystem is a special one (AUFS or btrfs, etc) which isn't delivering good I/O performances for data intensive applications such as a database server.
As a consequence, you want to have your data written on a volume instead of on the docker container filesystem. As you don't know at build time what would be the volume used at run time, and as there is no mean anyway to mount volumes at build time, no one should create database at build time.
Furthermore, if you take a close look at the Dockerfile of the official PostgreSQL image, you will see that there is a VOLUME instruction that makes the path at which the data is written a volume. That means that the image is designed so that the data will never hit the docker container filesystem.
If you take a look at other Dockerfiles for other databases or data intensive applications, you will notice that they all operate in this manner. An other reason for that is that it is accepted as a good practice to make your docker containers immutable.
If you want to install additional modules to your image, it is fine as long as those do not depend on data that would be written on a volume, and as long as you make sure to declare a volume for any path they would write data on.
tl;dr
Application code/binary → docker image filesystem
Application data → docker volume
This is right from the docker page for the postgres image (library/postgres):
If you would like to do additional initialization in an image derived from this one, add a *.sql or *.sh script under /docker-entrypoint-initdb.d (creating the directory if necessary). After the entrypoint calls initdb to create the default postgres user and database, it will run any *.sql files and source any *.sh script found in that directory to do further initialization before starting the service.
You can also extend the image with a simple Dockerfile to set the locale. The following example will set the default locale to de_DE.utf8:
FROM postgres:9.4
RUN localedef -i de_DE -c -f UTF-8 -A /usr/share/locale/locale.alias de_DE.UTF-8
ENV LANG de_DE.utf8
Since database initialization only happens on container startup, this allows us to set the language before it is created.
You have the ability to extend an image just as the example shows from the docs that I pasted above. You can also use the exec command and execute virtually anything within the container right from your host machine. It took me a little while to get used to it, I continue to discover things as I play with it more and more.
UPDATE:
sudo docker run --name some-postgres -v ~/PATH/TO/some-postgres/data:/var/lib/postgres/data -p 127.0.0.1:5432:5432 -e POSTGRES_PASSWORD=test -d postgres