How to safely stop/start my postgres server when using docker-compose - postgresql

I sometimes stop/start docker very often when I am release new features in my application.
docker-compose up -d
docker-compose stop
I am using pretty much the bare bones postgres docker setup (see below).
I am mapping the /data folder to my host.
Is there anything I should be worried about if I stop/start docker many times in a day in terms of data getting corrupted?
Is calling docker-compose stop the best way to be stopping my postgres instance?
My postgres service in my docker-compose looks like this:
db:
image: postgres:9.4
volumes:
- "/home/deploy/data/pgdata:/var/lib/postgresql/data"
restart: always
This setup currently is running smoothly in development, but once it goes to production I want to make sure I am following best practices etc.

Use,
docker-compose down -v
What it does is basically removes all the volumes you added. If you don't those volumes will hang on and eat up your space. It only removes the volume inside the docker container. The volume in your host stays and survives container removal in case if you want that data to survive container removal.
Whenever you create a docker container by docker run, Docker creates a volume/ directory to keep the details about the containers. After you execute docker run, if you look into /var/lib/docker/containers, you will see one directory for each container you started. If you have not removed the volumes for previous container, you will see many directories under the "container" directory. The name of these directories will be very long random letters and number. So, if you don't tell the docker to remove these directories when you stop the container, it will be there forever. The v option I mentioned above, will delete these directories when you take down the container.
Keep in mind, you can view the contents of the directory /var/lib/docker only as a root user. To change to root user, use sudo -i before you attempt to view the contents of the directory.

Databases in particular are usually designed so that it's very hard to lose data, even if the machine loses power in the middle of writing something to disk. (This comes at some performance cost.) So long as you don't have more than one PostgreSQL instance at a time using the same backing data store, I'd expect it to not lose data or otherwise corrupt itself; the worst you should expect to see is a message at startup that it's recovering from a write-ahead log or something along those lines.
docker stop will send a signal to a container that prompts it to shut down cleanly, and PostgreSQL will take this as a cue to shut down. It looks like docker-compose stop, docker-compose down, and sending ^C to docker-compose up all use the same mechanism. So the way you're doing it now should result in a clean shutdown (provided PostgreSQL finishes its cleanup within 10 seconds).
I believe you can docker-compose restart specific services, or docker-compose up --force-recreate them. This would help if you rebuilt your application container and needed to restart that, but not its database.

Related

can we remove docker-compose down from steps?

Is it necessary to use "docker-compose down" before "docker-compose up". I dont want my application go down. I am using docker-compose at this point of time and having no plan to move to kubernetes etc.
If we remove "docker-compose down" then how it will handle the orphan-volumes and images?
Any pointer is highly appreciated.
Thanks,
Sanjiv
No, you don't necessarily have to use docker-compose down before a docker-compose up. If you use docker-compose up on a running service stack, docker-compose will just recreate the services that have been changed. Changed either through:
a changed docker-compose.yml, or
updated images (either because you pulled new images, or rebuild them yourself).
To remove orphaned volumes, you have to issue a special flag --remove-orphans , see docker-compose up. But that behavior is the same with docker-compose down.
Also images are not changed with neither command. The difference is that with docker-compose down & docker-compose up, all running containers are removed and recreated from their images. So in case data was written inside the container, that data will be lost.

Docker postgres persistance and container lifetime

I'm new to docker. You can take a look at my last questions here and see that I've been asking questions down this line. I read the docs carefully, and also read several articles on the web (which is pretty difficult given the rapid versioning in docker), but I still can't get a clear picture of how am I supposed to use containers and its impact on persistance.
The official postgres image creates a volume in its Dockerfile using this command
VOLUME /var/lib/postgresql/data
And the readme.md file shows only one example of how to run the image
docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres
When I try that, I can see (with "docker inspect some-postgres") that the volume created lives in a random directory in my host, and it seems to "belong" to that particular container.
So here are some questions that may help my understanding:
It looks (from the official postgres image docs) that expected usage is to use "docker run" to create the container, and "docker start" afterwards (this last bit I inferred from the fact that -d and --name are used). This makes sense to me, but conflicts with a lot of information I've seen regarding containers should be ephemeral. If spin a new container every time, then the default VOLUME config in the Dockerfile doesn't work for persistance. What's the right way of doing things?
Given the above is correct (that I can run once and start many times), the only reason I see for the VOLUME command in the Dockerfile is I/O performance because of the CoW filesystem bypass. Is this right?
Could you please clearly explain what's wrong with using this approach over the (I think unofficially) recommended way of using a data container? I'd like to know the pros/cons to my specific situation, which is a node js intranet application.
Thanks,
Awer
You're correct that you can start the container using 'docker run' and start it again in the future using 'docker start' assuming you haven't removed the container. You're also correct that docker containers are supposed to be ephemeral and you shouldn't be in bad shape if the container disappears. What you can do is mount a volume into the docker container to the storage location of the database.
docker run -v /postgres/storage:/container/postgres --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres
If you know the location of where the database writes to inside the container you can mount it correctly and then even if you remove the postgres container, when you start back up all your data will persist. You may need to mount some other areas that control configurations as well unless you modify and save the container.

How am I supposed to use a Postgresql docker image/container?

I'm new to docker. I'm still trying to wrap my head around all this.
I'm building a node application (REST api), using Postgresql to store my data.
I've spent a few days learning about docker, but I'm not sure whether I'm doing things the way I'm supposed to.
So here are my questions:
I'm using the official docker postgres 9.5 image as base to build my own (my Dockerfile only adds plpython on top of it, and installs a custom python module for use within plpython stored procedures). I created my container as suggedsted by the postgres image docs:
docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres
After I stop the container I cannot run it again using the above command, because the container already exists. So I start it using docker start instead of docker run. Is this the normal way to do things? I will generally use docker run the first time and docker start every other time?
Persistance: I created a database and populated it on the running container. I did this using pgadmin3 to connect. I can stop and start the container and the data is persisted, although I'm not sure why or how is this happening. I can see in the Dockerfile of the official postgres image that a volume is created (VOLUME /var/lib/postgresql/data), but I'm not sure that's the reason persistance is working. Could you please briefly explain (or point to an explanation) about how this all works?
Architecture: from what I read, it seems that the most appropriate architecture for this kind of app would be to run 3 separate containers. One for the database, one for persisting the database data, and one for the node app. Is this a good way to do it? How does using a data container improve things? AFAIK my current setup is working ok without one.
Is there anything else I should pay atention to?
Thanks
EDIT: adding to my confusion, I just ran a new container from the debian official image (no Dockerfile, just docker run -i -t -d --name debtest debian /bin/bash). With the container running in the background, I attached to it using docker attach debtest and the proceeded to apt-get install postgresql. Once installed I ran (still from within the container) psql and created a table in the default postgres database, and populated it with 1 record. Then I exited the shell and the container stopped automatically since the shell wasn't running anymore. I started the container againg using docker start debtest, then attached to it and finally run psql again. I found everything is persisted since the first run. Postgresql is installed, my table is there, and offcourse the record I inserted is there too. I'm really confused as to why do I need a VOLUME to persist data, since this quick test didn't use one and everything apears to work just fine. Am I missing something here?
Thanks again
1.
docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword
-d postgres
After I stop the container I cannot run it again using the above
command, because the container already exists.
Correct. You named it (--name some-postgres) hence before starting a new one, the old one has to be deleted, e.g. docker rm -f some-postgres
So I start it using
docker start instead of docker run. Is this the normal way to do
things? I will generally use docker run the first time and docker
start every other time?
No, it is by no means normal for docker. Docker process containers are supposed normally to be ephemeral, that is easily thrown away and started anew.
Persistance: ... I can stop and start
the container and the data is persisted, although I'm not sure why or
how is this happening. ...
That's because you are reusing the same container. Remove the container and the data is gone.
Architecture: from what I read, it seems that the most appropriate
architecture for this kind of app would be to run 3 separate
containers. One for the database, one for persisting the database
data, and one for the node app. Is this a good way to do it? How does
using a data container improve things? AFAIK my current setup is
working ok without one.
Yes, this is the good way to go by having separate containers for separate concerns. This comes in handy in many cases, say when for example you need to upgrade the postgres base image without losing your data (that's in particular where the data container starts to play its role).
Is there anything else I should pay atention to?
When acquainted with the docker basics, you may take a look at Docker compose or similar tools that will help you to run multicontainer applications easier.
Short and simple:
What you get from the official postgres image is a ready-to-go postgres installation along with some gimmicks which can be configured through environment variables. With docker run you create a container. The container lifecycle commands are docker start/stop/restart/rm Yes, this is the Docker way of things.
Everything inside a volume is persisted. Every container can have an arbitrary number of volumes. Volumes are directories either defined inside the Dockerfile, the parent Dockerfile or via the command docker run ... -v /yourdirectoryA -v /yourdirectoryB .... Everything outside volumes is lost with docker rm. Everything including volumes is lost with docker rm -v
It's easier to show than to explain. See this readme with Docker commands on Github, read how I use the official PostgreSQL image for Jira and also add NGINX to the mix: Jira with Docker PostgreSQL. Also a data container is a cheap trick to being able to remove, rebuild and renew the container without having to move the persisted data.
Congratulations, you have managed to grasp the basics! Keep it on! Try docker-compose to better manage those nasty docker run ...-commands and being able to manage multi-containers and data-containers.
Note: You need a blocking thread in order to keep a container running! Either this command must be explicitly set inside the Dockerfile, see CMD, or given at the end of the docker run -d ... /usr/bin/myexamplecommand command. If your command is NON blocking, e.g. /bin/bash, then the container will always stop immediately after executing the command.

Why doesn't postgres official docker repo start db service at build time?

Under the background of https://github.com/docker-library/postgres (github repo) and https://registry.hub.docker.com/_/postgres/ (docker hub)
It can be seen database is started by Entrypoint and CMD with bash script
/docker-entrypoint.sh
with
ENTRYPOINT ["/docker-entrypoint.sh"]
EXPOSE 5432
CMD ["postgres"]
another script hook provided to change database is
/docker-entrypoint-initdb.d
which means the database starts (can be pqsl) only at runtime, when docker run command is typed in.
This causes a problem, we could not customize the database before it runs in build time, for example add extensions and populate db with data.
Of course, it could be done in run time. But it has the advantage to repeat the operation every time when the image is run.
So, what is the logic behind this design from docker or postgres perspective? How could I add extension and populate data in build time ?
If you were to customize (create, populate data) a database at build time, that would imply that the database data is written into the docker image filesystem itself (as one cannot mount a volume at build time).
The issue with that is that the docker image filesystem is a special one (AUFS or btrfs, etc) which isn't delivering good I/O performances for data intensive applications such as a database server.
As a consequence, you want to have your data written on a volume instead of on the docker container filesystem. As you don't know at build time what would be the volume used at run time, and as there is no mean anyway to mount volumes at build time, no one should create database at build time.
Furthermore, if you take a close look at the Dockerfile of the official PostgreSQL image, you will see that there is a VOLUME instruction that makes the path at which the data is written a volume. That means that the image is designed so that the data will never hit the docker container filesystem.
If you take a look at other Dockerfiles for other databases or data intensive applications, you will notice that they all operate in this manner. An other reason for that is that it is accepted as a good practice to make your docker containers immutable.
If you want to install additional modules to your image, it is fine as long as those do not depend on data that would be written on a volume, and as long as you make sure to declare a volume for any path they would write data on.
tl;dr
Application code/binary → docker image filesystem
Application data → docker volume
This is right from the docker page for the postgres image (library/postgres):
If you would like to do additional initialization in an image derived from this one, add a *.sql or *.sh script under /docker-entrypoint-initdb.d (creating the directory if necessary). After the entrypoint calls initdb to create the default postgres user and database, it will run any *.sql files and source any *.sh script found in that directory to do further initialization before starting the service.
You can also extend the image with a simple Dockerfile to set the locale. The following example will set the default locale to de_DE.utf8:
FROM postgres:9.4
RUN localedef -i de_DE -c -f UTF-8 -A /usr/share/locale/locale.alias de_DE.UTF-8
ENV LANG de_DE.utf8
Since database initialization only happens on container startup, this allows us to set the language before it is created.
You have the ability to extend an image just as the example shows from the docs that I pasted above. You can also use the exec command and execute virtually anything within the container right from your host machine. It took me a little while to get used to it, I continue to discover things as I play with it more and more.
UPDATE:
sudo docker run --name some-postgres -v ~/PATH/TO/some-postgres/data:/var/lib/postgres/data -p 127.0.0.1:5432:5432 -e POSTGRES_PASSWORD=test -d postgres

Reliability of Docker containers

My question aims at verifying and maybe rectifying my idea of the reliability of Docker containers. I read both, the Docker documentation and several articles on VOLUME in the Dockerfile and --v as an argument when running a container as means to persist data outside a Docker container. Be it in a data container or on the host system. As would like to keep the complexity of my setup simple, I would prefer not to copy/save/store data round and about but keep it in the Docker container itself.
There are several cases through which I discovered the behaviour of Docker containers. I'd like to know if I missed a scenario where a container can be 100% lost unpurposely, i.e. NOT doing $ docker rm -f mycontainer
docker commands to pause, stop and kill a container
-> restartable by $ docker restart mycontainer or $ docker run mycontainer
Host system reboot
-> docker container exits with 0 or 255
Host system unexpected power off
-> What happens?
Application exception
-> docker container exits with -1
Updating or restarting docker (as pointed out by Greg)
-> expected behavior: like on system reboot (?)
In all those cases, the docker container is still existent in the end. So is there any other scenario that can cause a docker container to be lost like with $ docker rm -f mycontainer?
The background is, that I read a lot about mounted volumes and external datastorage on the host system for Postgres but I'd like to avoid storing data outside my containers on the host system if possible. On the other hand, I don't want to wake up and have all data lost. (I do perform regular SQL-dumps, but I don't want to do this every 5 minutes). If a docker container itself is not reliable for persistant data, I don't see why I should create a second container to hold the data for a first one and increase the complexity of my system by adding a new container but not gaining anything in terms of reliability.
Edit: There are two points in the Docker userguide on Volumes which do not explicitly explain which behaviour to expect and therefore making me question if these concepts provide extra reliability:
Changes to a data volume will not be included when you update an
image
-> Does that mean that they get lost or that the content of the volume won't be changed?
Volumes persist until no containers use them
-> What's the definition of 'use'? As long as a container is not stopped, killed, removed? Does that mean that the volume Docker created on the host system will get removed? Or does volume only refer to a virtual bridge between a directory inside Docker and one on the host system?
If you store all your data in the container, what are you going to do when you need to update the image? Updates to images are normally done by changing the Dockerfile and rebuilding the image. If my data is kept separate to my container, I can start a new version of the image, mount the data with --volumes-from or -v and kill the old container. In your case, you have to keep the container running and try to patch in place with something like puppet.
Also, I'm not sure what you think you're saving. If you run the official postgres image, it will have declared volumes in the Dockerfile. Those volumes exist as normal directories on your host system whether you ran the container with -v or not. Even if your Dockerfile has no volumes, clearly the UFS is being stored on your host anyway.
In general, you should consider containers to be temporary and stateless. Whilst you don't have to do this, you will find most of the tooling and support services are designed around this idiom.
Regarding your scenarios, there are a few you're missing:
A bug could make it impossible to restart a stopped container
The updating issue mentioned above
If you want to change storage driver. This will cause a great deal of problems, as you need to migrate your images.
Just for clarity on the commands, docker start will restart stopped or exited containers and docker unpause will unpause paused containers.