Slow query time with Postgres 10 inside Docker vs bare-metal for AWS Linux 2 - postgresql

I've been trying to deploy Postgres within Docker for portability reason, and noticed that query performance as measured by "explain analyze" has been painfully slow compared to bare metal.
For a table with 1.7 million rows, a query on bare metal Postgres takes about 1.2 sec vs 4.8 sec on Dockered Postgres, an increase of 4 times! This comparison is done with the same mounted volume for both bare-metal and Docker (for Docker, I'm using the -v option) The volume is a gp2 volume, mounted through AWS console, 60GB
Couple of things I tried:
Increase shared memory buffer option in postgresql.conf, which has negligible effect
Tried several volume mapping options (delegated, cached, consistent)
Upgrading Docker from 17.06-ce to 17.12-ce
This is all done in AWS Linux 2 instance. At this point I’m hoping to get more suggestions on what to do to improve performance.
The docker run command I use:
docker run -p 5432:5432 --name postgres -v /vol/pgsql/10.0/data:/var/lib/postgresql/data postgres:latest

Related

what if two containers postgres containers mapping the same host volume

Actually learning docker,
i manipulate postgres containers and asking myself the
following questions :
I launch a first postgres container like this :
docker run -e POSTGRES_PASSWORD=secret -p 5464:5432 -v postgres-data:/var/lib/postgresql/data -d postgres
and then a second container, using this command, and by consequence EXACTLY THE SAME VOLUME.
docker run -p 5465:5432 -v postgres-data:/var/lib/postgresql/data -d postgres
Is it a problem ?
And my most essential question is :
do i have to consider i have two postgres servers sharing the same configurations files,
or do i have to conside i have two postgres containers sharing the same postgres server ?
It's not really clear for me.
Thanks in advance.
Yes, that's a problem. I think PostgreSQL is clever enough that one of the databases just won't start up. In the worst case, this is a recipe for data corruption. This isn't specific to Docker; just in general, you can't run two databases against the same physical storage.
A typical container-oriented setup is to have two separate databases with two separate volumes, one for each service that requires a database.

Unable to resize /dev/sda1 of GCP postgres

I created a postgres VM in GCP using this instructions https://joncloudgeek.com/blog/deploy-postgres-container-to-compute-engine/#create-a-compute-instance-running-a-postgres-container with a 10GB disk, everything has worked fine for the last couple of months but I seem to have run out of space on /dev/sda1. So i increased the disk size to 400GB but I can't seem to be able to resize /dev/sda1 using the standard command "sudo growpart /dev/sda 1" I keep getting command not found.
Solution for me:
Create a machine image of the container.
Spin up a new VM based on the machine image created.
Delete old VM.
This created a new Postgres VM with 400GB of disk.

Limit Disk usage in Docker+MongoDB

I am using the official mongo Docker image to start a MongoDB container where my boot disk is limited (e.g. 10G) I configured the docker to run with Google Cloud Logging driver and was hoping Google to store all the logs and save my local disk space. However, I notice the disk continues to grow:
$ df -h
/dev/sda1 9.9G 4.5G 4.9G 49%
As I digged deeper I realized the size of docker containers seems to be growing over time.
$ sudo du -sh /var/lib/docker/
3.6G /var/lib/docker/
However, I can't go further as somehow I can't access the directories within.
If I go inside the docker and du -sh the root, I don't find any suspicious directories occupying space.
So my problem is how do I find out where the disk space is used and how do I eliminate it.
My docker startup command (shown without project options)
docker run -d --log-driver=gcplogs mongo mongod
EDIT: I noticed the size growing has stopped at 4.5GB from ~3GB for a while. So I supposed it has reached some equilibrium now.

How am I supposed to use a Postgresql docker image/container?

I'm new to docker. I'm still trying to wrap my head around all this.
I'm building a node application (REST api), using Postgresql to store my data.
I've spent a few days learning about docker, but I'm not sure whether I'm doing things the way I'm supposed to.
So here are my questions:
I'm using the official docker postgres 9.5 image as base to build my own (my Dockerfile only adds plpython on top of it, and installs a custom python module for use within plpython stored procedures). I created my container as suggedsted by the postgres image docs:
docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres
After I stop the container I cannot run it again using the above command, because the container already exists. So I start it using docker start instead of docker run. Is this the normal way to do things? I will generally use docker run the first time and docker start every other time?
Persistance: I created a database and populated it on the running container. I did this using pgadmin3 to connect. I can stop and start the container and the data is persisted, although I'm not sure why or how is this happening. I can see in the Dockerfile of the official postgres image that a volume is created (VOLUME /var/lib/postgresql/data), but I'm not sure that's the reason persistance is working. Could you please briefly explain (or point to an explanation) about how this all works?
Architecture: from what I read, it seems that the most appropriate architecture for this kind of app would be to run 3 separate containers. One for the database, one for persisting the database data, and one for the node app. Is this a good way to do it? How does using a data container improve things? AFAIK my current setup is working ok without one.
Is there anything else I should pay atention to?
Thanks
EDIT: adding to my confusion, I just ran a new container from the debian official image (no Dockerfile, just docker run -i -t -d --name debtest debian /bin/bash). With the container running in the background, I attached to it using docker attach debtest and the proceeded to apt-get install postgresql. Once installed I ran (still from within the container) psql and created a table in the default postgres database, and populated it with 1 record. Then I exited the shell and the container stopped automatically since the shell wasn't running anymore. I started the container againg using docker start debtest, then attached to it and finally run psql again. I found everything is persisted since the first run. Postgresql is installed, my table is there, and offcourse the record I inserted is there too. I'm really confused as to why do I need a VOLUME to persist data, since this quick test didn't use one and everything apears to work just fine. Am I missing something here?
Thanks again
1.
docker run --name some-postgres -e POSTGRES_PASSWORD=mysecretpassword
-d postgres
After I stop the container I cannot run it again using the above
command, because the container already exists.
Correct. You named it (--name some-postgres) hence before starting a new one, the old one has to be deleted, e.g. docker rm -f some-postgres
So I start it using
docker start instead of docker run. Is this the normal way to do
things? I will generally use docker run the first time and docker
start every other time?
No, it is by no means normal for docker. Docker process containers are supposed normally to be ephemeral, that is easily thrown away and started anew.
Persistance: ... I can stop and start
the container and the data is persisted, although I'm not sure why or
how is this happening. ...
That's because you are reusing the same container. Remove the container and the data is gone.
Architecture: from what I read, it seems that the most appropriate
architecture for this kind of app would be to run 3 separate
containers. One for the database, one for persisting the database
data, and one for the node app. Is this a good way to do it? How does
using a data container improve things? AFAIK my current setup is
working ok without one.
Yes, this is the good way to go by having separate containers for separate concerns. This comes in handy in many cases, say when for example you need to upgrade the postgres base image without losing your data (that's in particular where the data container starts to play its role).
Is there anything else I should pay atention to?
When acquainted with the docker basics, you may take a look at Docker compose or similar tools that will help you to run multicontainer applications easier.
Short and simple:
What you get from the official postgres image is a ready-to-go postgres installation along with some gimmicks which can be configured through environment variables. With docker run you create a container. The container lifecycle commands are docker start/stop/restart/rm Yes, this is the Docker way of things.
Everything inside a volume is persisted. Every container can have an arbitrary number of volumes. Volumes are directories either defined inside the Dockerfile, the parent Dockerfile or via the command docker run ... -v /yourdirectoryA -v /yourdirectoryB .... Everything outside volumes is lost with docker rm. Everything including volumes is lost with docker rm -v
It's easier to show than to explain. See this readme with Docker commands on Github, read how I use the official PostgreSQL image for Jira and also add NGINX to the mix: Jira with Docker PostgreSQL. Also a data container is a cheap trick to being able to remove, rebuild and renew the container without having to move the persisted data.
Congratulations, you have managed to grasp the basics! Keep it on! Try docker-compose to better manage those nasty docker run ...-commands and being able to manage multi-containers and data-containers.
Note: You need a blocking thread in order to keep a container running! Either this command must be explicitly set inside the Dockerfile, see CMD, or given at the end of the docker run -d ... /usr/bin/myexamplecommand command. If your command is NON blocking, e.g. /bin/bash, then the container will always stop immediately after executing the command.

How to restrict cpu usage from host to docker container

I have one VM host in one physical server with many docker containers inside.
Here one fragment of my fig.yml
pg:
image: pg...
redis:
image: redis...
mongodb:
image: mongodb...
app:
image: myapp...
I wish set pg container use only 25% of host cpu and app use only 50% of host cpu and so on.
Could I do it with fig or with docker run and manage link by hand?
In my case when one of this container is running a costly task it affect the cpu performance of the others ones. But when in the same physical server I have others VM with similar deploy inside the problem increase dramatically.
For now, Fig doesn't support setting CPU and memory limitation. Maybe it will support in the future.
I encourage you to experiment with using docker run -m for memory limit, and docker run -c for CPU shares. These flags will allow you to set memory and CPU values when starting a container. Read more about the flags you can use with docker run here:
https://docs.docker.com/reference/commandline/cli/#run
But it can only set when you are create a new container.
After creating container, you cannot change the value.