postgresql and docker: which directory is needed to persist data - postgresql

In postgresql which are the directories we need to persist in general so that i can use the same data again even i rebuild
Like:
I know the main directory:
/var/lib/postgres or /var/lib/postgres/data (small confusion, which one)
and any other like the logs etc

You can define the PGDATA environment variable in your docker container to specify where postgres will save its database files.
From the documentation of the official postgres Docker image:
PGDATA:
This optional variable can be used to define another location - like a
subdirectory - for the database files. The default is
/var/lib/postgresql/data, but if the data volume you're using is a
filesystem mountpoint (like with GCE persistent disks), Postgres
initdb recommends a subdirectory (for example
/var/lib/postgresql/data/pgdata ) be created to contain the data.
Additionally from the postgres documentation transaction log files are also written to PGDATA:
By default the transaction log is stored in a subdirectory of the
main Postgres data folder (PGDATA).
So by default the postgres image will write database files to /var/lib/postgresql/data
To answer your question it should be sufficient to bind mount a directory to /var/lib/postgresql/data inside of your postgres container.

Related

How to indicate external path for data postgres?

I have volume
- ./var/volume/postgres/db:/var/lib/postgresql/data
for postgres container:
image: postgres:10
And I want to indicate an external folder from another disk
-/media/ubuntuuser/Data/data/db:/var/lib/postgresql/data
but the path out of working dir not works for me
Can I fix it somehow?

How can a docker-compose configuration transition from using anonymous volumes to named volumes while maintaining existing data?

Is there a way to migrate from a docker-compose configuration using all anonymous volumes to one using named volumes without needing manual intervention to maintain data (e.g. manually copying folders)? This could entail having users run a script on the host machine but there would need to be some safeguard against a subsequent docker-compose up succeeding if the script hadn't been run.
I contribute to an open source server application that users install on a range of infrastructure. Our users are typically not very technical and are resource-constrained. We have provided a simple docker-compose-based setup. Persistent data is in a containerized postgres database which stores its data on an anonymous volume. All of our administration instructions involve stopping running containers but not bringing them down.
This works well for most users but some users have ended up doing docker-compose down either because they have a bit of Docker experience or by simple analogy to up. When they bring their server back up, they get new anonymous volumes and it looks like they have lost their data. We have provided instructions for recovering from this state but it's happening often enough that we're reconsidering our configuration and exploring transitioning to named volumes.
We have many users happily using anonymous volumes and following our administrative instructions exactly. These are our least technical users and we want to make sure that they are not negatively affected by any change we make to the docker-compose configuration. For that reason, we can't "just" change the docker-compose configuration to use named volumes and provide a script to migrate data. There's too high of a risk that users would forget/fail to run the script and end up thinking they had lost all their data. This kind of approach would be fine if we could somehow ensure that bringing the service back up with the new configuration only succeeds if the data migration has been completed.
Side note for those wondering about our choice to use a containerized database: we also have a path for users to specify an external db server (e.g. RDS) but this is only accessible to our most resourced users.
Edit: Here is a similar ServerFault question.
Given that you're using an official PostgreSQL image, you can exploit their database initialization system
If you would like to do additional initialization in an image derived from this one, add one or more *.sql, *.sql.gz, or *.sh scripts under /docker-entrypoint-initdb.d (creating the directory if necessary). After the entrypoint calls initdb to create the default postgres user and database, it will run any *.sql files, run any executable *.sh scripts, and source any non-executable *.sh scripts found in that directory to do further initialization before starting the service.
with a change of PGDATA
This optional variable can be used to define another location - like a subdirectory - for the database files. The default is /var/lib/postgresql/data. If the data volume you're using is a filesystem mountpoint (like with GCE persistent disks) or remote folder that cannot be chowned to the postgres user (like some NFS mounts), Postgres initdb recommends a subdirectory be created to contain the data.
to solve the problem. The idea is that you define a different location for Postgres files and mount a named volume there. The new location will be empty initially and that will trigger database initialization scripts. You can use this to move data from anonymous volume and do this exactly once.
I've prepared an example for you to test this out. First, create a database on an anonymous volume with some sample data in it:
docker-compose.yml:
version: "3.7"
services:
postgres:
image: postgres
environment:
POSTGRES_PASSWORD: test
volumes:
- ./test.sh:/docker-entrypoint-initdb.d/test.sh
test.sh:
#!/bin/bash
set -e
psql -v ON_ERROR_STOP=1 --username "postgres" --dbname "postgres" <<-EOSQL
CREATE TABLE public.test_table (test_column integer NOT NULL);
INSERT INTO public.test_table VALUES (1);
INSERT INTO public.test_table VALUES (2);
INSERT INTO public.test_table VALUES (3);
INSERT INTO public.test_table VALUES (4);
INSERT INTO public.test_table VALUES (5);
EOSQL
Note how this test.sh is mounted, it should be in /docker-entrypoint-initdb.d/ directory in order to be executed at the initialization stage. Bring the stack up and down to initialize the database with this sample data.
Now create a script to move the data:
move.sh:
#!/bin/bash
set -e
rm -rf $PGDATA/*
mv /var/lib/postgresql/data/* "$PGDATA/"
and update the docker-compose.yml with a named volume and a custom location for data:
docker-compose.yml:
version: "3.7"
services:
postgres:
image: postgres
environment:
POSTGRES_PASSWORD: test
# set a different location for data
PGDATA: /pgdata
volumes:
# mount the named volume
- pgdata:/pgdata
- ./move.sh:/docker-entrypoint-initdb.d/move.sh
volumes:
# define a named volume
pgdata: {}
When you bring this stack up it won't find the database (because named volume is initially empty) and Postgres will run initialization scripts. First it runs its own script to create an empty database then it runs custom scripts from the /docker-entrypoint-initdb.d directory. In this example I mounted move.sh into that directory, which will erase temporary database and move old database to the new location.

dockerfile for backend and a seperate one for dbms because compose wont let me copy sql file into dbms container?

I have a dockerfile for frontend, one for backend, and one for the database.
In the backend portion of the project, I have a dockerfile and a docker-compose.yml file.
the dockerfile is great for the backend because it configures the backend, copies and sets up the information etc. I like it alot.
The issue i have come to though is that if i can easily create a dockerfile for the dbms, but it requires me to put it in a different directory, where i was hoping to just define it in the same directory as the backend, and because of the fact the backend and the dbms is so tightly coupled, i figured this is where docker-compose would go.
My issue I ran into is that in a compose file, I cant do a COPY into the dbms container. I would just have to create another dockerfile to set that up. I was thinking that would work.
When looking on github, there was a big enhancement thread about it, but the closest people would get is just creating volume relationship, which fails to do what I want.
Ideally, All i want to be able to do is to stand up a postgres dbms in a fashion such that i could conduct load balancing on it later down the line with 1 write, 5 read or something, and have its initial db defined in my one sql file.
Am I missing something? I thought i was going about it correctly, but maybe I need to create a whole new directory with a dockerfile for the dbms.
Thoughts on how I should accomplish this?
Right now i was doing something like:
version: '2.0'
services:
backend:
build: .
ports:
- "8080:8080"
database:
image: "postgres:10"
environment:
POSTGRES_USER: "test"
POSTGRES_PASSWORD: "password"
POSTGRES_DB: "foo"
# I shouldnt have volumes as it would copy the entire folder and its contents to db.
volumes:
- ./:/var/lib/postgresql/data
To copy things with docker there an infinite set of possibilities.
At image build time:
use COPY or ADD instructions
use shell commands including cp,ssh,wget and many others.
From the docker command line:
use docker cp to copy from/to hosts and containers
use docker exec to run arbitrary shell commands including cp, ssh and many others...
In docker-compose / kubernetes (or through command line):
use volume to share data between containers
volume can be local or distant file systems (network disk for example)
potentially combine that with shell commands for example to perform backups
Still how you should do it dependy heavily of the use case.
If the data you copy is linked to the code and versionned (in the git repo...) then treat as it was code and build the image with it thanks to the Dockerfile. This is for me a best practice.
If the data is a configuration dependrnt of the environement (like test vs prod, farm 1 vs farm 2), then go for docker config/secret + ENV variables.
If the data is dynamic and generated at production time (like a DB that is filled with user data as the app is used), use persistant volumes and be sure you understand well the impact of container failure for your data.
For a database in a test system it can make sense to relauch the DB from a backup dump, a read only persistant volume or much simpler backup the whole container at a known state (with docker commit).

Backup and restore postgresql data folder directly

Till now I've been backing up my postgresql data using pg_dump, which exports the data to an sql file mydb.sql, and then restoring from that sql file using psql -U user -d db < mydb.sql.
For one reason or another it would be more convenient to restore the database content more directly, in an environment where psql does not exist... specifically on a host server where postgresql is installed in a docker container running on the host, but not on the host itself.
My plan is to back up the content of /var/lib/postgresql/data/ to a tar file, and when required (e.g. when a new server is created that hosts the postgresql container) just restore that to the same path. The folder /var/lib/postgresql/data/ in the docker container is mapped to a folder on the host server, so I would create this backup on the host, not inside the postgres container.
Is this a valid approach? Any "gotchas"? And are there any subfolders within /var/lib/postgresql/data/ that I can exclude from the tar file? I don't want to back up mere 'housekeeping' information.
You can do that, but you have to do it properly if you don't want your database to become corrupted.
Either stop PostgreSQL before copying the data directory or follow the instructions from the documentation for an online backup.

Restoring Database PostgreSQL

One of my servers has a virus and the Postgres service in Windows is not running a backup and I'm using Odoo8 and even the Odoo Service is not running.
Is it possible to restore a database using only a OID directory which from what I know is the database file of Postgres.
I assume you mean /data/base/<oid> directory. Unfortunately it's not enough. There are some settings stored outside database oid directory as you called it.
Ex:
/data/glboal/ - cluster users' settings (passwords, roles etc)
/data/pg_xlog/ - WAL entries - possibly with transactions changes not "transfered" to database files yet.
/data/pg_tblspc/ - tablespaces
You need whole /data directory. Read more about PHYSICAL BACKUP.
Edit:
So, if whole /data is available for you, you can restore database to other server. There's one thing you should remember: destination postrges cluster must be at the same varsion ex. 9.4.1. When the first and seccond numbers match (ex 9.2.10 and 9.2.16) this should also work most of the times. Keeping that in mind, you just need to replace /data/ directory on destination server with your source /data directory (destination server must be stopped during that operation).