Postgres empty directories and GitHub - postgresql

EDIT: Some context was missing from the original question.
My goal in using Git was to share the table structure and stored procedures of a mostly empty database under development with a second person building a REST API server
Git does not track empty directories, but PostgreSQL requires them at startup. How can I get the two working together?
Every search I make on the topic brings me to some variation of a Git workaround that involves adding dummy files to directories that would otherwise be empty. However, PostgreSQL did not appreciate that solution in at least the pg_tblspc directory.
The directories required by PostgreSQL at startup are:
pg_notify
pg_tblspc
pg_replslot
pg_twophase
pg_stat
pg_snapshots
pg_commit_ts
pg_logical/mappings
pg_logical/snapshots
I am running PostgreSQL from a Docker container as follows:
docker-compose.yml
version: "3.9"
services:
db:
build: ./pg_db/
image: db_MyDB
ports:
- "5432:5432"
volumes:
- ./pg_db/db_data:/var/lib/postgresql/data
./pg_db/Dockerfile
FROM postgres:14
ENV POSTGRES_PASSWORD=admin
ENV POSTGRES_USER=<admin password>
ENV POSTGRES_DB=db_MyDB
Then in the CLI...
git add .
git commit -m "something"
git push -u origin HEAD
Later, if I or another user then try to replicate the database on GitHub from the CLI...
git clone https://github.com/<user>/<project>.git
cd <project>
docker-compose -up
The result is a set of startup failures related to the missing, empty, directories listed above. Once I manually create them, PostgreSQL starts up, without any issues to my knowledge.

Generally speaking it is best to keep the schema object definitions in text files external to the database in some sort of change management framework. This can either be a home grown solution or one of the many existing solutions as shown in your link Change Management. In either case keeping this under version control e.g. Git is an extra layer of flexibility and redundancy.
The basic principle is to have schema(database object) definitions that live outside the database itself. Then have a procedure to apply those definitions or changes to existing definitions in an orderly fashion. Preferably in a way that allows you to both move forward to a new state as well as revert to previous state in the database itself. A version control tool over this works to maintain a history of changes to the schema as well as the other text, code, etc that goes along with any project. It also allows for branching to try out new ideas without interfering with an existing setup.

Thank you to Richard and Adrian for responding so promptly.
EDIT: Adrian highlights in a comment below that Git is a strong compliment to a Schema Management Tool
Git is the wrong tool. What I need is a Schema Management Tool several of which are highlighted in the PostgreSQL wiki here:
https://wiki.postgresql.org/wiki/Change_management_tools_and_techniques
Some context was missing from the original question.
My goal in using Git was to share the table structure and stored procedures of a mostly empty database under development with a second person building a REST API server. My approach of sharing the entire data directory caused some confusion and in hindsight I can see how pushing likely stale data around with schema changes would would be a terrible idea in most scenario's.
I have what I need, but if anyone would like to provide a more inciteful answer for anyone else who might wind up here in a search I'd be happy to accept your answer as the correct one.

Related

Should I commit Postgresql data into Github?

When dockerizing my app, a data folder was created with postgresql elements and data. It comes from this docker-compose.yml snippet:
volumes:
- ./data/db:/var/lib/postgresql/data
This folder is being populated with data constantly from running the app (still not in production but running in docker compose), which means that every time I commit any changes I do to the app code, also various data files are also created. Should I include that folder into .gitignore or I should really commit them?
You should include this directory in your .gitignore file (and, if appropriate, .dockerignore as well).
Since the PostgreSQL data is opaque binary files, it's not something that can be stored well in source control. The database storage is also specific to your instance of the application. If you and a colleague are both working on the same application, and they commit changed database files to source control, your only real choices are to overwrite your local data with theirs, or to ignore their changes.
Most application frameworks have two key features to support working with databases. You can generally run migrations, which generally do things like create tables and indexes, and can be run incrementally to update an older database to a newer schema. You can also supply seed data to initialize a developer's database. Both of these things are in plain text, often SQL files or CSV-format data, and these can be committed to source control.
In a Docker context, you have one more option. Since you can't usefully read the opaque database data and you don't want it committed to source control, you can ask Docker to manage the storage in a named volume. On some platforms this can also be noticeably faster; but if you do need to back up the data or otherwise interact with it as files and not via the database layer, it can be harder.
version: '3.8'
services:
database:
image: postgres
volumes:
- 'database_data:/var/lib/postgresql/data'
# ^^^^^^^^^^^^^ volume name, not directory path
volumes:
database_data:
# empty, but must be declared

What is the easiest way to generate a script to drop and create all objects in a database?

I'm used to working with SQL Server and the SQL Server Management Studio has the option to automatically generate a script to drop and recreate everything in a database (tables/views/procedures/etc). I find that when developing a new application and writing a bunch of junk in a local database for basic testing it's very helpful to have the options to just nuke the whole thing and recreate it in a clean slate, so I'm looking for a similar functionality within postgres/pgadmin.
PGAdmin has an option to generate a create script for a specific table but right clicking each table would be very tedious and I'm wondering if there's another way to do it.
To recreate a clean schema only database you can use the pg_dump client included with a Postgres server install. The options to use are:
-c
--clean
Output commands to clean (drop) database objects prior to outputting the commands for creating them. (Unless --if-exists is also specified, restore might generate some harmless error messages, if any objects were not present in the destination database.)
This option is ignored when emitting an archive (non-text) output file. For the archive formats, you can specify the option when you call pg_restore.
and:
-s
--schema-only
Dump only the object definitions (schema), not data.
This option is the inverse of --data-only. It is similar to, but for historical reasons not identical to, specifying --section=pre-data --section=post-data.
(Do not confuse this with the --schema option, which uses the word “schema” in a different meaning.)
To exclude table data for only a subset of tables in the database, see --exclude-table-data.
clean in Flyway
The database migration tool Flyway offers a clean command that drops all objects in the configured schemas.
To quote the documentation:
Clean is a great help in development and test. It will effectively give you a fresh start, by wiping your configured schemas completely clean. All objects (tables, views, procedures, …) will be dropped.
Needless to say: do not use against your production DB!

dockerfile for backend and a seperate one for dbms because compose wont let me copy sql file into dbms container?

I have a dockerfile for frontend, one for backend, and one for the database.
In the backend portion of the project, I have a dockerfile and a docker-compose.yml file.
the dockerfile is great for the backend because it configures the backend, copies and sets up the information etc. I like it alot.
The issue i have come to though is that if i can easily create a dockerfile for the dbms, but it requires me to put it in a different directory, where i was hoping to just define it in the same directory as the backend, and because of the fact the backend and the dbms is so tightly coupled, i figured this is where docker-compose would go.
My issue I ran into is that in a compose file, I cant do a COPY into the dbms container. I would just have to create another dockerfile to set that up. I was thinking that would work.
When looking on github, there was a big enhancement thread about it, but the closest people would get is just creating volume relationship, which fails to do what I want.
Ideally, All i want to be able to do is to stand up a postgres dbms in a fashion such that i could conduct load balancing on it later down the line with 1 write, 5 read or something, and have its initial db defined in my one sql file.
Am I missing something? I thought i was going about it correctly, but maybe I need to create a whole new directory with a dockerfile for the dbms.
Thoughts on how I should accomplish this?
Right now i was doing something like:
version: '2.0'
services:
backend:
build: .
ports:
- "8080:8080"
database:
image: "postgres:10"
environment:
POSTGRES_USER: "test"
POSTGRES_PASSWORD: "password"
POSTGRES_DB: "foo"
# I shouldnt have volumes as it would copy the entire folder and its contents to db.
volumes:
- ./:/var/lib/postgresql/data
To copy things with docker there an infinite set of possibilities.
At image build time:
use COPY or ADD instructions
use shell commands including cp,ssh,wget and many others.
From the docker command line:
use docker cp to copy from/to hosts and containers
use docker exec to run arbitrary shell commands including cp, ssh and many others...
In docker-compose / kubernetes (or through command line):
use volume to share data between containers
volume can be local or distant file systems (network disk for example)
potentially combine that with shell commands for example to perform backups
Still how you should do it dependy heavily of the use case.
If the data you copy is linked to the code and versionned (in the git repo...) then treat as it was code and build the image with it thanks to the Dockerfile. This is for me a best practice.
If the data is a configuration dependrnt of the environement (like test vs prod, farm 1 vs farm 2), then go for docker config/secret + ENV variables.
If the data is dynamic and generated at production time (like a DB that is filled with user data as the app is used), use persistant volumes and be sure you understand well the impact of container failure for your data.
For a database in a test system it can make sense to relauch the DB from a backup dump, a read only persistant volume or much simpler backup the whole container at a known state (with docker commit).

postgresql docker replications

I'm relatively new to dockers, but I'm kind of wondering whether is it possible for me to create two master-slave postgres containers. I can do it on virtual machines, but I'm a bit confused on the one in docker.
If it's possible can someone please point me to right directions?
I have tried to docker exec -it, but all the files are all missing and I cannot edit the files inside.
Since you are new to Docker, and you wish to get up and running quickly, you can try using Bitnami's images, which allow you to specify a POSTGRESQL_REPLICATION_MODE environment variable, which will allow you to designate a container as a standby/slave.
Just save their docker-compose-replication.yml as docker-compose.yml in the director of your choice, run docker-compose up -d, and it will pull the necessary image and set everything up for you quickly.
However, I would highly encourage you to tinker on your own to learn how Docker works. Specifically, you could just use the community Postgres image, and then write your own entrypoint.sh file (along with any additional helper files as necessary), and customize the setup to your requirements.
Disclosure: I work for EnterpriseDB (EDB)

Can I copy the postgresql /base directory as a DB backup?

Don't shoot me, I'm only the OP!
When needing to backup our DB, we always are able to shutdown postgresql completely. After it is down, I found I could copy the "/base" directory with the binary data in it to another location. Checksum the accuracy and am later able to restore that if/when necessary. This has even worked when upgrading to a later version of postgresql. Integrity of the various 'conf' files is not an issue as that is done elsewhere (ie. by other processes/procedures!) in the system.
Is there any risk to this approach that I am missing?
The "File System Level Backup" link in Abelisto's comment is what JoeG is talking about doing. https://www.postgresql.org/docs/current/static/backup-file.html
To be safe I would go up one more level, to "main" on our ubuntu systems to take the snapshot, and thoroughly go through the caveats of doing file-level backups. I was tempted to post the caveats here, but I'd end up quoting the entire page.
The thing to be most aware of (in a 'simple' postgres environment ) is the relationship between the postgres database, a user database and the pg_clog and pg_xlog files. If you only get the "base" you lose the transaction and WAL information, and in more complex installations, other 'necessary' information.
If those caveat conditions listed do not exist in your environment, and you can do a full shutdown, this is a valid backup strategy, which can be much faster than a pg_dump.