Should I commit Postgresql data into Github? - postgresql

When dockerizing my app, a data folder was created with postgresql elements and data. It comes from this docker-compose.yml snippet:
volumes:
- ./data/db:/var/lib/postgresql/data
This folder is being populated with data constantly from running the app (still not in production but running in docker compose), which means that every time I commit any changes I do to the app code, also various data files are also created. Should I include that folder into .gitignore or I should really commit them?

You should include this directory in your .gitignore file (and, if appropriate, .dockerignore as well).
Since the PostgreSQL data is opaque binary files, it's not something that can be stored well in source control. The database storage is also specific to your instance of the application. If you and a colleague are both working on the same application, and they commit changed database files to source control, your only real choices are to overwrite your local data with theirs, or to ignore their changes.
Most application frameworks have two key features to support working with databases. You can generally run migrations, which generally do things like create tables and indexes, and can be run incrementally to update an older database to a newer schema. You can also supply seed data to initialize a developer's database. Both of these things are in plain text, often SQL files or CSV-format data, and these can be committed to source control.
In a Docker context, you have one more option. Since you can't usefully read the opaque database data and you don't want it committed to source control, you can ask Docker to manage the storage in a named volume. On some platforms this can also be noticeably faster; but if you do need to back up the data or otherwise interact with it as files and not via the database layer, it can be harder.
version: '3.8'
services:
database:
image: postgres
volumes:
- 'database_data:/var/lib/postgresql/data'
# ^^^^^^^^^^^^^ volume name, not directory path
volumes:
database_data:
# empty, but must be declared

Related

Postgres empty directories and GitHub

EDIT: Some context was missing from the original question.
My goal in using Git was to share the table structure and stored procedures of a mostly empty database under development with a second person building a REST API server
Git does not track empty directories, but PostgreSQL requires them at startup. How can I get the two working together?
Every search I make on the topic brings me to some variation of a Git workaround that involves adding dummy files to directories that would otherwise be empty. However, PostgreSQL did not appreciate that solution in at least the pg_tblspc directory.
The directories required by PostgreSQL at startup are:
pg_notify
pg_tblspc
pg_replslot
pg_twophase
pg_stat
pg_snapshots
pg_commit_ts
pg_logical/mappings
pg_logical/snapshots
I am running PostgreSQL from a Docker container as follows:
docker-compose.yml
version: "3.9"
services:
db:
build: ./pg_db/
image: db_MyDB
ports:
- "5432:5432"
volumes:
- ./pg_db/db_data:/var/lib/postgresql/data
./pg_db/Dockerfile
FROM postgres:14
ENV POSTGRES_PASSWORD=admin
ENV POSTGRES_USER=<admin password>
ENV POSTGRES_DB=db_MyDB
Then in the CLI...
git add .
git commit -m "something"
git push -u origin HEAD
Later, if I or another user then try to replicate the database on GitHub from the CLI...
git clone https://github.com/<user>/<project>.git
cd <project>
docker-compose -up
The result is a set of startup failures related to the missing, empty, directories listed above. Once I manually create them, PostgreSQL starts up, without any issues to my knowledge.
Generally speaking it is best to keep the schema object definitions in text files external to the database in some sort of change management framework. This can either be a home grown solution or one of the many existing solutions as shown in your link Change Management. In either case keeping this under version control e.g. Git is an extra layer of flexibility and redundancy.
The basic principle is to have schema(database object) definitions that live outside the database itself. Then have a procedure to apply those definitions or changes to existing definitions in an orderly fashion. Preferably in a way that allows you to both move forward to a new state as well as revert to previous state in the database itself. A version control tool over this works to maintain a history of changes to the schema as well as the other text, code, etc that goes along with any project. It also allows for branching to try out new ideas without interfering with an existing setup.
Thank you to Richard and Adrian for responding so promptly.
EDIT: Adrian highlights in a comment below that Git is a strong compliment to a Schema Management Tool
Git is the wrong tool. What I need is a Schema Management Tool several of which are highlighted in the PostgreSQL wiki here:
https://wiki.postgresql.org/wiki/Change_management_tools_and_techniques
Some context was missing from the original question.
My goal in using Git was to share the table structure and stored procedures of a mostly empty database under development with a second person building a REST API server. My approach of sharing the entire data directory caused some confusion and in hindsight I can see how pushing likely stale data around with schema changes would would be a terrible idea in most scenario's.
I have what I need, but if anyone would like to provide a more inciteful answer for anyone else who might wind up here in a search I'd be happy to accept your answer as the correct one.

Link mongo-data to /data/db folder to a volume Mongodb Docker

I accidentally deleted a volume of docker mongo-data:/data/db , i have a copy of that folder , now the problem is when i run docker-compose up mongodb container doesn't start and gives an error of mongo_1 exited with code 14 below more details of the error and the mongo-data folder , can you someone help me please
in docker-compose.yml
volumes:
- ./mongo-data:/data/db
Restore from backup files
A step-by-step process to repair the corrupted files from a failed mongodb in a docker container:
! Before you start, make copy of the files. !
Make sure you know which version of the image was running in the container
Spawn new container with to run the repair process as follows
docker run -it -v <data folder>:/data/db <image-name>:<image-version> mongod --repair
Once the files are repaired, you can start the containers from the docker-compose
If the repair fails, it usually means that the files are corrupted beyond repair. There is still a chance to repair it with exporting the data as described here.
How to secure proper backup files
The database is constantly working with the files, so the files are constantly changed on the disks. In addition, the database will keep some of the changes in the internal memory buffers before they are flushed to the filesystem. Although the database engines are doing very good job to assure the the database can recover from abrupt failure by using the 2-stage commit process (first update the transaction-log than the datafile), when the files are copied there could be a corruption that will prevent the database from recovery.
Reason for such corruption is that the copy process is not aware of the database written process progress, and this creates a racing condition. With very simple words, while the database is in middle of writing, the copy process will create a copy of the file(s) that is half-updated, hence it will be corrupted.
When the database writer is in middle of writing to the files, we call them hot files. hot files are term from the OS perspective, and MongoDB also uses a term hot backup which is a term from MongoDB perspective. Hot backup means that the backup was taken when the database was running.
To take a proper snapshot (assuring the files are cold) you need to follow the procedure explained here. In short, the command db.fsyncLock() that is issued during this process will inform the database engine to flush all buffers and stop writing to the files. This will make the files cold, however the database remains hot, hence the difference between the terms hot files and hot backup. Once the copy is done, the database is informed to start writing to the filesystem by issuing db.fsyncUnlock()
Note the process is more complex and can change with different version of the databse. Here I give a simplification of it, in order to illustrate the point about the problems with the file snapshot. To secure proper and consistent backup, always follow the documented procedure for the database version that you use.
Suggested backup method
Preferred backup should always be the data dump method, since this assures that you can restore even in case of upgraded/downgraded database engines. MongoDB provides very useful tool called mongodump that can be used to create database backups by dumping the data, instead by copy of the files.
For more details on how to use the backup tools, as well as for the other methods of backup read the MongoDB Backup Methods chapter of the MondoDB documentation.

dockerfile for backend and a seperate one for dbms because compose wont let me copy sql file into dbms container?

I have a dockerfile for frontend, one for backend, and one for the database.
In the backend portion of the project, I have a dockerfile and a docker-compose.yml file.
the dockerfile is great for the backend because it configures the backend, copies and sets up the information etc. I like it alot.
The issue i have come to though is that if i can easily create a dockerfile for the dbms, but it requires me to put it in a different directory, where i was hoping to just define it in the same directory as the backend, and because of the fact the backend and the dbms is so tightly coupled, i figured this is where docker-compose would go.
My issue I ran into is that in a compose file, I cant do a COPY into the dbms container. I would just have to create another dockerfile to set that up. I was thinking that would work.
When looking on github, there was a big enhancement thread about it, but the closest people would get is just creating volume relationship, which fails to do what I want.
Ideally, All i want to be able to do is to stand up a postgres dbms in a fashion such that i could conduct load balancing on it later down the line with 1 write, 5 read or something, and have its initial db defined in my one sql file.
Am I missing something? I thought i was going about it correctly, but maybe I need to create a whole new directory with a dockerfile for the dbms.
Thoughts on how I should accomplish this?
Right now i was doing something like:
version: '2.0'
services:
backend:
build: .
ports:
- "8080:8080"
database:
image: "postgres:10"
environment:
POSTGRES_USER: "test"
POSTGRES_PASSWORD: "password"
POSTGRES_DB: "foo"
# I shouldnt have volumes as it would copy the entire folder and its contents to db.
volumes:
- ./:/var/lib/postgresql/data
To copy things with docker there an infinite set of possibilities.
At image build time:
use COPY or ADD instructions
use shell commands including cp,ssh,wget and many others.
From the docker command line:
use docker cp to copy from/to hosts and containers
use docker exec to run arbitrary shell commands including cp, ssh and many others...
In docker-compose / kubernetes (or through command line):
use volume to share data between containers
volume can be local or distant file systems (network disk for example)
potentially combine that with shell commands for example to perform backups
Still how you should do it dependy heavily of the use case.
If the data you copy is linked to the code and versionned (in the git repo...) then treat as it was code and build the image with it thanks to the Dockerfile. This is for me a best practice.
If the data is a configuration dependrnt of the environement (like test vs prod, farm 1 vs farm 2), then go for docker config/secret + ENV variables.
If the data is dynamic and generated at production time (like a DB that is filled with user data as the app is used), use persistant volumes and be sure you understand well the impact of container failure for your data.
For a database in a test system it can make sense to relauch the DB from a backup dump, a read only persistant volume or much simpler backup the whole container at a known state (with docker commit).

Change path of Firebird Secondary database files

I have created a Firebird multi-file database
Main Database file D:\Database\MainDB.fdb
Secondary files (240 Files) located under D:\Database\DBFiles\Data001.fdb to D:\Database\DBFiles\Data240.fdb
When copy database to another location and trying to open it Firebird doesn't locate the files if they are not in D:\ partition
I want Firebird to locate the secondary files under Database\DBFiels folder at the new path.
So if I copy the database to C:\Database\MainDB.fdb
Firebird would open Data001.fdb in new path like C:\Database\DBFiles instead of old path in D:\Database\DBFiles where they were initially created
Can that be done with Firebird? if not, then how it should be done?
Update:
Finally I found out it's not possiable to change Firebird database secondary files usign Firebird.
but I found this Firebird FAQ mention GLINK tool but It doesn't support Firebird 3.x so I didn't test it, and It's not recommended to use it even with supported versions of Firebird.
Done what exactly?
UPD. I edited the very vague original question to make clear WHAT the topic starter wants.
You can not reliably "copy files with Firebird" - Firebird is not files copying tool. You can to a degree use EXTERNAL TABLE for raw files access, but very limited and not upon the database itself.
It is dangerous practice to "copy databases" while Firebird is working, because you would only copy part of the data. The recently updated data that is in memory cache but did not yet made it on disk would be lost. The database file would be inconsistent with some data updated and some not yet. When you "copy database files" you have first to shutdown either those databases or even the whole Firebird server.
Firebird has it's own tools for moving databases around - and those are called backup/restore tools. Maybe what you need is nbackup tool, if gbak is too slow for you.
Finally, you can list files that comprise the database. You can do it via gstat utility or via "Services API" it uses. You also can select from RDB$FILES system table. However what would you do after you did it? The very access to the database makes it badly suited for consequent copying (#2). You would perhaps need to shutdown database, turn it to read-only AND single-user state, and only then attach to it and read RDB$FILES. And after copying done - you would have to de-shutdown the database. Kinda much more complex than nbackup.
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/gstat-example-header.html
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/gfix-dbstartstop.html
https://www.firebirdsql.org/file/documentation/reference_manuals/fblangref25-en/html/fblangref-appx04-files.html
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/gbak.html
https://www.firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/nbackup.html

Restoring Database PostgreSQL

One of my servers has a virus and the Postgres service in Windows is not running a backup and I'm using Odoo8 and even the Odoo Service is not running.
Is it possible to restore a database using only a OID directory which from what I know is the database file of Postgres.
I assume you mean /data/base/<oid> directory. Unfortunately it's not enough. There are some settings stored outside database oid directory as you called it.
Ex:
/data/glboal/ - cluster users' settings (passwords, roles etc)
/data/pg_xlog/ - WAL entries - possibly with transactions changes not "transfered" to database files yet.
/data/pg_tblspc/ - tablespaces
You need whole /data directory. Read more about PHYSICAL BACKUP.
Edit:
So, if whole /data is available for you, you can restore database to other server. There's one thing you should remember: destination postrges cluster must be at the same varsion ex. 9.4.1. When the first and seccond numbers match (ex 9.2.10 and 9.2.16) this should also work most of the times. Keeping that in mind, you just need to replace /data/ directory on destination server with your source /data directory (destination server must be stopped during that operation).