Spoof free space available in Docker or tricking Postgres and RabbitMQ - postgresql

I'm using Google Cloud Run to host some solutions. When the containers start, programs can write to disk, and the data persists until the container stops. However, from a system point of view, all partitions of the container always report zero free space. I confirmed this in a few ways:
Running df from start.sh shows zero free space when the container starts
Deleting a large file and then running df from start.sh still shows zero free space
It is possible to write to disk via start.sh, PHP scripts, etc, so the system DOES have free space to write to memory, yet df still reports zero free space
(All of the above are once the container is deployed to Cloud Run. Manually running the same container via docker from the Cloud Shell and executing df reports free space).
The problem is that certain applications perform disk space checks when they start, and they fail to load in Google Cloud Run. For example, MariaDB uses df in its init script, so commenting out these lines makes it possible to add a static yet functional MariaDB instance to a Cloud Run container.
MariaDB made it easy. Now, I'm trying to do the same thing with PostgreSQL and RabbitMQ, but I'm having trouble figuring out how to override their disk space checks. Here are the two options I am considering:
Keep digging through the source of PostgreSQL and RabbitMQ until I find the disk space check and override it. I don't speak Erlang, so this is a pain, and I would have to do it for every application with this issue
Programs are probably using coreutils to determine disk size. I could edit the source and rebuild it as part of my Dockerfile routine so the system always returns with free space available (could have unintentional side effects)
Is anyone either familiar with the source of Postgres or RabbitMQ or have a system-wide solution that I could implement that would "spoof" the free space available?
EDIT: Here are the error messages given by RabbitMQ and PostgreSQL
RabbitMQ:
{error,{cannot_log_to_file,"/var/log/rabbitmq/rabbit#localhost.log",{error,einval}}}
Postgres:
Error: /usr/lib/postgresql/10/bin/pg_ctl /usr/lib/postgresql/10/bin/pg_ctl start -D /var/lib/postgresql/10/main -l /var/log/postgresql/postgresql-10-main.log -s -o -c config_file="/etc/postgresql/10/main/postgresql.conf" exited with status 1:

Related

delete postgres logs when database is running - Get back disk space

postgres Log files are saturating the disk and I intend to delete all disks after backing up, should I restart postgres service or can postgres see the new free space after deletion without retsart? If no is there a command that forces postgres to see the nes space size while it is running?
You can delete PostgreSQL log files at any time. Note, however, that deleting (unlinking) a file does not actually delete it as long as a process still holds it open. So you have to notify PostgreSQL with
pg_ctl logrotate
just like the documentation describes.

Automatically initialize replica set for mongoDB in docker fails

I have a NodeJS Express App that depends on MongoDB change streams. For them to be available, MongoDB has to be configured to run as a replica set (even if there is only one node in that set).
I'm working on Windows 10 pro.
I'm trying to dockerize this App, basing the MongoDB container off the official mongo:5 image.
For this to work, I want an automated way of initializing the DB as a replica set. Tutorials I've found rely on either execing into the container and running rs.initiate() from mongosh (or similar approaches), which is manual work I want to avoid. Or they use hacks like wait-for-it.sh as here.
I feel there must be a better solution, based somehow on the paragraph "Initializing a fresh instance", from the docs.
It describes that
When a container is started for the first time it will execute files with extensions .sh and .js that are found in /docker-entrypoint-initdb.d.
When exactly in the container lifecycle does that happen? After the container is initialized? Or after the DB is ready? Because this seems to be the perfect place for this initialization logic, which runs flawlessly when executed manually, from within the container.
However, placing
// initReplSet.js
print('Script running');
config={"_id":"rs0", "members":[{"_id":0,"host":"app-db:27017"}]};
print(JSON.stringify(rs.initiate(config)));
print('Script end');
fails with the error {"ok":0,"errmsg":"No host described in new configuration with {version: 1, term: 0} for replica set rs0 maps to this node","code":93,"codeName":"InvalidReplicaSetConfig"}, yet the database is available under the hostname app-db from other containers. This makes me feel that this code runs too early, before all other initialization logic (networking) is done.
Another approach is to place a bash script that executes code via mongosh. Here's what I've tried:
#!/bin/bash
mongosh "mongodb://app-db:27017/app_db" "initiateReplSet"
where initiateReplSet is
config={"_id":"rs0", "members":[{"_id":0,"host":"app-db:27017"}]}
rs.initiate(config)
exit
but this crashes the container with the error
/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/initiateReplSetWrapper.sh
{"t":{"$date":"2022-02-15T11:31:23.353+00:00"},"s":"I", "c":"-", "id":4939300, "ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"NotYetInitialized: Cannot use non-local read concern until replica set is finished initializing.","nextWakeupMillis":600}}
Warning: Could not access file: EACCES: permission denied, mkdir '/home/mongodb'
Current Mongosh Log ID: 620b8f0b04b7ad69b446768d
Connecting to: mongodb://app-db:27017/app_db?directConnection=true&appName=mongosh+1.1.9
Only the first and the last three lines seem to really belong to the bash script, the second line is repeated constantly.
I'm not sure whether the error originates at the permission denied issue, or whether the DB really can't be accessed. However, specifying
RUN mkdir -p /home/mongodb/.mongodb
RUN chown -R 777 /home/mongodb
in the Dockerfile did not improve the situation (same error nevertheless).
Could you please explain either why this approach can not work, or how to make it work? Is there another, better, automated way to initialize the replica set? Could the docker image be improved to allow such initialization logic?
I just made it work with a wild experiment. Means I simply left out the config in my call to rs.initiate(), from the JS script. For some reason, the script then runs successfully and change streams become available to my NodeJS backend.
I will post everything that's needed to run a MongoDB docker with change streams enabled:
# Dockerfile
From mongo
WORKDIR .
COPY initiateReplSet.js ./docker-entrypoint-initdb.d/
CMD ["-replSet", "rs0"]
// initiateReplSet.js
rs.initiate()

postgresql docker replications

I'm relatively new to dockers, but I'm kind of wondering whether is it possible for me to create two master-slave postgres containers. I can do it on virtual machines, but I'm a bit confused on the one in docker.
If it's possible can someone please point me to right directions?
I have tried to docker exec -it, but all the files are all missing and I cannot edit the files inside.
Since you are new to Docker, and you wish to get up and running quickly, you can try using Bitnami's images, which allow you to specify a POSTGRESQL_REPLICATION_MODE environment variable, which will allow you to designate a container as a standby/slave.
Just save their docker-compose-replication.yml as docker-compose.yml in the director of your choice, run docker-compose up -d, and it will pull the necessary image and set everything up for you quickly.
However, I would highly encourage you to tinker on your own to learn how Docker works. Specifically, you could just use the community Postgres image, and then write your own entrypoint.sh file (along with any additional helper files as necessary), and customize the setup to your requirements.
Disclosure: I work for EnterpriseDB (EDB)

Persisting a single, static, large Postgres database beyond removal of the db cluster?

I have an application which, for local development, has multiple Docker containers (organized under Docker Compose). One of those containers is a Postgres 10 instance, based on the official postgres:10 image. That instance has its data directory mounted as a Docker volume, which persists data across container runs. All fine so far.
As part of testing the creation and initialization of the postgres cluster, it is frequently the case that I need to remove the Docker volume that holds the data. (The official postgres image runs cluster init if-and-only-if the data directory is found to be empty at container start.) This is also fine.
However! I now have a situation where in order to test and use a third party Postgres extension, I need to load around 6GB of (entirely static) geocoding lookup data into a database on the cluster, from Postgres backup dump files. It's certainly possible to load the data from a local mount point at container start, and the resulting (very large) tables would persist across container restarts in the volume that holds the entire cluster.
Unfortunately, they won't survive the removal of the docker volume which, again, needs to happen with some frequency. I am looking for a way to speed up or avoid the rebuilding of the single database which holds the geocoding data.
Approaches I have been or currently am considering:
Using a separate Docker volume on the same container to create persistent storage for a separate Postgres tablespace that holds only the geocoder database. This appears to be unworkable because while I can definitely set it up, the official PG docs say that tablespaces and clusters are inextricably linked such that the loss of the rest of the cluster would render the additional tablespace unusable. I would love to be wrong about this, since it seems like the simplest solution.
Creating an entirely separate container running Postgres, which mounts a volume to hold a separate cluster containing only the geocoding data. Presumably I would then need to do something kludgy with foreign data wrappers (or some more arcane postgres admin trickery that I don't know of at this point) to make the data seamlessly accessible from the application code.
So, my question: Does anyone know of a way to persist a single database from a dockerized Postgres cluster, without resorting to a dump and reload strategy?
If you want to speed up then you could convert your database dump to a data directory (import your dump to a clean postgres container, stop it and create a tarball of the data directory, then upload it somewhere). Now when you need to create a new postgres container use use a init script to stop the database, download and unpack your tarball to the data directory and start the database again, this way you skip the whole db restore process.
Note: The data tarball has to match the postgres major version so the container has no problem to start from it.
If you want to speed up things even more then create a custom postgres image with the tarball and init script bundled so everytime it starts then it will wipe the empty cluster and copy your own.
You could even change the entrypoint to use your custom script and load the database data, then call docker-entrypoint.sh so there is no need to delete a possible empty cluster.
This will only work if you are OK with replacing the whole cluster everytime you want to run your tests, else you are stuck with importing the database dump.

Postgres 9.2 pg_largeobject tablespace

I am currently moving some data around and I am running into an interesting issue.
I have a CentOS server (6.3) up and running with Postgres 9.2 on a server with limited built in disk space; however, I do have a large amount of extremely reliable external network disk space available.
I have set the tablespace to a directory on this storage devise for my database and everything seems to be working well, until...
I realized that I have a large amount of BLOB data that needs to be stored in pg_largeobject.
I have been goggling how to set the tablespace of pg_largeobject and I did find some results, but they are horribly out dated.
I did find one article that looks promising, but I'm hesitant because the thread also references that things will/should have changed.
I have two questions...
In an ideal world, I would like to move all of postgres (including pg_largeobject) onto this external storage for ease of maintenance. Is this possible?
If not, how can I get pg_largeobject to use my network storage?
As you alluded to, your best bet is to move the entirety of PostgreSQL onto the remote storage, assuming that storage uses a reliable file network block device like iSCSI, ATAoE or NBD. I wouldn't recommend running Pg on NFS, and running it on CIFS/SMBFS just won't work.
Just:
Make a backup
Take a note of the output of SHOW data_directory; in psql
Shut PostgreSQL down
Move the data directory (the folder containing pg_xlog, pg_clog, etc) to the remote storage
Adjust the permissions on the parent directories for the datadir's new location to make sure the postgres user, postgres, group or others permissions block has at least execute on each parent directory so it can traverse the tree.
Adjust your system startup scripts to set the new location as the PostgreSQL datadir or symlink the old datadir location (output by SHOW data_directory) to the new location.
Start PostgreSQL
Unfortunately, different systems and packages find the datadir different ways. Debian/Ubuntu use pg_wrapper, for example.