Automatically initialize replica set for mongoDB in docker fails - mongodb

I have a NodeJS Express App that depends on MongoDB change streams. For them to be available, MongoDB has to be configured to run as a replica set (even if there is only one node in that set).
I'm working on Windows 10 pro.
I'm trying to dockerize this App, basing the MongoDB container off the official mongo:5 image.
For this to work, I want an automated way of initializing the DB as a replica set. Tutorials I've found rely on either execing into the container and running rs.initiate() from mongosh (or similar approaches), which is manual work I want to avoid. Or they use hacks like wait-for-it.sh as here.
I feel there must be a better solution, based somehow on the paragraph "Initializing a fresh instance", from the docs.
It describes that
When a container is started for the first time it will execute files with extensions .sh and .js that are found in /docker-entrypoint-initdb.d.
When exactly in the container lifecycle does that happen? After the container is initialized? Or after the DB is ready? Because this seems to be the perfect place for this initialization logic, which runs flawlessly when executed manually, from within the container.
However, placing
// initReplSet.js
print('Script running');
config={"_id":"rs0", "members":[{"_id":0,"host":"app-db:27017"}]};
print(JSON.stringify(rs.initiate(config)));
print('Script end');
fails with the error {"ok":0,"errmsg":"No host described in new configuration with {version: 1, term: 0} for replica set rs0 maps to this node","code":93,"codeName":"InvalidReplicaSetConfig"}, yet the database is available under the hostname app-db from other containers. This makes me feel that this code runs too early, before all other initialization logic (networking) is done.
Another approach is to place a bash script that executes code via mongosh. Here's what I've tried:
#!/bin/bash
mongosh "mongodb://app-db:27017/app_db" "initiateReplSet"
where initiateReplSet is
config={"_id":"rs0", "members":[{"_id":0,"host":"app-db:27017"}]}
rs.initiate(config)
exit
but this crashes the container with the error
/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-initdb.d/initiateReplSetWrapper.sh
{"t":{"$date":"2022-02-15T11:31:23.353+00:00"},"s":"I", "c":"-", "id":4939300, "ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"NotYetInitialized: Cannot use non-local read concern until replica set is finished initializing.","nextWakeupMillis":600}}
Warning: Could not access file: EACCES: permission denied, mkdir '/home/mongodb'
Current Mongosh Log ID: 620b8f0b04b7ad69b446768d
Connecting to: mongodb://app-db:27017/app_db?directConnection=true&appName=mongosh+1.1.9
Only the first and the last three lines seem to really belong to the bash script, the second line is repeated constantly.
I'm not sure whether the error originates at the permission denied issue, or whether the DB really can't be accessed. However, specifying
RUN mkdir -p /home/mongodb/.mongodb
RUN chown -R 777 /home/mongodb
in the Dockerfile did not improve the situation (same error nevertheless).
Could you please explain either why this approach can not work, or how to make it work? Is there another, better, automated way to initialize the replica set? Could the docker image be improved to allow such initialization logic?

I just made it work with a wild experiment. Means I simply left out the config in my call to rs.initiate(), from the JS script. For some reason, the script then runs successfully and change streams become available to my NodeJS backend.
I will post everything that's needed to run a MongoDB docker with change streams enabled:
# Dockerfile
From mongo
WORKDIR .
COPY initiateReplSet.js ./docker-entrypoint-initdb.d/
CMD ["-replSet", "rs0"]
// initiateReplSet.js
rs.initiate()

Related

Docker Run MongoDB replica set.., how is it working?

So I'm creating some softwaare that makes heavy use of mongo transactions.
So far, I've tried only with testcontainers mongo, and pure unit testing.
Now I'm moving to test it manually and I get an error that says something like: Transaction numbers are only allowed on a replica set ..., yet that error doesn't happen during unit tests.
I read that this error happens because transactions are only possible on a replica set, but then, how is testcontainers working? I checked docker ps during running of tests and only one mongo docker container is up.
I checked the args passed by testcontainers, and it resulted they pass --replSet docker-rs. So I did, but then I get this error: NotYetInitialized: Cannot use non-local read concern until replica set is finished initializing.
I'm scratching my head bad, wondering how is testcontainers running a ONE mongo docker container that behaves like a replica set.
Assuming you're using Tescontainers MongoDB module, the missing part in your manual setup is most probably the mongo replica set initiation.
This is mentioned in the testcontainers module docs as:
Initialize a single replica set via executing a proper command
Also feel free to take a look at the module sources itself to dig into implementation details. For example, initReplicaSet() part.

Link mongo-data to /data/db folder to a volume Mongodb Docker

I accidentally deleted a volume of docker mongo-data:/data/db , i have a copy of that folder , now the problem is when i run docker-compose up mongodb container doesn't start and gives an error of mongo_1 exited with code 14 below more details of the error and the mongo-data folder , can you someone help me please
in docker-compose.yml
volumes:
- ./mongo-data:/data/db
Restore from backup files
A step-by-step process to repair the corrupted files from a failed mongodb in a docker container:
! Before you start, make copy of the files. !
Make sure you know which version of the image was running in the container
Spawn new container with to run the repair process as follows
docker run -it -v <data folder>:/data/db <image-name>:<image-version> mongod --repair
Once the files are repaired, you can start the containers from the docker-compose
If the repair fails, it usually means that the files are corrupted beyond repair. There is still a chance to repair it with exporting the data as described here.
How to secure proper backup files
The database is constantly working with the files, so the files are constantly changed on the disks. In addition, the database will keep some of the changes in the internal memory buffers before they are flushed to the filesystem. Although the database engines are doing very good job to assure the the database can recover from abrupt failure by using the 2-stage commit process (first update the transaction-log than the datafile), when the files are copied there could be a corruption that will prevent the database from recovery.
Reason for such corruption is that the copy process is not aware of the database written process progress, and this creates a racing condition. With very simple words, while the database is in middle of writing, the copy process will create a copy of the file(s) that is half-updated, hence it will be corrupted.
When the database writer is in middle of writing to the files, we call them hot files. hot files are term from the OS perspective, and MongoDB also uses a term hot backup which is a term from MongoDB perspective. Hot backup means that the backup was taken when the database was running.
To take a proper snapshot (assuring the files are cold) you need to follow the procedure explained here. In short, the command db.fsyncLock() that is issued during this process will inform the database engine to flush all buffers and stop writing to the files. This will make the files cold, however the database remains hot, hence the difference between the terms hot files and hot backup. Once the copy is done, the database is informed to start writing to the filesystem by issuing db.fsyncUnlock()
Note the process is more complex and can change with different version of the databse. Here I give a simplification of it, in order to illustrate the point about the problems with the file snapshot. To secure proper and consistent backup, always follow the documented procedure for the database version that you use.
Suggested backup method
Preferred backup should always be the data dump method, since this assures that you can restore even in case of upgraded/downgraded database engines. MongoDB provides very useful tool called mongodump that can be used to create database backups by dumping the data, instead by copy of the files.
For more details on how to use the backup tools, as well as for the other methods of backup read the MongoDB Backup Methods chapter of the MondoDB documentation.

The best practices for PostgreSQL docker container initialization with some data

I've created docker image with PostgreSQL running inside and exposing 5432 port.
This image doesn't contain any database inside. Container is an empty PostgreSQL database server.
I'd like in (or during) "docker run" command:
attach db file
create db via sql query execution
restore db from dump
I don't want to keep the data after container will be closed. It's just a temporary development server.
I suspect it's possible to keep my "docker run" command string quite short/simple.
Probably there it is possible to mount some external folder with db/sql/dump in run command and then create db during container initialization.
What are the best/recommended way and the best practices to accomplish this task? Probably somebody can point me to corresponding docker examples.
This is a good question and probably something other folks asked themselves more than once.
According to the docker guide you would not do this in a RUN command. Instead you would create yourself an ENTRYPOINT or CMD in your Dockerfile that calls a custom shell script instead of calling the postgres process direclty. In this scenario the DB would be created in a "real" filesystem, but then cleaned-up during shutdown of the container.
How would this work? The container would start, call the ENTRYPOINT or CMD as usual and consume the init script to get the DB filled. Then at the moment the container is stopped, the same script will be notified with a signal and manually drop the database content.
CMD ["cleanAndRun.sh"]
A sketched script "cleanAndRun.sh" taken from the Docker documentation and modified for your needs. Please remember it is a sketch only and needs modifications:
#!/bin/sh
# The script that is called in the trap must also stop the DB, so below call to
# dropdb is not enough, it just demonstrates how to call anything in the stop-container scenario!
trap "dropdb <params>" HUP INT QUIT TERM
# init your DB -every- time container starts
<init script to import to clean and import dump>
# start your postgres DB
postgres
echo "exited $0"

How Sync between replica sets in mongoDB achieved. Automatic or Manual triggering Needed?

Going through MongoDB documentation I missed clarity regarding the above question. Please provide commands if any , are used for manual triggering a sync among replica sets.
Replica sets are always automatically synced, but if you need to do a manual re-sync you have a couple of options as explained here https://docs.mongodb.org/manual/tutorial/resync-replica-set-member/
So basically you can stop the member you want to re-sync and empty its data directory. When you will restart it, Mongo will automatically start the sync process.
Stop the member’s mongod instance. To ensure a clean shutdown, use the db.shutdownServer() method from the mongo shell or on Linux systems, the mongod --shutdown option.
Delete all data and sub-directories from the member’s data directory. By removing the data dbPath, MongoDB will perform a complete resync. Consider making a backup first.
Another way MongoDB suggests is that you can copy the data from another member, done that MongoDB will start syncing the rest of the data with the master. Similar to the first solution but faster because you have some data yet and you don't need to start from scratch.

MongoDB does not see database or collections after migrating from localhost to EBS volume

full disclosure: I am a complete n00b to mongodb and am just getting my feet wet with using mongo on AWS (but have 2 decades working in IT so not a total n00b :P)
I setup an EBS volume and installed mongo on a EC2 instance.
My problem is that I provisioned too small an EBS volume initially.
When I realized this I:
created a new larger EBS volume
mounted it on the server
stopped mongo ( $ sudo service mongod stop)
copied all my /data/db files into the new volume
updated conf files and fstab (dbpath, logpath, pidfilepath and mount point for new volume respectively)
restarted mongod
When I execute: $ sudo service mongod start
- everything runs fine.
- I can futz about in the admin and local databases.
However, when I run the mongos command: > show databases
- I only see the admin and local.
- the database I copied into the new volume (named encompass) is not listed.
I still have a working local copy of the database so my data is not lost, just not sure how best to move mongo data around other than:
A) start all over importing the data to the db on the AWS server (not what I would like since it is already loaded in my local db)
B) copy the local db to the new EBS volume again (also not preferred but better that importing all the data from scratch again!).
NOTE: originally I secure copied the data into the EBS volume with this command:
$ scp -r -i / / ec2-user#:/
then when I copied between volumes I used a vanilla cp command.
Did I miss something here?
The best I could find on SO and the web was this process (How to scale MongoDB?), but perhaps I missed a switch in a command or a nuance to the process that rendered my database files inert/useless?
Any idea how I can get mongo to see my other database files and collections?
Or did I make a irreversible error somewhere along the way?
Thanks for any help!!
Are you sure you conf file is being loaded? You can, for a test, load mongod.exe and specify the path directly to your db for a test, i.e.:
mongod --dbpath c:\mongo\data\db (unix syntax may vary a bit, this is windows)
run this from the command line and see what, if anything, mongo complains about.
A database has a very finicky algorithm that is easy to damage. Before copying from one database to another you should probably seed the database, a few dummy entries will tell you the database is working.