Rescue data from mongodb instance

Rescue data from mongodb instance - mongodb

recently I've got some problems with my old hard drive and windows so I switched over to a new one. I can still access any files on the old hard drive and I still have an installation of mongodb on there with some pretty important data (should have done a backup sometime).
What would be the smartest way to get this data and transfer it to my new instance? Just copying the data files over? This "old" instance is btw not running and its not possible for me to start it again.

You could get the "old" instance running again by running mongo on the system and pointing the DBPath to the folder on the old hard drive.
It looks like copying the files over is a valid option though.
See https://docs.mongodb.com/manual/core/backups/#back-up-with-cp-or-rsync

Related

Docker - under Windows 10 Pro - Need to map volumes and have them work, not quietly fail

I ran various containers on Two different Windows 10 Pro machines, and thought that I had the data drives mapped correctly, but now I'm finding out that it isn't writing the data there at all. One example was Mongo db, where I mapped /mongodb/database:/data/db I upgraded docker, and when it restarted mongodb.. POOF! no data, I thought that was weird and looked in /mongodb/database and the directory is empty. Thankfully, the app is still in the development phase, and not critical that the data was lost...
the line from the docker compose file:
volumes:
- /mongodb/database:/data/db
Different machine:
I installed Gogs/gogs image, mapping the data:
docker run --name=Gogs-Git -p 10022:22 -p 10080:3000 -v /var/gogs:/Docker/Gogs-GitServer/Data gogs/gogs
Seemed to work perfectly, so I was thinking everything was fine, I pushed a Repo up to it.. and today, I looked at \Docker\Gogs-gitserver\data and no files... so where did it write the data?
I also installed TeamCity, mapping that data.. nope, it has no logs, no data...
This feature seems to just not work at all. I found a reference from 2016 saying I need to look at the 'shared' tab (below general),and check C: to be shared, but well, no, that isn't a tab, so it isn't that.
There is no way someone would write a system that just quietly wrote the data some other place, or didn't bother actually mapping it without giving an error - that would be nuts.
So, there must be some other explanation... One of the machines has Hyper-V enabled in the BIOS, the other one doesn't even support it as far as I know.
I think some of the images are Linux, and some are Windows (TeamCity I'm pretty sure is)
OK, this is interesting... If I look at the volumes, and enter one that is in use, I get this:
The Target looks about like the right path, but I'm not sure about the /backup and the /data on the last two lines, if these are supposed to be directories under that, they don't exist, but if I click on the data tab, I can see the data, it is in Docker, hidden and not shared, in spite of there being a 'target' that points at the right directory... how to I get it to start writing this data correctly to that folder??

I've not confirmed this yet with the above configuration, but I found that for other containers, I needed to specify the path as 'c:/data/MongoDb/Database' when I created the container using that as the path, it worked and I have data there now. I just need to go back and fix all these VMs so they have their data correctly...

Default UNIX permissions of Mongodb files in the hard drive

I noticed that the files in the data/ directory, hosting the databases and collections, are the r permission for others.
So basically, anyone can read the data! Isn't it strange, or is it something I'm missing?
I found no solution to change this behavior in the mondodb configuration (ubuntu 18.04). When you search mongodb file permissions, you will find threads about user permissions inside the database.
Thank you!

Im going to assume you're using WiredTiger, the default storage engine for mongo. Either way, the same concept applies.
You'll see the .wt files (the ones you're talking about), although readable by permission, are not very readable to the eye. Try look for yourself with less <example>.wt.
They're stored in a specific format, with compression and some encryption. Realistically, they shouldn't be able to be retrieved from outside of your server - and your users in the server should trusted, or given limited access to the locations of these files.
In short, if you apply the proper policies, and keep your actual database and server secure, then this is normal and expected. I hope this makes sense.

When you launch mongod you need to specify a path to the data directory, and this directory must already exist.
You can set the permissions on this directory to deny world-read access by running:
chmod o-rwx /path/to/data/dir
Normally this would be done prior to the first start of mongod.
Once this is done, none of the files in the data directory will be world-readable regardless of their individual permissions.
MongoDB does not need to have a provision to do this because it never creates the data directory.
A different way of accomplishing similar end result is to use umask, but changing permissions on data directory generally would be more reliable.

Why is gsutil rsync re-downloading all our files?

We've been using gsutil -m rsync -r to keep dev and deploy boxes in sync with a GCS bucket for nearly 2 years without any problem. There are about 85k objects in the bucket.
Until recently, this worked perfectly: we'd run a deploy-box -> GCS rsync every 15 mins or so, to keep all new uploaded resource backed up, and then a GCS -> dev box rsync whenever we wanted to refresh the local dev data (running on OSX El Capitan).
Within the last couple of months, though, the GCS->dev rsync has started to bloat, downloading more and more images.
Initially I just thought "great, we're getting more resources uploaded", but it's been growing way faster than the data, until today when it seems to be downloading the whole 85k images.
I've double-checked I'm in the right place, the command is correct, the paths are correct, etc. For all that the gsutil output is scrolling by with reams and reams of "Copying..." and "Downloading..." messages, making good parallel use of our 100mbps connection, when I go to another terminal and run find . -type f | wc -l on the destination directory every 10 seconds, it shows that barely 2 or 3 new files are being added a minute. I look at modification times on files that gsutil says it's downloading right now and in the large majority they're old, plenty haven't changed in a year or more. Meaning: it's downloading all the data, using tons of time and bandwidth, all for the sake of a few hundred files.
Has something changed in recent OSX gsutil versions? Is there possibly a bug? How would I even start to go about tracking this down? Or reporting it? The newsgroups gsutil-discuss and gs-discussion have been archived, and the talk in gce-discussion is all about using gsutil from GCE instances.
Thanks!

I had a similar issue where the same files were synced over and over. I don't have that many files so you might need to check for performance but I decided to use the -c option to force using the checksum instead of mtime which was modified locally in my build process.
I think (and hope) the documentation is slightly wrong stating that
compare checksums for files if the size of source and destination as
well as mtime match
as it seems to use checksum even if mtime does not match

gsutil 4.20 (released 2016-07-20) modified the change detection algorithm for rsync. Instead of comparing only the size of the local file with its cloud counterpart, it now compares both the size and file modification time of local files. The file modification time is stored in the custom user metadata for the file when it is uploaded with rsync. If that doesn't exist the object creation time is used.

MongoDB replica set in Azure "Waiting for role to start... Calling OnRoleStart()"

I have a problem trying to implement a mongodb replica set as a worker role instance in Windows Azure. In the Windows Azure portal, one of the instances is shown as busy with the status:
Waiting for role to start... Calling OnRoleStart()
I have checked all the settings and everything seems to be ok, what could the problem be?

Denis Markelov's blog post helped me solve this problem. The solution is mainly his, however I had to take an extra step to get it to work and thought others might find it useful.
Solution from blog:
Windows Azure reuses virtual machines for roles, so after a fresh
deployment on a hard drive you can find files that were created during
previous sessions. If MongoDB was terminated improperly - there might
be a lock file ("persisted mutex" analogue), because of which MongoDB
refuses to start. It is located at the drive with a label
"WindowsAzureDrive" (say it is F:), at the path:
F:\data\mongod.lock
In the case of a production use this situation might require a
recovery procedures, but if you are just in the process of initial
setup - it is safe to remove this file, letting MongoDB to start
again.
I was having this problem and did as suggested, however I was still having the same problem. So I took a look at the log file at
C:\Resources\Directory\.MongoDB.WindowsAzure.MongoDBRole.MongodLogDir\mongod.txt
And saw that another file was also giving an error. In order to fix the problem, you also have to delete the file local.ns in the same directory as mongod.lock.

Restore full external ESENT backup

I've wrote the code that creates full backups of my ESENT database, using JetBeginExternalBackup API.
Following the MSDN guidelines, I backed up every file returned by JetGetAttachInfo and JetGetLogInfo.
I've made the backup, erased old database, and copied the backup data to the database folder.
The DB engine was unable to start, the JetInit error code is "JET_errMissingLogFile".
I've checked the backup, it only contains the database file, and "<inst>XXXXX.log" log files. It lacks the current log file (I'm using circular logging, BTW).
Is there any way to restore such backup?
I don't want to use JetExternalRestore API because it's too complex: I don't need to restore to another location, I don't understand why there're 3 input folders not 2, and I don't know the values to supply in genLow and genHigh arguments.
I do need external backups: the ESENT database is used by ASP.NET on a remote server, and I'm backing it up over the Internet.
Or, maybe there's a way to retrieve the name of the current log file, and I should just add it to the backup?
Thanks in advance!
P.S. I've got no permissions to span processes on my web server, so using eseutil.exe is not an option.

Unpack all backed up files to a single folder.
Take the name of your main database file. Replace extension to .pat. Create zero-length file with that name, e.g. database.pat.
After this simple step, call JetRestoreInstance API, it will restore the backup from that folder.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse