How to restore a single mongodb database with oplogReplay?

How to restore a single mongodb database with oplogReplay? - mongodb

I'm having trouble finding if this is even a supported operation. I found things that suggest it didn't use to be, but I'm not getting any logs that indicate it's not supported (just confusing logs) and I wasn't able to find anything in mongo's docs that indicate it's not allowed.
For reference:
OS: Centos6.6
Mongod: v3.0.2
Mongo Shell: v3.0.2
Mongodump: v3.0.2
Mongorestore: v3.0.2
Here's the command I'm running to create my dump (I am using auth):
mongodump -u username -p password --authenticationDatabase admin --oplog
Here's the original file structure after a dump.
└── dump
├── oplog.bson
├── admin
│   ├── system.users.bson
│   ├── system.users.metadata.json
│   ├── system.version.bson
│   └── system.version.metadata.json
├── dogs
│   ├── tails.bson
│   └── tails.metadata.json
└── mydata
├── objects.bson
├── objects.metadata.json
├── fs.chunks.bson
├── fs.chunks.metadata.json
├── fs.files.bson
├── fs.files.metadata.json
├── configuration.bson
└── configuration.metadata.json
I've tried a few different variations of restore to get what I want, but they each seem a little off. After reading the following in mongo's docs concerning mongorestore:
--db does not control which BSON files mongorestore restores. You must use the mongorestore path option to limit that restored data.
it seems to me that I should be able to copy the oplog.bson into the particular database's folder that I want to restore and then run the following from inside dump/:
mongorestore -u username -p password --authenticationDatabase admin --oplogReplay --db dogs dogs
I found this confusing because it gives these logs:
2015-05-13T22:10:12.694+0000 building a list of collections to restore from dogs dir
2015-05-13T22:10:12.695+0000 reading metadata file from dogs/tails.metadata.json
2015-05-13T22:10:12.695+0000 restoring dogs.oplog from file dogs/oplog.bson
2015-05-13T22:10:12.696+0000 no indexes to restore
2015-05-13T22:10:12.696+0000 finished restoring dogs.oplog
2015-05-13T22:10:12.696+0000 restoring dogs.tails from file dogs/tails.bson
2015-05-13T22:10:12.697+0000 restoring indexes for collection dogs.tails from metadata
2015-05-13T22:10:12.697+0000 finished restoring dogs.tails
2015-05-13T22:10:12.697+0000 replaying oplog
2015-05-13T22:10:12.697+0000 no oplog.bson file in root of the dump directory, skipping oplog application
2015-05-13T22:10:12.697+0000 done
The first part about dogs.oplog makes it seem as if things are working, however the later message about oplog confuses me.
No matter which variations of directories and paths that I try I can't seem to satiate this message in particular:
2015-05-13T22:10:12.697+0000 replaying oplog
2015-05-13T22:10:12.697+0000 no oplog.bson file in root of the dump directory, skipping oplog application
Does this mean my oplog replay isn't happening? Is my point-in-time backup / restore still doing what I expect? I recall seeing some tickets about improving the log messages of mongotools, perhaps this is just poor logging?

Related

enabling compression during "gsutil cp" download

The gsutil command supports options to enable compression during transport only (with -J or -j ext), allowing you to compress during transport only, thereby saving network bandwidth and speeding up the copy itself.
Is there an equivalent way to do this when downloading from GCS to local machine? That is, if I have an uncompressed text file at gs://foo/bar/file.json, is there some equivalent to -J that will compress the contents of "file.json" during transport only?
The goal is to speed up a copy from remote to local, and not just for a single file but dozens. I'm already using -m to do parallel copies, but would like to transmit compressed data to reduce network transfer time.
I didn't find anything relevant in the docs, and including -J doesn't appear to do anything during downloads. I've tried the following, but the "ETA" numbers printed by gsutil look identical whether -J is present or absent:
gsutil -cp -J gs://foo/bar/file.json .

This feature is not yet available.
As an alternative, you will need to implement your own solution for compressing, be it an App Engine, Cloud Function or Cloud Run. Your application will need to compress your files while they are on Cloud Storage.
The ideal solution would be to use -m along with the compressed files. This entails that you're making parallel copies of compressed files. Consider the following structures. If [1] is how you are doing your you are downloading each file individually. If you look at [2], you would only download the compressed files.
[1]
Bucket Foo
├───FooScripts
│ ├───SysWipe.sh
│ └───DropAll.sql
├───barconfig
│ ├───barRecreate.sh
│ └───reGenAll.sql
├───Baz
│ ├───BadBaz.sh
│ └───Drop.sh
...
[2]
Bucket Foo
├───FooScripts
│ ├───SysWipe.sh
│ └───DropAll.sql
│ ├───FooScripts.zip
├───barconfig
│ ├───barRecreate.sh
│ └───reGenAll.sql
│ ├───barconfig.zip
├───Baz
│ ├───BadBaz.sh
│ └───Drop.sh
│ ├───Baz.zip
...
Once your data has been downloaded, you should consider deleting the compressed files as they are no longer needed for your operations and you will be charged for them. Alternatively, you can raise a Feature Request on the Public Issue Tracker, which will be sent to the Cloud Storage team, who can look into the feasibility of this request.

Fatal error when trying to start postgresql container in docker

I'm deploying Dataverse using docker.
The containers were working nicely, however, few days ago and without any changes, when I use docker-compose up -d the db container (postgresql) does not start. This is the error when use docker logs db.
PostgreSQL Database directory appears to contain a database; Skipping initialization
LOG: could not create IPv6 socket: Address family not supported by protocol
FATAL: could not open directory "pg_tblspc": No such file or directory
LOG: database system is shut down*
Can anyone help me please ?

What is the contents of the PGDATA directory and are you able to share the docker-compose section of the database?
According to your logs, one of the important directories underneath PGDATA is missing. This can happen if the data is not persisted in general and only the empty data directory is given. I assume that you have a volume mounted to the docker image. So showing the contents of that directory (PGDATA) will help us understand if only pg_tblspc is missing or everything.
If it is only pg_tblspc, we already have threads that discuss a recovery on SO. I don't know the reason, but understanding what exactly is missing (pg_tblspc or everything) would be important to understand the problem.

I had a similar issue when restart my container:
ERROR: could not open file "pg_tblspc/18684/PG_9.6_201608131/16385/26851": No such file or directory
Inspecting the container internally, I found the following:
├── pg_tblspc
│   └── 18684 -> /var/lib/postgres/pg_dirs/pg_indices
In my docker-compose configuration I only had the volumen:
volumes:
- postgres/data:/var/lib/postgresql/data
so when restart the container all data referenced for 18684 -> /var/lib/postgres/pg_dirs/pg_indices is lost
to avoid this issue, I add this path to the volumes like this:
volumes:
- postgres/data:/var/lib/postgresql/data
- postgres/pg_indices:/var/lib/postgres/pg_dirs/pg_indices
The above does not restore the already lost data,(I had to restore my entire database), but prevents further loss

How to export the database of the mongodb with gzip file extension

I'm using mongodb for saving the data for my application and I want to backup of that database in gzip file I searched for it and I found question posted by the other users
link https://stackoverflow.com/questions/24439068/tar-gzip-mongo-dump-like-mysql
link https://stackoverflow.com/questions/52540104/mongodump-failed-bad-option-can-only-dump-a-single-collection-to-stdout
I used these commands but that will not me the expected output I want the command that will create my database gzip compress file and using extraction I will restore that database folder into the mongodb
currently I'm using this below command
mongodump --db Database --gzip --archive=pathDatabase.gz
which will create a compression of .gz while I extract it it will show me nothing.
Can you please give me a command that I will use it or any suggestions will appreciated.

When you use mongodump --db Database --gzip --archive=pathDatabase.gz You will create 1 archive file (it does not create a folder) for the specified DB and compress it with gzip. Resulting file will be pathDatabase.gz in your current directory.
To restore from such file, you'd do this
mongorestore --gzip --archive=pathDatabase.gz
This will restore the db "Database" with all its collection.
You can check out these MongoDB documentation pages for more info
Dump: https://docs.mongodb.com/manual/reference/program/mongodump/
Restore: https://docs.mongodb.com/manual/reference/program/mongorestore/
Edit: Removed --db flag from restore command as it is not supported when used with --archive.

mongodump --archive=/path/to/archive.gz --gzip will actually create an archive which interleaves the data from all your collections in a single file. Each block of data is then compressed using gzip.
That file can not be read by any other tool than mongorestore, and you need to use identical flags (i.e. mongorestore --archive=/path/to/archive.gz --gzip), which you can use to restore your dump on another deployment.
The resulting archive can not be extracted using gunzip or tar.
If you need to change the target namespace, then you should use the --nsFrom, --nsTo and --nsInclude options in order to use a different database name.

Scrapy MongoDB and Elasticsearch Synchronization

I'm using Scrapy to fetch data from websites and mongodb for persistence and elasticsearch for searching purposes.
My problem is that when Scrapy inserts data to Mongodb , Elasticsearch is not aware even with the listener set to inserts, updates, and deletes.
Should I add a new plugin for Scrapy to communicate directly with Elasticsearch , if so , why doesn't the listener listen to what happens to the database? Thanks!

The rivers in elasticsearch is deprecated.
Try this you can use transporter to sync data between mongodb and elasticsearch.
How To Sync Transformed Data from MongoDB to Elasticsearch with Transporter
Installing Go
In order to install the compose transporter we need to install Go language.
sudo apt-get install golang
Create a folder for Go from your $HOME directory:
mkdir ~/go; echo "export GOPATH=$HOME/go" >> ~/.bashrc
Update your path:
echo "export PATH=$PATH:$HOME/go/bin:/usr/local/go/bin" >> ~/.bashrc
Now go to the $GOPATH directory and create the subdirectories src, pkg and bin. These directories constitute a workspace for Go.
cd $GOPATH
mkdir src pkg bin
Installing Transporter
Now create and move into a new directory for Transporter. Since the utility was developed by Compose, we'll call the directory compose.
mkdir -p $GOPATH/src/github.com/compose
cd $GOPATH/src/github.com/compose
This is where compose/transporter will be installed.
Clone the Transporter GitHub repository:
git clone https://github.com/compose/transporter.git
Move into the new directory:
cd transporter
Take ownership of the /usr/lib/go directory:
sudo chown -R $USER /usr/lib/go
Make sure build-essential is installed for GCC:
sudo apt-get install build-essential
Run the go get command to get all the dependencies:
go get -a ./cmd/...
This step might take a while, so be patient. Once it's done you can build Transporter.
go build -a ./cmd/...
If all goes well, it will complete without any errors or warnings. Check that Transporter is installed correctly by running this command:
transporter
So the installation is complete.
Create some sample data in mongoDB.
Then we have to configure the transporter.
Transporter requires a config file (config.yaml), a transform file (myTransformation.js), and an application file (application.js) to migrate our data from MongoDB to Elasticsearch.
Move to the transporter directory:
cd ~/go/src/github.com/compose/transporter
Config File
You can take a look at the example config.yaml file if you like. We're going to back up the original and then replace it with our own contents.
mv test/config.yaml test/config.yaml.00
The new file is similar but updates some of the URIs and a few of the other settings to match what's on our server. Let's copy the contents from here and paste into the new config.yaml file. Use nano editor again.
nano test/config.yaml
Copy the contents below into the file. Once done, save the file as described earlier.
# api:
# interval: 60s
# uri: "http://requestb.in/13gerls1"
# key: "48593282-b38d-4bf5-af58-f7327271e73d"
# pid: "something-static"
nodes:
localmongo:
type: mongo
uri: mongodb://localhost/foo
tail: true
es:
type: elasticsearch
uri: http://localhost:9200/
timeseries:
type: influx
uri: influxdb://root:root#localhost:8086/compose
debug:
type: file
uri: stdout://
foofile:
type: file
uri: file:///tmp/foo
Application File
Now, open the application.js file in the test directory.
nano test/application.js
Replace the sample contents of the file with the contents shown below:
Source({name:"localmongo", namespace:"foo.bar"})
.transform({filename: "transformers/addFullName.js", namespace: "foo.bar"})
.save({name:"es", namespace:"foo.bar"});
Transformation File
Let's say we want the documents being stored in Elasticsearch to have another field called fullName. For that, we need to create a new transform file, test/transformers/addFullName.js.
nano test/transformers/addFullName.js
Paste the contents below into the file. Save and exit as described earlier.
module.exports = function(doc) {
console.log(JSON.stringify(doc)); //If you are curious you can listen in on what's changed and being copied.
doc._id = doc.data._id['$oid'];
doc["fullName"] = doc["firstName"] + " " + doc["lastName"];
return doc
}
The first line is necessary to tackle the way Transporter handles MongoDB's ObjectId() field. The second line tells Transporter to concatenate firstName and lastName of mongoDB to form fullName of ES.
This is a simple transformation for the example, but with a little JavaScript you can do more complex data manipulation as you prepare your data for searching.
Executing Transporter:
If you have a simple standalone instance of MongoDB, it won't be being replicated, there'll be no oplog and the Transporter won't be able to detect the changes. To convert a standalone MongoDB to a single node replica set you'll need to start the server with --replSet rs0 (rs0 is just a name for the set) and when running, log in with the Mongo shell and run rs.initiate() to get the server to configure itself.
make sure you are in the transporter directory:
cd ~/go/src/github.com/compose/transporter
Execute the following command to sync the data:
transporter run --config ./test/config.yaml ./test/application.js

The synchronizing mechanism that you are looking for is called rivers in Elasticsearch.
In this specific case, you should sync the specific Mongodb collection that you are using to save your Scrapy data with the Elasticsearch index.
For details about how to proceed you should check out the following links:
setting up the mongodb river
mongodb river plugin
Also I recommend checking out the answers on the Elasticsearch tags here on Stackoverflow. I have found detailed answers for most of the common problems regarding implementation details.

How can I backup a MongoDB GridFS database the easiest way?

Like the title says, I have a MongoDB GridFS database with a whole range of file types (e.g., text, pdf, xls), and I want to backup this database the easiest way.
Replication is not an option. Preferably I'd like to do it the usual database way of dumping the database to file and then backup that file (which could be used to restore the entire database 100% later on if needed). Can that be done with mongodump? I also want the backup to be incremental. Will that be a problem with GridFS and mongodump?
Most importantly, is that the best way of doing it? I am not that familiar with MongoDB, will mongodump work as well as mysqldump does with MySQL? Whats the best practice for MongoDB GridFS and incremental backups?
I am running Linux if that makes any difference.

GridFS stores files in two collections: fs.files and fs.chunks.
More information on this may be found in the GridFS Specification document:
http://www.mongodb.org/display/DOCS/GridFS+Specification
Both collections may be backed up using mongodump, the same as any other collection. The documentation on mongodump may be found here:
http://www.mongodb.org/display/DOCS/Import+Export+Tools#ImportExportTools-mongodump
From a terminal, this would look something like the following:
For this demonstration, my db name is "gridFS":
First, mongodump is used to back the fs.files and fs.chunks collections to a folder on my desktop:
$ bin/mongodump --db gridFS --collection fs.chunks --out /Desktop
connected to: 127.0.0.1
DATABASE: gridFS to /Desktop/gridFS
gridFS.fs.chunks to /Desktop/gridFS/fs.chunks.bson
3 objects
$ bin/mongodump --db gridFS --collection fs.files --out /Desktop
connected to: 127.0.0.1
DATABASE: gridFS to /Desktop/gridFS
gridFS.fs.files to /Users/mbastien/Desktop/gridfs/gridFS/fs.files.bson
3 objects
Now, mongorestore is used to pull the backed-up collections into a new (for the purpose of demonstration) database called "gridFScopy"
$ bin/mongorestore --db gridFScopy --collection fs.chunks /Desktop/gridFS/fs.chunks.bson
connected to: 127.0.0.1
Thu Jan 19 12:38:43 /Desktop/gridFS/fs.chunks.bson
Thu Jan 19 12:38:43 going into namespace [gridFScopy.fs.chunks]
3 objects found
$ bin/mongorestore --db gridFScopy --collection fs.files /Desktop/gridFS/fs.files.bson
connected to: 127.0.0.1
Thu Jan 19 12:39:37 /Desktop/gridFS/fs.files.bson
Thu Jan 19 12:39:37 going into namespace [gridFScopy.fs.files]
3 objects found
Now the Mongo shell is started, so that the restore can be verified:
$ bin/mongo
MongoDB shell version: 2.0.2
connecting to: test
> use gridFScopy
switched to db gridFScopy
> show collections
fs.chunks
fs.files
system.indexes
>
The collections fs.chunks and fs.files have been successfully restored to the new DB.
You can write a script to perform mongodump on your fs.files and fs.chunks collections periodically.
As for incremental backups, they are not really supported by MongoDB. A Google search for "mongodb incremental backup" reveals a good mongodb-user Google Groups discussion on the subject:
http://groups.google.com/group/mongodb-user/browse_thread/thread/6b886794a9bf170f
For continuous back-ups, many users use a replica set. (Realizing that in your original question, you stated that this is not an option. This is included for other members of the Community who may be reading this response.) A member of a replica set can be hidden to ensure that it will never become Primary and will never be read from. More information on this may be found in the "Member Options" section of the Replica Set Configuration documentation.
http://www.mongodb.org/display/DOCS/Replica+Set+Configuration#ReplicaSetConfiguration-Memberoptions

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse