How can I import bson and json files into MongoDB? - mongodb

I have following bson and json files from https://github.com/Apress/def-guide-to-mongodb/tree/master/9781484211830/The%20Definitive%20Guide%20to%20MongoDB
$ ls .
aggregation.bson aggregation.metadata.json mapreduce.bson mapreduce.metadata.json storage.bson text.json
How can I import them into MongoDB?
I tried to import each of them as a collection, but failed:
$ mongorestore -d test -c aggregation
2018-07-18T01:44:25.376-0400 the --db and --collection args should only be used when restoring from a BSON file. Other uses are deprecated and will not exist in the future; use --nsInclude instead
2018-07-18T01:44:25.377-0400 using default 'dump' directory
2018-07-18T01:44:25.377-0400 see mongorestore --help for usage information
2018-07-18T01:44:25.377-0400 Failed: mongorestore target 'dump' invalid: stat dump: no such file or directory
I am not sure if I specify the file aggregation.bson correctly, but the above command is what I learned from a similar example in a book.
Thanks.
UPDATE
In the following, why did the first fail and the second succeed? Which command shall I use?
$ mongoimport -d test -c aggregation --file aggregation.bson
2018-07-18T09:45:44.698-0400 connected to: localhost
2018-07-18T09:45:44.720-0400 Failed: error processing document #1: invalid character 'ยบ' looking for beginning of value
2018-07-18T09:45:44.720-0400 imported 0 documents
$ mongoimport -d test -c aggregation --file aggregation.metadata.json
2018-07-18T09:46:05.058-0400 connected to: localhost
2018-07-18T09:46:05.313-0400 imported 1 document

mongoimport --db dbName --collection collectionName --type json --file fileName.json
Update:
C:\Program Files\MongoDB\Server\4.0\bin>mongorestore -d test -c aggregation aggregation.bson
2018-07-19T10:28:39.963+0300 checking for collection data in aggregation.bson
2018-07-19T10:28:40.099+0300 restoring test.aggregation from aggregation.bson
2018-07-19T10:28:41.113+0300 no indexes to restore
2018-07-19T10:28:41.113+0300 finished restoring test.aggregation (1000 documents)
2018-07-19T10:28:41.113+0300 done
So I tried it and it worked fine for me do you have the file in your bin folder or maybe the command you used wasn't complete?
db.aggregation.find().pretty().limit(2)
{
"_id" : ObjectId("51de841747f3a410e3000001"),
"num" : 1,
"color" : "blue",
"transport" : "train",
"fruits" : [
"orange",
"banana",
"kiwi"
],
"vegetables" : [
"corn",
"broccoli",
"potato"
]
}
{
"_id" : ObjectId("51de841747f3a410e3000005"),
"num" : 5,
"color" : "yellow",
"transport" : "plane",
"fruits" : [
"lemon",
"cherry",
"dragonfruit"
],
"vegetables" : [
"mushroom",
"capsicum",
"zucchini"
]
}

Related

mongodb - issue with same file name in fs.files GridFS

I have multiple files in fs.files collection in mongodb GridFS with same name but for different Users.
When I use below query:
db.fs.files.find({"metadata.folder" : { "$exists": false,"metadata.msgid" : { "$exists": false}},{"metadata.user":1, "_id":0, "filename":1}).pretty()
I get result like :
{ "filename" : "standard.wav", "metadata" :
{ "user" : "101" }
}
{ "filename" : "standard.wav", "metadata" :
{ "user" : "100" }
}
{ "filename" : "standard.wav", "metadata" :
{ "user" : "104" }
}
Files are different for all Users but having same name.
So when I used following commands to store files in local system for different users, it always store same file for all Users.
For User 101 :
mongofiles --uri MONGO_DSN -d test -l /home/user/101/standard.wav get standard.wav
For User 100 :
mongofiles --uri MONGO_DSN -d test -l /home/user/100/standard.wav get standard.wav
For User 104 :
mongofiles --uri MONGO_DSN -d test -l /home/user/104/standard.wav get standard.wav
It should store different files for different users.
Thanks in advance.
I have solved it using get_id parameter instead of using get.
So my command now :
For User 101 :
mongofiles --uri MONGO_DSN -d test -l /home/user/101/standard.wav get_id $object101
For User 100 :
mongofiles --uri MONGO_DSN -d test -l /home/user/100/standard.wav get_id $object100
For User 104 :
mongofiles --uri MONGO_DSN -d test -l /home/user/104/standard.wav get_id $object104
Here my $object101, $object100, $object104 are extended JSON _id of the object in GridFS.
References :
mongofiles: get file by _id in addition to filename
MongoFiles

Export array of documents from MongoDB in csv

I'm working on a java program to pass from MongoDB to Neo4j.
I have to export some Mongo documents in a csv file.
I have, for example, this document:
"coached_Team" : [
{
"team_id" : "Pal.00",
"in_charge" : {
"from" : {
"day" : 25,
"month" : 9,
"year" : 2013
}
},
"matches" : 75
}
]
I have to export in csv. I read some other questions, for example this and I used that tip to export my document.
To export in csv I use this command:
Z:\path\to\Mongo\3.0\bin>mongoexport --db <database> --collection
<collection> --type=csv --fields coached_Team.0.team_id,coached_Team.0.in_charge.from.day,
coached_Team.0.in_charge.from.month,coached_Team.0.in_charge.from.year,
coached_Team.0.matches --out "C:\path\to\output\file\output.csv
But, it did not work for me:

Importing json from file into mongodb using mongoimport

I have my json_file.json like this:
[
{
"project": "project_1",
"coord1": 2,
"coord2": 10,
"status": "yes",
"priority": 7
},
{
"project": "project_2",
"coord1": 2,
"coord2": 10,
"status": "yes",
"priority": 7
},
{
"project": "project_3",
"coord1": 2,
"coord2": 10,
"status": "yes",
"priority": 7
}
]
When I run the following command to import this into mongodb:
mongoimport --db my_db --collection my_collection --file json_file.json
I get the following error:
Failed: error unmarshaling bytes on document #0: JSON decoder out of sync - data changing underfoot?
If I add the --jsonArray flag to the command I import like this:
imported 3 documents
instead of one document with the json format as shown in the original file.
How can I import json into mongodb with the original format in the file shown above?
The mongoimport tool has an option:
--jsonArray treat input source as a JSON array
Or it is possible to import from file containing same data format as the result of db.collection.find() command. Here is example from university.mongodb.com courseware some content from grades.json:
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb577" }, "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb578" }, "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb579" }, "student_id" : 0, "type" : "homework", "score" : 14.8504576811645 }
As you can see, no array used and no comma delimiters between documents either.
I discover, recently, that this complies with the JSON Lines text format.
Like one used in apache.spark.sql.DataFrameReader.json() method.
Side note:
$ python -m json.tool --sort-keys --json-lines < data.jsonl
also can handle this format
see demo and details here
Perhaps the following reference from the MongoDB project blog could help you gain insight on how arrays work in Mongo:
https://blog.mlab.com/2013/04/thinking-about-arrays-in-mongodb/
I would frame your import otherwise, and either:
a) import the three different objects separately into the collection as you say, using the --jsonArray flag; or
b) encapsulate the complete array within a single object, for example in this way:
{
"mydata":
[
{
"project": "project_1",
...
"priority": 7
}
]
}
HTH.
I faced opposite problem today, my conclusion would be:
If you wish to insert array of JSON objects at once, where each array entry shall be treated as separate dtabase entry, you have two options of syntax:
Array of object with valid coma positions & --jsonArray flag obligatory
[
{obj1},
{obj2},
{obj3}
]
Use file with basically incorrect JSON formatting (i.e. missing , between JSON object instances & without --jsonArray flag
{obj1}
{obj2}
{obj3}
If you wish to insert only an array (i.e. array as top-level citizen of your database) I think it's not possible and not valid, because mongoDB by definition supports documents as top-level objects which are mapped to JSON objects afterwards. In other words, you must wrap your array into JSON object as ALAN WARD pointed out.
Error:
$ ./mongoimport --db bookings --collection user --file user.json
2021-06-12T18:52:13.256+0530 connected to: localhost
2021-06-12T18:52:13.261+0530 Failed: error unmarshaling bytes on document #0: JSON decoder out of sync - data changing underfoot?
2021-06-12T18:52:13.261+0530 imported 0 documents
Solution: When your JSON data contain an array of objects then we need to use --jsonArray while import with the command like mentioned below
$ ./mongoimport --db bookings --collection user --file user.json --jsonArray
2021-06-12T18:53:44.164+0530 connected to: localhost
2021-06-12T18:53:44.532+0530 imported 414 documents

tar gzip mongo dump like MySQL

Is there anyway to tar gzip mongo dumps like you can do with MySQL dumps?
For example, for mysqldumps, you can write a command as such:
mysqldump -u <username> --password=<password> --all-databases | gzip > all-databases.`date +%F`.gz
Is there an equivalent way to do the same for mongo dumps?
For mongo dumps I run this command:
mongodump --host localhost --out /backup
Is there a way to just pipe that to gzip? I tried, but that didn't work.
Any ideas?
Version 3.2 introduced gzip and archive option:
mongodump --db <yourdb> --gzip --archive=/path/to/archive
Then you can restore with:
mongorestore --gzip --archive=/path/to/archive
Update (July 2015):
TOOLS-675 is now marked as complete, which will allow for dumping to an archive format in 3.2 and gzip will be one of the options in the 3.2 versions of the mongodump/mongorestore tools. I will update with the relevant docs once they are live for 3.2
Original answer (3.0 and below):
You can do this with a single collection by outputting mongodump to stdout, then piping it to a compression program (gzip, bzip2) but you will only get data (no index information) and you cannot do it for a full database (multiple collections) for now. The relevant feature request for this functionality is SERVER-5190 for upvoting/watching purposes.
Here is a quick sample run through of what is possible, using bzip2 in this example:
./mongo
MongoDB shell version: 2.6.1
connecting to: test
> db.foo.find()
{ "_id" : ObjectId("53ad8a3eb74b5ae2ff0ec93a"), "a" : 1 }
{ "_id" : ObjectId("53ad8ba445be9c4f7bd018b4"), "a" : 2 }
{ "_id" : ObjectId("53ad8ba645be9c4f7bd018b5"), "a" : 3 }
{ "_id" : ObjectId("53ad8ba845be9c4f7bd018b6"), "a" : 4 }
{ "_id" : ObjectId("53ad8baa45be9c4f7bd018b7"), "a" : 5 }
>
bye
$ ./mongodump -d test -c foo -o - | bzip2 - > foo.bson.bz2
connected to: 127.0.0.1
$ bunzip2 foo.bson.bz2
$ ./bsondump foo.bson
{ "_id" : ObjectId( "53ad8a3eb74b5ae2ff0ec93a" ), "a" : 1 }
{ "_id" : ObjectId( "53ad8ba445be9c4f7bd018b4" ), "a" : 2 }
{ "_id" : ObjectId( "53ad8ba645be9c4f7bd018b5" ), "a" : 3 }
{ "_id" : ObjectId( "53ad8ba845be9c4f7bd018b6" ), "a" : 4 }
{ "_id" : ObjectId( "53ad8baa45be9c4f7bd018b7" ), "a" : 5 }
5 objects found
Compare that with a straight mongodump (you get the same foo.bson but the extra foo.metadata.json describing the indexes is not included above):
$ ./mongodump -d test -c foo -o .
connected to: 127.0.0.1
2014-06-27T16:24:20.802+0100 DATABASE: test to ./test
2014-06-27T16:24:20.802+0100 test.foo to ./test/foo.bson
2014-06-27T16:24:20.802+0100 5 documents
2014-06-27T16:24:20.802+0100 Metadata for test.foo to ./test/foo.metadata.json
$ ./bsondump test/foo.bson
{ "_id" : ObjectId( "53ad8a3eb74b5ae2ff0ec93a" ), "a" : 1 }
{ "_id" : ObjectId( "53ad8ba445be9c4f7bd018b4" ), "a" : 2 }
{ "_id" : ObjectId( "53ad8ba645be9c4f7bd018b5" ), "a" : 3 }
{ "_id" : ObjectId( "53ad8ba845be9c4f7bd018b6" ), "a" : 4 }
{ "_id" : ObjectId( "53ad8baa45be9c4f7bd018b7" ), "a" : 5 }
5 objects found
Export Mongodb as
mongodump --host <host-ip> --port 27017 --db <database> --authenticationDatabase admin --username <username> --password <password> --gzip --archive > dump_`date "+%Y-%m-%d"`.gz
Import as
mongodump --host <host-ip> --port 27017 --db <database> --authenticationDatabase admin --username <username> --password <password> --gzip --archive=mongodump.gz
If you want to do it passing uri for your MongoDB replica set cluster
Dump:
mongodump --uri='mongodb://user:pass#primary_host,secondary_host/<db-name>?replicaSet=<replica-name>&authSource=admin' --gzip --archive > dump_`date "+%Y-%m-%d"`.gz
Restore:
mongorestore --uri='mongodb://user:pass#primary_host,secondary_host/<db-name>?replicaSet=<replica-name>&authSource=admin' --gzip --archive=<dump-file>.gz

Export Mongo Query

I have the following DB:
"_id" : ...,
"index" : [
...,
...,
...
],
"value" : {
...,
...,
...,
...,
...,
...
}
I want to export all record for which the second element of index is "London" so I used:
mongoexport --db DbReport --collection cityconsumption --query {'index.1':"London"} --csv --out /tmp/me/Query1.csv --username 'DBReport' --password '...' --fields 'index,value'
but I got an error:
assertion: 10340 Failure parsing JSON string near: index.1:1^
could you please help me.
Thanks,
Amir