I'm working on a java program to pass from MongoDB to Neo4j.
I have to export some Mongo documents in a csv file.
I have, for example, this document:
"coached_Team" : [
{
"team_id" : "Pal.00",
"in_charge" : {
"from" : {
"day" : 25,
"month" : 9,
"year" : 2013
}
},
"matches" : 75
}
]
I have to export in csv. I read some other questions, for example this and I used that tip to export my document.
To export in csv I use this command:
Z:\path\to\Mongo\3.0\bin>mongoexport --db <database> --collection
<collection> --type=csv --fields coached_Team.0.team_id,coached_Team.0.in_charge.from.day,
coached_Team.0.in_charge.from.month,coached_Team.0.in_charge.from.year,
coached_Team.0.matches --out "C:\path\to\output\file\output.csv
But, it did not work for me:
Related
I am trying to export the MongoDB output to CSV format. But have trouble.
See the following document in my collection:
db.save.find().pretty();
{
"_id" : ObjectId("58884b11e1370511b89d8267"),
"domain" : "google.com",
"emails" : [
{
"email" : "f#google.com",
"first" : "James",
"Last" : "fer"
},
{
"email" : "d#gmail.com",
"first" : "dear",
"last" : "near"
}
]
}
Exporting the document to csv
C:\MongoDB\Server\bin>mongoexport.exe -d Trial -c save -o file.csv --type csv --fields domain,emails
2017-01-25T12:50:54.927+0530 connected to: localhost
2017-01-25T12:50:54.929+0530 exported 1 record
The output file is:
domain,emails
google.com,"[{""email"":""f#google.com"",""first"":""James"",""Last"":""fer""},{""email"":""d#gmail.com"",""first"":""dear"",""last"":""near""}]"
But if I import the same file, the output is different then it was in the actual collection. See the example:
> db.sir.find().pretty()
{
"_id" : ObjectId("5888529fa26b65ae310d026f"),
"domain" : "google.com",
"emails" : "[{\"email\":\"f#google.com\",\"first\":\"James\",\"Last\":\"fer\"},{\"email\":\"d#gmail.com\",\"first\":\"dear\",\"last\":\"near\"}]"
}
I do not want that extra \ in my import document. That's it. Please tell me if it is avoidable and if yes, then what should be the format of CSV to be given for import.
This is not expected format. So let me know how I can make the proper format. Kindly help me with this query.
My doc:
db.org.insert({
"id" : 28,
"organisation" : "Mickey Mouse company",
"country" : "US",
"contactpersons" : [{
"title" : "",
"typecontact" : "D",
"mobilenumber" : "757784854",
"firstname" : "Mickey",
"lastname" : "Mouse",
"emailaddress" : "mickey#mouse.com"
},
{
"title" : "",
"typecontact" : "E",
"mobilenumber" : "757784854",
"firstname" : "Donald",
"lastname" : "Duck",
"emailaddress" : "donald#duck.com"
}],
"modifieddate" : "2013-11-21T16:04:49+0100"
});
My query:
mongoexport --host localhost --db sample --collection org --type csv --fields country,contactpersons.0.firstname,contactpersons.0.emailaddress --out D:\info_docs\org.csv
By this query, I'm able to get only the first document values of the contactpersons.But, I'm trying to export the second document values also.
How can I resolve this issue ? Can anyone please help me out regarding this ...
You're getting exactly the first document in contactpersons because you are only exporting the first element of the array (contactpersons.0.firstname). mongoexport can't export several or all elements of an array, so what you need to do is to unwind the array and save it in another collection. You can do this with the aggregation framework.
First, do an $unwind of contactpersons, then $project the fields you want to use (in your example, country and contactpersons), and finally save the output in a new collection with $out.
db.org.aggregate([
{$unwind: '$contactpersons'},
{$project: {_id: 0, org_id: '$id', contacts: '$contactpersons', country: 1}},
{$out: 'aggregate_org'}
])
Now you can do a mongoexport of contacts (which is the result of the $unwind of contactpersons) and country.
mongoexport --host localhost --db sample --collection aggregate_org --type=csv --fields country,contacts.firstname,contacts.emailaddress --out D:\info_docs\org.csv
Trying to run mongoexport and having problems with my query parameter.
mongoexport -d test-copy -c collection -o /home/ubuntu/mongodb-archiving/mongodump/collection.json --query '{"created_at": {\$lte: new Date(1451577599000) } }'
Collection is:
{"created_at" : ISODate("2014-03-07T06:32:19.172Z")}
To which I can query just fine in Mongo Client.
The result in the following error:
Assertion: 10340:Failure parsing JSON string near: "created_a
You have a \ in your query. Please remove it.
--query '{"created_at": {$lte: new Date(1451577599000)}}'
You should use $date with mongoexport:
mongoexport.exe -h *HOST* -p *PORT* -q "{ 'created_at' : { '$lt' : { '$date' : '2014-03-07T06:32:19.172Z' } } }"
Remove the \$lte and change it to quoted "$lt" in your query, and the mongodump shall work fine.
Tested on mongodb 3.0.8
> use appdb
> db.testcoll.find({})
{ "_id" : 1, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
{ "_id" : 2, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
{ "_id" : 3, "created_at" : ISODate("2016-09-16T08:46:30.736Z") }
{ "_id" : 4, "created_at" : ISODate("2016-09-16T08:47:12.368Z") }
{ "_id" : 5, "created_at" : ISODate("2016-09-16T08:47:15.562Z") }
> db.testcoll.find({"created_at":{"$lt":new Date("2016-09-16")}})
{ "_id" : 1, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
{ "_id" : 2, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
> db.testcoll.find({"created_at":{"$lt":new Date(1473984000)}})
// make sure you are using millisecond version of epoch
> db.testcoll.find({"created_at":{"$lt":new Date(1473984000000)}})
{ "_id" : 1, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
{ "_id" : 2, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
Now the mongodump part :
dp#xyz:~$ mongodump -d appdb -c testcoll --query '{"created_at":{"$lt":new Date(1473984000000)}}'
2016-09-16T14:21:27.695+0530 writing appdb.testcoll to dump/appdb/testcoll.bson
2016-09-16T14:21:27.696+0530 writing appdb.testcoll metadata to dump/appdb/testcoll.metadata.json
2016-09-16T14:21:27.708+0530 done dumping appdb.testcoll (2 documents)
The mongoexport and mongodump tools require a valid JSON object for the --query parameter. From https://docs.mongodb.com/manual/reference/program/mongodump/#cmdoption--query:
--query , -q
Provides a JSON document as a query that optionally limits the documents included in the output of mongodump.
You must enclose the query in single quotes (e.g. ') to ensure that it does not interact with your shell environment.
The command failed due to the query parameter you passed into mongoexport, which is not a valid JSON object due to the existence of new Date() which is a Javascript statement.
The required modification is to simply use the example ISODate() object you provided, .e.g:
mongoexport -d test-copy -c collection -o /home/ubuntu/mongodb-archiving/mongodump/collection.json --query '{"created_at": {$lte: ISODate("2014-03-07T06:32:19.172Z") } }'
You just need to replace the contents of the ISODate() with the date you require.
I'm trying to combine 2 collections into one (not join). I have 2 databases with same collections and collection structure.
As example:
Collection test1 db1:
{
"_id" : ObjectId("574c339b3644a65b36e77359"),
"appName" : "App1",
"customerId" : "Client1",
"environment" : "PROD",
"methods" : []
}
Collection test2 db2:
{
"_id" : ObjectId("574c367d627b45ef0abc00e5"),
"appName" : "App2",
"customerId" : "Client2",
"environment" : "PROD",
"methods" : []
}
I'm trying to create the following:
One collection test db, where the documents will be merged from test1 and test2 but not one with each other. What would be the proper way to achieve this?
{
"_id" : ObjectId("574c339b3644a65b36e77359"),
"appName" : "App1",
"customerId" : "Client1",
"environment" : "PROD",
"methods" : []
},
{
"_id" : ObjectId("574c367d627b45ef0abc00e5"),
"appName" : "App2",
"customerId" : "Client2",
"environment" : "PROD",
"methods" : []
}
The complexity is that ID are referenced in other collection of mongo.
the fastest way will be to create a dump (using mongodump) and restore them at once (example is using windows paths).
mongodump --db test1 --collection test1 --out c:\dump\test1
mongodump --db test2 --collection test2 --out c:\dump\test2
mongorestore --db test3 --collection test3 c:\dump\test1
mongorestore --db test3 --collection test3 c:\dump\test2
I have my json_file.json like this:
[
{
"project": "project_1",
"coord1": 2,
"coord2": 10,
"status": "yes",
"priority": 7
},
{
"project": "project_2",
"coord1": 2,
"coord2": 10,
"status": "yes",
"priority": 7
},
{
"project": "project_3",
"coord1": 2,
"coord2": 10,
"status": "yes",
"priority": 7
}
]
When I run the following command to import this into mongodb:
mongoimport --db my_db --collection my_collection --file json_file.json
I get the following error:
Failed: error unmarshaling bytes on document #0: JSON decoder out of sync - data changing underfoot?
If I add the --jsonArray flag to the command I import like this:
imported 3 documents
instead of one document with the json format as shown in the original file.
How can I import json into mongodb with the original format in the file shown above?
The mongoimport tool has an option:
--jsonArray treat input source as a JSON array
Or it is possible to import from file containing same data format as the result of db.collection.find() command. Here is example from university.mongodb.com courseware some content from grades.json:
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb577" }, "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb578" }, "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb579" }, "student_id" : 0, "type" : "homework", "score" : 14.8504576811645 }
As you can see, no array used and no comma delimiters between documents either.
I discover, recently, that this complies with the JSON Lines text format.
Like one used in apache.spark.sql.DataFrameReader.json() method.
Side note:
$ python -m json.tool --sort-keys --json-lines < data.jsonl
also can handle this format
see demo and details here
Perhaps the following reference from the MongoDB project blog could help you gain insight on how arrays work in Mongo:
https://blog.mlab.com/2013/04/thinking-about-arrays-in-mongodb/
I would frame your import otherwise, and either:
a) import the three different objects separately into the collection as you say, using the --jsonArray flag; or
b) encapsulate the complete array within a single object, for example in this way:
{
"mydata":
[
{
"project": "project_1",
...
"priority": 7
}
]
}
HTH.
I faced opposite problem today, my conclusion would be:
If you wish to insert array of JSON objects at once, where each array entry shall be treated as separate dtabase entry, you have two options of syntax:
Array of object with valid coma positions & --jsonArray flag obligatory
[
{obj1},
{obj2},
{obj3}
]
Use file with basically incorrect JSON formatting (i.e. missing , between JSON object instances & without --jsonArray flag
{obj1}
{obj2}
{obj3}
If you wish to insert only an array (i.e. array as top-level citizen of your database) I think it's not possible and not valid, because mongoDB by definition supports documents as top-level objects which are mapped to JSON objects afterwards. In other words, you must wrap your array into JSON object as ALAN WARD pointed out.
Error:
$ ./mongoimport --db bookings --collection user --file user.json
2021-06-12T18:52:13.256+0530 connected to: localhost
2021-06-12T18:52:13.261+0530 Failed: error unmarshaling bytes on document #0: JSON decoder out of sync - data changing underfoot?
2021-06-12T18:52:13.261+0530 imported 0 documents
Solution: When your JSON data contain an array of objects then we need to use --jsonArray while import with the command like mentioned below
$ ./mongoimport --db bookings --collection user --file user.json --jsonArray
2021-06-12T18:53:44.164+0530 connected to: localhost
2021-06-12T18:53:44.532+0530 imported 414 documents