How to Export Mongo Data with UTC Timestamps? - mongodb

I'm trying to use mongoexport to export a bunch of data in json so I can read it in a different program. I use the command:
mongoexport --jsonArray -h some_ip -d some_db -c some_collection -o mongo_dump.json
Problem is, all of my datetime objects wind up coming out looking like:
"time_created" : { "$date" : 1344000402000 }
"time_created" : { "$date" : 1343999298000 }
Which is the special 64 bit mongo time format. Is there something simple I can specify to just get unixtimestamps? Mongo time is useless to me and annoying to convert from.

I don't think there's a flag to change them in the output, unfortunately.
However, since the difference is just an extra three digits at the end, you can just do something like this:
sed -e 's/{ "\$date" : \([0-9]*\)[0-9]\{3\}/{ "\$date" : \1/' mongo_dump.json > unixstyle.json
It converted:
"time_created" : { "$date" : 1344000402000 }
"time_created" : { "$date" : 1343999298000 }
to:
"time_created" : { "$date" : 1344000402 }
"time_created" : { "$date" : 1343999298 }
edited to fix it to handle all kinds of digits, not just 0s

Related

How to export MongoDB long format collection as wide format csv

I have mongodb collection in the following format
brand1,trouser
brand1,jeans
brand2,belt
brand2,shoes
brand2,jeans
I want to export this data as csv file in the following format
brand1,trouser,jeans,belt
brand2,belt,shoes,jeans
can someone please help me.
I managed to get the answer for the question.
db.brand.aggregate([{ $group : { _id: "$brand_id", "items": { $push: '$item' } }}], {allowDiskUse : true})
It gives the result as follows:
{ "_id" : "brand1","items" : [ 'trouser','jeans'] }
{ "_id" : "brand2","items" : [ 'belt','shoes','jeans'] }

Update with $dateToString aggregation causes can't convert from BSON type object to Date in MongoDB

I have following document in the database (Mongodb 4.2)
{
"_id": ObjectId(
"5e58dd49103bba2c961e7d80"
),
"launchProducts": {
"scheduledLaunchDate": {
"$date": "2020-02-03T23:00:00.000Z"
}
}
}
I would like to update document and convert existing date object, into formatted string (dd-mm-yyyy) using following functionality of Mongo 4.2 https://docs.mongodb.com/manual/tutorial/update-documents-with-aggregation-pipeline/
I'm running following query in Mongo Shell:
db.collection.updateMany({}, [{"$set": {"launchProducts.scheduledLaunchDate": {"$dateToString": {"date":"$launchProducts.scheduledLaunchDate","format":"%d-%m-%Y"}}}}])
Unfortunately I'm getting following error:
2020-02-28T11:07:50.375+0100 E QUERY [js] WriteError({
"index" : 0,
"code" : 16006,
"errmsg" : "can't convert from BSON type object to Date",
"op" : {
"q" : {
},
"u" : [
{
"$set" : {
"launchProducts.scheduledLaunchDate" : {
"$dateToString" : {
"date" : "$launchProducts.scheduledLaunchDate",
"format" : "%d-%m-%Y"
}
}
}
}
],
"multi" : true,
"upsert" : false
}
}
Let me know if you have any ideas how to fix this.
You should never store Date as strings, you will just generate problems in future. If you like to get a specific format then you should format the Date value on client side at output.
If you want to get rid of the time part of scheduledLaunchDate you can use this pipeline:
db.collection.aggregate([
{
$set: {
"launchProducts.scheduledLaunchDate": {
$dateFromParts: {
year: {$year: "$launchProducts.scheduledLaunchDate"},
month: {$month: "$launchProducts.scheduledLaunchDate"},
day: {$dayOfMonth: "$launchProducts.scheduledLaunchDate"}
}
}
}
}
])
Problem was that date was stored improperly in MongoDB.
Running db.collection.insert({"launchProducts": {"scheduledLaunchDate": ISODate("2020-02-03T23:00:00.000Z")}}) first, solved my issue.

Export Mongodb subdocuments to CSV

I am having problems exporting subdocuments that are stored in MongoDB to a .CSV.
My data: a mongo collection that contains a unique user ID and scores from personality quizzes.
I would like a CSV that has three columns: user_id, name, raw_score. To add a further layer of complexity, within the 'scales' subdocument some users will have more than two entries (some quizzes produced more than 2 personality scores).
An example of my data minus documents that I am not interested in:
"assessment":{
"user_id" : "5839b1a654842f35617ad100",
"submissions" : {
"results" : {
"scales" : [
{
"scale" : {
"name" : "Security",
"code" : "SEC",
"multiplier" : 1
},
"raw_score" : 2
},
{
"scale" : {
"name" : "Power",
"code" : "POW",
"multiplier" : -1
},
"raw_score" : 3
}
],
}
}
}
}
I have tried using mongoexport but this produces a CSV that only has a user_id column.
rekuss$ mongoexport -d production_hoganx_app -c assessments --type=csv -o app_personality.csv -f user_id,results.scales.scale.name,results.scales.raw_score
Any ideas where I am going wrong?
Please let me know if you need anymore information.
Many thanks
You should try removing '=' sign from type. You could try --type csv

Mongoexport date range query result in Failure parsing

Trying to run mongoexport and having problems with my query parameter.
mongoexport -d test-copy -c collection -o /home/ubuntu/mongodb-archiving/mongodump/collection.json --query '{"created_at": {\$lte: new Date(1451577599000) } }'
Collection is:
{"created_at" : ISODate("2014-03-07T06:32:19.172Z")}
To which I can query just fine in Mongo Client.
The result in the following error:
Assertion: 10340:Failure parsing JSON string near: "created_a
You have a \ in your query. Please remove it.
--query '{"created_at": {$lte: new Date(1451577599000)}}'
You should use $date with mongoexport:
mongoexport.exe -h *HOST* -p *PORT* -q "{ 'created_at' : { '$lt' : { '$date' : '2014-03-07T06:32:19.172Z' } } }"
Remove the \$lte and change it to quoted "$lt" in your query, and the mongodump shall work fine.
Tested on mongodb 3.0.8
> use appdb
> db.testcoll.find({})
{ "_id" : 1, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
{ "_id" : 2, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
{ "_id" : 3, "created_at" : ISODate("2016-09-16T08:46:30.736Z") }
{ "_id" : 4, "created_at" : ISODate("2016-09-16T08:47:12.368Z") }
{ "_id" : 5, "created_at" : ISODate("2016-09-16T08:47:15.562Z") }
> db.testcoll.find({"created_at":{"$lt":new Date("2016-09-16")}})
{ "_id" : 1, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
{ "_id" : 2, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
> db.testcoll.find({"created_at":{"$lt":new Date(1473984000)}})
// make sure you are using millisecond version of epoch
> db.testcoll.find({"created_at":{"$lt":new Date(1473984000000)}})
{ "_id" : 1, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
{ "_id" : 2, "created_at" : ISODate("2016-09-15T08:46:12.272Z") }
Now the mongodump part :
dp#xyz:~$ mongodump -d appdb -c testcoll --query '{"created_at":{"$lt":new Date(1473984000000)}}'
2016-09-16T14:21:27.695+0530 writing appdb.testcoll to dump/appdb/testcoll.bson
2016-09-16T14:21:27.696+0530 writing appdb.testcoll metadata to dump/appdb/testcoll.metadata.json
2016-09-16T14:21:27.708+0530 done dumping appdb.testcoll (2 documents)
The mongoexport and mongodump tools require a valid JSON object for the --query parameter. From https://docs.mongodb.com/manual/reference/program/mongodump/#cmdoption--query:
--query , -q
Provides a JSON document as a query that optionally limits the documents included in the output of mongodump.
You must enclose the query in single quotes (e.g. ') to ensure that it does not interact with your shell environment.
The command failed due to the query parameter you passed into mongoexport, which is not a valid JSON object due to the existence of new Date() which is a Javascript statement.
The required modification is to simply use the example ISODate() object you provided, .e.g:
mongoexport -d test-copy -c collection -o /home/ubuntu/mongodb-archiving/mongodump/collection.json --query '{"created_at": {$lte: ISODate("2014-03-07T06:32:19.172Z") } }'
You just need to replace the contents of the ISODate() with the date you require.

Importing json from file into mongodb using mongoimport

I have my json_file.json like this:
[
{
"project": "project_1",
"coord1": 2,
"coord2": 10,
"status": "yes",
"priority": 7
},
{
"project": "project_2",
"coord1": 2,
"coord2": 10,
"status": "yes",
"priority": 7
},
{
"project": "project_3",
"coord1": 2,
"coord2": 10,
"status": "yes",
"priority": 7
}
]
When I run the following command to import this into mongodb:
mongoimport --db my_db --collection my_collection --file json_file.json
I get the following error:
Failed: error unmarshaling bytes on document #0: JSON decoder out of sync - data changing underfoot?
If I add the --jsonArray flag to the command I import like this:
imported 3 documents
instead of one document with the json format as shown in the original file.
How can I import json into mongodb with the original format in the file shown above?
The mongoimport tool has an option:
--jsonArray treat input source as a JSON array
Or it is possible to import from file containing same data format as the result of db.collection.find() command. Here is example from university.mongodb.com courseware some content from grades.json:
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb577" }, "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb578" }, "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 }
{ "_id" : { "$oid" : "50906d7fa3c412bb040eb579" }, "student_id" : 0, "type" : "homework", "score" : 14.8504576811645 }
As you can see, no array used and no comma delimiters between documents either.
I discover, recently, that this complies with the JSON Lines text format.
Like one used in apache.spark.sql.DataFrameReader.json() method.
Side note:
$ python -m json.tool --sort-keys --json-lines < data.jsonl
also can handle this format
see demo and details here
Perhaps the following reference from the MongoDB project blog could help you gain insight on how arrays work in Mongo:
https://blog.mlab.com/2013/04/thinking-about-arrays-in-mongodb/
I would frame your import otherwise, and either:
a) import the three different objects separately into the collection as you say, using the --jsonArray flag; or
b) encapsulate the complete array within a single object, for example in this way:
{
"mydata":
[
{
"project": "project_1",
...
"priority": 7
}
]
}
HTH.
I faced opposite problem today, my conclusion would be:
If you wish to insert array of JSON objects at once, where each array entry shall be treated as separate dtabase entry, you have two options of syntax:
Array of object with valid coma positions & --jsonArray flag obligatory
[
{obj1},
{obj2},
{obj3}
]
Use file with basically incorrect JSON formatting (i.e. missing , between JSON object instances & without --jsonArray flag
{obj1}
{obj2}
{obj3}
If you wish to insert only an array (i.e. array as top-level citizen of your database) I think it's not possible and not valid, because mongoDB by definition supports documents as top-level objects which are mapped to JSON objects afterwards. In other words, you must wrap your array into JSON object as ALAN WARD pointed out.
Error:
$ ./mongoimport --db bookings --collection user --file user.json
2021-06-12T18:52:13.256+0530 connected to: localhost
2021-06-12T18:52:13.261+0530 Failed: error unmarshaling bytes on document #0: JSON decoder out of sync - data changing underfoot?
2021-06-12T18:52:13.261+0530 imported 0 documents
Solution: When your JSON data contain an array of objects then we need to use --jsonArray while import with the command like mentioned below
$ ./mongoimport --db bookings --collection user --file user.json --jsonArray
2021-06-12T18:53:44.164+0530 connected to: localhost
2021-06-12T18:53:44.532+0530 imported 414 documents