using pymongo, how can i deal with nested json format? - mongodb

To be more specific,
I loaded the data into Mongodb by Pymongo with this script.
header = ['id', 'info']
for each in reader:
row={}
for field in header:
row[field]=each[field]
db.segment.insert_one(row)
The id column has unique Id of users and Info column is composed as nested json.
For example, here is the data set in the db
{
u'_id': ObjectId('111'),
u'id': u'123',
u'info': {
"TYPE": "food",
"dishes":"166",
"cc": "20160327 040001",
"country": "japan",
"money": 3521,
"info2": [{"type"; "dishes", "number":"2"}]
}
}
What i want to do is to read the value in the nested json format.
So what i did is ..
pipe = [{"$group":{"_id":"$id", "Totalmoney":{"$sum":"$info.money"}}}]
total_money = db.segment.aggregate(pipeline=pipe)
but the result of sum is always "0"for every id.
What am i doing wrong? how can i fix it?
I have to use mongodb because of the data size which is too big to be handled by python
Thank you in advance.

Related

JSON data getting retrieved from mongodb with formats added explicitly inside field e.g.({"field": {$numberInt: "20"}}). How to process that data?

I have used mongo import to import data into mongodb from csv files. I am trying to retrieve data from an Mongodb realm service. The returned data for the entry is as follows:
{
"_id": "6124edd04543fb222e",
"Field1": "some string",
"Field2": {
"$numberDouble": "145.81"
},
"Field3": {
"$numberInt": "0"
},
"Field4": {
"$numberInt": "15"
},
"Field5": {
"$numberInt": "0"
}
How do I convert this into normal JSON by removing $numberInt and $numberDouble like :
{
"_id": "6124edd04543fb222e",
"Field1": "some string",
"Field2": 145.8,
"Field3": 0,
"Field4": 15,
"Field5": 0
}
The fields are also different for different documents so cannot use Mongoose directly. Are there any solutions to this?
Also would help to know why the numbers are being stored as $numberInt:"".
Edit:
For anyone with the same problem this is how I solved it.
The array of documents is in EJSON format instead of JSON like said in the upvoted answer. To covert it back into normal JSON, I used JSON.stringify to first convert each document I got from map function into string and then parsed it using EJSON.parse with
{strict:false} (this option is important)
option to convert it into normal JSON.
{restaurants.map((restaurant) => {
restaurant=EJSON.parse(JSON.stringify(restaurant),{strict:false});
}
EJSON.parse documentation here. The module to be installed and imported is mongodb-extjson.
The format with $numberInt etc. is called (MongoDB) Extended JSON.
You are getting it on the output side either because this is how you inserted your data (meaning your inserted data was incorrect, you need to fix the ingestion side) or because you requested extended JSON serialization.
If the data in the database is correct, and you want non-extended JSON output, you generally need to write your own serializers to JSON since there are multiple possibilities of how to format the data. MongoDB's JSON output format is the Extended JSON you're seeing in your first quote.

MySQL Column Containing JSON string in Text Data Type to Mongo DB

I am new to Mongo but need to use it for a project at work.
I need to move some data into MongoDB from a MySQL server. I can import the column from the MySQL server into Mongo, but the data stays in column form because it is a string and a text data type. Is there a way to parse the JSON from the column after importing into Mongo? I would just use cold fusion and convert each cell to a file but there are millions of records. Just looking to see if there is a better way. Thanks for the help!
Here is an example of one cell in the column that was imported from MySQL. This is not complete, there is a lot more there. Also the JSON in each cell is valid:
{"cfid": 131146, "noun": "Cart", "value": 3, "cftoken": "b0ccc0c923077c03-D2ACF941-5056-A000-51C81C89B1058012", "cursign": "$", "currency": "US", "urltoken": "CFID=131146&CFTOKEN=b0ccc0c923077c03-D2ACF941-5056-A000-51C81C89B1058012", "cartvalue": 4, "isloginok": false, "itemavail": {"1": "In Stock", "2": "In Stock", "3": "will contact you"}, "sessionid": "MENU_131146_b0ccc0c923077c03-D2ACF941-5056-A000-51C81C89B1058012", "alert_list": "", "cartitems1": {"Custom": false, "PartNo": "224225", "Checked": false, "Quantity": 1, "extprice": 39, "subtotal": 39, "ArrayName": "Session.CartItems1", "FromSaved": false, "unitprice": 39}
Please format the data before insert to the database so that mongodb can be effectively identified.
if it is in String in or a file read it and use like Document.parse() fuction to format it at first .
you also can read data form mysql put it to a map/JsonObject , then insert .

mongoDB ignoring unique index

I built a REST service and I found out that a JSON String generated from an ObjectId by using Gson will be in a different format than that is produced by spring-boot. and if I send an ObjectId of an existing Document's _id field in GSON format to my REST service and save it to the collection by using mongorepository's save function, a new Document with duplicated _id will still be inserted even if a unique index is set on such field. But if I send ObjectId in a format that is produced by spring-boot everything works perfectly. I'm wondering what caused such a problem?
"timestamp": 1558461711,
"machineIdentifier": 5077764,
"processIdentifier": 21816,
"counter": 13546695,
"date": "2019-05-21T18:01:51.000+0000",
"time": 1558461711000,
"timeSecond": 1558461711(generated by spring-boot)
"counter": 13546695,
"randomValue1": 9256029,
"randomValue2": 856,
"timestamp": 1558461711(by GSON)
If you are working with mongodb it is better to use org.bson.Document (which is provided by mongodb dependency) or some other mongodb class to convert document to json rather then GSON.
Document document = new Document();
document.put("_id", new ObjectId());
String json = document.toJson()
document.toJson() should stringify ObjectId in a right way.
Actually the output of the code above will be:
{ "_id" : { "$oid" : "5ce51fb47dda11a8507087eb" } }
Which is a valid format for mongodb, not sure about how SpringBoot will react on it.
Anyway, hope it'll help.

How do I use ADF copy activity with multiple rows in source?

I have source which is JSON array, sink is SQL server. When I use column mapping and see the code I can see mapping is done to first element of array so each run produces single record despite the fact that source has multiple records. How do I use copy activity to import ALL the rows?
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"schemaMapping": {
"['#odata.context']": "BuyerFinancing",
"['#odata.nextLink']": "PropertyCondition",
"value[0].AssociationFee": "AssociationFee",
"value[0].AssociationFeeFrequency": "AssociationFeeFrequency",
"value[0].AssociationName": "AssociationName",
Use * as the source field to indicate all elements in json format. For example, with json:
{
"results": [
{"field1": "valuea", "field2": "valueb"},
{"field1": "valuex", "field2": "valuey"}
]
}
and a database table with a column result to store the json. The mapping with results as the collection and * and the sub element will create two records with:
{"field1": "valuea", "field2": "valueb"}
{"field1": "valuex", "field2": "valuey"}
in the result field.
Copy Data Field Mapping
ADF support cross apply for json array. Please check the example in this doc. https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#jsonformat-example
For schema mapping: https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#schema-mapping

Cloudant Query: Access data in nested array AND sort by if-statement

I have a question pertaining to how to query in cloudant with (1) a nested array and (2) sort by using an if-statement. I have been to the following websites, but I still need assistance.
https://docs.cloudant.com/guides/cloudant-query.html
https://docs.cloudant.com/api/cloudant-query.html
I want to return the documents that meet satisfy the following pseudo code:
if event[i].type == "check_in" then sort by event[i].time
Here is a snippet of the JSON data structure that I am using.
{
"status" : "active",
"event": [
{
"type": "check_in",
"time": "11/19/2014 15:34:12"
},
{
"type": "check_out",
"time": "11/20/2014 17:54:22"
}
]
}
Here are some questions I have that may break this problem down:
(1) How can I access event[0].type data?
(2) How can I loop through the entire event array inside of a Cloudant Query and check if event[i] == "check_in" for each object in the event array?
(3) How can I sort on the timestamp data (assume it is an integer for simplicity)?
(4) What format does the timestamp have to be in for me to sort it in a Cloudant Query?
Could you help point me in the right direction to help accomplish this? Please let me know if you need more information.
Cloudant Query doesn't seem to work with indexing array elements and query against.
You can create a View to get the same result by indexing each array elements and then query on the View.