How to query the nested JSON structure in MongoDB/Mongoid - mongodb

The following is one of the document in the MongoDB database.
I want to select the year between 2007 ,2008.
and the key includes "Actual" or "Upper End of Range"
and the table_name equals to Unemployment rate
How to finish it in Mongoid or MongoDB query.
Or I only can do it in application layer like Ruby or Python ?
id 2012-04-25_unemployment_rate
{
"_id": "2012-04-25_unemployment_rate",
"table_name": "Unemployment rate",
"unit": "Percent",
"data": [
{
"2007": [
{
"Actual": "3.5"
},
{
"Upper End of Range": "-"
},
{
"Upper End of Central Tendency": "-"
},
{
"Lower End of Central Tendency": "-"
},
{
"Lower End of Range": "-"
}
]
},
{
"2008": [
{
"Actual": "1.7"
},
{
"Upper End of Range": "-"
},
{
"Upper End of Central Tendency": "-"
},
{
"Lower End of Central Tendency": "-"
},
{
"Lower End of Range": "-"
}
]
}
}
id 2014-04-25_unemployment_rate
{
"_id": "2014-04-25_unemployment_rate",
"table_name": "Unemployment rate",
"unit": "Percent",
"data": [
{
"2008": [
{
"Actual": "3.5"
},
{
"Upper End of Range": "-"
},
{
"Upper End of Central Tendency": "-"
},
{
"Lower End of Central Tendency": "-"
},
{
"Lower End of Range": "-"
}
]
},
{
"2009": [
{
"Actual": "1.7"
},
{
"Upper End of Range": "-"
},
{
"Upper End of Central Tendency": "-"
},
{
"Lower End of Central Tendency": "-"
},
{
"Lower End of Range": "-"
}
]
}
}

You don't select documents by keys; you select documents by values. You should restructure your documents to have fields like "year" : 2007. For example,
{
"_id": "2012-04-25_unemployment_rate",
"table_name": "Unemployment rate",
"unit": "Percent",
"data": [
{
"year" : 2007,
"Actual": "3.5",
"Upper End of Range": "-",
"Upper End of Central Tendency": "-",
"Lower End of Central Tendency": "-",
"Lower End of Range": "-"
}
]
}
I'm not sure what you mean by the condition that "key includes 'Actual' or 'Upper End of Range'", but if you want documents with a data element with year 2007 or 2008 and table_name equal to "Unemployment rate", use the query spec
{ "table_name" : "Unemployment rate", "data.year" : { "$in" : [2007, 2008] } }

Related

mongodb update document from first element of array

Consider a collection client with the following documents:
[
{
"id": 1,
"Name": "Susie",
"ownership" : {
"ownershipContextCode" : "C1"
},
"clientIds": [
{
"clientClusterCode": "clientClusterCode_1",
"clientId": "11"
}
]
},
{
"id": 2,
"Name": "John",
"ownership" : {
"ownershipContextCode" : "C2"
},
"clientIds": [
{
"clientClusterCode": "clientClusterCode_2",
"clientId": "22"
}
]
}
]
I am attempting to set a field (ownershipClientCode) as the first element of the clientIds array.
The result should be like that:
[
{
"id": 1,
"Name": "Susie",
"ownership" : {
"ownershipContextCode" : "C1",
"ownershipClientCode" : "clientClusterCode_1"
},
"clientIds": [
{
"clientClusterCode": "clientClusterCode_1",
"clientId": "11"
}
],
},
{
"id": 2,
"Name": "John",
"ownership" : {
"ownershipContextCode" : "C2",
"ownershipClientCode" : "clientClusterCode_2"
},
"clientIds": [
{
"clientClusterCode": "clientClusterCode_2",
"clientId": "22"
}
],
}
]
I'm using this query but I can't get sub object from the first element in the array
db.collection.aggregate([
{
$addFields: {
"Last Semester": {
"$arrayElemAt": [
"$clientIds",
0
]
}
}
}
])
This query add the all object but I want only the field (clientClusterCode).
Some thing like that
db.collection.aggregate([
{
$addFields: {
"Last Semester": {
"$arrayElemAt": [
"$clientIds",
0
].clientClusterCode
}
}
}
])
I'm using mongodb 4.0.0
You're very close: https://mongoplayground.net/p/HY1Pj0P4z12
db.collection.aggregate([
{
$addFields: {
"ownership.ownershipClientCode": {
"$arrayElemAt": [
"$clientIds.clientClusterCode",
0
]
}
}
}
])
You can use the dot notation within the $arrayElemAt as well as when you defining the field name.
To directly set the field, do something like this (use aggregation in the update): https://mongoplayground.net/p/js-usEJSH_A
db.collection.update({},
[
{
$set: {
"ownership.ownershipClientCode": {
"$arrayElemAt": [
"$clientIds.clientClusterCode",
0
]
}
}
}
],
{
multi: true
})
Note: The second method to update needs to be an array, so that it functions as an pipeline.

need to phonenumber from table abc in mongodb

{
"id": "1234",
"applicant": [
{
"phone": [
{
"prirotynumber": "1",
"areacode": "407",
"linenumber": "1234",
"exchangenumber": "7899"
},
{
"prirotynumber": "27",
"areacode": "407",
"linenumber": "1234",
"exchangenumber": "79999"
}
]
}
]
}
for this id=1234 i need to fetch homephonenuber as applicant.phone.areacode+applicant.phone+linenumber+ applicant.phone+exchangenumber if prirotynumber=1
and
cellphone as applicant.phone.areacode+applicant.phone+linenumber+ applicant.phone+exchangenumber if prirotynumber=27
Expected result here:
{
"key":"value"
}
If this isn't what you need, make your expected result more clarify with right sample data.
db.collection.aggregate([
{
"$match": {
"id": "1234",
"applicant.phone.prirotynumber": "1"
}
},
{
"$unwind": "$applicant"
},
{
"$unwind": "$applicant.phone"
},
{
"$match": {
"applicant.phone.prirotynumber": "1"
}
},
{
"$set": {
"homePhoneNumber ": {
$concat: [
"$applicant.phone.areacode",
"-",
"$applicant.phone.linenumber",
"-",
"$applicant.phone.exchangenumber"
]
}
}
}
])
mongoplayground

Optimizing MongoDB aggregate query on large Index objects

I have 20 Million objects in my MongoDb collection. Currently running on M30 MongoDb instance with 7.5Gb ram and 40Gb disk.
Data is stored in collection like this -
{
_id:xxxxx,
id : 1 (int),
from : xxxxxxxx (int),
to : xxxxxx (int),
status : xx (int)
.
.
.
.
},
{
_id:xxxxx,
id : 2 (int),
from : xxxxxxxx (int),
to : xxxxxx (int),
status : xx (int)
.
.
.
.
}
.
.
.
. and so on..
id is unique Index & from is a Index in this collection.
I am running a query to group 'to' and return me the max id and sort by max id with a given condition i.e 'from'
$collection->aggregate([
['$project' => ['id'=>1,'to'=>1,'from'=>1],
[ '$match'=> [
'$and'=>
[
[ 'from'=> xxxxxxxxxx],
[ 'status'=> xx ],
]
]
],
['$group' => [
'_id' =>
'$to',
'max_revision'=>['$max' => '$id'],
]
],
['$sort' => ['max_revision' => -1]],
['$limit' => 20],
]);
Above query runs just fine (~2 sec) on small data set on Index from like for 50-100k of same 'from' value in collection. But for conditions like, for example if 2M objects are having same 'from' value, then it is taking over >10 sec to execute and giving the result.
A quick example,
case 1- same query runs under 2 sec if it is executed with from as 12345, As 12345 is present 50k times in the collection.
case 2- query takes over 10 sec if it executed with from as 98765, As 98765 is present 2M times in the collection.
Edit : Explained query below -
{
"command": {
"aggregate": "mycollection",
"pipeline": [
{
"$project": {
"id": 1,
"to": 1,
"from": 1
}
},
{
"$match": {
"$and": [
{
"from": {
"$numberLong": "12345"
}
},
{
"status": 22
}
]
}
},
{
"$group": {
"_id": "$to",
"max_revision": {
"$max": "$id"
}
}
},
{
"$sort": {
"max_revision": -1
}
},
{
"$limit": 20
}
],
"allowDiskUse": false,
"cursor": {},
"$db": "mongo_jc",
"lsid": {
"id": {
"$binary": "8LktsSkpTjOzF3GIC+m1DA==",
"$type": "03"
}
},
"$clusterTime": {
"clusterTime": {
"$timestamp": {
"t": 1597230985,
"i": 1
}
},
"signature": {
"hash": {
"$binary": "PHh4eHh4eD4=",
"$type": "00"
},
"keyId": {
"$numberLong": "6859724943999893507"
}
}
}
},
"planSummary": [
{
"IXSCAN": {
"from": 1
}
}
],
"keysExamined": 1246529,
"docsExamined": 1246529,
"hasSortStage": 1,
"cursorExhausted": 1,
"numYields": 9747,
"nreturned": 0,
"queryHash": "29DAFB9E",
"planCacheKey": "F5EBA6AE",
"reslen": 231,
"locks": {
"ReplicationStateTransition": {
"acquireCount": {
"w": 9847
}
},
"Global": {
"acquireCount": {
"r": 9847
}
},
"Database": {
"acquireCount": {
"r": 9847
}
},
"Collection": {
"acquireCount": {
"r": 9847
}
},
"Mutex": {
"acquireCount": {
"r": 100
}
}
},
"storage": {
"data": {
"bytesRead": {
"$numberLong": "6011370213"
},
"timeReadingMicros": 4350129
},
"timeWaitingMicros": {
"cache": 2203
}
},
"protocol": "op_msg",
"millis": 8548
}
For this specific case the mongod query executor can use an index for the initial match, but not for the sort.
If you were to reorder and modify the stages a bit, it could use an index on {from:1, status:1, id:1} for both matching and sorting:
$collection->aggregate([
[ '$match'=> [
'$and'=>
[
[ 'from'=> xxxxxxxxxx],
[ 'status'=> xx ],
]
]
],
['$sort' => ['id' => -1]],
['$project' => ['id'=>1,'to'=>1,'from'=>1],
['$group' => [
'_id' => '$to',
'max_revision'=>['$first' => '$id'],
]
],
['$limit' => 20],
]);
This way the it should be able to combine the $match and $sort stages into a single index scan.

Upsert KV pair in subdocument for specific rules

How to update a document and insert key-value in subdocument for specific rules?
MongoDB version: 3.4
Use this CLI to insert simulation data
db.country.insertMany([{"_id":"us","groups":[{"group":"1"},{"group":"2"} ]},{"_id":"eu","groups":[{"group":"1"},{"group":"2"}]}, {"_id":"jp","groups":[{"group":"2"}]}])
original data
db.country.find()
{
"_id": "us", "groups": [ { "group" : "1" }, { "group": "2" } ]
}
{
"_id": "eu", "groups": [ { "group" : "1" }, { "group" : "2" } ]
}
{
"_id": "jp", "groups": [ { "group" : "2" } ]
}
How to get this result? ( just add status: happy to group 1 )
{
"_id": "us", "groups": [ { "group" : "1", "status": "happy" }, { "group": "2" } ]
}
{
"_id": "eu", "groups": [ { "group" : "1", "status": "happy" }, { "group" : "2" } ]
}
{
"_id": "jp", "groups": [ { "group" : "2" } ]
}
I know how to select all groups that match group=1
db.country.aggregate([
{'$unwind': '$groups'},
{'$match': {'groups.group': '1'}} ,
{'$project': {'group': '$groups.group', _id:0 }}
])
{ "group" : "1" }
{ "group" : "1" }
and also know how to use update + $set like this
// { "_id": 1, "people": {"name": "tony" } }
db.test.update({_id: 1}, { $set: {'people.country': 'taiwan'}})
// { "_id": 1, "people": {"name": "tony" , "country": "taiwan" } }
but how to merge update + $set and aggregate function? Please help me.
pymongo is OK for me.
How to get this result? ( just add status: happy to group 1 )
Use $ to refer the position of the matched sub-document in array.
db.coll.update_many({'groups.group':'1'}, {'$set': {'groups.$.status': 'happy'}})
see more here

Order by date in sub-document and then by document

I have a simple "Event" mongo schema. Two sample documents are below :
Event Document #1
{
"event_name": "Some nice event",
"venues": [
{
"venue_name": "venue #1",
"shows": [
{
"show_time": "2014-06-18T07:46:02.415Z",
"capacity": 20
},
{
"show_time": "2014-06-20T07:46:02.415Z",
"capacity": 40
}
]
},
{
"venue_name": "venue #2",
"shows": [
{
"show_time": "2014-06-17T07:46:02.415Z",
"capacity": 20
},
{
"show_time": "2014-06-24T07:46:02.415Z",
"capacity": 40
}
]
}
]
}
Event Document #2
{
"event_name": "Another nice event",
"venues": [
{
"venue_name": "venue #1",
"shows": [
{
"show_time": "2014-06-19T07:46:02.415Z",
"capacity": 20
},
{
"show_time": "2014-06-16T07:46:02.415Z",
"capacity": 40
}
]
}
]
}
I need to query this collection of event documents and fetch the events with the closest shows, with respective to a particular time.
So for e.g., if I had to find events happening on or after 16 Jun, I should get document #2 followed by document #1, with the venue sub-document order as [venue #2, venue #1].
On the other hand, if I wanted events happening on or after 18 Jun, I should get document #1, with [venue #1, venue #2], followed by document #2.
Essentially, I need to be able to sort by the start_time of the nested sub-document. And this sorting should work on multiple venue sub-documents.
According to mongo's documentation, this doesn't seem to be supported, so is there a way of using aggregation to achieve this?
Or is there a way to rejig the schema to support such queries?
Or is mongoDB the wrong use-case for such scenarios altogether?
Really good question. Hoping that your dates are real date but the lexical form should not really matter here. The following form should do it, as long as you take the dates into consideration:
db.event.aggregate([
// Match the "documents" that meet the condition
{ "$match": {
"venues.shows.show_time": { "$gte": new Date("2014-06-16") }
}},
// Unwind the arrays
{ "$unwind": "$venues" },
{ "$unwind": "$venues.shows" },
// Sort the entries just to float the nearest result
{ "$sort": { "venues.shows.show_time": 1 } },
// Find the "earliest" for the venue while grouping
{ "$group": {
"_id": {
"_id": "$_id",
"event_name": "$event_name",
"venue_name": "$venues.venue_name"
},
"shows": {
"$push": "$venues.shows"
},
"earliest": {
"$min": {
"$cond": [
{ "$gte": [
"$venues.shows.show_time",
new Date("2014-06-16")
]},
"$venues.shows.show_time",
null
]
}
}
}},
// Sort those because of the order you want
{ "$sort": { "earliest": 1 } },
// Group back and with the "earliest" document
{ "$group": {
"_id": "$_id._id",
"event_name": { "$first": "$_id.event_name" },
"venues": {
"$push": {
"venue_name": "$_id.venue_name",
"shows": "$shows"
}
},
"earliest": {
"$min": {
"$cond": [
{ "$gte": [
"$earliest",
new Date("2014-06-16")
]},
"$earliest",
null
]
}
}
}},
// Sort by the earliest document
{ "$sort": { "earliest": 1 } },
// Project the fields
{ "$project": {
"event_name": 1,
"venues": 1
}}
])
So most of this looks reasonable straightforward if you have some experience with the aggregation framework. If not then there is some general explaining, plus there are some "funky" things happening as we evaluate further.
The first steps in aggregation are to $match just like any normal query and then to $unwind the arrays you want to process. The "unwind" statement effectively "de-normalizes" the documents contained in the array to be standard documents by themselves.
The next $sort ends up as a "prettying up" function as the "earliest" event in each "set" will be at the top as a result.
As there are "two" levels of arrays, you do the grouping in two stages via the $group pipeline stage.
The first $group "groups" by "document", "event_name" and "venue". All of the shows are put back into their original array form, but at this time we extract the $min value for the "show_time".
The value taken is not just the ordinary "minimal" value. Here we use the $cond operator to make sure that the value returned must be "greater than or equal to" the date that you were requesting in the query initially. This makes sure that any "earlier" values are not taken into consideration when "sorting".
The next thing to do is to $sort on that "earliest" date, to keep the entries for the "venues" in order. The following stages then do the same as above, but "grouping" back to the original documents this time, then finally "sorting" in the order of which "show_time" would be the "earliest".
The result from the dates shown as input would be your desired result for the 16th:
{
"_id" : ObjectId("53a95263a1923f45a6c2d3dd"),
"event_name" : "Another nice event",
"venues" : [
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-16T07:46:02.415Z"),
"capacity" : 40
},
{
"show_time" : ISODate("2014-06-19T07:46:02.415Z"),
"capacity" : 20
}
]
}
]
}
{
"_id" : ObjectId("53a952b5a1923f45a6c2d3de"),
"event_name" : "Some nice event",
"venues" : [
{
"venue_name" : "venue #2",
"shows" : [
{
"show_time" : ISODate("2014-06-17T07:46:02.415Z"),
"capacity" : 20
},
{
"show_time" : ISODate("2014-06-24T07:46:02.415Z"),
"capacity" : 40
}
]
},
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-18T07:46:02.415Z"),
"capacity" : 20
},
{
"show_time" : ISODate("2014-06-20T07:46:02.415Z"),
"capacity" : 40
}
]
}
]
}
And by changing the input to the 18th you also get the desired result:
{
"_id" : ObjectId("53a952b5a1923f45a6c2d3de"),
"event_name" : "Some nice event",
"venues" : [
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-18T07:46:02.415Z"),
"capacity" : 20
},
{
"show_time" : ISODate("2014-06-20T07:46:02.415Z"),
"capacity" : 40
}
]
},
{
"venue_name" : "venue #2",
"shows" : [
{
"show_time" : ISODate("2014-06-17T07:46:02.415Z"),
"capacity" : 20
},
{
"show_time" : ISODate("2014-06-24T07:46:02.415Z"),
"capacity" : 40
}
]
}
]
}
{
"_id" : ObjectId("53a95263a1923f45a6c2d3dd"),
"event_name" : "Another nice event",
"venues" : [
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-16T07:46:02.415Z"),
"capacity" : 40
},
{
"show_time" : ISODate("2014-06-19T07:46:02.415Z"),
"capacity" : 20
}
]
}
]
}
Also if you want to go further with this, just add an additional $match stage, and that can filter out "events" that occur before the date that is requested in the query:
db.event.aggregate([
{ "$match": {
"venues.shows.show_time": { "$gte": new Date("2014-06-18") }
}},
{ "$unwind": "$venues" },
{ "$unwind": "$venues.shows" },
{ "$match": {
"venues.shows.show_time": { "$gte": new Date("2014-06-18") }
}},
{ "$sort": { "venues.shows.show_time": 1 } },
{ "$group": {
"_id": {
"_id": "$_id",
"event_name": "$event_name",
"venue_name": "$venues.venue_name"
},
"shows": {
"$push": "$venues.shows"
},
"earliest": {
"$min": {
"$cond": [
{ "$gte": [
"$venues.shows.show_time",
new Date("2014-06-18")
]},
"$venues.shows.show_time",
null
]
}
}
}},
{ "$sort": { "earliest": 1 } },
{ "$group": {
"_id": "$_id._id",
"event_name": { "$first": "$_id.event_name" },
"venues": {
"$push": {
"venue_name": "$_id.venue_name",
"shows": "$shows"
}
},
"earliest": {
"$min": {
"$cond": [
{ "$gte": [
"$earliest",
new Date("2014-06-18")
]},
"$earliest",
null
]
}
}
}},
{ "$sort": { "earliest": 1 } },
{ "$project": {
"event_name": 1,
"venues": 1
}}
])
With the result:
{
"_id" : ObjectId("53a952b5a1923f45a6c2d3de"),
"event_name" : "Some nice event",
"venues" : [
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-18T07:46:02.415Z"),
"capacity" : 20
},
{
"show_time" : ISODate("2014-06-20T07:46:02.415Z"),
"capacity" : 40
}
]
},
{
"venue_name" : "venue #2",
"shows" : [
{
"show_time" : ISODate("2014-06-24T07:46:02.415Z"),
"capacity" : 40
}
]
}
]
}
{
"_id" : ObjectId("53a95263a1923f45a6c2d3dd"),
"event_name" : "Another nice event",
"venues" : [
{
"venue_name" : "venue #1",
"shows" : [
{
"show_time" : ISODate("2014-06-19T07:46:02.415Z"),
"capacity" : 20
}
]
}
]
}