how to sort before querying in the embedded document - mongodb

I know how to sort the embedded document after the find results but how do I sort before the find so that the query itself is run on the sorted array ? I know this must be possible if I use aggregate but i really like to know if this is possible without that so that I understand it better how it works.
This is my embedded document
"shipping_charges" : [
{
"region" : "region1",
"weight" : 500,
"rate" : 10
},
{
"region" : "Bangalore HQ",
"weight" : 200,
"rate" : 40
},
{
"region" : "region2",
"weight" : 1500,
"rate" : 110
},
{
"region" : "region3",
"weight" : 100,
"rate" : 50
},
{
"region" : "Bangalore HQ",
"weight" : 100,
"rate" : 150
}
]
This is the query i use to match the 'region' and the 'weight' to get the pricing for that match ..
db.clients.find( { "shipping_charges.region" : "Bangalore HQ" , "shipping_charges.weight" : { $gte : 99 } }, { "shipping_charges.$" : 1 } ).pretty()
This query currently returns me the
{
"shipping_charges" : [
{
"region" : "Bangalore HQ",
"weight" : 200,
"rate" : 40
}
]
}
The reason it possibly returns this set is because of the order in which it appears(& matches) in the embedded document.
But, I want this to return me the last set that best matches to closest slab of the weight(100grams)
What changes required in my existing query so that I can sort the embedded document before the find runs on them to get the results as I want it ?
If for any reasons you are sure this cant be done without a MPR, let me know so that i can stay away from this method and focus only on MPR to get the desired results as I want it .

You can use an aggregation pipeline instead of map-reduce:
db.clients.aggregate([
// Filter the docs to what we're looking for.
{$match: {
'shipping_charges.region': 'Bangalore HQ',
'shipping_charges.weight': {$gte: 99}
}},
// Duplicate the docs, once per shipping_charges element
{$unwind: '$shipping_charges'},
// Filter again to get the candidate shipping_charges.
{$match: {
'shipping_charges.region': 'Bangalore HQ',
'shipping_charges.weight': {$gte: 99}
}},
// Sort those by weight, ascending.
{$sort: {'shipping_charges.weight': 1}},
// Regroup and take the first shipping_charge which will be the one closest to 99
// because of the sort.
{$group: {_id: '$_id', shipping_charges: {$first: '$shipping_charges'}}}
])
You could also use find, but you'd need to pre-sort the shipping_charges array by weight in the documents themselves. You can do that by using a $push update with the $sort modifier:
db.clients.update({}, {
$push: {shipping_charges: {$each: [], $sort: {weight: 1}}}
}, {multi: true})
After doing that, your existing query will return the right element:
db.clients.find({
"shipping_charges.region" : "Bangalore HQ",
"shipping_charges.weight" : { $gte : 99 }
}, { "shipping_charges.$" : 1 } )
You would, of course, need to consistently include the $sort modifier on any further updates to your docs' shipping_charges array to ensure it stays sorted.

Related

How do access values in a nested documents in MongoDB?

I am trying to find the average sell of each console xbox, ps4, and wii. I am working with nested documents, and I try to access type to filter "sell" using db.console.find({"market.type":"sell"}); but I end up getting "online" type values as well.
Document 1:
_id:("111111111111111111111111")
market:Array
0:Object
type:"sell"
console:"Xbox"
amount:399
1:Object
type:"online"
console:"PS4"
amount:359
2:Object
type:"sell"
console:"xbox"
amount:349
Since you need to filter the sub-documents from a documents so simply find will not work to filter the sub-documents.
You have to use aggregation pipeline as below:
> db.st9.aggregate([
{
$unwind:"$market"
},
{
$match: {"market.type":"sell"}
},
{
$group: {_id:"$market.console", "avg": {$avg:"$market.amount"}, "count": {$sum:1}, "totalSum": {$sum: "$market.amount"} }
}
])
Output:
{ "_id" : "PS4", "avg" : 300, "count" : 1, "totalSum" : 300 }
{ "_id" : "Xbox", "avg" : 359, "count" : 3, "totalSum" : 1077 }
>
For more reference on aggregation pipeline check below official mongo db documentations:
$unwind
$match
$group

For each document retrieve object with $max field from array

I have the following documents in my collection. Each document contains historical weather data about a specific location:
{
'location':'new york',
'history':[
{'timestamp':1524542400, 'temp':79, 'wind_speed':1, 'wind_direction':'SW'}
{'timestamp':1524548400, 'temp':80, 'wind_speed':2, 'wind_direction':'SW'}
{'timestamp':1524554400, 'temp':82, 'wind_speed':3, 'wind_direction':'S'}
{'timestamp':1524560400, 'temp':78, 'wind_speed':4, 'wind_direction':'S'}
]
},
{
'location':'san francisco',
'history':[
{'timestamp':1524542400, 'temp':80, 'wind_speed':5, 'wind_direction':'SW'}
{'timestamp':1524548400, 'temp':81, 'wind_speed':6, 'wind_direction':'SW'}
{'timestamp':1524554400, 'temp':82, 'wind_speed':7, 'wind_direction':'S'}
{'timestamp':1524560400, 'temp':73, 'wind_speed':8, 'wind_direction':'S'}
]
},
{
'location':'miami',
'history':[
{'timestamp':1524542400, 'temp':84, 'wind_speed':9, 'wind_direction':'SW'}
{'timestamp':1524548400, 'temp':85, 'wind_speed':10, 'wind_direction':'SW'}
{'timestamp':1524554400, 'temp':86, 'wind_speed':11, 'wind_direction':'S'}
{'timestamp':1524560400, 'temp':87, 'wind_speed':12, 'wind_direction':'S'}
]
}
I would like to get a list of the most recent weather data for each location (more or less) like so:
{
'location':'new york',
'history':{'timestamp':1524560400, 'temp':78, 'wind_speed':4, 'wind_direction':'S'}
},
{
'location':'san francisco',
'history':{'timestamp':1524560400, 'temp':73, 'wind_speed':8, 'wind_direction':'S'}
},
{
'location':'miami',
'history':{'timestamp':1524560400, 'temp':87, 'wind_speed':12, 'wind_direction':'S'}
}
I was pretty sure it needed some sort of $group aggregate but can't figure out how to select an entire object by $max:<field>. For example the below query only returns the max timestamp itself, without any of the accompanying fields.
db.collection.aggregate([{
'$unwind': '$history'
}, {
'$group': {
'_id': '$name',
'timestamp': {
'$max': '$history.timestamp'
}
}
}])
returns
{ "_id" : "new york", "timestamp" : 1524560400 }
{ "_id" : "san franciscoeo", "timestamp" : 1524560400 }
{ "_id" : "miami", "timestamp" : 1524560400 }
The actual collection and arrays are very large so client side processing won't be ideal. Any help would be much appreciated.
Well as the author of the answer you found, I think we can actually do a bit better with modern MongoDB versions.
Single match per document
In short we can actually apply $max to your particular case, used with $indexOfArray and $arrayElemAt to extract the matched value:
db.collection.aggregate([
{ "$addFields": {
"history": {
"$arrayElemAt": [
"$history",
{ "$indexOfArray": [ "$history.timestamp", { "$max": "$history.timestamp" } ] }
]
}
}}
])
Which will return you:
{
"_id" : ObjectId("5ae9175564de8a00a66b3974"),
"location" : "new york",
"history" : {
"timestamp" : 1524560400,
"temp" : 78,
"wind_speed" : 4,
"wind_direction" : "S"
}
}
{
"_id" : ObjectId("5ae9175564de8a00a66b3975"),
"location" : "san francisco",
"history" : {
"timestamp" : 1524560400,
"temp" : 73,
"wind_speed" : 8,
"wind_direction" : "S"
}
}
{
"_id" : ObjectId("5ae9175564de8a00a66b3976"),
"location" : "miami",
"history" : {
"timestamp" : 1524560400,
"temp" : 87,
"wind_speed" : 12,
"wind_direction" : "S"
}
}
That is of course without actually needing to "group" anything and simply find the $max value from within each document, as you seem to be trying to do. This avoids you needing to "mangle" any other document output by forcing it through a $group or indeed an $unwind.
The usage essentially is that the $max returns the "maximum" value from the specified array property since $history.timestamp is a short way of notating to extract "just those values" from within the objects of the array.
This is used in comparison with the same "list of values" to determine the matching "index" via $indexOfArray, which takes an array as it's first argument and the value to match as the second.
The $arrayElemAt operator also takes an array as it's first argument, here we use the full "$history" array since we want to extract the "full object". Which we do by the "returned index" value of the $indexOfArray operator.
"Multiple" matches per document
Of course that's fine for "single" matches, but if you wanted to expand that to "multiple" matches of the same $max value, then you would use $filter instead:
db.collection.aggregate([
{ "$addFields": {
"history": {
"$filter": {
"input": "$history",
"cond": { "$eq": [ "$$this.timestamp", { "$max": "$history.timestamp" } ] }
}
}
}}
])
Which would output:
{
"_id" : ObjectId("5ae9175564de8a00a66b3974"),
"location" : "new york",
"history" : [
{
"timestamp" : 1524560400,
"temp" : 78,
"wind_speed" : 4,
"wind_direction" : "S"
}
]
}
{
"_id" : ObjectId("5ae9175564de8a00a66b3975"),
"location" : "san francisco",
"history" : [
{
"timestamp" : 1524560400,
"temp" : 73,
"wind_speed" : 8,
"wind_direction" : "S"
}
]
}
{
"_id" : ObjectId("5ae9175564de8a00a66b3976"),
"location" : "miami",
"history" : [
{
"timestamp" : 1524560400,
"temp" : 87,
"wind_speed" : 12,
"wind_direction" : "S"
}
]
}
The main difference being of course that the "history" property is still an "array" since that is what $filter will produce. Also noting of course that if there were in fact "multiple" entries with the same timestamp value, then this would of course return them all and not just the "first index" matched.
The comparison is basically done instead against "each" array element to see if the "current" ( "$$this" ) object has the specified property which matches the $max result, and ultimately returning only those array elements which are a match for the supplied condition.
These are essentially your "modern" approaches which avoid the overhead of $unwind, and indeed $sort and $group where they may not be needed. Of course they are not needed for just dealing with individual documents.
If however you really need to $group across "multiple documents" by a specific grouping key and consideration of values "inside" the array, then the initial approach outlined as you discovered is actually the fit for that scenario, as ultimately you "must" $unwind to deal with items "inside" an array in such a way. And also with consideration "across documents".
So be mindful to use stages like $group and $unwind only where you actually need to and where "grouping" is your actual intent. If you are just looking to find something "in the document", then there are far more efficient ways to do this without all the additional overhead that those stages bring with them to processing.

Mongodb unwind and match VS match and unwind

I'm looking to optimize the MongoDB performance by minimizing the number of records to unwind.
I do like:
unwind(injectionRecords),
match("machineID" : "machine1"),
count(counter)
But because of huge data unwind operation takes a lot of time and then it matches from unwind.
It unwinds all the 4 records then matches machineID from result and give me count of it.
Instead I would like to do something like :
match("machineID": "machine1"),
unwind(injectionRecords)
count(counter)
So, it would match records having machineID and unwind only 2 instead of 4 and give me the count of it.
Is this possible? How can I do this?
Here are sample docs,
{
"_id" : ObjectId("5981c24b90a7c215e4f166dd"),
"machineID" : "machine1",
"injectionRecords" : [
{
"startTime" : ISODate("2017-08-02T17:45:04.779+05:30"),
"endTime" : ISODate("2017-08-02T17:45:07.763+05:30"),
"counter" : 1
},
{
"startTime" : ISODate("2017-08-02T17:45:24.417+05:30"),
"endTime" : ISODate("2017-08-02T17:45:27.402+05:30"),
"counter" : 2
}
]
},
{
"_id" : ObjectId("5981c24b90a7c215e4f166de"),
"machineID" : "machine2",
"injectionRecords" : [
{
"startTime" : ISODate("2017-08-02T17:46:04.779+05:30"),
"endTime" : ISODate("2017-08-02T17:46:07.763+05:30"),
"counter" : 1
},
{
"startTime" : ISODate("2017-08-02T17:46:24.417+05:30"),
"endTime" : ISODate("2017-08-02T17:46:27.402+05:30"),
"counter" : 2
}
]
}
The following query will return a count of injectionRecords for a given machineId. I think this is what you are asking for.
db.collection.aggregate([
{$match: {machineID: 'machine1'}},
{$unwind: '$injectionRecords'},
{$group:{_id: "$_id",count:{$sum:1}}}
])
Of course, this query (where the unwind takes place before the match) is functionally equivalent:
db.collection.aggregate([
{$unwind: '$injectionRecords'},
{$match: {machineID: 'machine1'}},
{$group:{_id: "$_id",count:{$sum:1}}}
])
However, running that query with explain ...
db.collection.aggregate([
{$unwind: '$injectionRecords'},
{$match: {machineID: 'machine1'}},
{$group:{_id: "$_id",count:{$sum:1}}}
], {explain: true})
... shows that the unwind stage applies to the entire collection whereas if you match before unwinding then only the matched documents are unwound.

Mongo DB specify date query for all subobjects

I'm still new to MongoDB and I'm not able to achieve the following.
This a object inside one of the collections I have to deal with:
{
"_id" : ObjectId("5306ad28e4b04bd6667b03bf"),
"name" : "FOOBAR",
"Items" : [
{
"price" : 0,
"currency" : "EUR",
"expiryDate" : ISODate("2014-03-15T23:00:00Z"),
},
{
"price" : 0,
"currency" : "EUR",
"expiryDate" : ISODate("2015-03-15T23:00:00Z"),
},
{
"currency" : "EUR",
"expiryDate" : ISODate("2015-04-16T23:00:00Z"),
}, ...}
I now need to find objects, where the timestamp "expiryDate" for all sub-objects inside "Items" is less than a certain value (ISODate).
Here's what it tried:
1. first try
db.COLL.findOne({"Items.expiryDate": { $lt : ISODate("2015-02-10T00:00:00.000Z") }}));"
This will also return object where only one "expiryDate" is less thane.
second try
db.COLL.findOne({"Items": { $all : [ "$elemMatch" : { expiryDate: { $lt: ISODate(\"2015-02-10T00:00:00.000Z\") }} ] }}));"
Every query gives me only items where some but not all subobjects have a timestamp less than a certain time.
Please help me write this query!!
You can use the aggregation Framework to achieve this.
Using the $size operator we can find the size of Items which will be used in the later stages of aggregation.
The $unwind deconstructs the Items array so that we can apply the condition on individual items in the next $match stage.
In the $group stage we calculate the size of filtered Items and compare it with the original size using $cmp operator.This is done to identify documents where all sub-documents are less than the Date supplied in the $match condition.'
db.Coll.aggregate([
{'$project': {'name': 1, 'size': {'$size': '$Items'},'Items': 1}},
{'$unwind': '$Items'},
{'$match': {'Items.expiryDate': { $lt: ISODate("2015-03-15T23:40:00.000Z")}}},
{'$group': { '_id': '$_id','Items': { '$push': '$Items'},'size':{'$first': '$size'}, 'newsize': {'$sum':1}}},
{'$project': {'cmp_value': { $cmp : ['$size', '$newsize']},'name' :1 ,'Items': 1}},
{'$match': {'cmp_value': 0}}
])
Even though it's very old question, let me write my comment for others.
db.collection.findOne
will return only one record. Instead you should use
db.collection.find
Hope this helps!

Remove element from array in mongodb

I am new in mongodb and i want to remove the some element in array.
my document as below
{
"_id" : ObjectId("4d525ab2924f0000000022ad"),
"name" : "hello",
"time" : [
{
"stamp" : "2010-07-01T12:01:03.75+02:00",
"reason" : "new"
},
{
"stamp" : "2010-07-02T16:03:48.187+03:00",
"reason" : "update"
},
{
"stamp" : "2010-07-02T16:03:48.187+04:00",
"reason" : "update"
},
{
"stamp" : "2010-07-02T16:03:48.187+05:00",
"reason" : "update"
},
{
"stamp" : "2010-07-02T16:03:48.187+06:00",
"reason" : "update"
}
]
}
in document, i want to remove first element(reason:new) and last element(06:00) .
and i want to do it using mongoquery, i am not using any java/php driver.
If I'm understanding you correctly, you want to remove the first and last elements of the array if the size of the array is greater than 3. You can do this by using the findAndModify query. In mongo shell you would be using this command:
db.collection.findAndModify({
query: { $where: "this.time.length > 3" },
update: { $pop: {time: 1}, $pop: {time: -1} },
new: true
});
This would find the document in your collection which matches the $where clause.
The $where field allows you to specify any valid javascript method. Please note that it applies the update only to the first matched document.
You might want to look at the following docs also:
http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-JavascriptExpressionsand%7B%7B%24where%7D%7D for more on the $where clause.
http://www.mongodb.org/display/DOCS/Updating#Updating-%24pop for
more on $pop.
http://www.mongodb.org/display/DOCS/findAndModify+Command for more
on findAndModify.
You could update it with { $pop: { time: 1 } } to remove the last one, and { $pop: { time : -1 } } to remove the first one. There is probably a better way to handle it though.
#javaamtho you cannot test for a size greater than 3 but only if it is exactly 3, for size greater than x number you should use the $inc operator and have a field you either 1 or -1 to in order to keep track when you remove or add items (use a separate field outside the array as below, time_count)
{
"_id" : ObjectId("4d525ab2924f0000000022ad"),
"name" : "hello",
"time_count" : 5,
"time" : [
{
"stamp" : "2010-07-01T12:01:03.75+02:00",
"reason" : "new"
},
{
"stamp" : "2010-07-02T16:03:48.187+03:00",
"reason" : "update"
},
{
"stamp" : "2010-07-02T16:03:48.187+04:00",
"reason" : "update"
},
{
"stamp" : "2010-07-02T16:03:48.187+05:00",
"reason" : "update"
},
{
"stamp" : "2010-07-02T16:03:48.187+06:00",
"reason" : "update"
}
]
}
If you would like to leave these time elements, you can use aggregate command from mongo 2.2+ to retrieve min and max time elements, unset all time elements, and push min and max versions (with some modifications it could do your job):
smax=db.collection.aggregate([{$unwind: "$time"},
{$project: {tstamp:"$time.stamp",treason:"$time.reason"}},
{$group: {_id:"$_id",max:{$max: "$tstamp"}}},
{$sort: {max:1}}])
smin=db.collection.aggregate([{$unwind: "$time"},
{$project: {tstamp:"$time.stamp",treason:"$time.reason"}},
{$group: {_id:"$_id",min:{$min: "$tstamp"}}},
{$sort: {min:1}}])
db.students.update({},{$unset: {"scores": 1}},false,true)
smax.result.forEach(function(o)
{db.collection.update({_id:o._id},{$push:
{"time": {stamp: o.max ,reason: "new"}}},false,true)})
smin.result.forEach(function(o)
{db.collection.update({_id:o._id},{$push:
{"time": {stamp: o.min ,reason: "update"}}},false,true)})
db.collection.findAndModify({
query: {$where: "this.time.length > 3"},
update: {$pop: {time: 1}, $pop{time: -1}},
new: true });
convert to PHP