MongoDB aggregation framework sort by length of array - mongodb

Given the following data set:
{ "_id" : ObjectId("510458b188ce1d16e616129b"), "codes" : [ "oxtbyr", "xstute" ], "name" : "Ciao Mambo", "permalink" : "ciaomambo", "visits" : 1 }
{ "_id" : ObjectId("510458b188ce1d16e6161296"), "codes" : [ "zpngwh", "odszfy", "vbvlgr" ], "name" : "Anthony's at Spokane Falls", "permalink" : "anthonysatspokanefalls", "visits" : 0 }
How can I convert this python/pymongo sort into something that will work with the MongoDB aggregation framework? I'm sorting results based on the number of codes within the codes array.
z = [(x['name'], len(x['codes'])) for x in restaurants]
sorted_by_second = sorted(z, key=lambda tup: tup[1], reverse=True)
for x in sorted_by_second:
print x[0], x[1]
This works in python, I just want to know how to accomplish the same goal on the MongoDB query end of things.

> db.z.aggregate({ $unwind:'$codes'},
{ $group : {_id:'$_id', count:{$sum:1}}},
{ $sort :{ count: 1}})

Related

MongoDB searching for gaps in indices

I am caching data from an online resource for future use in machine learning. This data is canonical and has no missing entries.
In the event that the real-time connection is dropped or the machine rebooted, I have a safeguard in place that does a historical search for a range of ids that are missing from the cache.
What I have yet to implement, however, is a mechanism for searching through the collection and identifying ranges where id values have been skipped.
For instance:
{"entry_id": 27497713, ...}
{"entry_id": 27497761, ...}
This data has a clear gap where entries are missing between 27497713 and 27497761.
Is there a way I can find such a gap using queries? Perhaps at least narrowing it down by selecting values between two ranges and checking the count of returned entries? Given how many entries the collection contains, I am trying to avoid lots of queries for efficiency.
can you try this aggregation
$group - get $min and $max
$addFields - generate $range by $min and $max entry_id
$lookup - self lookup with generated range ids and entry ids
$project - get only non matching range ids using setDifference
pipeline
db.entries.aggregate(
[
{$group : {_id : null, min : {$min : "$entry_id"}, max : {$max : "$entry_id"}}},
{$addFields : {rangeIds : {$range : ["$min", "$max"]}}},
{$lookup : {from : "entries", localField : "rangeIds", foreignField : "entry_id", as : "entries"}},
{$project : {_id :0, missingIds : {$setDifference : ["$rangeIds", "$entries.entry_id"]}}}
]
)
collection
> db.entries.find()
{ "_id" : ObjectId("5a6fea9b7346ce591a17ad22"), "entry_id" : 27497713 }
{ "_id" : ObjectId("5a6fea9b7346ce591a17ad23"), "entry_id" : 27497761 }
{ "_id" : ObjectId("5a6fea9b7346ce591a17ad24"), "entry_id" : 27497750 }
>
aggregate result
> db.entries.aggregate( [ {$group : {_id : null, min : {$min : "$entry_id"}, max : {$max : "$entry_id"}}}, {$addFields : {rangeIds : {$range : ["$min", "$max"]}}}, {$lookup : {from : "entries", localField : "rangeIds", foreignField : "entry_id", as : "entries"}}, {$project : {_id :0, missingIds : {$setDifference : ["$rangeIds", "$entries.entry_id"]}}} ] )
{ "missingIds" : [ 27497714, 27497715, 27497716, 27497717, 27497718, 27497719, 27497720, 27497721, 27497722, 27497723, 27497724, 27497725, 27497726, 27497727, 27497728, 27497729, 27497730, 27497731, 27497732, 27497733, 27497734, 27497735, 27497736, 27497737, 27497738, 27497739, 27497740, 27497741, 27497742, 27497743, 27497744, 27497745, 27497746, 27497747, 27497748, 27497749, 27497751, 27497752, 27497753, 27497754, 27497755, 27497756, 27497757, 27497758, 27497759, 27497760 ] }
>

Using Mongo query to find an in array element

Have records in my db with such structure:
{
"_id" : "YA14163134",
"discount" : "",
"retail" : "115.0000",
"cost" : "",
"description" : "Caterpillar Mens Big Twist Analog Watch",
"stock_update" : "05",
"brand" : "Kronos",
"img_url" : "image2342000.jpg",
"UPC" : "4895053708012",
"stock" : [ [ "1611292138", "5" ], [ "1612032232", "4" ], [ "1612050918", "0" ] ]
}
and looking for query to get all records that have in "stock" "1612050918" value. That is update id.
Trying something like:
db.vlc.find({stock: {$elemMatch:{$all:["1612050918"]}}})
or
db.vlc.find({stock: { $in : ['1611292138']}})
or
db.vlc.find({stock: { $all : [[1611292138]]}})
with no result. It works only if I include in request second array element like here
db.vlc.find({stock: { $all : [['1611292138', '7']]}})
but that limit my request to all items from update with qnty 7 when I need with any qnty. Thank you in advance!
use this query:
{
"stock" : {
"$elemMatch" : {
"$elemMatch" : {
"$eq" : "1611292138"
}
}
}
}
Explanation:
The first $elemMatch allows you to scan all three arrays under stock
The nex $elemMatch allows you to scan the two elements in the sub-arrays
since $elemMatch requires a query object, the $eq notation is used for a literal match.
If you know that "1611292138" will always be the first element of the sub-array, your query becomes simpler:
{ "stock" : { "$elemMatch" : { "0" : "1611292138" } } }
Explanation:
Scan all arrays under stock
Look for "1611292138" in the first slot of each sub-array
Use nested $elemMatch as below :
db.vlc.find({stock: { "$elemMatch":{"$elemMatch":{"$all":["1612050918"]}}}})
Or
db.vlc.find({stock: {"$elemMatch":{ "$elemMatch":{"$in" : ["1612050918"]}}}})

Mongo aggregation on array elements

I have a mongo document like
{ "_id" : 12, "location" : [ "Kannur","Hyderabad","Chennai","Bengaluru"] }
{ "_id" : 13, "location" : [ "Hyderabad","Chennai","Mysore","Ballary"] }
From this how can I get the location aggregation (distinct area count).
some thing like
Hyderabad 2,
Kannur 1,
Chennai 2,
Bengaluru 1,
Mysore 1,
Ballary 1
Using aggregation you cannot get the exact output that you want. One of the limitations of aggregation pipeline is its inability to transform values to keys in the output document.
For example, Kannur is one of the values of the location field, in the input document. In your desired output structure it needs to be the key("kannur":1). This is not possible using aggregation. While, this can be used achieving map-reduce, you can however get a very closely related and useful structure using aggregation.
Unwind the location array.
Group by the location fields, get the count of individual locations
using the $sum operator.
Group again all the documents once again to get a consolidated array
of results.
Code:
db.collection.aggregate([
{$unwind:"$location"},
{$group:{"_id":"$location","count":{$sum:1}}},
{$group:{"_id":null,"location_details":{$push:{"location":"$_id",
"count":"$count"}}}},
{$project:{"_id":0,"location_details":1}}
])
Sample o/p:
{
"location_details" : [
{
"location" : "Ballary",
"count" : 1
},
{
"location" : "Mysore",
"count" : 1
},
{
"location" : "Bengaluru",
"count" : 1
},
{
"location" : "Chennai",
"count" : 2
},
{
"location" : "Hyderabad",
"count" : 2
},
{
"location" : "Kannur",
"count" : 1
}
]
}

MongoDB Group querying for Embeded Document

I have a mongo document which has structure like
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-26",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-26T08:08:38.716Z"),
"value" : 98.5
},
{
"dateTime" : ISODate("2014-11-26T08:18:38.716Z"),
"value" : 95.5
},
{
"dateTime" : ISODate("2014-11-26T08:28:38.663Z"),
"value" : 90.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-26T08:08:38.716Z"),
"from" : ISODate("2014-11-26T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.776Z")
}
{
"_id" : "THIS_IS_A_DHP_USER_ID+2014-11-25",
"_class" : "weight",
"items" : [
{
"dateTime" : ISODate("2014-11-25T08:08:38.716Z"),
"value" : 198.5
},
{
"dateTime" : ISODate("2014-11-25T08:18:38.716Z"),
"value" : 195.5
},
{
"dateTime" : ISODate("2014-11-25T08:28:38.716Z"),
"value" : 190.5
}
],
"source" : "MANUAL",
"to" : ISODate("2014-11-25T08:08:38.716Z"),
"from" : ISODate("2014-11-25T08:08:38.716Z"),
"userId" : "THIS_IS_A_DHP_USER_ID",
"createdDate" : ISODate("2014-11-26T08:38:38.893Z")
}
The query that want to fire on this document structure,
finding documents for a particular user id
unwinding the embedded array
Grouping the documents based over _id with -
summing the items.value of the embedded array
getting the minimum of the items.dateTime of the embedded array
Note. The sum and min, I want to get as a object i.e. { value : sum , dateTime : min of the items.dateTime} inside an array of items
Can this be achieved in an single aggregation call using push or some other technique.
When you group over a particular _id, and apply aggregation operators such as $min and $sum, there exists only one record per group(_id), that holds the sum and the minimum date for that group. So there is no way to obtain a different sum and a different minimum date for the same _id, which also logically makes no sense.
What you would want to do is:
db.collection.aggregate([
{$match:{"userId":"THIS_IS_A_DHP_USER_ID"}},
{$unwind:"$items"},
{$group:{"_id":"$_id",
"values":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}}
])
But in case when you do not query for a particular userId, then you would have multiple groups, each group having its own sum and min date. Then it makes sense to accumulate all these results together in an array using the $push operator.
db.collection.aggregate([
{$unwind:"$items"},
{$group:{"_id":"$_id",
"result":{$sum:"$items.value"},
"dateTime":{$min:"$items.dateTime"}}},
{$group:{"_id":null,"result":{$push:{"value":"$result",
"dateTime":"$dateTime",
"id":"$_id"}}}},
{$project:{"_id":0,"result":1}}
])
you should use following aggregation may it works
db.collectionName.aggregate(
{"$unwind":"$items"},
{"$match":{"userId":"THIS_IS_A_DHP_USER_ID"}},
{"$group":{"_id":"$_id","sum":{"$sum":"$items.value"},
"minDate":{"$min":"$items.dateTime"}}}
)

Map reduce in mongodb

I have mongo documents in this format.
{"_id" : 1,"Summary" : {...},"Examples" : [{"_id" : 353,"CategoryId" : 4},{"_id" : 239,"CategoryId" : 28}, ... ]}
{"_id" : 2,"Summary" : {...},"Examples" : [{"_id" : 312,"CategoryId" : 2},{"_id" : 121,"CategoryId" : 12}, ... ]}
How can I map/reduce them to get a hash like:
{ [ result[categoryId] : count_of_examples , .....] }
I.e. count of examples of each category.
I have 30 categories at all, all specified in Categories collection.
If you can use 2.1 (dev version of upcoming release 2.2) then you can use Aggregation Framework and it would look something like this:
db.collection.aggregate( [
{$project:{"CatId":"$Examples.CategoryId","_id":0}},
{$unwind:"$CatId"},
{$group:{_id:"$CatId","num":{$sum:1} } },
{$project:{CategoryId:"$_id",NumberOfExamples:"$num",_id:0 }}
] );
The first step projects the subfield of Examples (CategoryId) into a top level field of a document (not necessary but helps with readability), then we unwind the array of examples which creates a separate document for each array value of CatId, we do a "group by" and count them (I assume each instance of CategoryId is one example, right?) and last we use projection again to relabel the fields and make the result look like this:
"result" : [
{
"CategoryId" : 12,
"NumberOfExamples" : 1
},
{
"CategoryId" : 2,
"NumberOfExamples" : 1
},
{
"CategoryId" : 28,
"NumberOfExamples" : 1
},
{
"CategoryId" : 4,
"NumberOfExamples" : 1
}
],
"ok" : 1