MongoDB Aggregation count over a relation - mongodb

I've two collection, Buildings and Orders. A Building can have many Orders (1:N Relation).
I'm trying to achieve a "Top Ten Statistic"(Which Buildings have the most Orders) with the aggregation framework.
My Problem is, how can i get the total Orders per Building? Is there a way to "mix" data from two collections in one aggregation?
Currently i'm doing something like this:
db.buildings.aggregate( [
{ $group : _id : { street : "$street",
city : "$city",
orders_count : "$orders_count" }},
{ $sort : { _id.orders_count : -1 }},
{ $limit : 10}
] );
But in this case the "orders_count" is pre-calculated value. It works but is very inefficient and to slow for "live" aggregation.
Is there a way to count the related orders per building directly in the aggregation (im sure there is a way...)?
Many Thanks

You don't say how orders relate to buildings in your schema but if an order has a building id or name it references, just group by that:
db.orders.aggregate( { $group : { _id: "$buildingId",
sum: {$sum:1}
}
},
/* $sort by sum:-1, $limit:10 like you already have */
)

Related

How to perform sort and limit on whole group by in MongoDB - mongoose?

I am trying to apply sort on the whole group and limit the results.
But my below mongoose code sorts the group on the mentioned limit.
collection.aggregate([
{ $sort : {NAME: -1}} ,
{ $match : { NAME : {$regex : `.*${query.NAME.toUpperCase()}.*`} } },
{ $group : { _id : "$NAME", NAME:{$first:"$NAME"} }},
{ $skip : 1},
{ $limit : 10}],function(err,data){}
Let's say it sort first 10 results in the group, instead of sorting everything and show the first 10 results.
Thanks in advance.
See this link to the documentation. I haven't tested this since I haven't got the environment, nor the database to do so, but, I believe you might want to put your $sort argument just before $limit in the pipeline.

Find duplicates across multiple MongoDb collections

Coming from MySQL, I am wondering, how can we find duplicates across multiple collections in MongoDB ?
Let say I have two (or more) collections :
human :
_id
firstname
cat:
_id
nickname
What would be an efficient solution to list duplicated names. This includes if a name is used by 2+ users only, by 2+ cats only, or by at least one user and one cat. Our result should therefor contains duplicates of both collections AND duplicates across those collections (cats and humans with the same name)
Expected result :
The list of the duplicated values, the number of occurence could be interesting but is not essential.
Question is not about whether or not the proposed db schema would be appropriate in this situation, but about the best MongoDB solution.
Edit
My description of a duplicate was not really what I intended it, if it is not existing in one collection but is duplicated in another collection it still is a duplicate
For two collections with MongoDB 3.2 you can use $lookup aggregation (it's equivalent to left outer join you have used in MySQL):
db.human.aggregate([
{$group: {_id: "$firstname"}},
{$lookup: {
from: "cat",
localField: "_id",
foreignField: "nickname",
as: "cats"
}
},
{$match:{cats:{$ne:[]}}},
{$project: {catsCount:{$size:"$cats"}}}])
Stages:
Group humans by name as there could be humans with same name
Attach to each group array of cats which have same nickname as group id (i.e. human firstname)
Filter out those humans which don't have any matches in cats collection
Project result to get only names and count of matches
Result will look like
[
{ _id: "Bob", catsCount: 2 },
{ _id: "Alex", catsCount: 1 }
]
NOTE: If you need to join several collections, you can apply $lookup stage several times.
A solution I found :
Combine data in one collection
Find duplicates in the collection
And export it in a table
Code :
mapHuman = function() {
var values = {
name: this.firstname
};
emit(this._id, values);
};
mapCat = function() {
var values = {
name: this.name
};
emit(this._id, values);
};
reduce = function(k, values) {
var result = {names: []};
values.forEach(function(value) {
result.names.push(value.slug);
});
return result;
};
db.human.mapReduce(mapUsers, reduce, {"out": {"reduce": "name"}});
db.cat.mapReduce(mapUsers, reduce, {"out": {"reduce": "name"}});
db.name.aggregate(
{"$group" : { "_id": "$value.name", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },
{"$project": {"name" : "$_id", "_id" : 0}},
{"$out": "duplicate"}
)
db.duplicate.find()

Run map reduce for all keys in collections - mongodb

i am using map reduce in mongodb to find out the number of orders for a customer like this
db.order.mapReduce(
function() {
emit (this.customer,{count:1})
},
function(key,values){
var sum =0 ;
values.forEach(
function(value) {
sum+=value['count'];
}
);
return {count:sum};
},
{
query:{customer:ObjectId("552623e7e4b0cade517f9714")},
out:"order_total"
}).find()
which gives me an output like this
{ "_id" : ObjectId("552623e7e4b0cade517f9714"), "value" : { "count" : 13 } }
Currently it is working for a single customer which is a key. Now i want this to run this map reduce query for all customers in order collection, and output the result for all like this single output. Is there any way through which I can do it for all customers in order?
Using a map/reduce for that simple task is a bit like using a (comparatively slow) sledgehammer to crack a nut. The aggregation framework was basically invented for this kind of simple aggregation (and can do a lot more for you!):
db.order.aggregate([
{ "$group":{ "_id":"$customer", "orders":{ "$sum": 1 }}},
{ "$out": "order_total"}
])
Depending on your use case, you can even omit the $out stage and consume the results directly.
> db.orders.aggregate([{ "$group":{ "_id":"$customer", "orders":{ "$sum": 1 }}}])
{ "_id" : "b", "orders" : 2 }
{ "_id" : "a", "orders" : 3 }
Note that with very large collections, this most likely is not suitable, as it make take a while (but it should still be faster than a map/reduce operation).
For finding the number of orders of a single customer, you can use a simple query and use the cursor.count() method:
> db.orders.find({ "customer": "a" }).count()
3

Repeating results in grouped sort on pagination with skip and limit

I have the following aggregate query to sort grouped data and return it in pages:
Product.aggregate([
{ $match : { categories : category, brand : { $ne: null } }},
{ $group : { _id : '$brand', rating: { $max: '$rating' } } },
{ $sort : { rating : -1 } },
{ $skip : skip },
{ $limit : limit }], function(error, results){
....
})
This is meant to find the brands of the products with the input category and with a brand, group them by brand, and sort the brand groups by the highest rated product in the brand group. This is then meant to be paginated, using skip and limit parameters.
When I paginate it I end up getting occasional results repeating (just one group every now and then, I haven't noticed a pattern). I know that the data does not include repeated products, and none of the data is changing between calls, so what am I doing wrong with the query to get these results?
The issue in the aggregation is linked to the case when there are several brands with the same rating. In that case, you cannot guarantee that those will appear in the same order.
The solution is to force a brand sort for the equal rating cases:
{ $sort : { rating : -1, "_id" : 1 } }

Get documents with tags in list, ordered by total number of matches

Given the following MongoDB collection of documents :
{
title : 'shirt one'
tags : [
'shirt',
'cotton',
't-shirt',
'black'
]
},
{
title : 'shirt two'
tags : [
'shirt',
'white',
'button down collar'
]
},
{
title : 'shirt three'
tags : [
'shirt',
'cotton',
'red'
]
},
...
How do you retrieve a list of items matching a list of tags, ordered by the total number of matched tags? For example, given this list of tags as input:
['shirt', 'cotton', 'black']
I'd want to retrieve the items ranked in desc order by total number of matching tags:
item total matches
-------- --------------
Shirt One 3 (matched shirt + cotton + black)
Shirt Three 2 (matched shirt + cotton)
Shirt Two 1 (matched shirt)
In a relational schema, tags would be a separate table, and you could join against that table, count the matches, and order by the count.
But, in Mongo... ?
Seems this approach could work,
break the input tags into multiple "IN" statements
query for items by "OR"'ing together the tag inputs
i.e. where ( 'shirt' IN items.tags ) OR ( 'cotton' IN items.tags )
this would return, for example, three instances of "Shirt One", 2 instances of "Shirt Three", etc
map/reduce that output
map: emit(this._id, {...});
reduce: count total occurrences of _id
finalize: sort by counted total
But I'm not clear on how to implement this as a Mongo query, or if this is even the most efficient approach.
As i answered in In MongoDB search in an array and sort by number of matches
It's possible using Aggregation Framework.
Assumptions
tags attribute is a set (no repeated elements)
Query
This approach forces you to unwind the results and reevaluate the match predicate with unwinded results, so its really inefficient.
db.test_col.aggregate(
{$match: {tags: {$in: ["shirt","cotton","black"]}}},
{$unwind: "$tags"},
{$match: {tags: {$in: ["shirt","cotton","black"]}}},
{$group: {
_id:{"_id":1},
matches:{$sum:1}
}},
{$sort:{matches:-1}}
);
Expected Results
{
"result" : [
{
"_id" : {
"_id" : ObjectId("5051f1786a64bd2c54918b26")
},
"matches" : 3
},
{
"_id" : {
"_id" : ObjectId("5051f1726a64bd2c54918b24")
},
"matches" : 2
},
{
"_id" : {
"_id" : ObjectId("5051f1756a64bd2c54918b25")
},
"matches" : 1
}
],
"ok" : 1
}
Right now, it isnt possible to do unless you use MapReduce. The only problem with MapReduce is that it is slow (compared to a normal query).
The aggregation framework is slated for 2.2 (so should be available in 2.1 dev release) and should make this sort of thing much easier to do without MapReduce.
Personally, I do not think using M/R is an efficient way to do it. I would rather query for all the documents and do those calculations on the application side. It is easier and cheaper to scale your app servers than it is to scale your database servers so let the app servers do the number crunching. Of those, this approach may not work for you given your data access patterns and requirements.
An even simpler approach may be to just include a count property in each of your tag objects and whenever you $push a new tag to the array, you also $inc the count property. This is a common pattern in the MongoDB world, at least until the aggregation framework.
I'll second #Bryan in saying that MapReduce is the only possible way at the moment (and it's far from perfect). But, in case you desperately need it, here you go :-)
var m = function() {
var searchTerms = ['shirt', 'cotton', 'black'];
var me = this;
this.tags.forEach(function(t) {
searchTerms.forEach(function(st) {
if(t == st) {
emit(me._id, {matches : 1});
}
})
})
};
var r = function(k, vals) {
var result = {matches : 0};
vals.forEach(function(v) {
result.matches += v.matches;
})
return result;
};
db.shirts.mapReduce(m, r, {out: 'found01'});
db.found01.find();