MongoDB 2.2 Aggregation Framework group by field name - mongodb

Is it possible to group-by field name? Or do I need a different structure so I can group-by value?
I know we can use group by on values and we can unwind arrays, but is it possible to get total apples, pears and oranges owned by John amongst the three houses here without specifying "apples", "pears" and "oranges" explicitly as part of the query? (so NOT like this);
// total all the fruit John has at each house
db.houses.aggregate([
{
$group: {
_id: null,
"apples": { $sum: "$people.John.items.apples" },
"pears": { $sum: "$people.John.items.pears" },
"oranges": { $sum: "$people.John.items.oranges" },
}
},
])
In other words, can I group-by the first field-name under "items" and get the aggregate sum of apples:104, pears:202 and oranges:306, but also bananas, melons and anything else that might be there? Or do I need to restructure the data into an array of key/value pairs like categories?
db.createCollection("houses");
db.houses.remove();
db.houses.insert(
[
{
House: "birmingham",
categories : [
{
k : "location",
v : { d : "central" }
}
],
people: {
John: {
items: {
apples: 2,
pears: 1,
oranges: 3,
}
},
Dave: {
items: {
apples: 30,
pears: 20,
oranges: 10,
},
},
},
},
{
House: "London", categories: [{ k: "location", v: { d: "central" } }, { k: "type", v: { d: "rented" } }],
people: {
John: { items: { apples: 2, pears: 1, oranges: 3, } },
Dave: { items: { apples: 30, pears: 20, oranges: 10, }, },
},
},
{
House: "Cambridge", categories: [{ k: "type", v: { d: "rented" } }],
people: {
John: { items: { apples: 100, pears: 200, oranges: 300, } },
Dave: { items: { apples: 0.3, pears: 0.2, oranges: 0.1, }, },
},
},
]
);
Secondly, and more importantly, could I then also group by "house.categories.k" ? In other words, is it possible to find out how many "apples" "John" has in "rented" vs "owned" or "friends" houses (so group by "categories.k.type")?
Finally - if this is even possible, is it sensible? At first I thought it was quite useful to create dictionaries of nested objects using actual field names of the object, as it seemed a logical use of a document database, and it seemed to make the MR queries easier to write vs arrays, but now I'm starting to wonder if this is all a bad idea and having variable field names makes it very tricky/inefficient to write aggregation queries.

OK, so I think I have this partially solved. At least for the shape of data in the initial question.
// How many of each type of fruit does John have at each location
db.houses.aggregate([
{
$unwind: "$categories"
},
{
$match: { "categories.k": "location" }
},
{
$group: {
_id: "$categories.v.d",
"numberOf": { $sum: 1 },
"Total Apples": { $sum: "$people.John.items.apples" },
"Total Pears": { $sum: "$people.John.items.pears" },
}
},
])
which yields;
{
"result" : [
{
"_id" : "central",
"numberOf" : 2,
"Total Apples" : 4,
"Total Pears" : 2
}
],
"ok" : 1
}
Note that there's only "central", but if I had other "location"s in my DB I'd get a range of totals for each location. I wouldn't need the $unwind step if I had named properties instead of an array for "categories", but this is where I find the structure is at odds with itself. There are several keywords likely under "categories". The sample data shows "type" and "location" but there could be around 10 of these categorizations all with different values. So if I used named fields;
"categories": {
location: "london",
type: "owned",
}
...the problem I then have is indexing. I can't afford to simply index "location" since those are user-defined categories, and if 10,000 users choose 10,000 different ways of categorizing their houses I'd need 10,000 indexes, one for each field. But by making it an array I only need one on the array field itself. The downside is the $unwind step. I ran into this before with MapReduce. The last thing you want to be doing is a ForEach loop in JavaScript to cycle an array if you can help it. What you really want is to filter out the fields by name because it's much quicker.
Now this is all well and good where I already know what fruit I'm looking for, but if I don't, it's much harder. I can't (as far as I can see) $unwind or otherwise ForEach "people.John.items" here. If I could, I'd be overjoyed. So since the names of fruit are again user-defined, it looks like I need to convert them to an array as well, like this;
{
"people" : {
"John" : {
"items" : [
{ k:"apples", v:100 },
{ k:"pears", v:200 },
{ k:"oranges", v:300 },
]
},
}
}
So that now allows me get the fruit (where I don't know which fruit to look for) totalled, again by location;
db.houses.aggregate([
{
$unwind: "$categories"
},
{
$match: { "categories.k": "location" }
},
{
$unwind: "$people.John.items"
},
{
$group: { // compound key - thanks to Jenna
_id: { fruit:"$people.John.items.k", location:"$categories.v.v" },
"numberOf": { $sum: 1 },
"Total Fruit": { $sum: "$people.John.items.v" },
}
},
])
So now I'm doing TWO $unwinds. If you're thinking that looks grotesquely ineffecient, you'd be right. If I have just 10,000 house records, with 10 categories each, and 10 types of fruit, this query takes half a minute to run.
OK, so I can see that moving the $match before the $unwind improves things significantly but then it's the wrong output. I don't want an entry for every category, I want to filter out just the "location" categories.

I would have made this comment, but it's easier to format in a response text box.
{ _id: 1,
house: "New York",
people: {
John: {
items: {apples: 1, oranges:2}
}
Dave: {
items: {apples: 2, oranges: 1}
}
}
}
{ _id: 2,
house: "London",
people: {
John: {
items: {apples: 3, oranges:2}
}
Dave: {
items: {apples: 1, oranges:3}
}
}
}
Just to make sure I understand your question, is this what you're trying to accomplish?
{location: "New York", johnFruit:3}
{location: "London", johnFruit: 5}
Since categories is not nested under house, you can't group by "house.categories.k", but you can use a compound key for the _id of $group to get this result:
{ $group: _id: {house: "$House", category: "$categories.k"}
Although "k" doesn't contain the information that you're presumably trying to group by. And as for "categories.k.type", type is the value of k, so you can't use this syntax. You would have to group by "categories.v.d".
It may be possible with your current schema to accomplish this aggregation using $unwind, $project, possibly $match, and finally $group, but the command won't be pretty. If possible, I would highly recommend restructuring your data to make this aggregation much simpler. If you would like some help with schema, please let us know.

I'm not sure if this is a possible solution, but what if you begin the aggregation process by determining the number of different locations using distinct(), and run separate aggregation commands for each location? distinct() may not be efficient, but every subsequent aggregation will be able to use $match, and therefore, the index on categories. You could use the same logic to count the fruit for "categories.type".
{
"_id" : 1,
"house" : "New York",
"people" : {
"John" : [{"k" : "apples","v" : 1},{"k" : "oranges","v" : 2}],
"Dave" : [{"k" : "apples","v" : 2},{"k" : "oranges","v" : 1}]
},
"categories" : [{"location" : "central"},{"type" : "rented"}]
}
{
"_id" : 2,
"house" : "London",
"people" : {
"John" : [{"k" : "apples","v" : 3},{"k" : "oranges","v" : 2}],
"Dave" : [{"k" : "apples","v" : 3},{"k" : "oranges","v" : 1}]
},
"categories" : [{"location" : "suburb"},{"type" : "rented"}]
}
{
"_id" : 3,
"house" : "London",
"people" : {
"John" : [{"k" : "apples","v" : 0},{"k" : "oranges","v" : 1}],
"Dave" : [{"k" : "apples","v" : 2},{"k" : "oranges","v" : 4}]
},
"categories" : [{"location" : "central"},{"type" : "rented"}]
}
Run distinct() and iterate through the results by running aggregate() commands for each unique value of "categories.location":
db.agg.distinct("categories.location")
[ "central", "suburb" ]
db.agg.aggregate(
{$match: {categories: {location:"central"}}}, //the index entry is on the entire
{$unwind: "$people.John"}, //document {location:"central"}, so
{$group:{ //use this syntax to use the index
_id:"$people.John.k",
"numberOf": { $sum: 1 },
"Total Fruit": { $sum: "$people.John.v"}
}
}
)
{
"result" : [
{
"_id" : "oranges",
"numberOf" : 2,
"Total Fruit" : 3
},
{
"_id" : "apples",
"numberOf" : 2,
"Total Fruit" : 1
}
],
"ok" : 1
}

Related

MongoDB - average of a feature after slicing the max of another feature in a group of documents

I am very new in mongodb and trying to work around a couple of queries, which I am not even sure if they 're feasible.
The structure of each document is:
{
"_id" : {
"$oid": Text
},
"grade": Text,
"type" : Text,
"score": Integer,
"info" : {
"range" : NumericText,
"genre" : Text,
"special": {keys:values}
}
};
The first query would give me:
per grade (thinking I have to group by "grade")
the highest range (thinking I have to call $max:$range, it should work with a string)
the score average (thinking I have to call $avg:$score)
I tried something like the following, which apparently is wrong:
collection.aggregate([{
'$group': {'_id':'$grade',
'highest_range': {'$max':'$info',
'average_score': {'$avg':'$score'}}}
}])
The second query would give the distinct genre records.
Any help is valuable!
ADDITION - providing an example of the document and the output:
{
"_id" : {
"$oid": '60491ea71f8'
},
"grade": D,
"type" : Shop,
"score": 4,
"info" : {
"range" : "2",
"genre" : 'Pet shop',
"special": {'ClientsParking':True,
'AcceptsCreditCard':True,
'BikeParking':False}
}
};
And the output I am looking into is something within lines:
[{grade: A, "highest_range":"4", "average_score":3.5},
{grade: B, "highest_range":"7", "average_score":8.3},
{grade: C, "highest_range":"3", "average_score":2.4}]
I think you are looking for this:
db.collection.aggregate([
{
'$group': {
'_id': '$grade',
'highest_range': { '$max': '$info.range' },
'average_score': { '$avg': '$score' }
}
}
])
However, $min, $max, $avg works only on numbers, not strings.
You could try { '$first': '$info.range' } or { '$last': '$info.range' }. But it requires $sort for proper result. Not clear what you mean by "highest range".

$sum aggrigate with distint condition MongoDb

Collection of Player Stats
{
"_id" : ObjectId("612d07a8a8a6a71ee90dadc3"),
"player" : ObjectId("612895db68507d470af61326"),
"team" : ObjectId("611cd49e9be5e530de899065"),
"role" : "IGL",
"steamid" : "STEAM_1:1:239330230",
"kills" : NumberInt(19),
"deaths" : NumberInt(17),
"assists" : NumberInt(3),
"match" : ObjectId("6128d5ffa2988ca4cb03a958"),
"game" : ObjectId("606701ed58d20c0e47d95fee"),
"league" : ObjectId("60f139faf2dfa914ad5682b6"),
"map" : "de_dust2",
"win" : 1.0
}
I want the total wins per map of a team based on distinct matches, only $sum win for a unique match.
Query
let teamWinsOnMap = await PlayerStats.aggregate()
.group({ _id: {team: "$team", map: "$map"},
match: { $addToSet: '$match' },
team: { $first: '$team' },
map: { $first: "$map" },
win: { $sum: '$win' }
})
Result
[
{
_id: { team: 611cd49e9be5e530de899065, map: 'de_inferno' },
match: [ 612cc791b0c36af46dbd0c01, 612cc791b0c36af46dbd0c08 ],
team: 611cd49e9be5e530de899065,
map: 'de_inferno',
win: 0
},
{
_id: { team: 611cd49e9be5e530de899065, map: 'de_dust2' },
match: [ 6128d5ffa2988ca4cb03a958 ],
team: 611cd49e9be5e530de899065,
map: 'de_dust2',
win: 2
}
]
PROBLEM
The Team is having 2 wins while playing just one match, because we have stats of two player for that match
The problem is that whilst you are returning unique matches from addToSet, you aren't actually grouping by them and it's just counting the wins from all the documents that are inside that group stage.
If you were to perform two separate groups it should work. The first will get all unique matches by team on a certain map, take the first win count from that group as they should all be the same. Then group again and apply the sum it will return the correct value.
Group
{
_id: {
match:"$match", team:"$team", map:"$map"
},
team:{$first:"$team"},
map:{$first:"$map"},
match:{$first:"$match"},
win:{$first:"$win"}
}
And then by
Group
{
_id: {
team:"$team", map:"$map"
},
team:{$first:"$team"},
map:{$first:"$map"},
matches:{$addToSet:"$match"},
win:{$sum:"$win"}
}

How to update a document with a reference to its previous state?

Is it possible to reference the root document during an update operation such that a document like this:
{"name":"foo","value":1}
can be updated with new values and have the full (previous) document pushed into a new field (creating an update history):
{"name":"bar","value":2,"previous":[{"name:"foo","value":1}]}
And so on..
{"name":"baz","value":3,"previous":[{"name:"foo","value":1},{"name:"bar","value":2}]}
I figure I'll have to use the new aggregate set operator in Mongo 4.2, but how can I achieve this?
Ideally I don't want to have to reference each field explicitly. I'd prefer to push the root document (minus the _id and previous fields) without knowing what the other fields are.
In addition to the new $set operator, what makes your use case really easier with Mongo 4.2 is the fact that db.collection.update() now accepts an aggregation pipeline, finally allowing the update of a field based on its current value:
// { name: "foo", value: 1 }
db.collection.update(
{},
[{ $set: {
previous: {
$ifNull: [
{ $concatArrays: [ "$previous", [{ name: "$name", value: "$value" }] ] },
[ { name: "$name", value: "$value" } ]
]
},
name: "bar",
value: 2
}}],
{ multi: true }
)
// { name: "bar", value: 2, previous: [{ name: "foo", value: 1 }] }
// and if applied again:
// { name: "baz", value: 3, previous: [{ name: "foo", value: 1 }, { name: "bar", value: 2 } ] }
The first part {} is the match query, filtering which documents to update (in our case probably all documents).
The second part [{ $set: { previous: { $ifNull: [ ... } ] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline):
$set is a new aggregation operator and an alias of $addFields. It's used to add/replace a new field (in our case "previous") with values from the current document.
Using an $ifNull check, we can determine whether "previous" already exists in the document or not (this is not the case for the first update).
If "previous" doesn't exist (is null), then we have to create it and set it with an array of one element: the current document: [ { name: "$name", value: "$value" } ].
If "previous" already exist, then we concatenate ($concatArrays) the existing array with the current document.
Don't forget { multi: true }, otherwise only the first matching document will be updated.
As you mentioned "root" in your question and if your schema is not the same for all documents (if you can't tell which fields should be used and pushed in the "previous" array), then you can use the $$ROOT variable which represents the current document and filter out the "previous" array. In this case, replace both { name: "$name", value: "$value" } from the previous query with:
{ $arrayToObject: { $filter: {
input: { $objectToArray: "$$ROOT" },
as: "root",
cond: { $ne: [ "$$root.k", "previous" ] }
}}}
Imho, you are making your life indefinitely more complex for no reason with such complicated data models.
Think of what you really want to achieve. You want to correlate different values in one or more interconnected series which are written to the collection consecutively.
Storing this in one document comes with some strings attached. While it seems to be reasonable in the beginning, let me name a few:
How do you get the most current document if you do not know it's value for name?
How do you deal with very large series, which make the document hit the 16MB limit?
What is the benefit of the added complexity?
Simplify first
So, let's assume you have only one series for a moment. It gets as simple as
[{
"_id":"foo",
"ts": ISODate("2019-07-03T17:40:00.000Z"),
"value":1
},{
"_id":"bar",
"ts": ISODate("2019-07-03T17:45:00.000"),
"value":2
},{
"_id":"baz",
"ts": ISODate("2019-07-03T17:50:00.000"),
"value":3
}]
Assuming the name is unique, we can use it as _id, potentially saving an index.
You can actually get the semantic equivalent by simply doing a
> db.seriesa.find().sort({ts:-1})
{ "_id" : "baz", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "_id" : "bar", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }
{ "_id" : "foo", "ts" : ISODate("2019-07-03T17:40:00Z"), "value" : 1 }
Say you only want to have the two latest values, you can use limit():
> db.seriesa.find().sort({ts:-1}).limit(2)
{ "_id" : "baz", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "_id" : "bar", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }
Should you really need to have the older values in a queue-ish array
db.seriesa.aggregate([{
$group: {
_id: "queue",
name: {
$last: "$_id"
},
value: {
$last: "$value"
},
previous: {
$push: {
name: "$_id",
value: "$value"
}
}
}
}, {
$project: {
name: 1,
value: 1,
previous: {
$slice: ["$previous", {
$subtract: [{
$size: "$previous"
}, 1]
}]
}
}
}])
Nail it
Now, let us say you have more than one series of data. Basically, there are two ways of dealing with it: put different series in different collections or put all the series in one collection and make a distinction by a field, which for obvious reasons should be indexed.
So, when to use what? It boils down wether you want to do aggregations over all series (maybe later down the road) or not. If you do, you should put all series into one collection. Of course, we have to slightly modify our data model:
[{
"name":"foo",
"series": "a"
"ts": ISODate("2019-07-03T17:40:00.000Z"),
"value":1
},{
"name":"bar",
"series": "a"
"ts": ISODate("2019-07-03T17:45:00.000"),
"value":2
},{
"name":"baz",
"series": "a"
"ts": ISODate("2019-07-03T17:50:00.000"),
"value":3
},{
"name":"foo",
"series": "b"
"ts": ISODate("2019-07-03T17:40:00.000Z"),
"value":1
},{
"name":"bar",
"series": "b"
"ts": ISODate("2019-07-03T17:45:00.000"),
"value":2
},{
"name":"baz",
"series": "b"
"ts": ISODate("2019-07-03T17:50:00.000"),
"value":3
}]
Note that for demonstration purposes, I fell back for the default ObjectId value for _id.
Next, we create an index over series and ts, as we are going to need it for our query:
> db.series.ensureIndex({series:1,ts:-1})
And now our simple query looks like this
> db.series.find({"series":"b"},{_id:0}).sort({ts:-1})
{ "name" : "baz", "series" : "b", "ts" : ISODate("2019-07-03T17:50:00Z"), "value" : 3 }
{ "name" : "bar", "series" : "b", "ts" : ISODate("2019-07-03T17:45:00Z"), "value" : 2 }
{ "name" : "foo", "series" : "b", "ts" : ISODate("2019-07-03T17:40:00Z"), "value" : 1 }
In order to generate the queue-ish like document, we need to add a match state
> db.series.aggregate([{
$match: {
"series": "b"
}
},
// other stages omitted for brevity
])
Note that the index we created earlier will be utilized here.
Or, we can generate a document like this for every series by simply using series as the _id in the $group stage and replace _id with name where appropriate
db.series.aggregate([{
$group: {
_id: "$series",
name: {
$last: "$name"
},
value: {
$last: "$value"
},
previous: {
$push: {
name: "$name",
value: "$value"
}
}
}
}, {
$project: {
name: 1,
value: 1,
previous: {
$slice: ["$previous", {
$subtract: [{
$size: "$previous"
}, 1]
}]
}
}
}])
Conclusion
Stop Being Clever when it comes to data models in MongoDB. Most of the problems with data models I saw in the wild and the vast majority I see on SO come from the fact that someone tried to be Smart (by premature optimization) ™.
Unless we are talking of ginormous series (which can not be, since you settled for a 16MB limit in your approach), the data model and queries above are highly efficient without adding unneeded complexity.
addMultipleData: (req, res, next) => {
let name = req.body.name ? req.body.name : res.json({ message: "Please enter Name" });
let value = req.body.value ? req.body.value : res.json({ message: "Please Enter Value" });
if (!req.body.name || !req.body.value) { return; }
//Step 1
models.dynamic.findOne({}, function (findError, findResponse) {
if (findResponse == null) {
let insertedValue = {
name: name,
value: value
}
//Step 2
models.dynamic.create(insertedValue, function (error, response) {
res.json({
message: "succesfully inserted"
})
})
}
else {
let pushedValue = {
name: findResponse.name,
value: findResponse.value
}
let updateWith = {
$set: { name: name, value: value },
$push: { previous: pushedValue }
}
let options = { upsert: true }
//Step 3
models.dynamic.updateOne({}, updateWith, options, function (error, updatedResponse) {
if (updatedResponse.nModified == 1) {
res.json({
message: "succesfully inserted"
})
}
})
}
})
}
//This is the schema
var multipleAddSchema = mongoose.Schema({
"name":String,
"value":Number,
"previous":[]
})

Project values of different columns into one field

{
"_id" : ObjectId("5ae84dd87f5b72618ba7a669"),
"main_sub" : "MATHS",
"reporting" : [
{
"teacher" : "ABC"
}
],
"subs" : [
{
"sub" : "GEOMETRIC",
"teacher" : "XYZ",
}
]
}
{
"_id" : ObjectId("5ae84dd87f5b72618ba7a669"),
"main_sub" : "SOCIAL SCIENCE",
"reporting" : [
{
"teacher" : "XYZ"
}
],
"subs" : [
{
"sub" : "CIVIL",
"teacher" : "ABC",
}
]
}
I have simplified the structure of the documents that i have.
The basic structure is that I have a parent subject with an array of reporting teachers and an array of sub-subjects(each having a teacher)
I now want to extract all the subject(parent/sub-subjects) along with the condition if they are sub-subjects or not which are taught by a particular teacher.
For eg:
for teacher ABC i want the following structure:
[{'subject':'MATHS', 'is_parent':'True'}, {'subject':'CIVIL', 'is_parent':'FALSE'}]
-- What is the most efficient query possible ..? I have tried $project with $cond and $switch but in both the cases I have had to repeat the conditional statement for 'subject' and 'is_parent'
-- Is it advised to do the computation in a query or should I get the data dump and then modify the structure in the server code? AS in, I could $unwind and get a mapping of the parent subjects with each sub-subject and then do a for loop.
I have tried
db.collection.aggregate(
{$unwind:'$reporting'},
{$project:{
'result':{$cond:[
{$eq:['ABC', '$reporting.teacher']},
"$main_sub",
"$subs.sub"]}
}}
)
then I realised that even if i transform the else part into another query for the sub-subjects I will have to write the exact same thing for the property of is_parent
You have 2 arrays, so you need to unwind both - the reporting and the subs.
After that stage each document will have at most 1 parent teacher-subj and at most 1 sub teacher-subj pairs.
You need to unwind them again to have a single teacher-subj per document, and it's where you define whether it is parent or not.
Then you can group by teacher. No need for $conds, $filters, or $facets. E.g.:
db.collection.aggregate([
{ $unwind: "$reporting" },
{ $unwind: "$subs" },
{ $project: {
teachers: [
{ teacher: "$reporting.teacher", sub: "$main_sub", is_parent: true },
{ teacher: "$subs.teacher", sub: "$subs.sub", is_parent: false }
]
} },
{ $unwind: "$teachers" },
{ $group: {
_id: "$teachers.teacher",
subs: { $push: {
subject: "$teachers.sub",
is_parent: "$teachers.is_parent"
} }
} }
])

Search on multiple collections in MongoDB

I know the theory of MongoDB and the fact that is doesn't support joins, and that I should use embeded documents or denormalize as much as possible, but here goes:
I have multiple documents, such as:
Users, which embed Suburbs, but also has: first name, last name
Suburbs, which embed States
Child, which embeds School, belongs to a User, but also has: first name, last name
Example:
Users:
{ _id: 1, first_name: 'Bill', last_name: 'Gates', suburb: 1 }
{ _id: 2, first_name: 'Steve', last_name: 'Jobs', suburb: 3 }
Suburb:
{ _id: 1, name: 'Suburb A', state: 1 }
{ _id: 2, name: 'Suburb B', state: 1 }
{ _id: 3, name: 'Suburb C', state: 3 }
State:
{ _id: 1, name: 'LA' }
{ _id: 3, name: 'NY' }
Child:
{ _id: 1, _user_id: 1, first_name: 'Little Billy', last_name: 'Gates' }
{ _id: 2, _user_id: 2, first_name: 'Little Stevie', last_name: 'Jobs' }
The search I need to implement is on:
first name, last name of Users and Child
State from Users
I know that I have to do multiple queries to get it done, but how can that be achieved? With mapReduce or aggregate?
Can you point out a solution please?
I've tried to use mapReduce but that didn't get me to have documents from Users which contained a state_id, so that's why I brought it up here.
This answer is outdated. Since version 3.2, MongoDB has limited support for left outer joins with the $lookup aggregation operator
MongoDB does not do queries which span multiple collections - period. When you need to join data from multiple collections, you have to do it on the application level by doing multiple queries.
Query collection A
Get the secondary keys from the result and put them into an array
Query collection B passing that array as the value of the $in-operator
Join the results of both queries programmatically on the application layer
Having to do this should be rather the exception than the norm. When you frequently need to emulate JOINs like that, it either means that you are still thinking too relational when you design your database schema or that your data is simply not suited for the document-based storage concept of MongoDB.
So now join is possible in mongodb and you can achieve this using $lookup and $facet aggregation here and which is probably the best way to find in multiple collections
db.collection.aggregate([
{ "$limit": 1 },
{ "$facet": {
"c1": [
{ "$lookup": {
"from": Users.collection.name,
"pipeline": [
{ "$match": { "first_name": "your_search_data" } }
],
"as": "collection1"
}}
],
"c2": [
{ "$lookup": {
"from": State.collection.name,
"pipeline": [
{ "$match": { "name": "your_search_data" } }
],
"as": "collection2"
}}
],
"c3": [
{ "$lookup": {
"from": State.collection.name,
"pipeline": [
{ "$match": { "name": "your_search_data" } }
],
"as": "collection3"
}}
]
}},
{ "$project": {
"data": {
"$concatArrays": [ "$c1", "$c2", "$c3" ]
}
}},
{ "$unwind": "$data" },
{ "$replaceRoot": { "newRoot": "$data" } }
])
You'll find MongoDB easier to understand if you take a denormalized approach to schema design. That is, you want to structure your documents the way the requesting client application understands them. Essentially, you are modeling your documents as domain objects with which the applicaiton deals. Joins become less important when you model your data this way. Consider how I've denormalized your data into a single collection:
{
_id: 1,
first_name: 'Bill',
last_name: 'Gates',
suburb: 'Suburb A',
state: 'LA',
child : [ 3 ]
}
{
_id: 2,
first_name: 'Steve',
last_name: 'Jobs',
suburb: 'Suburb C',
state 'NY',
child: [ 4 ]
}
{
_id: 3,
first_name: 'Little Billy',
last_name: 'Gates',
suburb: 'Suburb A',
state: 'LA',
parent : [ 1 ]
}
{
_id: 4,
first_name: 'Little Stevie',
last_name: 'Jobs'
suburb: 'Suburb C',
state 'NY',
parent: [ 2 ]
}
The first advantage is that this schema is far easier to query. Plus, updates to address fields are now consistent with the individual Person entity since the fields are embedded in a single document. Notice also the bidirectional relationship between parent and children? This makes this collection more than just a collection of individual people. The parent-child relationships mean this collection is also a social graph. Here are some resoures which may be helpful to you when thinking about schema design in MongoDB.
Here's a JavaScript function that will return an array of all records matching specified criteria, searching across all collections in the current database:
function searchAll(query,fields,sort) {
var all = db.getCollectionNames();
var results = [];
for (var i in all) {
var coll = all[i];
if (coll == "system.indexes") continue;
db[coll].find(query,fields).sort(sort).forEach(
function (rec) {results.push(rec);} );
}
return results;
}
From the Mongo shell, you can copy/paste the function in, then call it like so:
> var recs = searchAll( {filename: {$regex:'.pdf$'} }, {moddate:1,filename:1,_id:0}, {filename:1} )
> recs
Based on #brian-moquin and others, I made a set of functions to search entire collections with entire keys(fields) by simple keyword.
It's in my gist; https://gist.github.com/fkiller/005dc8a07eaa3321110b3e5753dda71b
For more detail, I first made a function to gather all keys.
function keys(collectionName) {
mr = db.runCommand({
'mapreduce': collectionName,
'map': function () {
for (var key in this) { emit(key, null); }
},
'reduce': function (key, stuff) { return null; },
'out': 'my_collection' + '_keys'
});
return db[mr.result].distinct('_id');
}
Then one more to generate $or query from keys array.
function createOR(fieldNames, keyword) {
var query = [];
fieldNames.forEach(function (item) {
var temp = {};
temp[item] = { $regex: '.*' + keyword + '.*' };
query.push(temp);
});
if (query.length == 0) return false;
return { $or: query };
}
Below is a function to search a single collection.
function findany(collection, keyword) {
var query = createOR(keys(collection.getName()));
if (query) {
return collection.findOne(query, keyword);
} else {
return false;
}
}
And, finally a search function for every collections.
function searchAll(keyword) {
var all = db.getCollectionNames();
var results = [];
all.forEach(function (collectionName) {
print(collectionName);
if (db[collectionName]) results.push(findany(db[collectionName], keyword));
});
return results;
}
You can simply load all functions in Mongo console, and execute searchAll('any keyword')
You can achieve this using $mergeObjects by MongoDB Driver
Example
Create a collection orders with the following documents:
db.orders.insert([
{ "_id" : 1, "item" : "abc", "price" : 12, "ordered" : 2 },
{ "_id" : 2, "item" : "jkl", "price" : 20, "ordered" : 1 }
])
Create another collection items with the following documents:
db.items.insert([
{ "_id" : 1, "item" : "abc", description: "product 1", "instock" : 120 },
{ "_id" : 2, "item" : "def", description: "product 2", "instock" : 80 },
{ "_id" : 3, "item" : "jkl", description: "product 3", "instock" : 60 }
])
The following operation first uses the $lookup stage to join the two collections by the item fields and then uses $mergeObjects in the $replaceRoot to merge the joined documents from items and orders:
db.orders.aggregate([
{
$lookup: {
from: "items",
localField: "item", // field in the orders collection
foreignField: "item", // field in the items collection
as: "fromItems"
}
},
{
$replaceRoot: { newRoot: { $mergeObjects: [ { $arrayElemAt: [ "$fromItems", 0 ] }, "$$ROOT" ] } }
},
{ $project: { fromItems: 0 } }
])
The operation returns the following documents:
{ "_id" : 1, "item" : "abc", "description" : "product 1", "instock" : 120, "price" : 12, "ordered" : 2 }
{ "_id" : 2, "item" : "jkl", "description" : "product 3", "instock" : 60, "price" : 20, "ordered" : 1 }
This Technique merge Object and return the result
Minime solution worked except that it required a fix:
var query = createOR(keys(collection.getName()));
need to add keyword as 2nd parameter to createOR call here.