Mongodb : Query to list records that have at least 5 fields not equal to null or " " or records with more the 4 'X' s - mongodb

I have a collection with 5 fields:
Chocolate, Biscuits, Wafers, Truffles , Mints
The values for each field is either X, null, or " "
I am trying to list all the records that have at least 4 'X'

This input:
var r = [
{store: "A", chocolate:"X", biscuits:"X", wafers: "X", truffles:"X", mints:"X"},
{store: "B", chocolate:null, biscuits:"", wafers: "X", truffles:"X", mints:"X"},
{store: "C", chocolate:null, biscuits:"", wafers: "", truffles:"X", mints:"X"},
{store: "D", chocolate:"X", biscuits:"", wafers: "X", truffles:"X", mints:"X"}
];
db.foo.insert(r);
will yield the desired result. $reduce is a powerful array-oriented function. In the use case below, we are saying "for item in the list of fields, if it equals X, add 1 else add 0; when done, N is the result."
db.foo.aggregate([
{$addFields: {N: {$reduce: {
input: ["$chocolate","$biscuits","$wafers","$truffles","$mints"],
initialValue: 0,
in: {$sum: [ "$$value", {"$cond":[ {"$eq": ["$$this","X"]}, 1, 0]} ]}
}}
}}
,{$match: {"N": {$gte:4} }}
]);
{
"_id" : ObjectId("5dda931f76e431f9c9169859"),
"store" : "A",
"chocolate" : "X",
"biscuits" : "X",
"wafers" : "X",
"truffles" : "X",
"mints" : "X",
"N" : 5
}
{
"_id" : ObjectId("5dda931f76e431f9c916985c"),
"store" : "D",
"chocolate" : "X",
"biscuits" : "",
"wafers" : "X",
"truffles" : "X",
"mints" : "X",
"N" : 4
}

I arrived in this solution for this query:
db.getCollection("stores").aggregate([
{
$addFields: {
fieldsToMatch: ["$chocolates", "$biscuits", "$wafers", "$truffles", "$mints"]
}
},
{
$unwind: "$fieldsToMatch"
},
{
$match: {
fieldsToMatch: "X"
}
},
{
$group: {
_id: "$name",
count:{$sum:1}
}
},
{
$match: {
count: {
$gte: 4
}
}
}])
I'm basically adding all the values to an array "fieldsToMatch", then using the unwind operator to create several records for each element in the array, matching the ones that have it equal to X, grouping and counting by name (or the _id), then finally matching the ones that have at least 4 occurrences.
The only problem with this approach is you are returning only the _ids of the documents matching the criteria, to get the entire documents you will need to either use lookup or query the collection again.
Hope this helps!

Related

Finding the mongoDB object

I have a Sequence : [k1,k4,k6,k10]and a MongoDB object as follows:{"_id" : "x" , "k" : [ "k1", "k2 ","k6"]},{"_id" : "y" , "k" : [ "k2", "k4 ","k10","k11"]},{"_id" : "z" , "k" : [ "k4", "k6 ","k10","k12"]}
I have to find the particular object which has a maximum number of matching elements from the array.In this case, it will be "z" as it has three matching elements in the "k" array i,e ["k4","k6","k10"].
So I wanted to know that is there a MongoDB way of doing?
You can use this query :
db.data.aggregate([{
$match: {
k: {
$in: ["k1", "k4", "k6", "k10"]
}
}
}, {
$addFields: {
count: {
$size:{
$setIntersection: [
["k1", "k4", "k6", "k10"], "$k"
]
}
}
}
}, {
$sort: {
count: -1
}
}, {
$limit: 1
}])
This query will :
filter on relevant records (non-necessary but better perfs with it)
Calculate the count of matching elements using the set intersection
sort them by count descending
return the first record
Note: There can be multiple records with maximum matches, the above query gets you only one.

Aggregate on array of embedded documents

I have a mongodb collection with multiple documents. Each document has an array with multiple subdocuments (or embedded documents i guess?). Each of these subdocuments is in this format:
{
"name": string,
"count": integer
}
Now I want to aggregate these subdocuments to find
The top X counts and their name.
Same as 1. but the names have to match a regex before sorting and limiting.
I have tried the following for 1. already - it does return me the top X but unordered, so I'd have to order them again which seems somewhat inefficient.
[{
$match: {
_id: id
}
}, {
$unwind: {
path: "$array"
}
}, {
$sort: {
'count': -1
}
}, {
$limit: x
}]
Since i'm rather new to mongodb this is pretty confusing for me. Happy for any help. Thanks in advance.
The sort has to include the array name in order to avoid an additional sort later on.
Given the following document to work with:
{
students: [{
count: 4,
name: "Ann"
}, {
count: 7,
name: "Brad"
}, {
count: 6,
name: "Beth"
}, {
count: 8,
name: "Catherine"
}]
}
As an example, the following aggregation query will match any name containing the letters "h" and "e". This needs to happen after the "$unwind" step in order to only keep the ones you need.
db.tests.aggregate([
{$match: {
_id: ObjectId("5c1b191b251d9663f4e3ce65")
}},
{$unwind: {
path: "$students"
}},
{$match: {
"students.name": /[he]/
}},
{$sort: {
"students.count": -1
}},
{$limit: 2}
])
This is the output given the above mentioned input:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 6, "name" : "Beth" } }
Both names contain the letters "h" and "e", and the output is sorted from high to low.
When setting the limit to 1, the output is limited to:
{ "_id" : ObjectId("5c1b191b251d9663f4e3ce65"), "students" : { "count" : 8, "name" : "Catherine" } }
In this case only the highest count has been kept after having matched the names.
=====================
Edit for the extra question:
Yes, the first $match can be changed to filter on specific universities.
{$match: {
university: "University X"
}},
That will give one or more matching documents (in case you have a document per year or so) and the rest of the aggregation steps would still be valid.
The following match would retrieve the students for the given university for a given academic year in case that would be needed.
{$match: {
university: "University X",
academic_year: "2018-2019"
}},
That should narrow it down to get the correct documents.

(mongo) How could i get the documents that have a value in array along with size

I have a mongo collection with something like the below:
{
"_id" : ObjectId("59e013e83260c739f029ee21"),
"createdAt" : ISODate("2017-10-13T01:16:24.653+0000"),
"updatedAt" : ISODate("2017-11-11T17:13:52.956+0000"),
"age" : NumberInt(34),
"attributes" : [
{
"year" : "2017",
"contest" : [
{
"name" : "Category1",
"division" : "Department1"
},
{
"name" : "Category2",
"division" : "Department1"
}
]
},
{
"year" : "2016",
"contest" : [
{
"name" : "Category2",
"division" : "Department1"
}
]
},
{
"year" : "2015",
"contest" : [
{
"name" : "Category1",
"division" : "Department1"
}
]
}
],
"name" : {
"id" : NumberInt(9850214),
"first" : "john",
"last" : "afham"
}
}
now how could i get the number of documents who have contest with name category1 more than one time or more than 2 times ... and so on
I tried to use size and $gt but couldn't form a correct result
Assuming that a single contest will never contain the same name (e.g. "Category1") value more than once, here is what you can do.
The absence of any unwinds will result in improved performance in particular on big collections or data sets with loads of entries in your attributes arrays.
db.collection.aggregate({
$project: {
"numberOfOccurrences": {
$size: { // count the number of matching contest elements
$filter: { // get rid of all contest entries that do not contain at least one entry with name "Category1"
input: "$attributes",
cond: { $in: [ "Category1", "$$this.contest.name" ] }
}
}
}
}
}, {
$match: { // filter the number of documents
"numberOfOccurrences": {
$gt: 1 // put your desired min. number of matching contest entries here
}
}
}, {
$count: "numberOfDocuments" // count the number of matching documents
})
Try this on for size.
db.foo.aggregate([
// Start with breaking down attributes:
{$unwind: "$attributes"}
// Next, extract only name = Category1 from the contest array. This will yield
// an array of 0 or 1 because I am assuming that contest names WITHIN
// the contest array are unique. If found and we get an array of 1, turn that
// into a single doc instead of an array of a single doc by taking arrayElemAt 0.
// Otherwise, "x" is not set into the doc AT ALL. All other vars in the doc
// will go away after $project; if you want to keep them, change this to
// $addFields:
,{$project: {x: {$arrayElemAt: [ {$filter: {
input: "$attributes.contest",
as: "z",
cond: {$eq: [ "$$z.name", "Category1" ]}
}}, 0 ]}
}}
// We split up attributes before, creating multiple docs with the same _id. We
// must now "recombine" these _id (OP said he wants # of docs with name).
// We now have to capture all the single "x" that we created above; docs without
// Category1 will have NO "x" and we don't want to include them in the count.
// Also, we KNOW that name can only be Category 1 but division could vary, so
// let's capture that in the $push in case we might want it:
,{$group: {_id: "$_id", x: {$push: "$x.division"}}}
// One more pass to compute length of array:
,{$addFields: {len: {$size: "$x"}} }
// And lastly, the filter for one time or two times or n times:
,{$match: {len: {$gt: 2} }}
]);
First, we need to flatten the document by the attributes and contest fields. Then to group by the document initial _id and a contest names counting different contests along the way. Finally, we filter the result.
db.person.aggregate([
{ $unwind: "$attributes" },
{ $unwind: "$attributes.contest" },
{$group: {
_id: {initial_id: "$_id", contest: "$attributes.contest.name"},
count: {$sum: 1}
}
},
{$match: {$and: [{"_id.contest": "Category1"}, {"count": {$gt: 1}}]}}]);

MongoDB filter inner array of object

I have one document like this:
-document: users-
{
"name": "x", password: "x" recipes:
[{title: eggs, dificult: 1},{title: "pizza" dificult: 2}],
name: "y", password: "y" recipes: [{title: "oil", dificult: 2},{title: "eggandpotatoes" dificult: 2}]
}
I want to get all recipes filtering by title and dificult
I have tried some like this
db.users.find({$and: [{"recipes.title": /.*egg.*/},{"recipes.dificult": 2}]},
{"recipes.title": 1,"recipes.dificult": 1}).toArray();
this should return
{title: "eggandpotatoes" dificult: 2}
but return
{title: eggs, dificult: 1}
{title: "eggandpotatoes" dificult: 2}
I would like once the filter works, limit the result with start: 2 and end: 5
returning from the 2 result the next 5.
Thanks in advance.
You can use the $filter aggregation to filter the recipes and $slice to limit the results. It will give you everything you are looking expect but regex search which is not possible as of now.
db.users.aggregate([{
$project: {
_id: 0,
recipes: {
$slice: [{
$filter: {
input: "$recipes",
as: "recipe",
"cond": {
$and: [{
$eq: ["$$recipe.title", "eggandpotatoes"]
}, {
$eq: ["$$recipe.dificult", 2]
}]
}
}
}, 2, 3]
}
}
}]);
If you need to find out regular expression in title then use $elemMatch as below :
db.users.find({"recipes":{"$elemMatch":{"title": /.*egg.*/,"dificult": 2}}}, {"recipes.$":1})
One thing I mentioned here if your recipes array contains some thing like this
"recipes" : [ { "title" : "egg", "dificult" : 2 }, { "title" : "eggand", "dificult" : 2 } ]
then $ projection return only first matched array objcect in above case it will be
"recipes" : [ { "title" : "egg", "dificult" : 2 } ]
If you need this both array then go to #Sagar answer using aggregation

MongoDB 2.2 Aggregation Framework group by field name

Is it possible to group-by field name? Or do I need a different structure so I can group-by value?
I know we can use group by on values and we can unwind arrays, but is it possible to get total apples, pears and oranges owned by John amongst the three houses here without specifying "apples", "pears" and "oranges" explicitly as part of the query? (so NOT like this);
// total all the fruit John has at each house
db.houses.aggregate([
{
$group: {
_id: null,
"apples": { $sum: "$people.John.items.apples" },
"pears": { $sum: "$people.John.items.pears" },
"oranges": { $sum: "$people.John.items.oranges" },
}
},
])
In other words, can I group-by the first field-name under "items" and get the aggregate sum of apples:104, pears:202 and oranges:306, but also bananas, melons and anything else that might be there? Or do I need to restructure the data into an array of key/value pairs like categories?
db.createCollection("houses");
db.houses.remove();
db.houses.insert(
[
{
House: "birmingham",
categories : [
{
k : "location",
v : { d : "central" }
}
],
people: {
John: {
items: {
apples: 2,
pears: 1,
oranges: 3,
}
},
Dave: {
items: {
apples: 30,
pears: 20,
oranges: 10,
},
},
},
},
{
House: "London", categories: [{ k: "location", v: { d: "central" } }, { k: "type", v: { d: "rented" } }],
people: {
John: { items: { apples: 2, pears: 1, oranges: 3, } },
Dave: { items: { apples: 30, pears: 20, oranges: 10, }, },
},
},
{
House: "Cambridge", categories: [{ k: "type", v: { d: "rented" } }],
people: {
John: { items: { apples: 100, pears: 200, oranges: 300, } },
Dave: { items: { apples: 0.3, pears: 0.2, oranges: 0.1, }, },
},
},
]
);
Secondly, and more importantly, could I then also group by "house.categories.k" ? In other words, is it possible to find out how many "apples" "John" has in "rented" vs "owned" or "friends" houses (so group by "categories.k.type")?
Finally - if this is even possible, is it sensible? At first I thought it was quite useful to create dictionaries of nested objects using actual field names of the object, as it seemed a logical use of a document database, and it seemed to make the MR queries easier to write vs arrays, but now I'm starting to wonder if this is all a bad idea and having variable field names makes it very tricky/inefficient to write aggregation queries.
OK, so I think I have this partially solved. At least for the shape of data in the initial question.
// How many of each type of fruit does John have at each location
db.houses.aggregate([
{
$unwind: "$categories"
},
{
$match: { "categories.k": "location" }
},
{
$group: {
_id: "$categories.v.d",
"numberOf": { $sum: 1 },
"Total Apples": { $sum: "$people.John.items.apples" },
"Total Pears": { $sum: "$people.John.items.pears" },
}
},
])
which yields;
{
"result" : [
{
"_id" : "central",
"numberOf" : 2,
"Total Apples" : 4,
"Total Pears" : 2
}
],
"ok" : 1
}
Note that there's only "central", but if I had other "location"s in my DB I'd get a range of totals for each location. I wouldn't need the $unwind step if I had named properties instead of an array for "categories", but this is where I find the structure is at odds with itself. There are several keywords likely under "categories". The sample data shows "type" and "location" but there could be around 10 of these categorizations all with different values. So if I used named fields;
"categories": {
location: "london",
type: "owned",
}
...the problem I then have is indexing. I can't afford to simply index "location" since those are user-defined categories, and if 10,000 users choose 10,000 different ways of categorizing their houses I'd need 10,000 indexes, one for each field. But by making it an array I only need one on the array field itself. The downside is the $unwind step. I ran into this before with MapReduce. The last thing you want to be doing is a ForEach loop in JavaScript to cycle an array if you can help it. What you really want is to filter out the fields by name because it's much quicker.
Now this is all well and good where I already know what fruit I'm looking for, but if I don't, it's much harder. I can't (as far as I can see) $unwind or otherwise ForEach "people.John.items" here. If I could, I'd be overjoyed. So since the names of fruit are again user-defined, it looks like I need to convert them to an array as well, like this;
{
"people" : {
"John" : {
"items" : [
{ k:"apples", v:100 },
{ k:"pears", v:200 },
{ k:"oranges", v:300 },
]
},
}
}
So that now allows me get the fruit (where I don't know which fruit to look for) totalled, again by location;
db.houses.aggregate([
{
$unwind: "$categories"
},
{
$match: { "categories.k": "location" }
},
{
$unwind: "$people.John.items"
},
{
$group: { // compound key - thanks to Jenna
_id: { fruit:"$people.John.items.k", location:"$categories.v.v" },
"numberOf": { $sum: 1 },
"Total Fruit": { $sum: "$people.John.items.v" },
}
},
])
So now I'm doing TWO $unwinds. If you're thinking that looks grotesquely ineffecient, you'd be right. If I have just 10,000 house records, with 10 categories each, and 10 types of fruit, this query takes half a minute to run.
OK, so I can see that moving the $match before the $unwind improves things significantly but then it's the wrong output. I don't want an entry for every category, I want to filter out just the "location" categories.
I would have made this comment, but it's easier to format in a response text box.
{ _id: 1,
house: "New York",
people: {
John: {
items: {apples: 1, oranges:2}
}
Dave: {
items: {apples: 2, oranges: 1}
}
}
}
{ _id: 2,
house: "London",
people: {
John: {
items: {apples: 3, oranges:2}
}
Dave: {
items: {apples: 1, oranges:3}
}
}
}
Just to make sure I understand your question, is this what you're trying to accomplish?
{location: "New York", johnFruit:3}
{location: "London", johnFruit: 5}
Since categories is not nested under house, you can't group by "house.categories.k", but you can use a compound key for the _id of $group to get this result:
{ $group: _id: {house: "$House", category: "$categories.k"}
Although "k" doesn't contain the information that you're presumably trying to group by. And as for "categories.k.type", type is the value of k, so you can't use this syntax. You would have to group by "categories.v.d".
It may be possible with your current schema to accomplish this aggregation using $unwind, $project, possibly $match, and finally $group, but the command won't be pretty. If possible, I would highly recommend restructuring your data to make this aggregation much simpler. If you would like some help with schema, please let us know.
I'm not sure if this is a possible solution, but what if you begin the aggregation process by determining the number of different locations using distinct(), and run separate aggregation commands for each location? distinct() may not be efficient, but every subsequent aggregation will be able to use $match, and therefore, the index on categories. You could use the same logic to count the fruit for "categories.type".
{
"_id" : 1,
"house" : "New York",
"people" : {
"John" : [{"k" : "apples","v" : 1},{"k" : "oranges","v" : 2}],
"Dave" : [{"k" : "apples","v" : 2},{"k" : "oranges","v" : 1}]
},
"categories" : [{"location" : "central"},{"type" : "rented"}]
}
{
"_id" : 2,
"house" : "London",
"people" : {
"John" : [{"k" : "apples","v" : 3},{"k" : "oranges","v" : 2}],
"Dave" : [{"k" : "apples","v" : 3},{"k" : "oranges","v" : 1}]
},
"categories" : [{"location" : "suburb"},{"type" : "rented"}]
}
{
"_id" : 3,
"house" : "London",
"people" : {
"John" : [{"k" : "apples","v" : 0},{"k" : "oranges","v" : 1}],
"Dave" : [{"k" : "apples","v" : 2},{"k" : "oranges","v" : 4}]
},
"categories" : [{"location" : "central"},{"type" : "rented"}]
}
Run distinct() and iterate through the results by running aggregate() commands for each unique value of "categories.location":
db.agg.distinct("categories.location")
[ "central", "suburb" ]
db.agg.aggregate(
{$match: {categories: {location:"central"}}}, //the index entry is on the entire
{$unwind: "$people.John"}, //document {location:"central"}, so
{$group:{ //use this syntax to use the index
_id:"$people.John.k",
"numberOf": { $sum: 1 },
"Total Fruit": { $sum: "$people.John.v"}
}
}
)
{
"result" : [
{
"_id" : "oranges",
"numberOf" : 2,
"Total Fruit" : 3
},
{
"_id" : "apples",
"numberOf" : 2,
"Total Fruit" : 1
}
],
"ok" : 1
}