How do i find the total number of subjectsthat has no prerequisites using agregation? - mongodb

I have tried several codes but it didn't work.
Example from the database,
one has a prerequisite and one does not have prerequisites and I would like to find the total number of the subject with no prerequisites :
db.Subject.insert(
{
"_id":ObjectId(),
"subject":{
"subCode":"CSCI321",
"subTitle":"Final Year Project",
"credit":6,
"type":"Core",
"assessments": [
{ "assessNum": 1,
"weight":30,
"assessType":"Presentation",
"description":"Prototype demonstration" },
{ "assignNum": 2,
"weight":70,
"assessType":"Implementation and Presentation",
"description":"Final product Presentation and assessment of product implementation by panel of project supervisors" }
]
}
}
)
db.Subject.insert(
{
"_id":ObjectId(),
"subject":{
"subCode":"CSCI203",
"subTitle":"Algorithm and Data Structures",
"credit":3,
"type":"Core",
"prerequisite": ["csci103"]
}})
one of the few codes that I tried using :
db.Subject.aggregate({$group:{"prerequisite":{"$exists": null}, count:{$sum:1}}});
Results :
_getErrorWithCode#src/mongo/shell/utils.js:25:13
doassert#src/mongo/shell/assert.js:18:14
_assertCommandWorked#src/mongo/shell/assert.js:534:17
assert.commandWorked#src/mongo/shell/assert.js:618:16
DB.prototype._runAggregate#src/mongo/shell/db.js:260:9
DBCollection.prototype.aggregate#src/mongo/shell/collection.js:1062:12
#(shell):1:1

You can use $match to eliminate unwanted documents and $group to calculate sum
db.collection.aggregate([
{
$match: {
"subject.prerequisite": {
"$exists": false
}
}
},
{
$group: {
_id: null,
total: {
$sum: 1
}
}
}
])
Working Mongo playground

This can be achieved within a single aggregation pipeline stage i.e. the $group step where you can use the BSON Types comparison order to aggregate the
documents where the 'subjects.prerequisites' field exists and has at least an element. The condition can be used as the group by key i.e. the _id field
in $group.
Consider running the following aggregation pipeline to get the desired results:
db.Subject.aggregate([
{ $group: {
_id: {
$cond: [
{
$or: [
{ $lte: ['$subject.prerequisite', null] },
{
$eq: [
{ $size: { $ifNull: ['$subject.prerequisite', [] ] } },
0
]
}
]
},
'noPrerequisite',
'havePrerequisite'
]
},
count: { $sum: 1 }
} }
])
The first condition in the OR simply returns true if a document does not have the embedded prerequisites field and the other satisfies these set of conditions:
if length of ( prerequisites || [] ) is zero
In the above, $cond takes a logical condition as its first argument (if) and then returns the second argument where the evaluation is true (then) or the third argument where false (else). When used as an expression in the _id field for $group, it groups all the documents into either true/false which is conditionally projected as "noPrerequisite" (true) OR "havePrerequisite" (false) in the group key.
The results will contain both counts for documents where the prerequisite field exists and for those without the field OR it has an empty array.

Related

MongoDB - count by field, and sort by count

I am new to MongoDB, and new to making more than super basic queries and i didn't succeed to create a query that does as follows:
I have such collection, each document represents one "use" of a benefit (e.g first row states the benefit "123" was used once):
[
{
"id" : "1111",
"benefit_id":"123"
},
{
"id":"2222",
"benefit_id":"456"
},
{
"id":"3333",
"benefit_id":"456"
},
{
"id":"4444",
"benefit_id":"789"
}
]
I need to create q query that output an array. at the top is the most top used benefit and how many times is was used.
for the above example the query should output:
[
{
"benefit_id":"456",
"cnt":2
},
{
"benefit_id":"123",
"cnt": 1
},
{
"benefit_id":"789",
"cnt":1
}
]
I have tried to work with the documentation and with $sortByCount but with no success.
$group
$group by benefit_id and get count using $sum
$sort by count descending order
db.collection.aggregate([
{
$group: {
_id: "$benefit_id",
count: { $sum: 1 }
}
},
{ $sort: { count: -1 } }
])
Playground
$sortByCount
Same operation using $sortByCount operator
db.collection.aggregate([
{ $sortByCount: "$benefit_id" }
])
Playground

mongo aggregation - number of documents where field in one array is also in another one

I have a Movies collection
...
{
...
"cast":[ "First Actor", "Second Actor" ],
"directors":[ "First Director", "Second Director" ]
},
{
...
"cast": [ "Actor Director", "First Actor" ],
"directors": [ "Actor Director", "Firt Director" ]
}
...
Using aggregation framework I need to get number of documents where at least one value from directors array is also in a cast array. How could I achieve it?
You can use $setIntersection to find common entries in both arrays, then filter documents by $size of the result gt than 0 (means that at least one element is common to arrays), and finally use $count stage to count documents that match this condition.
-- EDIT : Add $addFields stage in case of no array present for cast or directors
In case of any document that doesn't contain cast or directors array, you will get an error for size waiting for an array and getting a null value.
In order to avoid this, you need to add an $addField stage to define empty array instead of null, for cast and directors.
Here's the query :
db.collection.aggregate([
{
$addFields: {
directors: {
$cond: {
if: {
$isArray: "$directors"
},
then: "$directors",
else: []
}
},
cast: {
$cond: {
if: {
$isArray: "$cast"
},
then: "$cast",
else: []
}
}
}
},
{
$match: {
$expr: {
$gt: [
{
$size: {
$setIntersection: [
"$cast",
"$directors"
]
}
},
0
]
}
}
},
{
$count: "have_common_value"
}
])
You can test it here

MongoDB aggregate query return document based on match query and priority values

In a mongodb collection , i have following documents :
{"id":"1234","name":"John","stateCode":"CA"}
{"id":"1234","name":"Smith","stateCode":"CA"}
{"id":"1234","name":"Tony","stateCode":"GA"}
{"id":"3323", "name":"Neo","stateCode":"OH"}
{"id":"3323", "name":"Sam","stateCode":"US"}
{"id":"4343","name":"Bruce","stateCode":"NV"}
I am trying to write a mongo aggregate query which do following things:
match based on id field
Give more priority to document having values other than "NV" or "GA" in stateCode field.
If all the document have values either "NV" or "GA" then ignore the priority.
If any of the document have stateCode other than "NV" or "GA" , then return those document.
Example 1:
id = "1234"
then return
{"id":"1234","name":"John","stateCode":"CA"}
{"id":"1234","name":"Smith","stateCode":"CA"}
Example 2:
id = "4343"
then return
{"id":"4343","name":"Bruce","stateCode":"NV"}
Could you please help with a query to achieve this.
I tried with a query , but i am stuck with error:
Failed to execute script.
Error: command failed: {
"ok" : 0,
"errmsg" : "input to $filter must be an array not string",
"code" : 28651,
"codeName" : "Location28651"
} : aggregate failed
Query :
db.getCollection('emp').aggregate([{$match:{
'id': "1234"
}
},
{
$project: {
"data": {
$filter: {
input: "$stateCode",
as: "data",
cond: { $ne: [ "$data", "GA" ],$ne: [ "$data", "NV" ] }
}
}
}
}
])
I actually recommend you split this into 2 queries, first try to find documents with a different status code and if that fails then retrieve the rest.
With that said here is a working pipeline that does it in one go, Due to the fact we cant know in advance whether the condition is true or not we need to iterate all the documents who match the id, this fact makes it VERY inefficient in the case the id is shared by many documents, if this is not possible then using this pipeline is fine.
db.getCollection('emp').aggregate([
{
$match: {
'id': "1234"
}
},
{ //we have to group so we can check
$group: {
_id: null,
docs: {$push: "$$ROOT"}
}
},
{
$addFields: {
highPriorityDocs: {
$filter: {
input: "$docs",
as: "doc",
cond: {$and: [{$ne: ["$$doc.stateCode", "NV"]}, {$ne: ["$$doc.stateCode", "GA"]}]}
}
}
}
},
{
$project: {
finalDocs: {
$cond: [ // if size of high priority docs gt 0 return them.
{$gt: [{$ize: "$highPriorityDocs"}, 0]},
"$highPriorityDocs",
"$docs"
]
}
}
},
{
$unwind: "$finalDocs"
},
{
$replaceRoot: {newRoot: "$finalDocs"}
}
])
The last two stages are just to restore the original structure, you can drop them if you don't care about it.

Mongo Query to return common values in array

I need a Mongo Query to return me common values present in an array.
So if there are 4 documents in match, then the values are returned if those are present in in all the 4 documents
Suppose I have the below documents in my db
Mongo Documents
{
"id":"0",
"merchants":["1","2"]
}
{
"id":"1",
"merchants":["1","2","4"]
}
{
"id":"2",
"merchants":["4","5"]
}
Input : List of id
(i) Input with id "0" and "1"
Then it should return me merchants:["1","2"] as both are present in documents with id "0" & id "1"
(ii) Input with id "1" and "2"
Then it should return me merchants:["4"] as it is common and present in both documents with id "1" & id "2"
(iii) Input with id "0" and "2"
Should return empty merchants:[] as no common merchants between these 2 documents
You can try below aggregation.
db.collection.aggregate(
{$match:{id: {$in: ["1", "2"]}}},
{$group:{_id:null, first:{$first:"$merchants"}, second:{$last:"$merchants"}}},
{$project: {commonToBoth: {$setIntersection: ["$first", "$second"]}, _id: 0 } }
)
Say you have a function query that does the required DB query for you, and you'll call that function with idsToMatch which is an array containing all the elements you want to match. I have used JS here as the driver language, replace it with whatever you are using.
The following code is dynamic, will work for any number of ids you give as input:
const query = (idsToMatch) => {
db.collectionName.aggregate([
{ $match: { id: {$in: idsToMatch} } },
{ $unwind: "$merchants" },
{ $group: { _id: { id: "$id", data: "$merchants" } } },
{ $group: { _id: "$_id.data", count: {$sum: 1} } },
{ $match: { count: { $gte: idsToMatch.length } } },
{ $group: { _id: 0, result: {$push: "$_id" } } },
{ $project: { _id: 0, result: "$result" } }
])
The first $group statement is to make sure you don't have any
repetitions in any of your merchants attribute in a document. If
you are certain that in your individual documents you won't have any
repeated value for merchants, you need not include it.
The real work happens only upto the 2nd $match phase. The last two
phases ($group and $project) are only to prettify the result,
you may choose not use them, and instead use the language of your
choice to transform it in the form you want
Assuming you want to reduce the phases as per the points given above, the actual code will reduce to:
aggregate([
{ $match: { id: {$in: idsToMatch} } },
{ $unwind: "$merchants" },
{ $group: { _id: "merchants", count: {$sum: 1} } },
{ $match: { count: { $gte: idsToMatch.length } } }
])
Your required values will be at the _id attribute of each element of the result array.
The answer provided by #jgr0 is correct to some extent. The only mistake is the intermediate match operation
(i) So if input ids are "1" & "0" then the query becomes
aggregate([
{"$match":{"id":{"$in":["1","0"]}}},
{"$unwind":"$merchants"},
{"$group":{"_id":"$merchants","count":{"$sum":1}}},
{"$match":{"count":{"$eq":2}}},
{"$group":{"_id":null,"merchants":{"$push":"$_id"}}},
{"$project":{"_id":0,"merchants":1}}
])
(ii) So if input ids are "1", "0" & "2" then the query becomes
aggregate([
{"$match":{"id":{"$in":["1","0", "2"]}}},
{"$unwind":"$merchants"},
{"$group":{"_id":"$merchants","count":{"$sum":1}}},
{"$match":{"count":{"$eq":3}}},
{"$group":{"_id":null,"merchants":{"$push":"$_id"}}},
{"$project":{"_id":0,"merchants":1}}
])
The intermediate match operation should be the count of ids in input. So in case (i) it is 2 and in case (2) it is 3.

mongodb aggregation framework group + project

I have the following issue:
this query return 1 result which is what I want:
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } } }])
{
"result" : [
{
"_id" : "b91e51e9-6317-4030-a9a6-e7f71d0f2161",
"version" : 1.2000000000000002
}
],
"ok" : 1
}
this query ( I just added projection so I can later query for the entire document) return multiple results. What am I doing wrong?
> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } }, $project: { _id : 1 } }])
{
"result" : [
{
"_id" : ObjectId("5139310a3899d457ee000003")
},
{
"_id" : ObjectId("513931053899d457ee000002")
},
{
"_id" : ObjectId("513930fd3899d457ee000001")
}
],
"ok" : 1
}
found the answer
1. first I need to get all the _ids
db.items.aggregate( [
{ '$match': { 'owner.id': '9e748c81-0f71-4eda-a710-576314ef3fa' } },
{ '$group': { _id: '$item.id', dbid: { $max: "$_id" } } }
]);
2. then i need to query the documents
db.items.find({ _id: { '$in': "IDs returned from aggregate" } });
which will look like this:
db.items.find({ _id: { '$in': [ '1', '2', '3' ] } });
( I know its late but still answering it so that other people don't have to go search for the right answer somewhere else )
See to the answer of Deka, this will do your job.
Not all accumulators are available in $project stage. We need to consider what we can do in project with respect to accumulators and what we can do in group. Let's take a look at this:
db.companies.aggregate([{
$match: {
funding_rounds: {
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
funding: {
$push: {
amount: "$funding_rounds.raised_amount",
year: "$funding_rounds.funded_year"
}
}
}
}, ]).pretty()
Where we're checking if any of the funding_rounds is not empty. Then it's unwind-ed to $sort and to later stages. We'll see one document for each element of the funding_rounds array for every company. So, the first thing we're going to do here is to $sort based on:
funding_rounds.funded_year
funding_rounds.funded_month
funding_rounds.funded_day
In the group stage by company name, the array is getting built using $push. $push is supposed to be part of a document specified as the value for a field we name in a group stage. We can push on any valid expression. In this case, we're pushing on documents to this array and for every document that we push it's being added to the end of the array that we're accumulating. In this case, we're pushing on documents that are built from the raised_amount and funded_year. So, the $group stage is a stream of documents that have an _id where we're specifying the company name.
Notice that $push is available in $group stages but not in $project stage. This is because $group stages are designed to take a sequence of documents and accumulate values based on that stream of documents.
$project on the other hand, works with one document at a time. So, we can calculate an average on an array within an individual document inside a project stage. But doing something like this where one at a time, we're seeing documents and for every document, it passes through the group stage pushing on a new value, well that's something that the $project stage is just not designed to do. For that type of operation we want to use $group.
Let's take a look at another example:
db.companies.aggregate([{
$match: {
funding_rounds: {
$exists: true,
$ne: []
}
}
}, {
$unwind: "$funding_rounds"
}, {
$sort: {
"funding_rounds.funded_year": 1,
"funding_rounds.funded_month": 1,
"funding_rounds.funded_day": 1
}
}, {
$group: {
_id: {
company: "$name"
},
first_round: {
$first: "$funding_rounds"
},
last_round: {
$last: "$funding_rounds"
},
num_rounds: {
$sum: 1
},
total_raised: {
$sum: "$funding_rounds.raised_amount"
}
}
}, {
$project: {
_id: 0,
company: "$_id.company",
first_round: {
amount: "$first_round.raised_amount",
article: "$first_round.source_url",
year: "$first_round.funded_year"
},
last_round: {
amount: "$last_round.raised_amount",
article: "$last_round.source_url",
year: "$last_round.funded_year"
},
num_rounds: 1,
total_raised: 1,
}
}, {
$sort: {
total_raised: -1
}
}]).pretty()
In the $group stage, we're using $first and $last accumulators. Right, again we can see that as with $push - we can't use $first and $last in project stages. Because again, project stages are not designed to accumulate values based on multiple documents. Rather they're designed to reshape documents one at a time. Total number of rounds is calculated using the $sum operator. The value 1 simply counts the number of documents passed through that group together with each document that matches or is grouped under a given _id value. The project may seem complex, but it's just making the output pretty. It's just that it's including num_rounds and total_raised from the previous document.