Getting the name of the field with maximum count in mongodb - mongodb

I am new to mongodb and want to get the name of the field(spare part type) which has the maximum count! A sample document in my collection(original collection has 50 documents) is given below
[
{
"Vehicle": {
"licensePlateNo": "111A",
"vehicletype": "Car",
"model": "Nissan Sunny",
"VehicleCategory": [
{
"name": "Passenger"
}
],
"SparePart": [
{
"sparePartID": 4,
"Type": "Wheel",
"Price": 10000,
"Supplier": [
{
"supplierNo": 10,
"name": "Saman",
"contactNo": 112412634
}
]
}
],
"Employee": [
{
"employeeNo": 3,
"job": "Painter",
"jobCategory": "",
"salary": 100000
}
]
}
}
]
How can i write a query to obtain the name of the spare part with the highest count?

Use the aggregation framework for this type of query. In particular you'd need to run an aggregation operation where the pipeline consists of the following stages (in order):
$unwind - You need this as the first pipeline step in order to flatten the SparePart array so that you can process the documents as denormalised further down the pipeline. Without this you won't get
the desired result since the data will be in array format and the accumulator operators within the preceding stage work on single documents to aggregate the counts.
$group - This step will calculate the counts for you, for documents grouped by the Type field. The accumulator operator $sum will return the total number of documents with each group.
$sort - As you get the results from the previous $group pipeline, you would need to order the documents by the count field so that you get the top document with the most counts.
$limit - This will give you the top document.
Now, assembling the above together you should run the following pipeline to get the desired result:
db.AutoSmart.aggregate([
{ "$unwind": "$Vehicle.SparePart" },
{
"$group": {
"_id": "$Vehicle.SparePart.Type",
"count": { "$sum": 1 }
}
},
{ "$sort": { "count": -1 } },
{ "$limit": 1 }
])

let suppose we want to get the max-age of users from DB.
db.collection.find().sort({age:-1}).limit(1) // for MAX
further you can check that document.

Related

Efficiently find the most recent filtered document in MongoDB collection using datetime field

I have a large collection of documents with datetime fields in them, and I need to retrieve the most recent document for any given queried list.
Sample data:
[
{"_id": "42.abc",
"ts_utc": "2019-05-27T23:43:16.963Z"},
{"_id": "42.def",
"ts_utc": "2019-05-27T23:43:17.055Z"},
{"_id": "69.abc",
"ts_utc": "2019-05-27T23:43:17.147Z"},
{"_id": "69.def",
"ts_utc": "2019-05-27T23:44:02.427Z"}
]
Essentially, I need to get the most recent record for the "42" group as well as the most recent record for the "69" group. Using the sample data above, the desired result for the "42" group would be document "42.def".
My current solution is to query each group one at a time (looping with PyMongo), sort by the ts_utc field, and limit it to one, but this is really slow.
// Requires official MongoShell 3.6+
db = db.getSiblingDB("someDB");
db.getCollection("collectionName").find(
{
"_id" : /^42\..*/
}
).sort(
{
"ts_utc" : -1.0
}
).limit(1);
Is there a faster way to get the results I'm after?
Assuming all your documents have the format displayed above, you can split the id into two parts (using the dot character) and use aggregation to find the max element per each first array (numeric) element.
That way you can do it in a one shot, instead of iterating per each group.
db.foo.aggregate([
{ $project: { id_parts : { $split: ["$_id", "."] }, ts_utc : 1 }},
{ $group: {"_id" : { $arrayElemAt: [ "$id_parts", 0 ] }, max : {$max: "$ts_utc"}}}
])
As #danh mentioned in the comment, the best way you can do is probably adding an auxiliary field to indicate the grouping. You may further index the auxiliary field to boost the performance.
Here is an ad-hoc way to derive the field and get the latest result per grouping:
db.collection.aggregate([
{
"$addFields": {
"group": {
"$arrayElemAt": [
{
"$split": [
"$_id",
"."
]
},
0
]
}
}
},
{
$sort: {
ts_utc: -1
}
},
{
"$group": {
"_id": "$group",
"doc": {
"$first": "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
}
])
Here is the Mongo playground for your reference.

MongoDB 4.1 TotalRecords and Data in Aggregate

I´m facing a problem because I´m new at mongo but I want to solve it.
I have different collections which I aggregate with lookups which is working perfect.
But now, I want to have the sum of total records in the header of my result.
My first problem now is that my actor relation is an array and my second problem is that I don´t know how to divide TotalCount and data from each other in the response.
The result should look like:
{
"totalRecords": 12,
"itemsPerPage": 10,
"docs": {
"_id": "7429437848adssk",
"title": "abc"
"actors" [
{"name": "Mr.x" },
{"name": "Mrs.Y"}
]
}
}
I solved my aggregation without the total count with the following stages:
unwind actors
lookup on actors
group with $first on first collection and $addToSet of the actors
Result on my response without count is as expected but if I add a $count into group it counts the actors and on 1 document with 2 actors it counts 2. But I want to have a count of each document.
Could someone provide me with a simple working example on my problem?
You need to add these 3 steps at the end of your aggregation
{
$facet: {
totalRecords: [
{
$count: "totalRecords"
}
],
docs: [
{
$match: {}
}
]
}
},
{
$unwind: "$docs"
},
{
$addFields: {
totalRecords: {
$arrayElemAt: [
"$totalRecords.totalRecords",
0
]
}
}
}
MongoPlayground

MongoosJS: Best approach for a derived/calculated value

I am creating a college football betting app for my family.
Here are my schemas:
const GameSchema = new mongoose.Schema({
home: {
type: String,
required: true
},
opponent: {
type: String,
required: true
},
homeScore: Number,
opponentScore: Number,
week:{
type: Number,
required: true
},
winner: String,
userPicks: [
{
user: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
},
choosenTeam: String
}
]
});
const UserSchema = new mongoose.Schema({
name: String
});
I need to be able to calculate each user's weekly score (i.e. the number of football games they predict correctly each week) and their accumulative score (i.e. the number of games each user predicts correctly overall)
I am still very new to MongoDB and Mongoose, so I am unsure how to handle the issue. Since the Game document will never grow beyond 200 records, I think both scores should be derived or calculated from the data stored in the database.
Here are the possible solutions that I have thought of so far:
Make both scores virtual attributes, not sure how this would work for the multiple users
Persist the attributes to the document, but use middleware to re-calculate the scores, when the results for the week's games are saved to the database.
Use a static method to calculate the scores.
Any advice would be appreciated.
You could use the aggregation framework for calculating the aggregates. This is a faster alternative to Map/Reduce for common aggregation operations.
In MongoDB, a pipeline consists of a series of special operators applied to a collection to process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. For more details, please consult the documentation.
Consider running the following pipeline to get the desired result:
var pipeline = [
{ "$unwind": "$userPicks" },
{
"$group": {
"_id": {
"week": "$week",
"user": "$userPicks.user"
},
"weeklyScore": {
"$sum": {
"$cond": [
{ "$eq": ["$userPicks.chosenTeam", "$winner"] },
1, 0
]
}
}
}
},
{
"$group": {
"_id": "$_id.user",
"weeklyScores": {
"$push": {
"week": "$_id.week",
"score": "$weeklyScore"
}
},
"totalScores": { "$sum": "$weeklyScore" }
}
}
];
Game.aggregate(pipeline, function(err, results){
User.populate(results, { "path": "_id" }, function(err, results) {
if (err) throw err;
console.log(JSON.stringify(results, undefined, 4));
});
})
In the above pipeline, the first step is the $unwind operator
{ "$unwind": "$userPicks" }
which comes in quite handy when the data is stored as an array. When the unwind operator is applied on a list data field, it will generate a new record for each and every element of the list data field on which unwind is applied. It basically flattens the data.
This is a necessary operation for the next pipeline stage, the $group step where you group the flattened documents by the fields week and the "userPicks.user"
{
"$group": {
"_id": {
"week": "$week",
"user": "$userPicks.user"
},
"weeklyScore": {
"$sum": {
"$cond": [
{ "$eq": ["$userPicks.chosenTeam", "$winner"] },
1, 0
]
}
}
}
}
The $group pipeline operator is similar to the SQL's GROUP BY clause. In SQL, you can't use GROUP BY unless you use any of the aggregation functions. The same way, you have to use an aggregation function in MongoDB as well. You can read more about the aggregation functions here.
In this $group operation, the logic to calculate each user's weekly score (i.e. the number of football games they predict correctly each week) is done through the ternary operator $cond that takes a logical condition as it's first argument (if) and then returns the second argument where the evaluation is true (then) or the third argument where false (else). This makes true/false returns into 1 and 0 to feed to $sum respectively:
"$cond": [
{ "$eq": ["$userPicks.chosenTeam", "$winner"] },
1, 0
]
So, if within the document being processed the "$userPicks.chosenTeam" field is the same as the "$winner" field, the $cond operator feeds the value 1 to the sum else it sums zero value.
The second group pipeline:
{
"$group": {
"_id": "$user",
"weeklyScores": {
"$push": {
"week": "$_id.week",
"score": "$weeklyScore"
}
},
"totalScores": { "$sum": "$weeklyScore" }
}
}
takes the documents from the previous pipeline and groups them further by the user field and calculates another aggregate i.e. the total score, using the $sum accumulator operator. Within the same pipeline, you can aggregate a list of the weekly scores by using the $push operator which returns an array of expression values for each group.
One thing to note here is when executing a pipeline, MongoDB pipes operators into each other. "Pipe" here takes the Linux meaning: the output of an operator becomes the input of the following operator. The result of each operator is a new collection of documents. So Mongo executes the above pipeline as follows:
collection | $unwind | $group | $group => result
Now, when you run the aggregation pipeline in Mongoose, the results will have an _id key which is the user id and you need to populate the results on this field i.e. Mongoose will perform a "join" on the users collection and return the documents with the user schema in the results.
As a side note, to help with understanding the pipeline or to debug it should you get unexpected results, run the aggregation with just the first pipeline operator. For example, run the aggregation in mongo shell as:
db.games.aggregate([
{ "$unwind": "$userPicks" }
])
Check the result to see if the userPicks array is deconstructed properly. If that gives the expected result, add the next:
db.games.aggregate([
{ "$unwind": "$userPicks" },
{
"$group": {
"_id": {
"week": "$week",
"user": "$userPicks.user"
},
"weeklyScore": {
"$sum": {
"$cond": [
{ "$eq": ["$userPicks.chosenTeam", "$winner"] },
1, 0
]
}
}
}
}
])
Repeat the steps till you get to the final pipeline step.

MongoDB find a given field and get average value

I'd like to get the average in a collection for a given property value. What am I doing wrong?
[{name:'Bob',city:'Barcelona',trips: 1 },
{name:'Bruce',city:'Barcelona',trips: 5 },
{name:'Bruno',city:'València',trips: 2 },
{name:'Bart',city:'Barcelona',trips: 3 }]
db.x.aggregate([{$group:{city:'Barcelona', $avg:"$trips"}}]);
You need to filter the documents using the $match operator i.e. create a pipeline before the $group operator which will filter all the documents in the collection based on the given city value.
In the preceding $group operator pipeline, you can then use a null key (as denoted by the _id field) to group all the documents from the previous pipeline and get the accumulated average:
db.x.aggregate([
{ "$match": { "city": "Barcelona" } },
{ "$group": { "_id": null, "$avg": "$trips" } }
]);
Another approach (not as optimal as the above) would be to group all the documents in the collection by the city key and then filter afterwards:
db.x.aggregate([
{ "$group": { "_id": "$city", "$avg": "$trips" } },
{ "$match": { "_id": "Barcelona" } }
]);

MongoDB Nested Array Intersection Query

and thank you in advance for your help.
I have a mongoDB database structured like this:
{
'_id' : objectID(...),
'userID' : id,
'movies' : [{
'movieID' : movieID,
'rating' : rating
}]
}
My question is:
I want to search for a specific user that has 'userID' : 3, for example, get all is movies, then i want to get all the other users that have at least, 15 or more movies with the same 'movieID', then with that group i wanna select only the users that have those 15 movies in similarity and have one extra 'movieID' that i choose.
I already tried aggregation, but failed, and if i do single queries like getting all the users movies from a user, the cycling every user movie and comparing it takes a bunch of time.
Any ideias?
Thank you
There are a couple of ways to do this using the aggregation framework
Just a simple set of data for example:
{
"_id" : ObjectId("538181738d6bd23253654690"),
"movies": [
{ "_id": 1, "rating": 5 },
{ "_id": 2, "rating": 6 },
{ "_id": 3, "rating": 7 }
]
},
{
"_id" : ObjectId("538181738d6bd23253654691"),
"movies": [
{ "_id": 1, "rating": 5 },
{ "_id": 4, "rating": 6 },
{ "_id": 2, "rating": 7 }
]
},
{
"_id" : ObjectId("538181738d6bd23253654692"),
"movies": [
{ "_id": 2, "rating": 5 },
{ "_id": 5, "rating": 6 },
{ "_id": 6, "rating": 7 }
]
}
Using the first "user" as an example, now you want to find if any of the other two users have at least two of the same movies.
For MongoDB 2.6 and upwards you can simply use the $setIntersection operator along with the $size operator:
db.users.aggregate([
// Match the possible documents to reduce the working set
{ "$match": {
"_id": { "$ne": ObjectId("538181738d6bd23253654690") },
"movies._id": { "$in": [ 1, 2, 3 ] },
"$and": [
{ "movies": { "$not": { "$size": 1 } } }
]
}},
// Project a copy of the document if you want to keep more than `_id`
{ "$project": {
"_id": {
"_id": "$_id",
"movies": "$movies"
},
"movies": 1,
}},
// Unwind the array
{ "$unwind": "$movies" },
// Build the array back with just `_id` values
{ "$group": {
"_id": "$_id",
"movies": { "$push": "$movies._id" }
}},
// Find the "set intersection" of the two arrays
{ "$project": {
"movies": {
"$size": {
"$setIntersection": [
[ 1, 2, 3 ],
"$movies"
]
}
}
}},
// Filter the results to those that actually match
{ "$match": { "movies": { "$gte": 2 } } }
])
This is still possible in earlier versions of MongoDB that do not have those operators, just using a few more steps:
db.users.aggregate([
// Match the possible documents to reduce the working set
{ "$match": {
"_id": { "$ne": ObjectId("538181738d6bd23253654690") },
"movies._id": { "$in": [ 1, 2, 3 ] },
"$and": [
{ "movies": { "$not": { "$size": 1 } } }
]
}},
// Project a copy of the document along with the "set" to match
{ "$project": {
"_id": {
"_id": "$_id",
"movies": "$movies"
},
"movies": 1,
"set": { "$cond": [ 1, [ 1, 2, 3 ], 0 ] }
}},
// Unwind both those arrays
{ "$unwind": "$movies" },
{ "$unwind": "$set" },
// Group back the count where both `_id` values are equal
{ "$group": {
"_id": "$_id",
"movies": {
"$sum": {
"$cond":[
{ "$eq": [ "$movies._id", "$set" ] },
1,
0
]
}
}
}},
// Filter the results to those that actually match
{ "$match": { "movies": { "$gte": 2 } } }
])
In Detail
That may be a bit to take in, so we can take a look at each stage and break those down to see what they are doing.
$match : You do not want to operate on every document in the collection so this is an opportunity to remove the items that are not possibly matches even if there still is more work to do to find the exact ones. So the obvious things are to exclude the same "user" and then only match the documents that have at least one of the same movies as was found for that "user".
The next thing that makes sense is to consider that when you want to match n entries then only documents that have a "movies" array that is larger than n-1 can possibly actually contain matches. The use of $and here looks funny and is not required specifically, but if the required matches were 4 then that actual part of the statement would look like this:
"$and": [
{ "movies": { "$not": { "$size": 1 } } },
{ "movies": { "$not": { "$size": 2 } } },
{ "movies": { "$not": { "$size": 3 } } }
]
So you basically "rule out" arrays that are not possibly long enough to have n matches. Noting here that this $size operator in the query form is different to $size for the aggregation framework. There is no way for example to use this with an inequality operator such as $gt is it's purpose is to specifically match the requested "size". Hence this query form to specify all of the possible sizes that are less than.
$project : There are a few purposes in this statement, of which some differ depending on the MongoDB version you have. Firstly, and optionally, a document copy is being kept under the _id value so that these fields are not modified by the rest of the steps. The other part here is keeping the "movies" array at the top of the document as a copy for the next stage.
What is also happening in the version presented for pre 2.6 versions is there is an additional array representing the _id values for the "movies" to match. The usage of the $cond operator here is just a way of creating a "literal" representation of the array. Funny enough, MongoDB 2.6 introduces an operator known as $literal to do exactly this without the funny way we are using $cond right here.
$unwind : To do anything further the movies array needs to be unwound as in either case it is the only way to isolate the existing _id values for the entries that need to be matched against the "set". So for the pre 2.6 version you need to "unwind" both of the arrays that are present.
$group : For MongoDB 2.6 and greater you are just grouping back to an array that only contains the _id values of the movies with the "ratings" removed.
Pre 2.6 since all values are presented "side by side" ( and with lots of duplication ) you are doing a comparison of the two values to see if they are the same. Where that is true, this tells the $cond operator statement to return a value of 1 or 0 where the condition is false. This is directly passed back through $sum to total up the number of matching elements in the array to the required "set".
$project: Where this is the different part for MongoDB 2.6 and greater is that since you have pushed back an array of the "movies" _id values you are then using $setIntersection to directly compare those arrays. As the result of this is an array containing the elements that are the same, this is then wrapped in a $size operator in order to determine how many elements were returned in that matching set.
$match: Is the final stage that has been implemented here which does the clear step of matching only those documents whose count of intersecting elements was greater than or equal to the required number.
Final
That is basically how you do it. Prior to 2.6 is a bit clunkier and will require a bit more memory due to the expansion that is done by duplicating each array member that is found by all of the possible values of the set, but it still is a valid way to do this.
All you need to do is apply this with the greater n matching values to meet your conditions, and of course make sure your original user match has the required n possibilities. Otherwise just generate this on n-1 from the length of the "user's" array of "movies".