MongoosJS: Best approach for a derived/calculated value - mongodb

I am creating a college football betting app for my family.
Here are my schemas:
const GameSchema = new mongoose.Schema({
home: {
type: String,
required: true
},
opponent: {
type: String,
required: true
},
homeScore: Number,
opponentScore: Number,
week:{
type: Number,
required: true
},
winner: String,
userPicks: [
{
user: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
},
choosenTeam: String
}
]
});
const UserSchema = new mongoose.Schema({
name: String
});
I need to be able to calculate each user's weekly score (i.e. the number of football games they predict correctly each week) and their accumulative score (i.e. the number of games each user predicts correctly overall)
I am still very new to MongoDB and Mongoose, so I am unsure how to handle the issue. Since the Game document will never grow beyond 200 records, I think both scores should be derived or calculated from the data stored in the database.
Here are the possible solutions that I have thought of so far:
Make both scores virtual attributes, not sure how this would work for the multiple users
Persist the attributes to the document, but use middleware to re-calculate the scores, when the results for the week's games are saved to the database.
Use a static method to calculate the scores.
Any advice would be appreciated.

You could use the aggregation framework for calculating the aggregates. This is a faster alternative to Map/Reduce for common aggregation operations.
In MongoDB, a pipeline consists of a series of special operators applied to a collection to process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. For more details, please consult the documentation.
Consider running the following pipeline to get the desired result:
var pipeline = [
{ "$unwind": "$userPicks" },
{
"$group": {
"_id": {
"week": "$week",
"user": "$userPicks.user"
},
"weeklyScore": {
"$sum": {
"$cond": [
{ "$eq": ["$userPicks.chosenTeam", "$winner"] },
1, 0
]
}
}
}
},
{
"$group": {
"_id": "$_id.user",
"weeklyScores": {
"$push": {
"week": "$_id.week",
"score": "$weeklyScore"
}
},
"totalScores": { "$sum": "$weeklyScore" }
}
}
];
Game.aggregate(pipeline, function(err, results){
User.populate(results, { "path": "_id" }, function(err, results) {
if (err) throw err;
console.log(JSON.stringify(results, undefined, 4));
});
})
In the above pipeline, the first step is the $unwind operator
{ "$unwind": "$userPicks" }
which comes in quite handy when the data is stored as an array. When the unwind operator is applied on a list data field, it will generate a new record for each and every element of the list data field on which unwind is applied. It basically flattens the data.
This is a necessary operation for the next pipeline stage, the $group step where you group the flattened documents by the fields week and the "userPicks.user"
{
"$group": {
"_id": {
"week": "$week",
"user": "$userPicks.user"
},
"weeklyScore": {
"$sum": {
"$cond": [
{ "$eq": ["$userPicks.chosenTeam", "$winner"] },
1, 0
]
}
}
}
}
The $group pipeline operator is similar to the SQL's GROUP BY clause. In SQL, you can't use GROUP BY unless you use any of the aggregation functions. The same way, you have to use an aggregation function in MongoDB as well. You can read more about the aggregation functions here.
In this $group operation, the logic to calculate each user's weekly score (i.e. the number of football games they predict correctly each week) is done through the ternary operator $cond that takes a logical condition as it's first argument (if) and then returns the second argument where the evaluation is true (then) or the third argument where false (else). This makes true/false returns into 1 and 0 to feed to $sum respectively:
"$cond": [
{ "$eq": ["$userPicks.chosenTeam", "$winner"] },
1, 0
]
So, if within the document being processed the "$userPicks.chosenTeam" field is the same as the "$winner" field, the $cond operator feeds the value 1 to the sum else it sums zero value.
The second group pipeline:
{
"$group": {
"_id": "$user",
"weeklyScores": {
"$push": {
"week": "$_id.week",
"score": "$weeklyScore"
}
},
"totalScores": { "$sum": "$weeklyScore" }
}
}
takes the documents from the previous pipeline and groups them further by the user field and calculates another aggregate i.e. the total score, using the $sum accumulator operator. Within the same pipeline, you can aggregate a list of the weekly scores by using the $push operator which returns an array of expression values for each group.
One thing to note here is when executing a pipeline, MongoDB pipes operators into each other. "Pipe" here takes the Linux meaning: the output of an operator becomes the input of the following operator. The result of each operator is a new collection of documents. So Mongo executes the above pipeline as follows:
collection | $unwind | $group | $group => result
Now, when you run the aggregation pipeline in Mongoose, the results will have an _id key which is the user id and you need to populate the results on this field i.e. Mongoose will perform a "join" on the users collection and return the documents with the user schema in the results.
As a side note, to help with understanding the pipeline or to debug it should you get unexpected results, run the aggregation with just the first pipeline operator. For example, run the aggregation in mongo shell as:
db.games.aggregate([
{ "$unwind": "$userPicks" }
])
Check the result to see if the userPicks array is deconstructed properly. If that gives the expected result, add the next:
db.games.aggregate([
{ "$unwind": "$userPicks" },
{
"$group": {
"_id": {
"week": "$week",
"user": "$userPicks.user"
},
"weeklyScore": {
"$sum": {
"$cond": [
{ "$eq": ["$userPicks.chosenTeam", "$winner"] },
1, 0
]
}
}
}
}
])
Repeat the steps till you get to the final pipeline step.

Related

How to group documents of a collection to a map with unique field values as key and count of documents as mapped value in mongodb?

I need a mongodb query to get the list or map of values with unique value of the field(f) as the key in the collection and count of documents having the same value in the field(f) as the mapped value. How can I achieve this ?
Example:
Document1: {"id":"1","name":"n1","city":"c1"}
Document2: {"id":"2","name":"n2","city":"c2"}
Document3: {"id":"3","name":"n1","city":"c3"}
Document4: {"id":"4","name":"n1","city":"c5"}
Document5: {"id":"5","name":"n2","city":"c2"}
Document6: {"id":"6,""name":"n1","city":"c8"}
Document7: {"id":"7","name":"n3","city":"c9"}
Document8: {"id":"8","name":"n2","city":"c6"}
Query result should be something like this if group by field is "name":
{"n1":"4",
"n2":"3",
"n3":"1"}
It would be nice if the list is also sorted in the descending order.
It's worth noting, using data points as field names (keys) is somewhat considered an anti-pattern and makes tooling difficult. Nonetheless if you insist on having data points as field names you can use this complicated aggregation to perform the query output you desire...
Aggregation
db.collection.aggregate([
{
$group: { _id: "$name", "count": { "$sum": 1} }
},
{
$sort: { "count": -1 }
},
{
$group: { _id: null, "values": { "$push": { "name": "$_id", "count": "$count" } } }
},
{
$project:
{
_id: 0,
results:
{
$arrayToObject:
{
$map:
{
input: "$values",
as: "pair",
in: ["$$pair.name", "$$pair.count"]
}
}
}
}
},
{
$replaceRoot: { newRoot: "$results" }
}
])
Aggregation Explanation
This is a 5 stage aggregation consisting of the following...
$group - get the count of the data as required by name.
$sort - sort the results with count descending.
$group - place results into an array for the next stage.
$project - use the $arrayToObject and $map to pivot the data such
that a data point can be a field name.
$replaceRoot - make results the top level fields.
Sample Results
{ "n1" : 4, "n2" : 3, "n3" : 1 }
For whatever reason, you show desired results having count as a string, but my results show the count as an integer. I assume that is not an issue, and may actually be preferred.

How to use two MongoDB aggregations to perform an updateMany

I am trying to write a script that uses 2 aggregates and saves the results as an array to be used for an updateMany.
The first aggregate finds any documents that has a firstTrackingId and a secondTrackingId on it. I save this into an array. This aggregate is working correctly when tested alone.
The second aggregate will use the first aggregate's result array, pulling all documents that have a firstTrackingId from the first aggregate's results. This one will pull any documents that do NOT have a secondTrackingId on it, and save the unique mongo _id/ObjectId to an array.
The updateMany will use all of the results from the second aggregation to update all relevant documents with a status of void.
All these functions are working when I give them hard-coded data, but I can't figure out how to pull the data from the arrays. I am not even sure if I'm "saving" it correctly, or if there is something else I should be doing aside from just initializing the aggregation as an array.
var ids = db.getCollection('Test').aggregate([
{
$match: {
"firstTrackingId": { "$ne": "" },
"secondTrackingId": { "$exists": true }
}
},
{
$group: {
_id: "$firstTrackingId",
}
},
])
var secondIds = db.getCollection('Test').aggregate([
{
$match: {
"firstTrackingId": { $in: ids },
"secondTrackingId": { $exists: false }
}
},
{
$group: {
"_id": "$_id",
}
},
])
db.getCollection('Test').updateMany({
"_id": {
"$in": secondIds
},
}, { $set: {
"status": "VOID"
} })
I tried printing the first aggregation's results out... can't really figure out how... so for the first one if I do:
print(ids.next(ids._id))
I get:
[object BSON]
Which leads me to believe I need to somehow perform an $objectToArray. If anyone has any insight, that'd be awesome. Thank you!
If you are using MongoDB 4.4+, you can do that with a single aggregation pipeline:
match documents with both first and second tracking ID
lookup an array of all documents with the same first tracking ID
unwind the array
consider the array elements as the root document
match to eliminate any that have a second tracking ID
set the desired status field
merge the results with the original collection
{$match: {
firstTrackingId: { $ne: "" },
secondTrackingId: { $exists: true }
}},
{$lookup:{
from: "Test",
localField:"firstTrackingId",
foreignField:"firstTrackingId",
as:"matched"
}},
{$unwind:"$matched"},
{$replaceRoot:{newRoot:"$matched"}},
{$match:{secondTrackingId:{"$exists":false}}},
{$addFields:{status:"VOID"}},
{$merge: {into: "Test"}}

MapReduce: aggregate in map function?

Suppose you have a DB where every document is a tweet from Twitter, and you want, with MapReduce, to generate another document that contains:
Number of tweets published on every country
List of words contained in those tweets, with a counter that counts the total hits of that word. This, for every country too.
My question: is it fine to aggregate and count the words on the map function, and then again on the reduce function? Doing it like this, the output of the map function represents the information of a single tweet, and the reduce function aggregates the info from several tweets, all from the same country, but I don't know if this is a good practice with the MapReduce algorithm...
Thank you in advance!
In mongoDB 3.4 you can do this process with aggregation framework.
For the first bullet, you just have to use $group operator at the country field and count the tweets.
For the second bullet, you have to use $split(new in 3.4) operator at the field of the tweet text, then use $unwind at the generated array and finally use $group with word as _id or country + word as _id.
If you have an older version of mongodb then you have to use map-reduce procedure but, have in mind, aggregation framework is much faster than map-reduce at mongodb.
$split: https://docs.mongodb.com/manual/reference/operator/aggregation/split/#exp._S_split
$unwind: https://docs.mongodb.com/manual/reference/operator/aggregation/unwind/
$group: https://docs.mongodb.com/manual/reference/operator/aggregation/group/
Building from the great answer above by Moi Syme, you ideally would run the following aggregate operation to get the desired result:
db.tweets.aggregate([
{ "$project": { "wordList": { "$split": [ "$text", " " ] }, "user.country": 1 } },
{ "$unwind": "$wordList" },
{
"$group": {
"_id": {
"country": "$user.country",
"word": "$wordList"
},
"count": { "$sum": 1 }
}
},
{
"$group": {
"_id": "$_id.country",
"numberOfTweets": { "$sum": 1 },
"counts": {
"$push": {
"word": "$_id.word",
"count": "$count"
}
}
}
}
])

Finding documents based on the minimum value in an array

my document structure is something like :
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 12,
},
{
source: 'b',
value: 10,
},
...
]
},
{
_id: ...,
key1: ....
key2: ....
....
min_value: //should be the minimum of all the values in options
options: [
{
source: 'a',
value: 24,
},
{
source: 'b',
value: 36,
},
...
]
}
the value of various sources in options will keep getting updated on a frequent basis(evey few mins or hours),
assume the size of options array doesnt change, i.e. no extra elements are added to the list
my queries are of the following type:
-find all documents where the min_value of all the options falls between some limit.
I could first do an unwind on options(and then take min) and then run comparison queries, but I am new to mongo and not sure how performance
is affected by unwind operation. The number of documents of this type would be about a few million.
Or does anyone has any suggestions around changing the document structure which could help me simplify this query? ( apart from creating separate documents per source - it would involves lot of data duplication )
Thanks!
Using $unwind is indeed quite expensive, most notably so with larger arrays, but there is a cost in all cases of usage. There are a couple of way to approach not needing $unwind here without real structural changes.
Pure Aggregation
In the basic case, as of MongoDB 3.2.x release series the $min operator can work directly on an array of values in a "projection" sense in addition to it's standard grouping accumulator role. This means that with the help of the related $map operator for processing elements of an array, you can then get the minimal value without using $unwind:
db.collection.aggregate([
// Still makes sense to use an index to select only possible documents
{ "$match": {
"options": {
"$elemMatch": {
"value": { "$gte": minValue, "$lt": maxValue }
}
}
}},
// Provides a logical filter to remove non-matching documents
{ "$redact": {
"$cond": {
"if": {
"$let": {
"vars": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
},
"in": { "$and": [
{ "$gte": [ "$$min_value", minValue ] },
{ "$lt": [ "$$min_value", maxValue ] }
]}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
// Optionally return the min_value as a field
{ "$project": {
"min_value": {
"$min": {
"$map": {
"input": "$options",
"as": "option",
"in": "$$option.value"
}
}
}
}}
])
The basic case is to get the "minimum" value from the array ( done inside of $let since we want to use the result "twice" in logical conditions. Helps us not repeat ourselves ) is to first extract the "value" data from the "options" array. This is done using $map.
The output of $map is an array with just those values, so this is supplied as the argument to $min, which then returns the minimum value for that array.
Using $redact is sort of like a $match pipeline stage with the difference that rather than needing a field to be "present" in the document being examined, you instead just form a logical condition with calculations.
In this case the condition is $and where "both" the logical forms of $gte and $lt return true against the calculated value ( from $let as "$$min_value" ).
The $redact stage then has the special arguments to apply to $$KEEP the document when the condition is true or $$PRUNE the document from results when it is false.
It's all very much like doing $project and then $match to actually project the value into the document before filtering in another stage, but all done in one stage. Of course you might actually want to $project the resulting field in what you return, but it generally cuts the workload if you remove non-matched documents "first" using $redact instead.
Updating Documents
Of course I think the best option is to actually keep the "min_value" field in the document rather than work it out at run-time. So this is a very simple thing to do when adding to or altering array items during update.
For this there is the $min "update" operator. Use it when appending with $push:
db.collection.update({
{ "_id": id },
{
"$push": { "options": { "source": "a", "value": 9 } },
"$min": { "min_value": 9 }
}
})
Or when updating a value of an element:
db.collection.update({
{ "_id": id, "options.source": "a" },
{
"$set": { "options.$.value": 9 },
"$min": { "min_value": 9 }
}
})
If the current "min_value" in the document is greater than the argument in $min or the key does not yet exist then the value given will be written. If it is greater than, the existing value stays in place since it is already the smaller value.
You can even set all your existing data with a simple "bulk" operations update:
var ops = [];
db.collection.find({ "min_value": { "$exists": false } }).forEach(function(doc) {
// Queue operations
ops.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": {
"$min": {
"min_value": Math.min.apply(
null,
doc.options.map(function(option) {
return option.value
})
)
}
}
}
});
// Write once in 1000 documents
if ( ops.length == 1000 ) {
db.collection.bulkWrite(ops);
ops = [];
}
});
// Clear any remaining operations
if ( ops.length > 0 )
db.collection.bulkWrite(ops);
Then with a field in place, it is just a simple range selection:
db.collection.find({
"min_value": {
"$gte": minValue, "$lt": maxValue
}
})
So it really should be in your best interests to keep a field ( or fields if you regularly need different conditions ) in the document since that provides the most efficient query.
Of course, the new functions of aggregation $min along with $map also make this viable to use without a field, if you prefer more dynamic conditions.

How do I use mongodb to count only collections that match two fields

I have some documents that have a newOne: true or false and that have an owner tag on them. I want to count all fields that have both newOne : true and the owner field equal to "MSlaton" How do I go about this in mongodb?
Thank you!
You could use the count() method as
db.collection.count( { "newOne": true, "owner": "MSlaton" } )
which is equivalent to
db.collection.find( { "newOne": true, "owner": "MSlaton" } ).count()
Another route, albeit slower, would be via the aggregation framework where you run the following aggregation operation to get the count:
db.collection.aggregate([
{ "$match" : { "newOne": true, "owner": "MSlaton" } },
{ "$group": { "_id": null, "count": { "$sum": 1 } } }
]);
The aggregation operation is slower since it reads each and every document in the collection and processes it which can only be halfway in the same order of magnitude with count() when doing it over only a significantly large collection.