Maintaining an embedded array with top 3 elements - mongodb

I'm currently working on a mobile car racing game.
After a user finishes a track, a new document is added to "Plays" collections.
Also, if the user finishes the track 3rd/2nd/1st in time. the user id and time will be added to the "best" array of this track. (and the new 4th place user will be removed from the array).
Since 2+ users can finish a track on the same time, I'll probably need to make this atomic. so I've used findAndModify.
So far I've managed to do it well if I only maintain the 1st position in the array. this is what I did:
db.collection('tracks').findAndModify(
{ $or: [ {_id: track_id, 'best': {$exists: false}}, {_id: track_id,'best.0.time': {$gt: _time}} ] },
[],
{$set : {'best.0' : {'user_id': _userId, 'time': _time} }},
(err, data) => {
if (err) return app_res.send(err);
app_res.send (data.value != null);
}
);
But My goal is to maintain the 3 best.
I've looked in the MongoDB documentation for array operators but I can't understand how (and if) they can't help me achieve my goal.
Is there anyway I can do it?
EDIT: Just to make this more clear, the top 3 indicates the top 3 users and their top times. for example, if "best" array is:
1. user: a, time : 5.
2. user: b, time : 9.
3. user: c, time : 20.
and than user c finish the track in 7 seconds, than "best" changes to:
1. user: a, time : 5.
2. user: c, time : 7.
3. user: b, time : 9.
My Schema:
Users:
{
"_id": {
"$oid": "123"
},
"name": "A name"
}
Tracks:
{
"_id": {
"$oid": "765"
},
"name": "A track name",
"length": 34.65,
"best": [{"user_id": 467,"time": 24},{"user_id": 532,"time": 47},{"user_id": 953,"time": 89}]
}
Plays:
{
"_id": {
"$oid": "1"
},
"time": 300000,
"date": {
"$date": "2018-08-15T14:05:47.872Z"
},
"user_id": {
"$oid": "123"
},
"track_id": {
"$oid": "765"
}
}

Here is how you'd do that - using some special modifiers that can be used with $push:
db.tracks.update({}, {
$push: {
"best": {
$each: [ {"user_id": 123,"time": 1} ], // add a new item to the "best" array
$slice: 3, // keep only top three
$sort: { "time": 1 } // rank/sort based on "time" field
}
}
})

Related

multi-stage aggregation pipeline matching data based on fields retrieved through $lookup

I'm trying to build a complex, nested aggregation pipeline in MongoDB (4.4.9 Community Edition, using the pymongo driver for Python 3.10).
There are relevant data points in different collections which I want to aggregate into one, NEW (ideally) view (or, if that doesn't work) collection.
The collections, and the relevant fields therein follow a hierarchy. There is members, which contains the top-level key on which other data is to be merged,
membershipNumber.
> members.find_one()
{'_id': ObjectId('61153299af6122XXXXXXXXXXXXX'), 'membershipNumber': 'N03XXXXXX'}
Then, there's a different collection, which contains membershipNumber, but also a different, linked field, an_user_id. an_user_id is used in other collections to denote records/fields in arrays that pertain to that particular user.
I 'join' members and an_users like so:
result = members.aggregate([
{
'$lookup': {
'from': 'an_users',
'localField': 'membershipNumber',
'foreignField': 'memref',
'as': 'an_users'
}
},
{ '$unwind' : '$an_users' },
{
'$project' : {
'_id' : 1,
'membershipNumber' : 1,
'an_user_id' : '$an_users.user_id'
}
}
]);
So far so good, this returns the desired, aggregated record:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX'}
Now, I have a third collection, which contains the an_user_id as a string in arrays, denoting wherever that user clicked a given email, whereby a record is an email (and the an_user_ids in the clicks array are users that clicked a link in that email.
{'_id': ObjectId('blah'),
'email_id': '407XXX',
'actions_count': 17,
'administrative_title': 'test',
'bounce': ['3440XXXX'],
'click': ['38294CCC',
'418FFFF',
'48XXXXXX',
'38eGGGG'}
I want to count the number occurences of a given an_user_id (which I've attained from aggregating) in arrays (e.g. clicks, bounces, opens) in the emails collection, and include it in the .aggregate call, to retrieve something like this:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX',
'n_email_clicks' : 412,
'n_email_bounces' : 12
}
Further, I might want to also attach counts of an_user_id in other collections in my DB.
Consider, e.g., this collection called events:
{
"_id": "617ffa96ee11844e143a63dd",
"id": "12345",
"administrative_title": "my_event",
"created_at": {
"$date": "2020-01-15T16:28:50.000Z"
},
"event_creator_id": "123456",
"event_title": "my_event",
"group_id": "123456",
"permalink": "event_id",
"rsvp_count": 54,
"rsvps": [{
"rsvp_id": "56789",
"display_name": "John Doe",
"rsvp_user_id": "48XXXXXX",
"rsvp_created_at": {
"$date": "2020-01-28T15:38:50.000Z"
},
"rsvp_updated_at": {
"$date": "2020-01-28T15:38:50.000Z"
},
"first_name": "John",
"last_name": "Doe",
}, {
"rsvp_id": "543895",
"display_name": "James Appleslice",
"rsvp_user_id": "N03XXXXXX",
"rsvp_created_at": {
"$date": "2020-02-05T13:15:14.000Z"
},
"rsvp_updated_at": {
"$date": "2020-02-05T13:15:14.000Z"
},
"first_name": "James",
"last_name": "Appleslice"}
]
}
So, the end-product would look something like this:
{'_id': ObjectId('61153253aBBBBBBBBBBBB'),
'membershipNumber': 'N0XXXXXXXX',
'an_user_id': '48XXXXXX',
'n_email_clicks' : 412,
'n_email_bounces' : 12,
'n_rsvps' : 12
}
My idea was to use the $lookup parameter -- however, I only know how to use this for matching on fields that I have in the parent collection that I'm performing the aggregation on, but not on fields that have been generated in the process of the aggregation.
Any help would be hugely appreciated!
You could use $lookup pipeline. First you would $lookup the user id followed by another $lookup to verify if the user id exists in email. Lastly few more stages to collect the results and format per your need. Furthermore, you can add $out stage if you would like to write the results into another collection.
db.members.aggregate([{
$lookup: {
from: "an_users",
let: {
membershipNumber: "$membershipNumber"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$memref",
"$$membershipNumber"
]
},
}
},
{
"$lookup": {
"from": "emails",
"localField": "user_id",
"foreignField": "click",
"as": "clicks"
}
},
{
"$project": {
"_id": 1,
"membershipNumber": 1,
"an_user_id": "$user_id",
"n_email_clicks": {
$size: "$clicks"
}
}
}
],
as: "details"
}
},
{
$replaceRoot: {
newRoot: {
$mergeObjects: [
{
$arrayElemAt: [
"$details",
0
]
},
"$$ROOT"
]
}
}
},
{
$project: {
details: 0
}
}])
Working example - https://mongoplayground.net/p/yrFsNp44hpi

MongoDB $setUnion on object ($setUnion but with additional information)

stackoverflow community,
I do not often work with big Arrays of Objects within in mongodb
so I have no idea how to solve this problem:
1.
i am working within one file, so obviously it's an aggregate witch firstly does an {$match:{"_id" : ObjectId("5c3f5cb04147b3082648278b") }},
2.
ok now I have another step that $project + $filter to filter out some objects, but it is not important for this (i think)
I have an array of objects, similar to this
{
"_id": ObjectId(".."),
"data":
[
{
id : 01,
groupId: 22,
noteId: 876543
},
{
id : 02,
groupId: 33,
noteId: 767676
},
{
id : 03,
groupId: 22,
noteId: 876543
},
{
id : 04,
groupId: 76,
noteId: 876543
}
]
}
but with thousands of entries and more values per object.
Every groupId can have any noteId, but the same groups have always the same noteId.
The Problem: noteIds can be shared between groups.
I added this
{ $project: {
"groupIds": {"$setUnion": "$data.groupId"}
}}
witch gives me all the groupIds
but it is very important that I also get all the related noteId's because
it is an arbitrary ID in relation with nothing else.
is it possible to somehow union an object by a specified field?
or is there another way to solve this? If I maybe filter for Objects with $in($data.groupId, $setUnion('union from above') I still would not know how to only extract the 2 fields that I need.
thanks for your help in advance
H.M.
You can use below aggregation
db.collection.aggregate([
{ "$unwind": "$data" },
{ "$group": {
"_id": {
"_id": "$_id",
"groupId": "$data.groupId"
},
"noteIds": {
"$push": {
"noteId": "$data.noteId",
}
}
}},
{ "$group": {
"_id": "$_id._id",
"data": {
"$push": {
"groupId": "$_id.groupId",
"noteIds": "$noteIds"
}
}
}}
])

Mongodb aggregate to find if a user is in any other user's follower list

I collected followers list and friends list for n number of users from twitter and stored them in mongodb.
Here is a sample document:
{
"_id": ObjectId("561d6f8986a0ea57e51ec95c"),
"status": "True",
"UserId": "1489245878",
"followers": [
"1566382441",
"1155774331"
],
"followersCount": 2,
"friendsCount": 5,
"friends": [
"1135511478",
"998082481",
"565321118",
"848123988",
"343334562"
]
}
I wanted to know within my collection, are there any userids that are also in the followers list of some other documents. Lets say we have user "a", now i would like to know if user "a" is in the followers list of any other document within the same collection. I'm not sure how to do this. In case if we have, i would like to project the userid and the _id of the document that has the userid within the followers list.
I guess you can use aggregate function like below to get this result.
db.getCollection('your_collection").aggregate([
{
"$match": {
"followers": "1566382441"
}
},
{
"$project": {
"followers": 1
}
},
{
"$unwind": "$followers"
},
{
"$match": {
"followers": "1566382441"
}
},
{
"$group": {
"_id": "$followers",
"ids": {
"$addToSet": "$_id"
}
}
},
{
"$project": {
"userId": "$_id",
"ids": 1,
"_id": 0
}
}
])
I am using only a sample of your data. You can add your list of users for whom you are trying to filter in both stages of "$match". Just see if this helps.
P.S: I know its been a long time since you asked this question! But you know, its never late!

Sub-query in MongoDB

I have two collections in MongoDB, one with users and one with actions. Users look roughly like:
{_id: ObjectId("xxxxx"), country: "UK",...}
and actions like
{_id: ObjectId("yyyyy"), createdAt: ISODate(), user: ObjectId("xxxxx"),...}
I am trying to count events and distinct users split by country. The first half of which is working fine, however when I try to add in a sub-query to pull the country I only get nulls out for country
db.events.aggregate({
$match: {
createdAt: { $gte: ISODate("2013-01-01T00:00:00Z") },
user: { $exists: true }
}
},
{
$group: {
_id: {
year: { $year: "$createdAt" },
user_obj: "$user"
},
count: { $sum: 1 }
}
},
{
$group: {
_id: {
year: "$_id.year",
country: db.users.findOne({
_id: { $eq: "$_id.user_obj" },
country: { $exists: true }
}).country
},
total: { $sum: "$count" },
distinct: { $sum: 1 }
}
})
No Joins in here, just us bears
So MongoDB "does not do joins". You might have tried something like this in the shell for example:
db.events.find().forEach(function(event) {
event.user = db.user.findOne({ "_id": eventUser });
printjson(event)
})
But this does not do what you seem to think it does. It actually does exactly what it looks like and, runs a query on the "user" collection for every item that is returned from the "events" collection, both "to and from" the "client" and is not run on the server.
For the same reasons your 'embedded' statement within an aggregation pipeline does not work like that. Unlike the above the "whole pipeline" logic is sent to the server before execution. So if you did something like this to 'select "UK" users:
db.events.aggregate([
{ "$match": {
"user": {
"$in": db.users.distinct("_id",{ "country": "UK" })
}
}}
])
Then that .distinct() query is actually evaluated on the "client" and not the server and therefore not having availability to any document values in the aggregation pipeline. So the .distinct() runs first, returns it's array as an argument and then the whole pipeline is sent to the server. That is the order of execution.
Correcting
You need at least some level of de-normalization for the sort of query you want to run to work. So you generally have two choices:
Embed your whole user object data within the event data.
At least embed "some" of the user object data within the event data. In this case "country" becasue you are going to use it.
So then if you follow the "second" case there and at least "extend" your existing data a little to include the "country" like this:
{
"_id": ObjectId("yyyyy"),
"createdAt": ISODate(),
"user": {
"_id": ObjectId("xxxxx"),
"country": "UK"
}
}
Then the "aggregation" process becomes simple:
db.events.aggregate([
{ "$match": {
"createdAt": { "$gte": ISODate("2013-01-01T00:00:00Z") },
"user": { "$exists": true }
}},
{ "$group": {
"_id": {
"year": { "$year": "$createdAt" },
"user_id": "$user._id"
"country": "$user.country"
},
"count": { "$sum": 1 }
}},
{ "$group": {
"_id": "$_id.country",
"total": { "$sum": "$count" },
"distinct": { "$sum": 1 }
}}
])
We're not normal
Fixing your data to include the information it needs on a single collection where we "do not do joins" is a relatively simple process. Just really a variant on the original query sample above:
var bulk = db.events.intitializeUnorderedBulkOp(),
count = 0;
db.users.find().forEach(function(user) {
// update multiple events for user
bulk.find({ "user": user._id }).update({
"$set": { "user": { "_id": user._id, "country": user.country } }
});
count++;
// Send batch every 1000
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.events.intitializeUnorderedBulkOp();
}
});
// Clear any queued
if ( count % 1000 != 0 )
bulk.execute();
So that's what it's all about. Individual queries to a MongoDB server get "one collection" and "one collection only" to work with. Even the fantastic "Bulk Operations" as shown above can still only be "batched" on a single collection.
If you want to do things like "aggregate on related properties", then you "must" contain those properties in the collection you are aggregating data for. It is perfectly okay to live with having data sitting in separate collections, as for instance "users" would generally have more information attached to them than just and "_id" and a "country".
But the point here is if you need "country" for analysis of "event" data by "user", then include it in the data as well. The most efficient server join is a "pre-join", which is the theory in practice here in general.

Insert if not exists, else remove MongoDB

So I have a query in MongoDB (2.6.4) where I am trying to implement a simple upvote/downvote mechanism. When a user clicks upvote, I need to do the following:
If already upvoted by user, then remove upvote.
Else if not upvoted by user, then add upvote AND remove downvote if exists.
So far, my query formed (is incorrect) is:
db.collection.aggregate([
{
$project: {
"_id" : ObjectId("53e4d45c198d7811248cefca"),
"upvote": {
"$cond":
[
{"$in": ["$upvote",1] },
{"$pull": {"upvote" : 1}},
{"$addToSet": {"upvote" : 1}, "$pull": {"downvote": 1}}
]
}
}
}
])
where '1' is the user id who is trying to upvote.
Both upvote and downvote are arrays that contain userIds of those who have upvoted and downvoted, respectively.
For output of query, I just want a bool value: true if $cond evaluated to true, else false.
That's not a good way to implement up-votes and downvotes. Aside from the aggregation framework not being a mechanism for updating documents in any way, you seem to have gravitated towards thinking it may be a solution due to the logic you want to implement. But aggregate does not update.
What you want on your, well lets call it a "question" schema is a structure like this:
{
"_id": ObjectId("53f51a844ffa9b02cf01c074"),
"upvoted": [],
"downvoted": [],
"upvoteCount": 0,
"downvoteCount": 0
}
That is something that can work well with atomic updates and actually give you some stateful information about the object at the same time.
For the "upvoted" and "downvoted" arrays, we are going to consider that the "users" voting have a similar unique ObjectId value. So what we are going to do is $push or $pull from either array and also "increment/decrement" the counter values along with each of those operations.
Here's how this works for an upvote:
db.questions.update(
{
"_id": ObjectId("53f51a844ffa9b02cf01c074"),
"upvoted": { "$ne": ObjectId("53f51c0a4ffa9b02cf01c075") }
"downvoted": ObjectId("53f51c0a4ffa9b02cf01c075")
},
{
"$push": { "upvoted": ObjectId("53f51c0a4ffa9b02cf01c075") },
"$inc": { "upvoteCount": 1, "downvoteCount": -1 },
"$pull": { "downvoted": ObjectId("53f51c0a4ffa9b02cf01c075") },
}
)
db.questions.update(
{
"_id": ObjectId("53f51a844ffa9b02cf01c074"),
"upvoted": { "$ne": ObjectId("53f51c0a4ffa9b02cf01c075") }
},
{
"$push": { "upvoted": ObjectId("53f51c0a4ffa9b02cf01c075") },
"$inc": { "upvoteCount": 1 },
}
)
Actually that's two operations, which you could do with the Bulk operations API as well (probably the best way really) but it has a point to it. The first statement will only match a document where the current user has a "downvote" recorded in the array. As it, we already "pushed" that user id value to the "downvotes" array. If it is not there then no update is made. But you both push and pull from respective arrays and also "increment/decrement" the counter fields at the same time.
With the second statement which will only match something where the first did not, you make a fair assessment that now you don't need to touch "downvotes" and just handle the upvote fields. In both cases the safe thing to do is make sure that the main condition is the current user id value is not present in the "upvoted" array.
For downvotes the fields are just reversed:
db.questions.update(
{
"_id": ObjectId("53f51a844ffa9b02cf01c074"),
"downvoted": { "$ne": ObjectId("53f51c0a4ffa9b02cf01c075") }
"upvoted": ObjectId("53f51c0a4ffa9b02cf01c075")
},
{
"$pull": { "upvoted": ObjectId("53f51c0a4ffa9b02cf01c075") },
"$inc": { "upvoteCount": -1, "downvoteCount": 1 },
"$push": { "downvoted": ObjectId("53f51c0a4ffa9b02cf01c075") },
}
)
db.questions.update(
{
"_id": ObjectId("53f51a844ffa9b02cf01c074"),
"downvoted": { "$ne": ObjectId("53f51c0a4ffa9b02cf01c075") }
},
{
"$push": { "downvoted": ObjectId("53f51c0a4ffa9b02cf01c075") },
"$inc": { "downvoteCount": 1 },
}
)
Naturally you can see the logical progression to simply cancelling any "upvote/downvote" for the user in question. Also you can be smart about it if you want and expose the information in your client to not only show if the current user have already "upvoted/downvoted" but also control click actions and eliminate unnecessary requests.