Select top N rows from each group

Select top N rows from each group - mongodb

I use mongodb for my blog platform, where users can create their own blogs. All entries from all blogs are in an entries collection. The document of an entry looks like:
{
'blog_id':xxx,
'timestamp':xxx,
'title':xxx,
'content':xxx
}
As the question says, is there any way to select, say, last 3 entries for each blog?

You need to first sort the documents in the collection by the blog_id and timestamp fields, then do an initial group which creates an array of the original documents in descending order. After that you can slice the array with the documents to return the first 3 elements.
The intuition can be followed in this example:
db.entries.aggregate([
{ '$sort': { 'blog_id': 1, 'timestamp': -1 } },
{
'$group': {
'_id': '$blog_id',
'docs': { '$push': '$$ROOT' },
}
},
{
'$project': {
'top_three': {
'$slice': ['$docs', 3]
}
}
}
])

The only way to do this in basic mongo if you can live with two things :
An additional field in your entry document, let's call it "age"
A new blog entry taking an additional update
If so, here's how you do it :
Upon creating a new intro do your normal insert and then execute this update to increase the age of all posts (including the one you just inserted for this blog) :
db.entries.update({blog_id: BLOG_ID}, {age:{$inc:1}}, false, true)
When querying, use the following query which will return the most recent 3 entries for each blog :
db.entries.find({age:{$lte:3}, timestamp:{$gte:STARTOFMONTH, $lt:ENDOFMONTH}}).sort({blog_id:1, age:1})
Note that this solution is actually concurrency safe (no entries with duplicate ages).

Starting in Mongo 5.2, it's a perfect use case for the new $topN aggregation accumulator:
// { blog_id: "a", title: "plop", content: "smthg" }
// { blog_id: "b", title: "hum", content: "meh" }
// { blog_id: "a", title: "hello", content: "world" }
// { blog_id: "a", title: "what", content: "ever" }
db.collection.aggregate([
{ $group: {
_id: "$blog_id",
messages: { $topN: { n: 2, sortBy: { _id: -1 }, output: "$$ROOT" } }
}}
])
// {
// _id: "a",
// messages: [
// { blog_id: "a", title: "what", content: "ever" },
// { blog_id: "a", title: "hello", content: "world" }
// ]
// }
// {
// _id: "b",
// messages: [
// { blog_id: "b", title: "hum", content: "meh" }
// ]
// }
This applies a $topN group accumulation that:
takes for each group the top 2 (n: 2) elements
top 2, as defined by sortBy: { _id: -1 }, which in this case means by reversed order of insertion
and for each record pushes the whole record in the group's list (output: "$$ROOT") since $$ROOT represents the whole document being processed.

It's possible with group (aggregation), but this will create a full-table scan.
Do you really need exactly 3 or can you set a limit...e.g.: max 3 posts from the last week/month?

This answer using map reduce by drcosta from another question did the trick
In mongo, how do I use map reduce to get a group by ordered by most recent
mapper = function () {
emit(this.category, {top:[this.score]});
}
reducer = function (key, values) {
var scores = [];
values.forEach(
function (obj) {
obj.top.forEach(
function (score) {
scores[scores.length] = score;
});
});
scores.sort();
scores.reverse();
return {top:scores.slice(0, 3)};
}
function find_top_scores(categories) {
var query = [];
db.top_foos.find({_id:{$in:categories}}).forEach(
function (topscores) {
query[query.length] = {
category:topscores._id,
score:{$in:topscores.value.top}
};
});
return db.foo.find({$or:query});

Related

Mongodb aggregation and conditional push to array

It's been 2 days (or nights should I say) since I am trying to figure out following so would appreciate your help guys.
in mongodb I have number of orders (I will simplify documents for the case).
I want to group all documents by $campRoundId and where there are installments, push all object to installments variable.
The problem I am facing is when document.installments array is empty it pushes it to the array too.
Initial documents (same campRoundId to group by - first with no instalmments, second with 2)
[
{
_id: ObjectId("62792d8a519af6ae8cdff779"),
campRoundId: ObjectId("620a790b2cbc52006c83115a"),
installments: [],
},
{
_id: ObjectId("62792d8a519af6ae8cdff77a"),
campRoundId: ObjectId("620a790b2cbc52006c83115a"),
installments: [
{
payment: 100,
paymentStatus: false,
},
{
payment: 20,
paymentStatus: false,
},
],
},
];
my aggregation
/**
* _id: The id of the group.
* fieldN: The first field name.
*/
{
_id : "$campRoundId",
installments: {
$push: {
$cond:[
{ $gte: ["$installments.length", 1] },
"$installments",
null
]
}
}
}
I want to get rid of empty object, so if there are no installments nothing will be pushed. (dotted lines)

Update a mongo field based on another collections data

I'm looking for a way to update a field based on the sum of the data of another collection.
I tried to bring all the meals and use forEach to call the Products collection for each meal, tested if it was working, but I got a time out.
meals.find().forEach(meal => {
var products = db.Products.find(
{ sku: { $in: meal.products } },
{ _id: 1, name: 1, sku: 1, nutritional_facts: 1 }
)
printjson(products)
})
My goal was to execute something like this below to get the desired result, but I got "SyntaxError: invalid for/in left-hand side". Is not possible to use for in inside a mongo query?
db.Meals.find({}).forEach(meal => {
const nutri_facts = {};
db.Products.find({ sku: { $in: meal.products } },
{ _id: 1, name: 1, sku: 1, nutri_facts: 1 }).forEach(product => {
for (let nutriFact in product.nutri_facts) {
nutri_facts[nutriFact] =
parseFloat(nutri_facts[nutriFact]) +
parseFloat(product.nutri_facts[nutriFact]);
}
}
});
for (let nutriFact in nutri_facts) {
meal.nutri_facts[nutriFact] =
nutri_facts[nutriFact];
}
}
db.Meals.updateOne({ _id: meal._id }, meal)
});
I also had a hard time trying to figure out how to use aggregate and lookup in this case but was not successful.
Is it possible to do that?
Example - Meals Document
{
_id: ObjectId("..."),
products : ["P068","L021","L026"], //these SKUs are part of this meal
nutri_facts: {
total_fat: 5g,
calories: 100kcal
(...other properties)
}
}
For each meal I need to look for its products on 'Products' collections using 'sku' field.
Then I will sum the nutritional facts of all products to get the meal nutritional facts.
Example Products Document
{
_id: ObjectId("..."),
sku: 'A010'
nutri_facts: {
total_fat: 2g,
calories: 40kcal
(...other properties)
}
}
I know that mongo might not be the best option in this case, but the entire application is already built using it.

For each meal I need to look for its products on 'Products'
collections using 'sku' field. Then I will sum the nutritional facts
of all products to get the meal nutritional facts.
db.Meals.find( { } ).forEach( meal => {
// print(meal._id);
const nutri_facts_var = { };
db.Products.find( { sku: { $in: meal.products } }, { nutri_facts: 1 }.forEach( product => {
// printjson(product.nutri_facts);
for ( let nutriFact in product.nutri_facts ) {
let units = (product.nutri_facts[nutriFact].split(/\d+/)).pop();
// print(units)
// Checks for the existence of the field and then adds or assigns
if ( nutri_facts_var[nutriFact] ) {
nutri_facts_var[nutriFact] = parseFloat( nutri_facts_var[nutriFact] ) + parseFloat( product.nutri_facts[nutriFact] );
}
else {
nutri_facts_var[nutriFact] = parseFloat( product.nutri_facts[nutriFact] );
}
nutri_facts_var[nutriFact] = nutri_facts_var[nutriFact] + units;
}
} );
// printjson(nutri_facts_var);
db.Meals.updateOne( { _id: meal._id }, { $set: { nutri_facts: nutri_facts_var } } );
} );
NOTES:
I use the variable nutri_facts_var name ( the var suffixed) so
that we can distinguish the user defined variable names easily from
the document fields names.
{ _id: 1, name: 1, sku: 1, nutri_facts: 1 } changed to {
nutri_facts: 1 }. The _id is included by default in a
projection. The fields name and sku are not needed.
db.Meals.updateOne({ _id: meal._id }, meal) is not a correct
syntax. The update operations use Update
Operators.
The corrected code: db.Meals.updateOne( { _id: meal._id }, { $set:
{ nutri_facts: nutri_facts_v } } ). Note we are updating the
nutifacts only, not all details.
Since the individual nutrifacts are stored as strings (e.g.,
"100kcal"), during arithmetic the string parts are stripped. So, we
capture the string units (e.g., "kcal") for each nutrifact and
append it later after the arithmetic. The following code strips and
stores the units part: let units =
(product.nutri_facts[nutriFact].split(/\d+/)).pop().
Use the mongo shell methods print and printjson to print the
contents of a variable or an object respectively - for debugging
purposes.
Note the query updates the Meal collection even if the nutri_facts field is not defined in it; the $set update operator creates new fields and sets the values in case the fields do not exist.

MongoDB conditionally $addToSet sub-document in array by specific field

Is there a way to conditionally $addToSet based on a specific key field in a subdocument on an array?
Here's an example of what I mean - given the collection produced by the following sample bootstrap;
cls
db.so.remove();
db.so.insert({
"Name": "fruitBowl",
"pfms" : [
{
"n" : "apples"
}
]
});
n defines a unique document key. I only want one entry with the same n value in the array at any one time. So I want to be able to update the pfms array using n so that I end up with just this;
{
"Name": "fruitBowl",
"pfms" : [
{
"n" : "apples",
"mState": 1111234
}
]
}
Here's where I am at the moment;
db.so.update({
"Name": "fruitBowl",
},{
// not allowed to do this of course
// "$pull": {
// "pfms": { n: "apples" },
// },
"$addToSet": {
"pfms": {
"$each": [
{
"n": "apples",
"mState": 1111234
}
]
}
}
}
)
Unfortunately, this adds another array element;
db.so.find().toArray();
[
{
"Name" : "fruitBowl",
"_id" : ObjectId("53ecfef5baca2b1079b0f97c"),
"pfms" : [
{
"n" : "apples"
},
{
"n" : "apples",
"mState" : 1111234
}
]
}
]
I need to effectively upsert the apples document matching on n as the unique identifier and just set mState whether or not an entry already exists. It's a shame I can't do a $pull and $addToSet in the same document (I tried).
What I really need here is dictionary semantics, but that's not an option right now, nor is breaking out the document - can anyone come up with another way?
FWIW - the existing format is a result of language/driver serialization, I didn't choose it exactly.
further
I've gotten a little further in the case where I know the array element already exists I can do this;
db.so.update({
"Name": "fruitBowl",
"pfms.n": "apples",
},{
$set: {
"pfms.$.mState": 1111234,
},
}
)
But of course that only works;
for a single array element
as long as I know it exists
The first limitation isn't a disaster, but if I can't effectively upsert or combine $addToSet with the previous $set (which of course I can't) then it the only workarounds I can think of for now mean two DB round-trips.

The $addToSet operator of course requires that the "whole" document being "added to the set" is in fact unique, so you cannot change "part" of the document or otherwise consider it to be a "partial match".
You stumbled on to your best approach using $pull to remove any element with the "key" field that would result in "duplicates", but of course you cannot modify the same path in different update operators like that.
So the closest thing you will get is issuing separate operations but also doing that with the "Bulk Operations API" which is introduced with MongoDB 2.6. This allows both to be sent to the server at the same time for the closest thing to a "contiguous" operations list you will get:
var bulk = db.so.initializeOrderedBulkOp();
bulk.find({ "Name": "fruitBowl", "pfms.n": "apples": }).updateOne({
"$pull": { "pfms": { "n": "apples" } }
});
bulk.find({ "Name": "fruitBowl" }).updateOne({
"$push": { "pfms": { "n": "apples", "state": 1111234 } }
})
bulk.execute();
That pretty much is your best approach if it is not possible or practical to move the elements to another collection and rely on "upserts" and $set in order to have the same functionality but on a collection rather than array.

I have faced the exact same scenario. I was inserting and removing likes from a post.
What I did is, using mongoose findOneAndUpdate function (which is similar to update or findAndModify function in mongodb).
The key concept is
Insert when the field is not present
Delete when the field is present
The insert is
findOneAndUpdate({ _id: theId, 'likes.userId': { $ne: theUserId }},
{ $push: { likes: { userId: theUserId, createdAt: new Date() }}},
{ 'new': true }, function(err, post) { // do the needful });
The delete is
findOneAndUpdate({ _id: theId, 'likes.userId': theUserId},
{ $pull: { likes: { userId: theUserId }}},
{ 'new': true }, function(err, post) { // do the needful });
This makes the whole operation atomic and there are no duplicates with respect to the userId field.
I hope this helpes. If you have any query, feel free to ask.

As far as I know MongoDB now (from v 4.2) allows to use aggregation pipelines for updates.
More or less elegant way to make it work (according to the question) looks like the following:
db.runCommand({
update: "your-collection-name",
updates: [
{
q: {},
u: {
$set: {
"pfms.$[elem]": {
"n":"apples",
"mState": NumberInt(1111234)
}
}
},
arrayFilters: [
{
"elem.n": {
$eq: "apples"
}
}
],
multi: true
}
]
})

In my scenario, The data need to be init when not existed, and update the field If existed, and the data will not be deleted. If the datas have these states, you might want to try the following method.
// Mongoose, but mostly same as mongodb
// Update the tag to user, If there existed one.
const user = await UserModel.findOneAndUpdate(
{
user: userId,
'tags.name': tag_name,
},
{
$set: {
'tags.$.description': tag_description,
},
}
)
.lean()
.exec();
// Add a default tag to user
if (user == null) {
await UserModel.findOneAndUpdate(
{
user: userId,
},
{
$push: {
tags: new Tag({
name: tag_name,
description: tag_description,
}),
},
}
);
}
This is the most clean and fast method in the scenario.

As a business analyst , I had the same problem and hopefully I have a solution to this after hours of investigation.
// The customer document:
{
"id" : "1212",
"customerCodes" : [
{
"code" : "I"
},
{
"code" : "YK"
}
]
}
// The problem : I want to insert dateField "01.01.2016" to customer documents where customerCodes subdocument has a document with code "YK" but does not have dateField. The final document must be as follows :
{
"id" : "1212",
"customerCodes" : [
{
"code" : "I"
},
{
"code" : "YK" ,
"dateField" : "01.01.2016"
}
]
}
// The solution : the solution code is in three steps :
// PART 1 - Find the customers with customerCodes "YK" but without dateField
// PART 2 - Find the index of the subdocument with "YK" in customerCodes list.
// PART 3 - Insert the value into the document
// Here is the code
// PART 1
var myCursor = db.customers.find({ customerCodes:{$elemMatch:{code:"YK", dateField:{ $exists:false} }}});
// PART 2
myCursor.forEach(function(customer){
if(customer.customerCodes != null )
{
var size = customer.customerCodes.length;
if( size > 0 )
{
var iFoundTheIndexOfSubDocument= -1;
var index = 0;
customer.customerCodes.forEach( function(clazz)
{
if( clazz.code == "YK" && clazz.changeDate == null )
{
iFoundTheIndexOfSubDocument = index;
}
index++;
})
// PART 3
// What happens here is : If i found the indice of the
// "YK" subdocument, I create "updates" document which
// corresponds to the new data to be inserted`
//
if( iFoundTheIndexOfSubDocument != -1 )
{
var toSet = "customerCodes."+ iFoundTheIndexOfSubDocument +".dateField";
var updates = {};
updates[toSet] = "01.01.2016";
db.customers.update({ "id" : customer.id } , { $set: updates });
// This statement is actually interpreted like this :
// db.customers.update({ "id" : "1212" } ,{ $set: customerCodes.0.dateField : "01.01.2016" });
}
}
}
});
Have a nice day !

MongoDB 2.4's "Limit Number of Elements in an Array after an Update" using C# driver?

MongoDB 2.4 added a new "Limit Number of Elements in an Array after an Update" feature. This is how it can be used through the shell:
db.students.update(
{ _id: 1 },
{ $push:
{ scores:
{ $each :
[
{ attempt: 3, score: 7 },
{ attempt: 4, score: 4 }
],
$sort: { score: 1 },
$slice: -3
}
}
}
)
How can this be accomplished with the MongoDB's C#-driver?

Here is an example test that shows how to do this without using typed classes: https://github.com/mongodb/mongo-csharp-driver/blob/master/MongoDB.DriverUnitTests/Builders/UpdateBuilderTests.cs#L492
The relevant piece of code you are looking for is this:
var update = Update.PushEach(
"name",
new PushEachOptions { Slice = -3, Sort = SortBy.Descending("a") },
value1ToPush,
value2ToPush);
We also support this if you are using typed entities: https://github.com/mongodb/mongo-csharp-driver/blob/master/MongoDB.DriverUnitTests/Builders/UpdateBuilderTests.cs#L524
var update = Update<Test>.PushEach(
x => x.B,
args => args.SortDescending(x => x.C).Slice(-3),
new[] { new B { C = 0 }, new B { C = 1 } });
Finally, like everything else in the .NET driver, you can always build up a BsonDocument that looks exactly like your structure above and simply execute it.

Merge changeset documents in a query

I have recorded changes from an information system in a mongo database. Every time a set of values are set or changed, a record is saved in the mongo database.
The change collection is in the following form:
{ "user_id": 1, "timestamp": { "date" : "2010-09-22 09:28:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "valueA", "fieldB": "valueB", "fieldC": "valueC" } }
{ "user_id": 1, "timestamp": { "date" : "2010-09-24 19:01:52", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldA": "new_valueA", "fieldB": null, "fieldD": "valueD" } }
{ "user_id": 1, "timestamp": { "date" : "2010-10-01 11:11:02", "timezone_type" : 3, "timezone" : "Europe/Paris" } }, "changes: { "fieldD": "new_valueD" } }
Of course there are thousands of records per user with different attributes which represent millions of records. What I want to do is to see a user status at a given time. By example, the user_id 1 at 2010-09-30 would be
fieldA: new_valueA
fieldC: valueC
fieldD: valueD
This means I need to flatten all the changes prior to a given date for a given user into a single record. Can I do that directly in mongo ?
Edit: I am using the 2.0 version of mongodb hence cannot benefit from the aggregation framework.
Edit: It sounds I have found the answer to my question.
var mapTimeAndChangesByUserId = function() {
var key = this.user_id;
var value = { timestamp: this.timestamp.date, changes: this.changes };
emit(key, value);
}
var reduceMergeChanges = function(user_id, changeset) {
var mergeFunction = function(a, b) { for (var attr in b) a[attr] = b[attr]; };
var result = {};
changeset.forEach(function(e) { mergeFunction(result, e.changes); });
return { timestamp: changeset.pop().timestamp, changes: result };
}
The reduce function merges the changes in the order they come and returns the result.
db.user_change.mapReduce(
mapTimeAndChangesByUserId,
reduceMergeChanges,
{
out: { inline: 1 },
query: { user_id: 1, "timestamp.date": { $lt: "2010-09-30" } },
sort: { "timestamp.date": 1 }
});
'results' : [
"_id": 1,
"value": {
"timestamp": "2010-09-24 19:01:52",
"changes": {
"fieldA": "new_valueA",
"fieldB": null,
"fieldC": "valueC",
"fieldD": "valueD"
}
}
]
Which is fine to me.

You could write a MR to do this.
Since the fields are a lot like tags you can modify a nice cookbook example of counting tags here: http://cookbook.mongodb.org/patterns/count_tags/ of course instead of counting you want the latest value applied (assumption since this is not clear in your question) for that field.
So lets get our map function:
map = function() {
if (!this.changes) {
// If there were not changes for some reason lets bail this record
return;
}
// We iterate the changes
for (index in this.changes) {
emit(index /* We emit the field name */, this.changes[index] /* We emit the field value */);
}
}
And now for our reduce:
reduce = function(values){
// This part is dependant upon your input query. If you add a sort of
// date (ts) DESC then you will prolly want the first index (0) not the last as
// gathered here by values.length
return values[values.length];
}
And this will output a single document per field change of the type:
{
_id: your_field_ie_fieldA,
value: whoop
}
You can then iterate the end of the (most likely) in line output and, bam, you have your changes.
This is of course one way of dong it and is not designed to be run completely in line to your app, however that all depends on the size of the data your working on; it could be run very close.
I am unsure whether the group and distinct can run on this but it looks like it might: http://docs.mongodb.org/manual/reference/method/db.collection.group/#db-collection-group however I should note that group is basically a MR wrapper but you could do something like (untested just like the MR above):
db.col.group( {
key: { 'changes.fieldA': 1, // the rest of the fields },
cond: { 'timestamp.date': { $gt: new Date( '01/01/2012' ) } },
reduce: function ( curr, result ) { },
initial: { }
} )
But it does require you to define the keys instead of just iterating them programmably (maybe a better way).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Select top N rows from each group - mongodb

It's possible with group (aggregation), but this will create a full-table scan. Do you really need exactly 3 or can you set a limit...e.g.: max 3 posts from the last week/month?

Related

Mongodb aggregation and conditional push to array

Update a mongo field based on another collections data

MongoDB conditionally $addToSet sub-document in array by specific field

MongoDB 2.4's "Limit Number of Elements in an Array after an Update" using C# driver?

Merge changeset documents in a query

Categories

Resources