How to addfields in MongoDB retrospectively - mongodb

I have a schema that has a like array. This array stores all the people who have liked my post.
I just added a likeCount field as well but the likeCount fields default value is 0.
How can I create an addfields in mongoDB so that I can update the likeCount with the length of the like array?
I am on a MERN stack.

I am assuming you have a data structure like this:
{
postId: "post1",
likes: [ "ID1", "ID2", "ID3" ]
}
There is almost no reason to add a likeCount field. You should take the length of the likes array itself. Some examples:
db.foo.insert([
{'post':"P1", likes: ["ID1","ID2","ID3"]},
{'post':"P2", likes: ["ID1","ID2","ID3"]},
{'post':"P3", likes: ["ID4","ID2","ID6","ID7"]}
]);
// Which post has the most likes?
db.foo.aggregate([
{$addFields: {N: {$size: "$likes"}}},
{$sort: {"N":-1}}
//, {$limit: 2} // optionally limit to whatever
]);
// Is ID6 in likes?
// $match of a scalar to an input field ('likes') acts like
// $in for convenience:
db.foo.aggregate([ {$match: {'likes':'ID6'}} ]);
// Is ID6 OR ID3 in likes?
db.foo.aggregate([ {$match: {'likes':{$in:['ID6','ID3']}}} ]);
// Is ID2 AND ID7 in likes?
// This is a fancier way of doing set-to-set compares instead
// of a bunch of expression passed to $and:
var targets = ['ID7','ID2'];
db.foo.aggregate([
{$project: {X: {$eq:[2, {$size:{$setIntersection: ['$likes', targets]}} ]} }}
]);
// Who likes the most across all posts?
db.foo.aggregate([
{$unwind: '$likes'},
{$group: {_id: '$likes', N:{$sum:1}} },
{$sort: {'N':-1}}
]);

This is how to update all the documents with the respective likeCount values the first time:
db.collection.update({},
[
{
$addFields: {
likeCount: {
$size: "$like"
}
}
}
],
{
multi: true
})
Every next time somebody or multiple people are added to the like array , you may set the likeCount with the $size or you may increase the count with $inc operation.
Afcourse as #Buzz pointed below it is best to leave the array count() in the read code since updating every time like count() it will be an expensive operation leading to performance implication under heavy load ...
playground

Related

Most frequent word in MongoDB collection

I got a MongoDB collection where each entry has a product field containing a string array. What i would like to do is find the most frequent word in the whole collection. Any ideas on how to do that ?
Here is a sample object:
{
"_id" : ObjectId("55e02d333b88f425f84191af"),
"product" : [
" x bla y "
],
"hash_key" : "ecfe355b2f45dfbaf361cff4d314d4cc",
"price" : [
"z"
],
"image" : "image_url"
}
Looking at the sample object, what I would like to do is count "x", "bla" and "y" singularly.
I recently had to do something similar. I had a collection of objects and each object had a list of keywords. To count the frequency of each keyword, I used the following aggregation pipeline, which uses the MongoDB version 4.4 $accumulator group operation.
db.collectionname.aggregate(
{$match: {available: true}}, // Some criteria to filter the documents
{$project:
{ _id: 0, keywords: 1}}, // Only keep keywords
{$group:
{_id: null, keywords: // Accumulate keywords into one array
{$accumulator: {
init: function(){return new Array()},
accumulate: function(state, value){return state.concat(value)},
accumulateArgs: ["$keywords"],
merge: function(state1, state2){return state1.concat(state2)},
lang: "js"}}}},
{$unwind: "$keywords"}, // Split array into fields
{$group: {_id: "$keywords", freq: {$sum: 1}}}, // Group keywords and count frequencies
{$sort: {freq: -1}}, // Sort in reverse order
{$limit: 5} // Take first five
)
I have no idea if this is the most efficient solution. However, it solved the problem for me.

MongoDB: calculate average value for the document & then do the same thing across entire collection

I have collection of documents (Offers) with subdocuments (Salary) like this:
{
_id: ObjectId("zzz"),
sphere: ObjectId("xxx"),
region: ObjectId("yyy"),
salary: {
start: 10000,
end: 50000
}
}
And I want to calculate average salary across some region & sphere for the entire collection. I created query for this, it works, but it takes care only about salary start value.
db.offer.aggregate(
[
{$match:
{$and: [
{"salary.start": {$gt: 0}},
{region: ObjectId("xxx")},
{sphere: ObjectId("yyy")}
]}
},
{$group: {_id: null, avg: {$avg: "$salary.start"}}}
]
)
But firstly I want to calculate avarage salary (start & end) of the offer. How can I do this?
Update.
If value for "salary.end" may be missing in your data, you need to add one additional "$project" iteration to replace missing "salary.end" with existing "salary.start". Otherwise, the result of the average function will be calculated wrong due to ignoring documents with the lack of "salary.end" values.
db.offer.aggregate([
{$match:
{$and: [
{"salary.start": {$gt: 0}},
{"region": ObjectId("xxx")},
{"sphere": ObjectId("yyy")}
]}
},
{$project:{"_id":1,
"sphere":1,
"region":1,
"salary.start":1,
"salary.end":1,
"salary.end": {$ifNull: ["$salary.end", "$salary.start"]}
}
},
{$project:{"_id":1,
"sphere":1,
"region":1,
"avg_salary":{$divide:[
{$add:["$salary.start","$salary.end"]}
,2
]}}},
{$group:{"_id":{"sphere":"$sphere","region":"$region"},
"avg":{$avg:"$avg_salary"}}}
])
The way you aggregate has to be modified:
Match the required region,sphere and where salary > 0.
Project a extra field for each offer, which holds the average of
start and end.
Now group together the records with the same region and sphere, and
apply the $avg aggregation operator on the avg_salary for each offer
in that group,to get the average salary.
The Code:
db.offer.aggregate([
{$match:
{$and: [
{"salary.start": {$gt: 0}},
{"region": ObjectId("xxx")},
{"sphere": ObjectId("yyy")}
]}
},
{$project:{"_id":1,
"sphere":1,
"region":1,
"avg_salary":{$divide:[
{$add:["$salary.start","$salary.end"]}
,2
]}}},
{$group:{"_id":{"sphere":"$sphere","region":"$region"},
"avg":{$avg:"$avg_salary"}}}
])

mongodb index. how to index a single object on a document, nested in an array

I have the following document:
{
'date': date,
'_id': ObjectId,
'Log': [
{
'lat': float,
'lng': float,
'date': float,
'speed': float,
'heading': float,
'fix': float
}
]
}
for 1 document, the Log array can be some hundred entries.
I need to query the first and last date element of Log on each document. I know how to query it, but I need to do it fast, so I would like to build an index for that. I don't want to index Log.date since it is too big... how can I index them?
In fact it's hard to advise without knowing how you work with the documents. One of the solutions could be to use a sparse index. You just need to add a new field to every first and last array element, let's call it shouldIndex. Then just create a sparse index which includes shouldIndex and date fields. Here's a short example:
Assume we have this document
{"Log":
[{'lat': 1, 'lng': 2, 'date': new Date(), shouldIndex : true},
{'lat': 3, 'lng': 4, 'date': new Date()},
{'lat': 5, 'lng': 6, 'date': new Date()},
{'lat': 7, 'lng': 8, 'date': new Date(), shouldIndex : true}]}
Please note the first element and the last one contain shouldIndex field.
db.testSparseIndex.ensureIndex( { "Log.shouldIndex": 1, "Log.date":1 }, { spar
se: true } )
This index should contain entries only for your first and last elements.
Alternatively you may store first and last elements date field in a seperate array.
For more info on sparse indexes please refer to this article.
Hope it helps!
So there was an answer about indexing that is fundamentally correct. As of writing though it seems a little unclear whether you are talking about indexing at all. It almost seems like what you want to do is get the first and last date from the elements in your array.
With that in mind there are a few approaches:
1. The elements in your array have been naturally inserted in increasing date values
So if the way all writes that are made to this field is done, only with use of the $push operator over a period of time, and you never update these items, at least in so much as changing a date, then your items are already in order.
What this means is you just get the first and last element from the array
db.collection.find({ _id: id },{ Log: {$slice: 1 }}); // gets the first element
db.collection.find({ _id: id },{ Log: {$slice: -1 }}); // gets the last element
Now of course that is two queries but it's a relatively simple operation and not costly.
2. For some reason your elements are not naturally ordered by date
If this is the case, or indeed if you just can't live with the two query form, then you can get the first and last values in aggregation, but using $min and $max modifiers
db.collection.aggregate([
// You might want to match first. Just doing one _id here. (commented)
//{"$match": { "_id": id }},
//Unwind the array
{"$unwind": "$Log" },
//
{"$group": {
"_id": "$_id",
"firstDate": {"$min": "$Log.Date" },
"lastDate": {"$max": "$Log.Date" }
}}
])
So finally, if your use case here is getting the details of the documents that have the first and last date, we can do that as well, mirroring the initial two query form, somewhat. Using $first and $last :
db.collection.aggregate([
// You might want to match first. Just doing one _id here. (commented)
//{"$match": { "_id": id }},
//Unwind the array
{"$unwind": "$Log" },
// Sort the results on the date
{"$sort": { "_id._id": 1, "Log.date": 1 }},
// Group using $first and $last
{"$group": {
"_id": "$_id",
"firstLog": {"$first": "$Log" },
"lastLog": {"$last": "$Log" }
}}
])
Your mileage may vary, but those approaches may obviate the need to index if this indeed would the the only usage for that index.

mongodb/meteor: how to I get the value of one field corresponding to the $max value of another field?

I have a collection of messsages with the following fields: _id, senderId, receiverId, dateSubmittedMs, message, and for a given user I want to return the latest message to him from all other users. So, for example, if there are users Alex, Barb, Chuck, Dora, I would like to return the most recent message between Alex and each of Barb, Chuck and Dora. What is the best way to do this? Can I do it in one step using aggregation?
The aggregation examples in the official online documentation (http://docs.mongodb.org/manual/reference/aggregation/min/) show how to find the lowest age over groups within a collection, but what I need is something analogous to finding the name of the youngest person over groups of people.
Here is my current approach:
Step 1: Find the highest value for dateSubmitted over all messages sent and received by Alex, grouping over the other users:
var M = Messages.aggregate(
{$match:
{$or: [{senderId: 'Alex'}, {receiverId: 'Alex'}]}
},
{$group: {_id: "$receiverId", lastestSubmitted: {$max: "$submitted"} }}).fetch();
Step 2: Create an array of these highest values of dateSubmitted:
var MIds = _.pluck(M,'lastestSubmitted');
Step 3: Find these messages, by senderId, receiverId, and latestSubmitted:
return Messages.find(
{submitted: {$in: MIds}, $or: [{senderId: 'Alex'}, {receiverId: 'Alex'}]},
{$sort: {submitted: 1}}
});
There are two problems with this:
Can it be done in one step instead of three? Perhaps through a mapReduce or Aggregate command?
Instead of grouping only over the receiverId: 'Alex', is there a way to group over something like: $or [{receiverId: 'Alex', senderId: 'Barb'}, {senderId: 'Alex', receiverId: 'Barb'}]? (but for EACH of the other users) This would allow me to get the latest message in a conversation between any two participants that Alex conversed with. So for example:
Any suggestions?
The only thing you have to change is the group._id in the grougping phase. While you can use a document for this propose not just a field, you can apply the sort in the same pipeline. However it is not change if you use only the receiverId for grouping.
var M = Messages.aggregate(
{$match:
{$or: [{senderId: 'Alex'}, {receiverId: 'Alex'}]}
},
{$group:
{_id: {
receiverId: "$receiverId",
senderId: "$senderId"},
lastestSubmitted: {$max: "$submitted"} }
},
{$sort: {submitted: -1}
},
{$limit: 1}
).fetch();
This example above only adds the possiblity to check for the second most recently pinged connection or the third one, if you only likely to get the most recent message you do not even need to group. Just run this:
var M = Messages.aggregate(
{$match:
{$or: [{senderId: 'Alex'}, {receiverId: 'Alex'}]}
},
{$sort: {submitted: -1}
},
{$limit: 1}
).fetch();
THE ANSWER ABOVE THIS IS FOR GETTING THE MOST RECENT MESSAGE BETWEEN A SINGLE USER AND OTHERS, TO GET THE MOST RECENT MESSAGE BETWEEN ALL PAIRS INVOLVING A USER READ UNDER>>>
Based on the comments i misinterpreted a bit the question but the part above is useful anyway. The correct resolution for your problem is under. The difficulty in is to get a key for the pair to group on and identify the switched pair ( bob -> tom == tom -> bob) in this case. U can use condition and ordering to identify the swaps. (It is certainly a much more difficult question) The code look like this:
var M = Messages.aggregate(
{$match:
{$or: [{senderId: 'Alex'}, {receiverId: 'Alex'}]}
},
{$project:
{'part1':{$cond:[{$gt:['$senderId','$receiverId']},'$senderId','$receiverId']},
'part2':{$cond:[{$gt:['$senderId','$receiverId']},'$receiverId','$senderId']},
'message':1,
'submitted':1
}
},
{$sort: {submitted: -1}},
{$group:
{_id: {
part1: "$part1",
part2: "$part2"},
lastestSubmitted: {$first: "$submitted"},
message: {$first: "$message"} }
}
).fetch();
If you are not familiar with some of the operators used above, like $cond or $first, check out this.

mongoDB - average on array values

I'm trying to compute an average aggregation operation on each values of an array for each documents in my collection.
Document structure
{
myVar: myValue,
[...]
myCoordinates: [
myLng,
myLat
]
}
So, I tried to compute average of myLng and myLat values of myCoordinates array for the whole collection of documents by querying the collection like this :
myColl.aggregate([{
$group: {
_id: 0,
lngAvg: { $avg: "$myCoordinates.0" },
latAvg: { $avg: "$myCoordinates.1" }
}
}])
But unfortunately, it doesn't work and returns me a value of 0 for both lngAvg and latAvg fields.
Have you some ideas? Is this feasible at least?
Positional notation in aggregation seems to still be unsupported, check out this ticket.
As #Sammaye says you'd need to either unwind the array first, or replace your coordinates array with an embedded lng/lat embedded doc, which would make this trivial.
Given the array structure, you might unwind and project the lat/lng like this:
myColl.aggregate([
// unwind the coordinates into separate docs
{$unwind: "$myCoordinates"},
// group back into single docs, projecting the first and last
// coordinates as lng and lat, respectively
{$group: {
_id: "$_id",
lng: {$first: "$myCoordinates"},
lat: {$last: "$myCoordinates"}
}},
// then group as normal for the averaging
{$group: {
_id: 0,
lngAvg: {$avg: "$lng"},
latAvg: {$avg: "$lat"}
}}
]);