MongoDB Social Network Adding Followers - mongodb

I'm implementing a social network in MongoDB and I need to keep track of Followers and Following for each User. When I search for Users I want to display a list like Facebook with the User Name, Picture and number of Followers & Following. If I just wanted to display the User Name and Picture (info that doesn't change) it would be easy, but I also need to display the number of Followers & Following (which changes fairly regularly).
My current strategy is to embed the People a User follows into each User Document:
firstName: "Joe",
lastName: "Bloggs",
follows: [
{
_id: ObjectId("520534b81c9aac710d000002"),
profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg",
name: "Mark Rogers",
},
{
_id: ObjectId("51f26293a5c5ea4331cb786a"),
name: "The Palace Bar",
profilePictureUrl: "https://s3-eu-west-1.amazonaws.com/businesses/xxx.jpg",
}
]
The question is - What is the best strategy to keep track of the number of Followers & Following for each User?
If I include the number of Follows / Following as part of the embedded document i.e.
follows: [
{
_id: ObjectId("520534b81c9aac710d000002"),
profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg",
name: "Mark Rogers",
**followers: 10,**
**following: 400**
}
then every time a User follows someone requires multiple updates across all the embedded documents.
Since the consistency of this data isn't really important (i.e. Showing someone I have 10 instead of 11 followers isn't the end of the world), I can queue this update. Is this approach ok or can anyone suggest a better approach ?

You're on the right track. Think about which calculation is performed more - determining the number of followers/following or changing number of followers/following? Even if you're caching the output of the # of followers/following calculation it's still going to be performed one or two orders of magnitude more often than changing the number.
Also, think about the opposite. If you really need to display the number of followers/following for each of those users, you'll have to then do an aggregate on each load (or cache it somewhere, but you're still doing a lot of calcs).
Option 1: Cache the number of followers/following in the embedded document.
Upsides: Can display stats in O(1) time
Downsides: Requires O(N) time to follow/unfollow
Option 2: Count the number of followers/following on each page view (or cache invalidation)
Upsides: Can follow/unfollow in O(1) time
Downsides: Requires O(N) time to display
Add in the fact that follower/following stats can be eventually consistent whereas the counts have to be displayed on demand and I think it's a pretty easy decision to cache it.

I've gone ahead and implement the update followers/following based on the same strategy recommended by Mason (Option 1). Here's my code in NodeJs and Mongoose and using the AsyncJs Waterfall pattern in case anyone is interested or has any opinions. I haven't implemented queuing yet but the plan would be to farm most of this of to a queue.
async.waterfall([
function (callback) {
/** find & update the person we are following */
Model.User
.findByIdAndUpdate(id,{$inc:{followers:1}},{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}})
.lean()
.exec(callback);
},
function (followee, callback) {
/** find & update the person doing the following */
var query = {
$inc:{following:1},
$addToSet: { follows: followee}
}
Model.User
.findByIdAndUpdate(credentials.username,query,{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}})
.lean()
.exec(function(err,follower){
callback(err,follower,followee);
});
},
function(follower,followee,callback){
/** update the following count */
Model.User
.update({'follows._id':follower.id},{'follows.$.following':follower.following},{upsert:true,multi:true},function(err){
callback(err,followee);
});
},
function(followee,callback){
/** update the followers count */
Model.User
.update({'follows._id':followee.id},{'follows.$.followers':followee.followers},{upsert:true,multi:true},callback);
}
], function (err) {
if (err)
next(err);
else {
res.send(HTTPStatus.OK);
next();
}
});

Related

MongoDB query: If two docs are referencing each other, eliminate one doc (Keep one combination only)

I have docs like these:
{
_id:61af43169dae3a9c3e133a90
name:"user1",
status: "RECOMMENDED",
recommendedId:61b708b8041895f4c68a3b3d
}
{
_id:61b708b8041895f4c68a3b3d
name:"user2",
status: "RECOMMENDED"
recommendedId:61af43169dae3a9c3e133a90
}
Both users are recommended to each other, so, I don't want both documents having recommended Id populated. I just want one document having recommendedId populated (Keep one combo only)
I would try to prevent this from happening at the time of setting the value of recommendedId in the first place.
So before trying to set the value, you could do something like this:
const idToRecommend = Types.ObjectId()
const recommenders = await Foo.find({
_id: idToRecommend,
recommendedId: user._id
})
if (recommenders.length > 0) {
// We don't want to make the change, we already have a relationship recorded.
}
Cleaning up a db already tainted with these duplicate relationships is a different question, but I would do that as a one-off task rather than a matter of process.

Best way to store and organize data in MongoDB

I have a users in MongoDB and each user has an interface allowing them to set their current state of hunger being a combination of "hungry", "not hungry", "famished", "starving", or "full"
Each user can enter a multiple options for any period of time. For example, one use case would be "in the morning, record how my hunger is" and the user can put "not hungry" and "full". They can record how their hunger is at any time in the day, and as many times as they want.
Should I store the data as single entries, and then group the data by a date in MongoDB later on when I need to show it in a UI? Or should I store the data as an array of the options the user selected along with a date?
It depends on your future queries, and you may want to do both. Disk space is cheaper than processing, and it's always best to double your disk space than double your queries.
If you're only going to map by date then you'll want to group all users/states by date. If you're only going to map by user then you'll want to group all dates/states by user. If you're going to query by both, you should just make two Collections to minimize processing. Definitely use an array for the hunger state in either case.
Example structure for date grouping:
{ date: '1494288000',
time-of-day: [
{ am: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
],
pm: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
]}]}
It depends on how you are going to access it. If you want to report on a user's last known state, then the array might be better:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: ['HUNGRY','FAMISHED'],
}
The timestamps of multiple records might not align perfectly if you are passing in the output from new Date() (note the second record is 99 ms later):
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: 'HUNGRY',
}
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.099Z',
hunger: ['FAMISHED',
}
You should probably look at your data model though and try to get a more deterministic state model. Maybe:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
isHungry: true,
hunger: 'FAMISHED',
}

Storing functions in MongoDB in different collections?

Say that I have a business that represents users who spend a certain amount of time to produce certain quantities of stuff. I want each user to be free to create their own algorithm, or formula, for determining the price that they charge for their work:
Users Collection, with possibly thousands of different users.
{
userId: 'sdf23d23dwew',
price: function(time, qty){
// some algorithm
}
},
{
userId: '23f5gf34f',
price: function(time, qty){
// another algorithm
}
},
{
userId: '7u76565',
price: function(time, qty){
// yet another algorithm
}
},
{
userId: 'w45y65yh4',
price: function(time, qty){
// something else
}
}
//and on and on and on...
Now, JSON doesn't support functions and neither does MongoDB. BUT this use-case of possibly thousands of users, each with the freedom to create their own unique method of determining their own prices, seems to me like being able to store functions inside of their user document would be ideal.
I certainly don't feel like it's a good idea to just store all these thousands of functions in a JS file on the server that somehow gets referenced by a userId when it's needed...
Is there a solution for this case?

Performance on denormalized field vs. use of separate collection

I'm looking at keeping track of scores/points per user. From examples I've seen, it seems normal to just keep track of a total count of something in a field. However, I'm concerned about being able to backtrack or keep track of the scores/points given to the user in case of cheating. Here's what I've got in mind:
Meteor.User Collection:
Meteor.user.profile: {
...
totalScore: 0
...
}
Scenario 1:
Just add total score and keep track of it per user:
updateScore() {
var currentUser = Meteor.user();
currentUser.profile.update({ _id: this._id }{$inc: { totalScore: 1} });
}
Scenario 1:
Put score into separate Collection first to log it, before adding to total score of user:
Scores Collection:
Scores: {
playerId: ,
score: ,
...
}
updateScore() {
var currentUser = Meteor.user();
Scores.insert({ playerId: this._id, score: 1, ...});
currentUser.profile.update({ _id: this._id }{$inc: { totalScore: 1} });
//if not the above, then thisor this
var currentUserScore = Calculate total score count from Scores collection of current user here
Meteor.user.profile.update({ _id: this._id }{$set: { totalScore: currentUserScore} });
}
So what I'd like to know is, does Scenario 2 make sense vs. Scenario 1? And if Scenario 2 makes sense, if I calculate for the total score via the variable currentUserScore then use that to update the user's totalScore profile field (this runs every time the score needs to be updated), will this be detrimental to the app's performance?
Based on our discussion, Scenario 2 makes the most sense to me, especially given that the score history may have value outside of auditing the total. Keep in mind it's always easier to remove data than it is to create it, so even if the history doesn't prove useful there is no harm in removing the collection sometime later.
I would implement an addScore method like this:
Meteor.methods({
addScore: function(score) {
check(score, Number);
Meteor.users.update(this.userId, {
$inc: {'profile.totalScore': score}
});
Scores.insert({
playerId: this.userId,
score: score,
createdAt: new Date()
});
}
});
Unless you can think of a compelling reason to do so, I suspect the db/computation overhead of aggregating the totalScore each time would not be worthwile. Doing so only fixes the case where a user cheated by updating her profile directly. You can solve that by adding the following:
Meteor.users.deny({
update: function() {
return true;
}
});
I'd recommend adding the above regardless of the solution you go with as user profiles can be updated directly by the user even if insecure has been removed. See this section of the docs for more details.
Finally, if you want to audit the totalScore for each user you can aggregate the totals as part of a nightly process rather than each time a new score is added. You can do this on the server by fetching the Scores documents, or directly in mongodb with aggregation. Note the latter would require you to use a process outside of meteor (my understanding is that the aggregation packages for meteor don't currently work but you may want to research that yourself).

MongoDB - get 1 last message from each conversation?

I have a collection for conversations:
{_id: ..., from: userA, to: userB, message: "Hello!", datetime: ...}
I want to show a preview of user's conversations - last message from each conversation between current user and any other users. So when user clicks on some "last message" he goes to next page with all messages between him and that user.
How do I do that (get 1 last message from each conversation) without Map/Reduce?
1) use "distinct" command? (how?)
2) set "last" flag for last message? I think it's not very safe...
3) ..?
I was writing up a complicated answer to this question using cursors and a lot of advanced query features and stuff... it was painful and confusing. Then I realized, this is painful because it's not how mongodb expects you to do things really.
What I think you should do is just denormalize the data and solve this problem in one shot easily. Here's how:
Put a hash/object field on your User called most_recent_conversations
When you make a new conversation with another user, update it so that it looks like this:
previewUser.most_recent_conversations[userConversedWith._id] = newestConversation._id
Every time you make a new conversation, simply smash the value for the users involved in their hashes with the newer conversation id. Now we have a { uid: conversationId, ... } structure which basically is the preview data we need.
Now you can look up the most recent conversation (or N conversations if you make each value of the hash an array!) simply:
var previewUser = db.users.findOne({ _id: someId });
var recentIds = [];
for( uid in previewUser.most_recent_conversations ) {
recentIds.push( previewUser.most_recent_conversations[uid] );
}
var recentConversations = db.conversations.find({
_id: { $in: recentIds }
});