Best way to store and organize data in MongoDB - mongodb

I have a users in MongoDB and each user has an interface allowing them to set their current state of hunger being a combination of "hungry", "not hungry", "famished", "starving", or "full"
Each user can enter a multiple options for any period of time. For example, one use case would be "in the morning, record how my hunger is" and the user can put "not hungry" and "full". They can record how their hunger is at any time in the day, and as many times as they want.
Should I store the data as single entries, and then group the data by a date in MongoDB later on when I need to show it in a UI? Or should I store the data as an array of the options the user selected along with a date?

It depends on your future queries, and you may want to do both. Disk space is cheaper than processing, and it's always best to double your disk space than double your queries.
If you're only going to map by date then you'll want to group all users/states by date. If you're only going to map by user then you'll want to group all dates/states by user. If you're going to query by both, you should just make two Collections to minimize processing. Definitely use an array for the hunger state in either case.
Example structure for date grouping:
{ date: '1494288000',
time-of-day: [
{ am: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
],
pm: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
]}]}

It depends on how you are going to access it. If you want to report on a user's last known state, then the array might be better:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: ['HUNGRY','FAMISHED'],
}
The timestamps of multiple records might not align perfectly if you are passing in the output from new Date() (note the second record is 99 ms later):
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: 'HUNGRY',
}
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.099Z',
hunger: ['FAMISHED',
}
You should probably look at your data model though and try to get a more deterministic state model. Maybe:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
isHungry: true,
hunger: 'FAMISHED',
}

Related

Matching a master collection to a transactional status collection

My issue is rather specific so I'll try to explain my setup first.
I have a collection called clients, which is a master list of all clients. The model for it is:
{
id: String,
organizationId: Number,
networkId: String,
deviceSerial: String,
}
(irrelevant properties removed)
I also have a collection called clienttransactions, which is a list of when clients have gone online or offline. So each time a client comes online, it adds a record saying it came online (online: true), and vice-versa for when a client goes offline (online: false). The model for that looks like this:
{
clientId: String,
deviceSerial: String,
networkId: String,
organizationId: Number,
ts: Number,
online: Boolean
}
ts is a unix timestamp in seconds. Also if you're wondering why I need all those foreign keys on each record, it's because of the way the API where I get this data from works. So just ignore that.
issue:
Given a deviceSerial, networkId, and organizationId, I want to find all clients that were online at any point between a given time frame (given a start time and end time in epoch seconds).
Possible edge case: There could be times when a client came online before the given start time, and stayed online until after the given end time. In this case, there will be no transaction record within the time frame, but the client should still be seen as online.
Accounting for this case is what I'm having the most trouble with, since I can't simply just search for online transactions between the time frame. If there are no transactions for a client in the time frame, then I need to search outside the time frame to see if the last transaction made before the start time for that client was an online one.
I'm not super well-versed on the aggregation pipeline yet, so this is as far as I got:
const startTime = 1550601742;
const endTime = 1550599341;
ClientTransaction.aggregation([
{
$match: {
organizationId: 600381,
networkId: 'N_651896046061895525',
deviceSerial: 'Q2MN-3CUN-6GQM',
ts: {$lt: endTime}
}
},
{
$group: {
_id: '$clientId',
lastStatus: {
$max: '$ts'
},
online: {
$last: '$online'
}
}
}
]);
I think I'm halfway there with this. It finds all transactions for unique clients before the end time, but stops before process of checking if the client was actually online during the time frame.
You are looking for all clients whose latest activity is an online activity before start time or has online/offline activity between start and end time.
So something like should work
ClientTransaction.aggregation([
{ $match: {
organizationId: 600381,
networkId: 'N_651896046061895525',
deviceSerial: 'Q2MN-3CUN-6GQM',
ts: {$lte: endTime}
}
},
{ $sort:{"clentId":1, "ts":-1 } },
{ $group: {
_id: '$clientId',
latest: {
$first: '$$ROOT'
}
}},
{ $match:{
$or:[
{"latest.online":true,"latest.ts":{$lt:startTime}},
{"latest.ts":{$gte:startTime, $lte:endTime}}
]
}}
]);

Storing functions in MongoDB in different collections?

Say that I have a business that represents users who spend a certain amount of time to produce certain quantities of stuff. I want each user to be free to create their own algorithm, or formula, for determining the price that they charge for their work:
Users Collection, with possibly thousands of different users.
{
userId: 'sdf23d23dwew',
price: function(time, qty){
// some algorithm
}
},
{
userId: '23f5gf34f',
price: function(time, qty){
// another algorithm
}
},
{
userId: '7u76565',
price: function(time, qty){
// yet another algorithm
}
},
{
userId: 'w45y65yh4',
price: function(time, qty){
// something else
}
}
//and on and on and on...
Now, JSON doesn't support functions and neither does MongoDB. BUT this use-case of possibly thousands of users, each with the freedom to create their own unique method of determining their own prices, seems to me like being able to store functions inside of their user document would be ideal.
I certainly don't feel like it's a good idea to just store all these thousands of functions in a JS file on the server that somehow gets referenced by a userId when it's needed...
Is there a solution for this case?

Performance on denormalized field vs. use of separate collection

I'm looking at keeping track of scores/points per user. From examples I've seen, it seems normal to just keep track of a total count of something in a field. However, I'm concerned about being able to backtrack or keep track of the scores/points given to the user in case of cheating. Here's what I've got in mind:
Meteor.User Collection:
Meteor.user.profile: {
...
totalScore: 0
...
}
Scenario 1:
Just add total score and keep track of it per user:
updateScore() {
var currentUser = Meteor.user();
currentUser.profile.update({ _id: this._id }{$inc: { totalScore: 1} });
}
Scenario 1:
Put score into separate Collection first to log it, before adding to total score of user:
Scores Collection:
Scores: {
playerId: ,
score: ,
...
}
updateScore() {
var currentUser = Meteor.user();
Scores.insert({ playerId: this._id, score: 1, ...});
currentUser.profile.update({ _id: this._id }{$inc: { totalScore: 1} });
//if not the above, then thisor this
var currentUserScore = Calculate total score count from Scores collection of current user here
Meteor.user.profile.update({ _id: this._id }{$set: { totalScore: currentUserScore} });
}
So what I'd like to know is, does Scenario 2 make sense vs. Scenario 1? And if Scenario 2 makes sense, if I calculate for the total score via the variable currentUserScore then use that to update the user's totalScore profile field (this runs every time the score needs to be updated), will this be detrimental to the app's performance?
Based on our discussion, Scenario 2 makes the most sense to me, especially given that the score history may have value outside of auditing the total. Keep in mind it's always easier to remove data than it is to create it, so even if the history doesn't prove useful there is no harm in removing the collection sometime later.
I would implement an addScore method like this:
Meteor.methods({
addScore: function(score) {
check(score, Number);
Meteor.users.update(this.userId, {
$inc: {'profile.totalScore': score}
});
Scores.insert({
playerId: this.userId,
score: score,
createdAt: new Date()
});
}
});
Unless you can think of a compelling reason to do so, I suspect the db/computation overhead of aggregating the totalScore each time would not be worthwile. Doing so only fixes the case where a user cheated by updating her profile directly. You can solve that by adding the following:
Meteor.users.deny({
update: function() {
return true;
}
});
I'd recommend adding the above regardless of the solution you go with as user profiles can be updated directly by the user even if insecure has been removed. See this section of the docs for more details.
Finally, if you want to audit the totalScore for each user you can aggregate the totals as part of a nightly process rather than each time a new score is added. You can do this on the server by fetching the Scores documents, or directly in mongodb with aggregation. Note the latter would require you to use a process outside of meteor (my understanding is that the aggregation packages for meteor don't currently work but you may want to research that yourself).

MongoDB Social Network Adding Followers

I'm implementing a social network in MongoDB and I need to keep track of Followers and Following for each User. When I search for Users I want to display a list like Facebook with the User Name, Picture and number of Followers & Following. If I just wanted to display the User Name and Picture (info that doesn't change) it would be easy, but I also need to display the number of Followers & Following (which changes fairly regularly).
My current strategy is to embed the People a User follows into each User Document:
firstName: "Joe",
lastName: "Bloggs",
follows: [
{
_id: ObjectId("520534b81c9aac710d000002"),
profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg",
name: "Mark Rogers",
},
{
_id: ObjectId("51f26293a5c5ea4331cb786a"),
name: "The Palace Bar",
profilePictureUrl: "https://s3-eu-west-1.amazonaws.com/businesses/xxx.jpg",
}
]
The question is - What is the best strategy to keep track of the number of Followers & Following for each User?
If I include the number of Follows / Following as part of the embedded document i.e.
follows: [
{
_id: ObjectId("520534b81c9aac710d000002"),
profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg",
name: "Mark Rogers",
**followers: 10,**
**following: 400**
}
then every time a User follows someone requires multiple updates across all the embedded documents.
Since the consistency of this data isn't really important (i.e. Showing someone I have 10 instead of 11 followers isn't the end of the world), I can queue this update. Is this approach ok or can anyone suggest a better approach ?
You're on the right track. Think about which calculation is performed more - determining the number of followers/following or changing number of followers/following? Even if you're caching the output of the # of followers/following calculation it's still going to be performed one or two orders of magnitude more often than changing the number.
Also, think about the opposite. If you really need to display the number of followers/following for each of those users, you'll have to then do an aggregate on each load (or cache it somewhere, but you're still doing a lot of calcs).
Option 1: Cache the number of followers/following in the embedded document.
Upsides: Can display stats in O(1) time
Downsides: Requires O(N) time to follow/unfollow
Option 2: Count the number of followers/following on each page view (or cache invalidation)
Upsides: Can follow/unfollow in O(1) time
Downsides: Requires O(N) time to display
Add in the fact that follower/following stats can be eventually consistent whereas the counts have to be displayed on demand and I think it's a pretty easy decision to cache it.
I've gone ahead and implement the update followers/following based on the same strategy recommended by Mason (Option 1). Here's my code in NodeJs and Mongoose and using the AsyncJs Waterfall pattern in case anyone is interested or has any opinions. I haven't implemented queuing yet but the plan would be to farm most of this of to a queue.
async.waterfall([
function (callback) {
/** find & update the person we are following */
Model.User
.findByIdAndUpdate(id,{$inc:{followers:1}},{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}})
.lean()
.exec(callback);
},
function (followee, callback) {
/** find & update the person doing the following */
var query = {
$inc:{following:1},
$addToSet: { follows: followee}
}
Model.User
.findByIdAndUpdate(credentials.username,query,{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}})
.lean()
.exec(function(err,follower){
callback(err,follower,followee);
});
},
function(follower,followee,callback){
/** update the following count */
Model.User
.update({'follows._id':follower.id},{'follows.$.following':follower.following},{upsert:true,multi:true},function(err){
callback(err,followee);
});
},
function(followee,callback){
/** update the followers count */
Model.User
.update({'follows._id':followee.id},{'follows.$.followers':followee.followers},{upsert:true,multi:true},callback);
}
], function (err) {
if (err)
next(err);
else {
res.send(HTTPStatus.OK);
next();
}
});

MongoDB - get 1 last message from each conversation?

I have a collection for conversations:
{_id: ..., from: userA, to: userB, message: "Hello!", datetime: ...}
I want to show a preview of user's conversations - last message from each conversation between current user and any other users. So when user clicks on some "last message" he goes to next page with all messages between him and that user.
How do I do that (get 1 last message from each conversation) without Map/Reduce?
1) use "distinct" command? (how?)
2) set "last" flag for last message? I think it's not very safe...
3) ..?
I was writing up a complicated answer to this question using cursors and a lot of advanced query features and stuff... it was painful and confusing. Then I realized, this is painful because it's not how mongodb expects you to do things really.
What I think you should do is just denormalize the data and solve this problem in one shot easily. Here's how:
Put a hash/object field on your User called most_recent_conversations
When you make a new conversation with another user, update it so that it looks like this:
previewUser.most_recent_conversations[userConversedWith._id] = newestConversation._id
Every time you make a new conversation, simply smash the value for the users involved in their hashes with the newer conversation id. Now we have a { uid: conversationId, ... } structure which basically is the preview data we need.
Now you can look up the most recent conversation (or N conversations if you make each value of the hash an array!) simply:
var previewUser = db.users.findOne({ _id: someId });
var recentIds = [];
for( uid in previewUser.most_recent_conversations ) {
recentIds.push( previewUser.most_recent_conversations[uid] );
}
var recentConversations = db.conversations.find({
_id: { $in: recentIds }
});