Storing functions in MongoDB in different collections? - mongodb

Say that I have a business that represents users who spend a certain amount of time to produce certain quantities of stuff. I want each user to be free to create their own algorithm, or formula, for determining the price that they charge for their work:
Users Collection, with possibly thousands of different users.
{
userId: 'sdf23d23dwew',
price: function(time, qty){
// some algorithm
}
},
{
userId: '23f5gf34f',
price: function(time, qty){
// another algorithm
}
},
{
userId: '7u76565',
price: function(time, qty){
// yet another algorithm
}
},
{
userId: 'w45y65yh4',
price: function(time, qty){
// something else
}
}
//and on and on and on...
Now, JSON doesn't support functions and neither does MongoDB. BUT this use-case of possibly thousands of users, each with the freedom to create their own unique method of determining their own prices, seems to me like being able to store functions inside of their user document would be ideal.
I certainly don't feel like it's a good idea to just store all these thousands of functions in a JS file on the server that somehow gets referenced by a userId when it's needed...
Is there a solution for this case?

Related

Correct JSON structure to filter through data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
What's the "best" JSON structure when you need to "filter" through data in Firebase (in Swift)?
I'm having users sort their questions into:
Business
Entertainment
Other
Is it better to have a separate child for each question genre? If so, how do I get all of the data (when i want it), and then filter it only by "business" when I want to?
In NoSQL databases you usually end up modeling your data structure for the use-cases you want to allow in your app.
It's a bit of a learning path, so I'll explain it below in four steps:
Tree by category: Storing the data in a tree by its category, as you seem to be most interested in already.
Flat list of questions, and querying: Storing the data in a flat list, and then using queries to filter.
Flat list and indexes: Combining the above two approaches, to make the result more scalable.
Duplicating data: By duplicating data on top of that, you can reduce code complexity and improve performance further.
Tree by category
If you only want to get the questions by their category, you're best of simply storing each question under its category. In a simple model that'd look like this:
questionsByCategory: {
Business: {
question1: { ... },
question4: { ... }
},
Entertainment: {
question2: { ... },
question5: { ... }
},
Other: {
question3: { ... },
question6: { ... }
}
}
With the above structure, loading a list of question for a category is a simple, direct-access read for that category: firebase.database().ref("questionsByCategory").child("Business").once("value"....
But if you'd need a list of all questions, you'd need to read all categories, and denest the categories client-side. If you'd need a list of all question that is not a real problem, as you need to load them all anyway, but if you want to filter over some other condition than category, this may be wasteful.
Flat list of questions, and querying
An alternative is to create a flat list of all questions, and use queries to filter the data. In that case your JSON would look like this:
questions: {
question1: { category: "Business", difficulty: 1, ... },
question2: { category: "Entertainment", difficulty: 1, ... },
question3: { category: "Other", difficulty: 2, ... },
question4: { category: "Business", difficulty: 2, ... }
question5: { category: "Entertainment", difficulty: 3, ... }
question6: { category: "Other", difficulty: 1, ... }
}
Now, getting a list of all questions is easy, as you can just read them and loop over the results:
firebase.database().ref("questions").once("value").then(function(result) {
result.forEach(function(snapshot) {
console.log(snapshot.key+": "+snapshot.val().category);
})
})
If we want to get all questions for a specific category, we use a query instead of just the ref("questions"). So:
Get all Business questions:
firebase.database().ref("questions").orderByChild("category").equalTo("Business").once("value")...
Get all questions with difficult 3:
firebase.database().ref("questions").orderByChild("difficult").equalTo(3).once("value")...
This approach works quite well, unless you have huge numbers of questions.
Flat list and indexes
If you have millions of questions, Firebase database queries may not perform well enough anymore for you. In that case you may need to combine the two approaches above, using a flat list to store the question, and so-called (self-made) secondary indexes to perform the filtered lookups.
If you think you'll ever reach this number of questions, I'd consider using Cloud Firestore, as that does not have the inherent scalability limits that the Realtime Database has. In fact, Cloud Firestore has the unique guarantee that retrieving a certain amount of data takes a fixed amount of time, no matter how much data there is in the database/collection.
In this scenario, your JSON would look like:
questions: {
question1: { category: "Business", difficulty: 1, ... },
question2: { category: "Entertainment", difficulty: 1, ... },
question3: { category: "Other", difficulty: 2, ... },
question4: { category: "Business", difficulty: 2, ... }
question5: { category: "Entertainment", difficulty: 3, ... }
question6: { category: "Other", difficulty: 1, ... }
},
questionsByCategory: {
Business: {
question1: true,
question4: true
},
Entertainment: {
question2: true,
question5: true
},
Other: {
question3: true,
question6: true
}
},
questionsByDifficulty: {
"1": {
question1: true,
question2: true,
question6: true
},
"2": {
question3: true,
question4: true
},
"3": {
question3: true
}
}
You see that we have a single flat list of the questions, and then separate lists with the different properties we want to filter on, and the question IDs of the question for each value. Those secondary lists are also often called (secondary) indexes, since they really serve as indexes on your data.
To load the hard questions in the above, we take a two-step approach:
Load the questions IDs with a direct lookup.
Load each question by their ID.
In code:
firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
result.forEach(function(snapshot) {
firebase.database().ref("questions").child(snapshot.key).once("value").then(function(questionSnapshot) {
console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
});
})
})
If you need to wait for all questions before logging (or otherwise processing) them, you'd use Promise.all:
firebase.database().ref("questionsByDifficulty/3").once("value").then(function(result) {
var promises = [];
result.forEach(function(snapshot) {
promises.push(firebase.database().ref("questions").child(snapshot.key).once("value"));
})
Promise.all(promises).then(function(questionSnapshots) {
questionSnapshots.forEach(function(questionSnapshot) {
console.log(questionSnapshot.key+": "+questionSnapshot.val().category);
})
})
})
Many developers assume that this approach is slow, since it needs a separate call for each question. But it's actually quite fast, since Firebase pipelines the requests over its existing connection. For more on this, see Speed up fetching posts for my social network app by using query instead of observing a single event repeatedly
Duplicating data
The code for the nested load/client-side join is a bit tricky to read at times. If you'd prefer only performing a single load, you could consider duplicating the data for each question into each secondary index too.
In this scenario, the secondary index would look like this:
questionsByCategory: {
Business: {
question1: { category: "Business", difficulty: 1, ... },
question4: { category: "Business", difficulty: 2, ... }
},
If you come from a background in relational data modeling, this may look quite unnatural, since we're now duplicating data between the main list and the secondary indexes.
To an experienced NoSQL data modeler however, this looks completely normal. We're trading off storing some extra data against the extra time/code it takes to load the data.
This trade-off is common in all areas of computer science, and in NoSQL data modeling you'll fairly often see folks choosing to sacrifice space (and thus store duplicate data) to get an easier and more scalable data model.

Best way to store and organize data in MongoDB

I have a users in MongoDB and each user has an interface allowing them to set their current state of hunger being a combination of "hungry", "not hungry", "famished", "starving", or "full"
Each user can enter a multiple options for any period of time. For example, one use case would be "in the morning, record how my hunger is" and the user can put "not hungry" and "full". They can record how their hunger is at any time in the day, and as many times as they want.
Should I store the data as single entries, and then group the data by a date in MongoDB later on when I need to show it in a UI? Or should I store the data as an array of the options the user selected along with a date?
It depends on your future queries, and you may want to do both. Disk space is cheaper than processing, and it's always best to double your disk space than double your queries.
If you're only going to map by date then you'll want to group all users/states by date. If you're only going to map by user then you'll want to group all dates/states by user. If you're going to query by both, you should just make two Collections to minimize processing. Definitely use an array for the hunger state in either case.
Example structure for date grouping:
{ date: '1494288000',
time-of-day: [
{ am: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
],
pm: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
]}]}
It depends on how you are going to access it. If you want to report on a user's last known state, then the array might be better:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: ['HUNGRY','FAMISHED'],
}
The timestamps of multiple records might not align perfectly if you are passing in the output from new Date() (note the second record is 99 ms later):
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: 'HUNGRY',
}
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.099Z',
hunger: ['FAMISHED',
}
You should probably look at your data model though and try to get a more deterministic state model. Maybe:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
isHungry: true,
hunger: 'FAMISHED',
}

Performance on denormalized field vs. use of separate collection

I'm looking at keeping track of scores/points per user. From examples I've seen, it seems normal to just keep track of a total count of something in a field. However, I'm concerned about being able to backtrack or keep track of the scores/points given to the user in case of cheating. Here's what I've got in mind:
Meteor.User Collection:
Meteor.user.profile: {
...
totalScore: 0
...
}
Scenario 1:
Just add total score and keep track of it per user:
updateScore() {
var currentUser = Meteor.user();
currentUser.profile.update({ _id: this._id }{$inc: { totalScore: 1} });
}
Scenario 1:
Put score into separate Collection first to log it, before adding to total score of user:
Scores Collection:
Scores: {
playerId: ,
score: ,
...
}
updateScore() {
var currentUser = Meteor.user();
Scores.insert({ playerId: this._id, score: 1, ...});
currentUser.profile.update({ _id: this._id }{$inc: { totalScore: 1} });
//if not the above, then thisor this
var currentUserScore = Calculate total score count from Scores collection of current user here
Meteor.user.profile.update({ _id: this._id }{$set: { totalScore: currentUserScore} });
}
So what I'd like to know is, does Scenario 2 make sense vs. Scenario 1? And if Scenario 2 makes sense, if I calculate for the total score via the variable currentUserScore then use that to update the user's totalScore profile field (this runs every time the score needs to be updated), will this be detrimental to the app's performance?
Based on our discussion, Scenario 2 makes the most sense to me, especially given that the score history may have value outside of auditing the total. Keep in mind it's always easier to remove data than it is to create it, so even if the history doesn't prove useful there is no harm in removing the collection sometime later.
I would implement an addScore method like this:
Meteor.methods({
addScore: function(score) {
check(score, Number);
Meteor.users.update(this.userId, {
$inc: {'profile.totalScore': score}
});
Scores.insert({
playerId: this.userId,
score: score,
createdAt: new Date()
});
}
});
Unless you can think of a compelling reason to do so, I suspect the db/computation overhead of aggregating the totalScore each time would not be worthwile. Doing so only fixes the case where a user cheated by updating her profile directly. You can solve that by adding the following:
Meteor.users.deny({
update: function() {
return true;
}
});
I'd recommend adding the above regardless of the solution you go with as user profiles can be updated directly by the user even if insecure has been removed. See this section of the docs for more details.
Finally, if you want to audit the totalScore for each user you can aggregate the totals as part of a nightly process rather than each time a new score is added. You can do this on the server by fetching the Scores documents, or directly in mongodb with aggregation. Note the latter would require you to use a process outside of meteor (my understanding is that the aggregation packages for meteor don't currently work but you may want to research that yourself).

MongoDB Mongoose dynamic fields

I'm developing a website in which each user has a number of balances for different currencies. Throughout the lifetime of the website I will regularly add new currencies.
I'm trying to figure out the best way to store the balances using mongoose. I currently atore the balances like this:
var UserSchema = new Schema({
...
balances: {
mck: {
type: Number,
default: 100.0,
addresses: String
},
btc:{
type: Number,
default: 10.0,
address: String
}
}
});
But it doesn't seem like the best approach. each time I want to add a new currency the existing documents will not contain it. Are there disadvantages to allowing documents in the database which are out of sync with the schema?
I thought of making the schema more dynamic by using a subdocument to store currencies and their respective balances like this:
var BalanceSchema = new Schema({
currency: String,
amount: Number,
address: String
});
But there would be a painful number of callbacks to deal with when changing balances etc.
Which of these methods would be the best approach? Or is there another I have missed?
If you have the need to add currencies dynamically in the future, you should opt to have "balances" as an array.
balances: [
{
curr: "mck",
bal: 123,45
},
{
curr: "btc",
bal: 42
}
]
It helps with queries in the future (like so) and it also gives you a lot of flexibility with each document.
Or why not go for a flat schema like:
{
user: "user1",
currency1balance:54,76,
currency5balance:1024
}

MongoDB Social Network Adding Followers

I'm implementing a social network in MongoDB and I need to keep track of Followers and Following for each User. When I search for Users I want to display a list like Facebook with the User Name, Picture and number of Followers & Following. If I just wanted to display the User Name and Picture (info that doesn't change) it would be easy, but I also need to display the number of Followers & Following (which changes fairly regularly).
My current strategy is to embed the People a User follows into each User Document:
firstName: "Joe",
lastName: "Bloggs",
follows: [
{
_id: ObjectId("520534b81c9aac710d000002"),
profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg",
name: "Mark Rogers",
},
{
_id: ObjectId("51f26293a5c5ea4331cb786a"),
name: "The Palace Bar",
profilePictureUrl: "https://s3-eu-west-1.amazonaws.com/businesses/xxx.jpg",
}
]
The question is - What is the best strategy to keep track of the number of Followers & Following for each User?
If I include the number of Follows / Following as part of the embedded document i.e.
follows: [
{
_id: ObjectId("520534b81c9aac710d000002"),
profilePictureUrl: "https://pipt.s3.amazonaws.com/users/xxx.jpg",
name: "Mark Rogers",
**followers: 10,**
**following: 400**
}
then every time a User follows someone requires multiple updates across all the embedded documents.
Since the consistency of this data isn't really important (i.e. Showing someone I have 10 instead of 11 followers isn't the end of the world), I can queue this update. Is this approach ok or can anyone suggest a better approach ?
You're on the right track. Think about which calculation is performed more - determining the number of followers/following or changing number of followers/following? Even if you're caching the output of the # of followers/following calculation it's still going to be performed one or two orders of magnitude more often than changing the number.
Also, think about the opposite. If you really need to display the number of followers/following for each of those users, you'll have to then do an aggregate on each load (or cache it somewhere, but you're still doing a lot of calcs).
Option 1: Cache the number of followers/following in the embedded document.
Upsides: Can display stats in O(1) time
Downsides: Requires O(N) time to follow/unfollow
Option 2: Count the number of followers/following on each page view (or cache invalidation)
Upsides: Can follow/unfollow in O(1) time
Downsides: Requires O(N) time to display
Add in the fact that follower/following stats can be eventually consistent whereas the counts have to be displayed on demand and I think it's a pretty easy decision to cache it.
I've gone ahead and implement the update followers/following based on the same strategy recommended by Mason (Option 1). Here's my code in NodeJs and Mongoose and using the AsyncJs Waterfall pattern in case anyone is interested or has any opinions. I haven't implemented queuing yet but the plan would be to farm most of this of to a queue.
async.waterfall([
function (callback) {
/** find & update the person we are following */
Model.User
.findByIdAndUpdate(id,{$inc:{followers:1}},{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}})
.lean()
.exec(callback);
},
function (followee, callback) {
/** find & update the person doing the following */
var query = {
$inc:{following:1},
$addToSet: { follows: followee}
}
Model.User
.findByIdAndUpdate(credentials.username,query,{upsert:true,select:{fullName:1,profilePictureUrl:1,address:1,following:1,followers:1}})
.lean()
.exec(function(err,follower){
callback(err,follower,followee);
});
},
function(follower,followee,callback){
/** update the following count */
Model.User
.update({'follows._id':follower.id},{'follows.$.following':follower.following},{upsert:true,multi:true},function(err){
callback(err,followee);
});
},
function(followee,callback){
/** update the followers count */
Model.User
.update({'follows._id':followee.id},{'follows.$.followers':followee.followers},{upsert:true,multi:true},callback);
}
], function (err) {
if (err)
next(err);
else {
res.send(HTTPStatus.OK);
next();
}
});