How would you model this in MongoDB? - mongodb

There are products with a name and price.
Users log about products they have bought.
# option 1: embed logs
product = { id, name, price }
user = { id,
name,
logs : [{ product_id_1, quantity, datetime, comment },
{ product_id_2, quantity, datetime, comment },
... ,
{ product_id_n, quantity, datetime, comment }]
}
I like this. But if product ids are 12 bytes long, quantity and datetime are 32-bit (4 bytes) integers and comments 100 bytes on average, then the size of one log is 12+4+4+100 = 120 bytes. The maximum size of a document is 4MB, so maximum amount of logs per user is 4MB/120bytes = 33,333. If assumed that a user logs 10 purchases per day, then the 4MB limit is reached in 33,333/10 = 3,333 days ~ 9 years. Well, 9 years is probably fine, but what if we needed to store even more data? What if the user logs 100 purchases per day?
What is the other option here? Do I have to normalize this fully?
# option 2: normalized
product = { id, name, price }
log = { id, user_id, product_id, quantity, datetime, comment }
user = { id, name }
Meh. We are back to relational.

if the size is the main concern, you can go ahead with option 2 with mongo DbRef.
logs : [{ product_id_1, quantity, datetime, comment },
{ product_id_2, quantity, datetime, comment },
... ,
{ product_id_n, quantity, datetime, comment }]
and embed this logs inside user using Dbref, something like
var log = {product_id: "xxx", quantity:"2", comment:"something"}
db.logs.save(log)
var user= { id:"xx" name : 'Joe', logs : [ new DBRef('logs ', log._id) ] }
db.users.save(user)

Yes, option 2 is your best bet. Yes, you're back to a relational model, but then, your data is best modeled that way. I don't see a particular downside to option 2, its your data that is requiring you to go that way, not a bad design process.

Related

Faster way to get delta of 2 collections in Mongo

I have a complete list of catalog inventory data in Mongo
The basic schema is:
productSku (string)
inventory (number)
This collection consists of approximately 14 million records.
I have another list of actively sold products with a similar schema.
Right now I have it as a json file.
It consists of approximately 23,000 records.
Every 5 hours the 14 million records updates with the latest inventory data.
Once that happens I need to create a CSV of the 23,000 product's latest inventory.
I'm doing it like this:
const inventoryModel = require('../data/inventoryModel');
const activeProducts = require('./activeProducts.json');
const inventoryUpdate = [];
for (const product of activeProducts) {
let latest = await inventoryModel.findOne({ productSku: product.sku }).exec()
latest = latest ? latest._doc : null;
// If there's no current inventory record for the product
if (!lastest) {
// If there was previously an inventory greater than 0
if (product.inventory) {
// Set the latest inventory to you
inventoryUpdate.push({ sku: product.sku, inventory: 0 });
}
} else {
// If there's a change in inventory
if (latest.inventory != product.inventory) {
inventoryUpdate.push({ sku: product.sku, inventory: latest.inventory });
}
}
}
This gives me an array inventoryUpdate that I can use to create a CSV for a mass update. This works fine but it's very slow. It takes about an hour to complete!
I was thinking about maybe adding activeProducts to Mongo as well and if I can somehow keep the execution of the logic within Mongo this would be a lot faster. If possible this is beyond my current understanding and ability.
Anyone have any suggestions?

Google Play Services - Leaderboard max score 21,474,836.47$

I am using Ionic 3 in making my app QuickLife.
One of the things that is pushed to leaderboard is how much users earned money, and even if I push 50 million dollars to Google Services, max number that appears is 21,474,836.47$
googlePlaySubmitScore(data) {
let age = data.age;
// net Worth is money that is pushed, it is multiplied by 100 and Google will use that last two characters for decimal point
let netWorth = data.netWorth * 100;
let followers = data.numOfSocialFans;
this.googlePlayGamesServices.isSignedIn()
.then(() => {
this.googlePlayGamesServices.submitScore({
score: netWorth,
leaderboardId: ID
});
this.googlePlayGamesServices.submitScore({
score: age,
leaderboardId: ID
});
this.googlePlayGamesServices.submitScore({
score: followers,
leaderboardId: ID
});
});
}
That's the max of a signed integer (2^31 - 1).
https://dev.mysql.com/doc/refman/8.0/en/integer-types.html

Meteor MongoDB pub/sub memory error

I'm doing data visualizations using Meteor, running React and D3 for the view. Today decided to populate the MongoDB server with more documents (a total of 30 documents, netting ~50k lines each). There was no issue before running the database with 'only' 4 documents, but now I'm seeing
Exception while polling query {"collectionName":"Metrics","selector":{},"options":{"transform":null}}: MongoError: Query exceeded the maximum allowed memory usage of 40 MB. Please consider adding more filters to reduce the query response size.
This is my collections.js file, because autopublish is off.
if (Meteor.isServer) {
const remoteCollectionString = "notForHumanConsumption";
database = new MongoInternals.RemoteCollectionDriver(remoteCollectionString);
Meteor.publish('metricsDB', function() {
return Metrics.find({})
});
}
Metrics = new Mongo.Collection("Metrics", { _driver: database });
if (Meteor.isClient) {
Meteor.startup(function() {
Session.set('data_loaded', false);
console.log(Session.get('data_loaded'))
});
Meteor.subscribe('metricsDB', function(){
// Set the reactive session as true to indicate that the data have been loaded
Session.set('data_loaded', true);
console.log(Session.get('data_loaded'))
});
}
Adding any type of sort in the publish function seems to at least get the console to log true, but it does so pretty much instantly. The terminal shows no error, but I'm stuck and not getting any data to the App.
Update: decided to remove 10 entries from the collection, limiting it to 20, and now the collection is ~28MB large, instead of ~42MB for the 30 items. The App is loading, albeit slowly.
The data structure looks pretty much like this:
{
_id: BLAKSBFLUyiy79a6fs9Pjhadkh&SA86886Daksh,
DateQueried: dateString,
DataSet1: [
{
ID,
Name,
Properties: [
{
ID,
Name,
Nr,
Status: [0, 1, 2, 3],
LocationData: {
City,
CountryCode,
PostalCode,
Street,
StreetNumber
}
}
]
}
],
DataSet2: [
{
ID,
Name,
Nr,
Obj: {
NrR,
NrGS,
LengthOfR
}
}
]
}
In each document DataSet1 is usually 9 items long. Properties in there can contain up to ~1900 (they average around 500) items. DataSet2 is normally around 49 items.

Atomic consistency

My lecturer in the database course I'm taking said an advantage of NoSQL databases is that they "support atomic consistency of a single aggregate". I have no idea what this means, can someone please explain it to me?
It means that by using aggregates you are able to avoid that your database save inconsistence data by an error of transaction.
In Domain Driven Design, an aggregate is a collection of related objects that are treated as an unit.
For example, lets say you have a restaurant and you want to save the orders of each customer.
You could save your data with two aggregates like below:
var customerIdGenerated = newGuid();
var customer = { id: customerIdGenerated , name: 'Mateus Forgiarini'};
var orders = {
id: 1,
customerId: customerIdGenerated ,
orderedFoods: [{
name: 'Sushi',
price: 50
},
{
name: 'Tacos',
price: 12
}]
};
Or you could threat orders and customers as a single aggregate:
var customerIdGenerated = newGuid();
var customerAndOrders = {
customerId: customerIdGenerated ,
name: 'Mateus Forgiarini',
orderId: 1,
orderedFoods: [{
name: 'Sushi',
price: 50
},
{
name: 'Tacos',
price: 12
}]
};
By setting your orders and customer as a single aggregate you avoid an error of transaction. In the NoSQL world an error of transaction can occur when you have to write a related data in many nodes (a node is where you store your data, NoSQL databases that run on clusters can have many nodes).
So if you are treating orders and customers as two aggregates, an error can occur while you are saving the customer but your orders can still be saved, so you would have an inconsistency data because you would have orders with no customer.
However by making use of a single aggregate, you can avoid that, because if an error occur, you won't have an inconsistency data, since you are saving your related data together.

Best way to store and organize data in MongoDB

I have a users in MongoDB and each user has an interface allowing them to set their current state of hunger being a combination of "hungry", "not hungry", "famished", "starving", or "full"
Each user can enter a multiple options for any period of time. For example, one use case would be "in the morning, record how my hunger is" and the user can put "not hungry" and "full". They can record how their hunger is at any time in the day, and as many times as they want.
Should I store the data as single entries, and then group the data by a date in MongoDB later on when I need to show it in a UI? Or should I store the data as an array of the options the user selected along with a date?
It depends on your future queries, and you may want to do both. Disk space is cheaper than processing, and it's always best to double your disk space than double your queries.
If you're only going to map by date then you'll want to group all users/states by date. If you're only going to map by user then you'll want to group all dates/states by user. If you're going to query by both, you should just make two Collections to minimize processing. Definitely use an array for the hunger state in either case.
Example structure for date grouping:
{ date: '1494288000',
time-of-day: [
{ am: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
],
pm: [
{ user: asdfas, hunger-state: [hungry, full] },
{ user: juhags, hunger-state: [full] }
]}]}
It depends on how you are going to access it. If you want to report on a user's last known state, then the array might be better:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: ['HUNGRY','FAMISHED'],
}
The timestamps of multiple records might not align perfectly if you are passing in the output from new Date() (note the second record is 99 ms later):
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
hunger: 'HUNGRY',
}
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.099Z',
hunger: ['FAMISHED',
}
You should probably look at your data model though and try to get a more deterministic state model. Maybe:
{
user_id: '5358e4249611f4a65e3068ab',
timestamp: '2017-05-08T17:30:00.000Z',
isHungry: true,
hunger: 'FAMISHED',
}