Design Followers and Followees Schema in MongoDB? - mongodb

I want to design followers and followee(following) module for a social media application similar to instagram.
I've implemented following approach for the same
Users Schema
module.exports = mongoose.model('users', new Schema({
name: { type: String, default: null },
gender: { type: String, default: null, enum: ['male', 'female', 'others', null] },
email: { type: String, unique: true, sparse: true },
isBlocked: { type: Boolean, default: false },
isDeleted: { type: Boolean, default: false },
profileImage: { type: String, default: null },
isVerified: { type: Boolean, default: false },
}, {
versionKey: false,
timestamps: true
}));
Followers Schema
module.exports = mongoose.model('followers', new Schema({
followeeId: { type: ObjectId, required: true },
followerId: { type: ObjectId, required: true }
}, {
versionKey: false,
timestamps: true
}));
When using this approach if one user has 1 million followers then 1 million records will be created for that one user and if user followed back all the followers then count would be 2 million
So on average:
user#1 has 1 million followers/followees = 1 million records // total records: 1 Million
user#2 has 1 million followers/followees = 1 million records // total records: 2 Million
.
.
user#1000 has 1 million followers/followees = 1 million records // total records: 1 Billion
.
.
user#1,000,000 has 1 million followers/followees = 1 million records // total records: 1 Trillion
There would be more than trillions of records in a collection if I use this approach
So is it okay to generate records like this?
Or please suggest if there is any different approach to design this schema

Storing follower/following data in a list is such a wrong approach. If you have 1 million followers, you will have 1 million in the list of User table, making getting 1 user from the list extremely cumbersome. Also, you can't do pagination in that case.
Also, as you mentioned-
The size of ObjectId is 12 bytes and the limit per document is 16 MB.
So, after calculation, we can store about 1.4 million ObjectIds
(700,000 followers and 700,000 followees) and followers can easily
surpass the 2 million mark.
This is another reason why you shouldn't be storing follower/followee information in lists.
My advice-
Make a relationship table like this:
Follower_UserId | Following_UserId | Timestamp
So if Mel follows Pushpit then the entry would be like-
Mel_Id | Pushpit_Id | May 31
More examples-
Ash_Id | Pushpit_Id | May 30
Mel_Id | Ash_Id | May 31
Table size should be the least of your concern in this approach.
Data storage is cheap. The per-row size of this relationship table would be tiny. There are so many optimization techniques to handle large tables with many small-sized rows.

Good job finding the flaw in your own code. According to your schema, it will create too many records but there is another problem of querying the database at the time of finding followers of a user, it would be a little slow and you have to make a separate query as well!
So there must be another way. One more thing, it is always a best practice to name the model with a capital letter.
Here's what I would do with the same problem.
module.exports = mongoose.model('User', new Schema({
name: { type: String, default: null },
gender: { type: String, default: null, enum: ['male', 'female', 'others', null] },
email: { type: String, unique: true, sparse: true },
isBlocked: { type: Boolean, default: false },
isDeleted: { type: Boolean, default: false },
profileImage: { type: String, default: null },
isVerified: { type: Boolean, default: false },
followers: [{type: ObjectId, ref: "User", required: true}],
following: [{type: ObjectId, ref: "User", required: true}]
}, {
versionKey: false,
timestamps: true
}));
I would add 'followers' and 'following' field which contains arrays of ObjectIds of different users. So, each time someone follows a user, you would update the record of both the users - add the followee to the following field of the follower user and vice versa.
This approach would require doing two database update queries at the time someone follows someone. But it would save tonnes of resources and the time of querying later (you don't need to do a separate query for this).
Please let me know if you find any mistakes in this approach too.

Related

Is there a way to conditionally set 'unique' in Mongoose Schema?

I came across a problem while implementing user deletion functionality. A
Suppose I have a model:
const UserSchema = new mongoose.Schema(
{
email: { type: String, required: true, unique: true },
name: { type: String, required: true },
password: { type: String, required: true },
deleted: {type: Date, default: null}
},
{ timestamps: true }
);
This clearly states that the field email has to be unique. However, I would like to set it unique only in the set filtered for deleted != null.
In other words, I would like to filter out the deleted users' records before checking if it is unique or not.
Are there any best practices regarding this?
Or should I just create a field called del-email and null the email field to avoid over-complication and preserve the data?
You can try,
Partial index with unique constraint:
The partial unique index, you can specify the filter expression condition, if it matches then the unique index will take the role,
UserSchema.index(
{ email: 1 },
{ unique: true, partialFilterExpression: { deleted: { $eq: null } } }
);
Note:
As noted in the query coverage documentation for partial indexes:
To use the partial index, a query must contain the filter expression (or a modified filter expression that specifies a subset of the filter expression) as part of its query condition.
User.find({ email: "something#mail.com", deleted: null });

MongoDB, sort big collection by a field frequently updated

I have this schema (using nodejs - mongoose):
const Post = new mongoose.Schema({
title: {
type: String,
required: true,
unique: true,
},
description: {
type: String,
required: true,
},
likes: {
type: Number,
required: true,
default: 0,
min: 0,
}
}, {
timestamps: true,
});
Let's say my collection has millions of these documents and I want to sort by 'likes'. Meanwhile 'likes' is something very frequently updated so I don't think I should use sorting index on it. If I serve that content in pagination using sort and limit, does this guarantee I have good performance on reading the data even if I don't use index? (I know mongo by default uses some algorithm to create in-memory buckets to sort data, when no index is provided)

Fastest way to query collection for ALL documents

My collection has approximately 500 documents, which will be double that in a few weeks:
How can I make getting all the documents faster? I'm currently using db.registrations.find(), so that I can have all the documents available for searching, sorting, and filtering data. Using skip/limit makes the query display quickly, but you can't search all the registrations for players, and that's necessary.
My schema:
var mongoose = require('mongoose');
var Schema = mongoose.Schema;
var playerSchema = new mongoose.Schema({
first: {
type: String,
required: true
},
last: {
type: String,
required: true
},
email: {
type: String,
required: true
},
phone: {
type: String,
required: true
},
address: {
address: String,
city: String,
state: String,
zip: String,
country: {
type: String,
"default" : "USA"
},
adult: Boolean
}
});
var registrationsSchema = new mongoose.Schema({
event : {
type: String,
required: true
},
day : {
type: String,
required: true
},
group : {
type: String,
required: true
},
field : {
type: String,
},
price : {
type: Number,
required: true
},
players : [playerSchema],
net : Number,
team : {
type: Number,
min: 0,
max: 7,
"default" : null
},
notes: String,
paymentID : {
type: String,
required: true,
"default": "VOLUNTEER"
},
paymentStatus : {
type: String,
"default" : "PAID"
},
paymentNote : String,
// users : [userSchema],
active : {
type: Boolean,
default: true
},
users: [{
type: Schema.Types.ObjectId,
ref: 'User'
}],
createdOn : {
type : Date,
"default" : Date.now
},
updatedLast : {
type: Date
}
});
mongoose.model('Registration', registrationsSchema);
There is no big deal to load 1000 records from mongodb using mongoose. I did it in the past (2-3k records) and it worked well as long as I respected this 2 rules:
Don't load all the mongoose stuff
Use lean query.
This way it won't load all the mongoose methods / attributes and it will get just the data of your objects. You can't use .save() or other methods but it's way faster to load.
Use stream to load your data.
Streams are a good way to load large dataset with nodejs/mongoose. It will read the data block by block from mongodb and send them to your application for usage. You will avoid the tipical case :
I wait 2 seconds my data and my server is idle
My server is 100% CPU during 2 seconds to process the data I got and the db is idle.
With streams, in this example your total time will be ~2s instead of 2+2=4s
To load data from stream with mongoose use the .cursor() function to change your request to a nodejs stream.
Here is an example to load all your players fast
const allPlayers = [];
const cursor = Player.find({}).lean().cursor();
cursor.on('data', function(player) { allPlayers.push(player); });
cursor.on('end', function() { console.log('All players are loaded here'); });
You can achieve your objective using the following ways.
By default if you query for mongoose document, it will load with all of it's attributes and other necessary meta-data required(ie.,with lean = false). so if you use lean() function, it will load only a plain java script objects and it won't return even setter and corresponding getter methods. So that you can get all the documents very very fast. And you will get High performance. that's what the magic of lean() function on the back ground.
Another suggestion is as a thumb rule, please maintain proper indexing as per your requirement for each collection to get good performance while querying.
Hope this will help you!

How to model a pending trade in MongoDB?

I'm wondering what the "Mongo Way" is for modeling a pending trade of an item between two users.
I have a user collection and I have a book collection. In my app, the users will be able to propose trades to one another. Until the trade proposal is accepted, the trade needs to be stored as a pending trade in the database.
It seems to me that the best option is to have a 'trades' property on each book document modeled like this (using Mongoose):
const booksSchema = new Schema({
title: { type: String, required: true },
createdAt: { type: Date, 'default': Date.now },
updatedAt: { type: Date, 'default': Date.now },
author: { type: String, required: false},
imageUrl: { type: String, required: false},
ownerUser: { type: Schema.ObjectId, required: true },
trades: [{
fromUser: { type: Schema.ObjectId, required: true },
bookOffered: { type: Schema.ObjectId, required: true }
}]
});
The problem I see with this is that it will involve updating two documents when the trade is accepted. Assuming that the trade is accepted, the ownerUser on each document will need to be changed and the trades array will need to be cleared out.
It seems that to do this you'd want the changes to be in some sort of "Transaction" so that if one didn't update for some reason, then the other wouldn't either.
Is this a typical way to model this type of situation? What to do about the "Transaction" part of the situation?
There is no way to do a transaction including multiple documents in MongoDB.
You might consider a separate Trade collection with documents like:
{
book: ...,
ownerUser: ...,
buyerUser: ...,
status: 'pending'
dateSold: null
}
When the trade is approved you can change this document first, then update any related documents next. Should something else fail, this document would decide whether the transaction had actually happened.

Mongoose document schema and validation

I Have a schema like so:
class Schemas
constructor: ->
#mongoose = require 'mongoose'
#schema = #mongoose.Schema
#EmployeeSchema = new #schema
'firstname': { type: String, required: true },
'lastname': { type: String, required: true },
'email': { type: String, required: true, index: { unique: true }, validate: /\b[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b/ },
'departmentId': { type: #schema.ObjectId, required: true }
'enddate': String,
'active': { type: Boolean, default: true }
#EmployeeSchemaModel = #mongoose.model 'employees', #EmployeeSchema
#DepartmentSchema = new #schema
'name': { type: String, required: true, index: { unique: true } }
'employees' : [ #EmployeeSchema ]
#DepartmentSchemaModel = #mongoose.model 'departments', #DepartmentSchema
So that my employees live in an array of employee documents inside a department
I have several department documents that have a number of employee documents stored in the employees array.
I then added a new department but it contained no employees. If I then attempt to add another department without employees, Mongoose produces a Duplicate key error for the employee.email field which is a required field. The employee.email field is required and unique, and it needs to be.
Is there anyway round this?
If you enable Mongoose debug logging with the coffeescript equivalent of mongoose.set('debug', true); you can see what's going on:
DEBUG: Mongoose: employees.ensureIndex({ email: 1 }) { safe: true, background: true, unique: true }
DEBUG: Mongoose: departments.ensureIndex({ name: 1 }) { safe: true, background: true, unique: true }
DEBUG: Mongoose: departments.ensureIndex({ 'employees.email': 1 }) { safe: true, background: true, unique: true }
By embedding the full EmployeeSchema in the employees array of DepartmentSchema (rather than just an ObjectId reference to it), you end up creating unique indexes on both employees.email and department.employees.email.
So when you create a new department without any employees you are 'using up' the undefined email case in the department.employees.email index as far a uniqueness. So when you try and do that a second time that unique value is already taken and you get the Duplicate key error.
The best fix for this is probably to change DepartmentSchema.employees to an array of ObjectId references to employees instead of full objects. Then the index stays in the employees collection where it belongs and you're not duplicating data and creating opportunities for inconsistencies.
Check out these references:
http://docs.mongodb.org/manual/core/indexes/#sparse-indexes
mongoDB/mongoose: unique if not null (specifically JohnnyHK's answer)
The short of it is that since Mongo 1.8, you can define what is called a sparse index, which only kicks in the unique check if the value is not null.
In your case, you would want:
#EmployeeSchema = new #schema
'firstname': { type: String, required: true },
'lastname': { type: String, required: true },
'email': { type: String, required: true, index: { unique: true, sparse: true }, validate: /\b[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}\b/ },
'departmentId': { type: #schema.ObjectId, required: true }
'enddate': String,
'active': { type: Boolean, default: true }
Notice the sparse: true added to your index on EmployeeSchema's email attribute.
https://gist.github.com/juanpaco/5124144
It appears that you can't create a unique index on an individual field of a sub-document. Although the db.collection.ensureIndex function in the Mongo shell appears to let you do that, it tests the sub-document as a whole for its uniqueness and not the individual field.
You can create an index on an individual field of a sub-document, you just can't make it unique.