mongoose indexing? grouping? - mongodb

I'm kinda new to mongoose, and I'm not sure if it's a right term.
what I'm building is a community site (like redit), and I have a schema like below
const postSchema = new mongoose.Schema({
content: {
type: String,
required: true,
},
title: {
type: String,
required: true,
},
userId: {
type: mongoose.Schema.Types.ObjectId,
required: true,
ref: 'User',
},
board: {
type: String,
required: true,
enum: ['board1','board2'],
},
created_at: {
type: Date,
default: Date.now,
},
updated_at: {
type: Date,
},
})
there are many kinds of 'board'
and I'm not sure if it can be 'indexed'.
purpose of it is for getting posts faster
for example in sql (assume that board column is indexed)
--> select * from post where board = 'board1' ;
I'm confusing about the terms, need some direction..

Short answer:
You need to create an index on the field board by doing:
db.post.createIndex(
{ board: 1 } ,
{ name: "borad index" }
)
Long answer:
Indexing in mongodb uses memory in order to save running time.
Let's take an example: say you have all words in English in your DB. And you are reading a book and from time to time you need to search for a word to check its meaning.
How would you do that? A dictionary. You'll sort the words alphabetically and then you could easily search for every word you wanted.
Indexing apply the same concept. When you create an index on the field board it takes all its values, sort them and save it in a table (and reference for each entry the full document from your collection).
Now when you search for select * from post where board = 'board1' it first use the memorized table of sorted boards, finds the ones that equal to board1 and then by the reference gives you the full documents that belongs to it. You can continue reading here.

Related

Database design - saving the entire object to a user or just the id of an object?

database noob here using MongoDB, in my program, I have users, and the core of my program are these roadmaps that I display. So, each user can create roadmaps, save others roadmaps, blah blah... Each user has a field named savedRoadmaps and createdRoadmaps which should store the roadmaps. My question is, should I just store the roadmap _ids in the savedRoadmap and createdRoadmaps field or the entire roadmap?
I am asking this because it feels like saving just the _id of the roadmaps can save storage, but it might not come in handy when I have to fetch the data of the user first, then fetch the roadmap using the roadmap ID in the user's savedRoadmap/createdRoadmap field, versus just fetching the user and the savedRoadmap field will already have the roadmap in there.
And btw, is there any sweet and brief database design read out there, please direct me to some if you know any!
For a user, I want it to have a name, email, password, description ofcourse, and also savedRoadmaps and createdRoadmaps. A user can create unlimited roadmaps and also save as much as he or she wants. For a roadmap, I want it to have a name, category, time_completion, author, date, and a roadmap object which will contain the actual json string that I will use d3 to display. Here's my User and Roadmap Schema right now:
const RoadmapSchema = new Schema({
author: {
type: String,
require: false
},
name: {
type: String,
require: true
},
category: {
type: String,
require: true
},
time_completion: {
type: Number,
require: true
},
date: {
type: Date,
default: Date.now
},
roadmap: {
type: "object",
require: true
}
});
and User Schema:
const UserSchema = new Schema({
name: {
type: String,
required: true
},
email: {
type: String,
required: true
},
password: {
type: String,
required: true
},
date: {
type: Date,
default: Date.now
},
savedRoadmap: {
type: "object",
default: []
},
createdRoadmap: {
type: "object",
default: []
}
});
My question is, inside of the savedRoadmap and createdRoadmap fields of the User schema, should I include just the _id of a roadmap, or should I include the entire json string which represents the roadmap?
There are 3 different data-modeling techniques you can use to design your roadmaps system based on the cardinality of the relationship between users and roadmaps.
In general you need to de-normalize your data model based on the queries that are expected from your application:
One to Few: Embed the N side if the cardinality is one-to-few and there is no need to access the embedded object outside the context of the parent object
One to Many: Use an array of references to the N-side objects if the cardinality is one-to-many or if the N-side objects should stand alone for any reasons
One-to-Squillions: Use a reference to the One-side in the N-side objects if the cardinality is one-to-squillions
And btw, is there any sweet and brief database design read out there,
please direct me to some if you know any!
Rules of Thumb for MongoDB Schema Design: Part 1

Which approach is better?

I want to create a collection for user's rating, I have doubts between 2 structures schemas.
First schema:
var Rating = new mongoose.Schema({
userID: {
type: String,
minlength: 1,
required: true,
trim: true
},
ratings: [{
rate: {
type: Number
}
}]
});
Second schema:
var Rating = new mongoose.Schema({
userID: {
type: String,
required: true,
},
rating: {
type: Number,
required: true
},
});
The first schema will cause that every rating the be pushed into the array of ratings and the second will cause inserting multiple documents of the same userID and each document contains its rating.
I would like to know which approach is recommended between the two, increasing the array or increasing documents each time the user get rating.
It depends on the details of your project (there is no the one super good and universal schema).
The first structure is closer to the MongoDB ideology. But do not forget about the document size limitation (16MB, except if you are using GridFS). This structure is better if you do not have a big amount of information (items in the ratings field). Because all ratings will be in one document it means that your indexes will be optimal small (one user - one document).
The second schema is better for situation when ou have a big amount of ratings (related to the document size limit).
Also you can use two collections. One for aggregated data (final results after calculations, something like as cache) and another for detailed information. As mentioned before - the best solution depends on the details of the project
I recoment you to read this article 6 Rules of Thumb for MongoDB Schema Design

Can't update or query embedded sub-documents using MongoDB? Now what?

I took the NoSQL plunge against all my RDBMS prejudices from my past. But I trusted. Now I find myself 3 months into a project and the exact reasons we adhered to RDMS principles seem to be biting me in the butt. I think I just discovered here on stackoverflow that I can't work with twice embedded arrays. I followed the noSQL, embedded document approach like a good kool-aid drinker and feel like I've been betrayed. Before I swear off noSQL and go back and refactor my entire code-base to adhere to new 'normalized' model I'd like to here from some no-sql champions.
Here is my model using one big document with embedded docs and the works:
var mongoose = require('mongoose'),
Schema = mongoose.Schema,
User = mongoose.model('User');
var Entry = new Schema({
text: String,
ups: Number,
downs: Number,
rankScore: Number,
posted: {
type: Date,
default: Date.now
},
postedBy: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
}
});
var boardSchema = new Schema({
theme: String,
created: {
type: Date,
default: Date.now
},
owner: {
type: mongoose.Schema.Types.ObjectId,
ref: 'User'
},
entered: {
type: Boolean,
default: false
},
entries: [Entry],
participants: [{
user: { type: mongoose.Schema.Types.ObjectId, ref: 'User'},
date: { type: Date, default: Date.now },
topTen: [ { type: mongoose.Schema.Types.ObjectId, ref: 'Entry'} ]
}]
});
mongoose.model('Board', boardSchema);
Basically, I want to query the document by Board._id, then where participants.user == req.user.id, I'd like to add to the topTen[] array. Note participants[] is an array within the document and topTen is an array within participants[]. I've found other similar questions but I was pointed to a Jira item which doesn't look like it will be implemented to allow the use of $ positional operation in multiple embedded arrays. Is there no way to do this now? Or if anyone has a suggestion of how to model my document so that I don't have to go full re-write with a new normalized reference model...please help!
Here are some of my query attempts from what I could find online. Nothing worked for me.
Board.update({_id: ObjectId('56910eed15c4d50e0998a2c9'), 'participants.user._id': ObjectId('56437f6a142974240273d862')}, {$set:{'participants.0.topTen.$.entry': ObjectId('5692eafc64601ceb0b64269b') }}
I read you should avoid such 'nested' designs but with the embedded model its hard not to. Basically this statement says to me "don't embed" go "ref".

Is there a MongoDB maximum bson size work around?

The document I am working on is extremely large. It collects user input from an extremely long survey (like survey monkey) and stores the answers in a mongodb database.
I am unsurprisingly getting the following error
Error: Document exceeds maximal allowed bson size of 16777216 bytes
If I cannot change the fields in my document is there anything I can do? Is there some way to compress down the document, by removing white space or something like that?
Edit
Here is the structure of the document
Schema({
id : { type: Number, required: true },
created: { type: Date, default: Date.now },
last_modified: { type: Date, default: Date.now },
data : { type: Schema.Types.Mixed, required: true }
});
An example of the data field:
{
id: 65,
question: {
test: "some questions",
answers: [2,5,6]
}
// there could be thousands of these question objects
}
One thing you can do is to build your own mongoDB :-). Mongodb is an open source and the limitation about the size of a document is rather arbitrary to enforce a better schema design. You can just modify this line and build it for yourself. Be careful with this.
The most straight forward idea is to have each small question in a different document with a field which reference to its parent.
Another idea is to limit number of documents in the parent. Lets say you limit is N elements then the parent looks like this:
{
_id : ObjectId(),
id : { type: Number, required: true },
created: { type: Date, default: Date.now }, // you can store it only for the first element
last_modified: { type: Date, default: Date.now }, // the same here
data : [{
id: 65,
question: {
test: "some questions",
answers: [2,5,6]
}
}, ... up to N of such things {}
]
}
This way modifying number N you can make sure that you will be in 16 MB of BSON. And in order to read the whole survey you can select
db.coll.find({id: the Id you need}) and then combine the whole survey on the application level. Also do not forget to ensureIndex on id.
Try different things, do a benchmark on your data and see what works for you.
You should be using gridfs. It allows you to store documents in chunks. Here's the link: http://docs.mongodb.org/manual/reference/gridfs/

mongoose - sort by array length

I am having this schema
var PostSchema = new Schema({
title: {type: String, trim: true}
, description: {type: String, trim: true}
, votes: [{ type: Schema.ObjectId, ref: 'User' }]
})
I want to sort posts based on votes i.e, I need to sort by array length.
Tried the usual way, but din't work
PostSchema.statics.trending = function (cb) {
return this.find().sort('votes', -1).limit(5).exec(cb)
}
Any help?
version of mongoose I am using is 2.7.2
You can't do that directly. To be able to sort on array length, you have to maintain it in a separate field (votes_count, or whatever) and update it when you push/pull elements to/from votes.
Then you sort on that votes_count field. If you also index it, queries will be faster.