Data structure design for fast queries MongoDB - mongodb

I have several collections: photos, photos_like , photos_comment
I have two data structure options:
Case 1: photos_like, photos_comment is a filed array in photos collection
Case 2: use relation to connect between collections
Which option should I use to get the fastest performance?

You should go for case 2 - document reference.
If you start to get a lot of traffic, you will end up having photo documents with hundreds or even thousands of comments and likes inside.
You would have a data structure like:
collection: photos
{
"_id": ObjectId("5c53653451154c6da4623a79"),
"name": "ocean",
"path": "path/to/ocean.png"
}
collection: photo_comments
{
"_id": ObjectId("c73h42h3ch238c7238cyn34y"),
"comment": "it's actually a lake",
"photo": ObjectId("5c53653451154c6da4623a79"),
"user": ObjectId("sd686sd8ywh3rkjiusyyrk32")
}
collection: photo_likes
{
"_id": ObjectId("x267cb623yru2ru6c4r273bn"),
"photo": ObjectId("5c53653451154c6da4623a79"),
"user": ObjectId("sd686sd8ywh3rkjiusyyrk32")
}
You can get more details at Model One-to-Many Relationships with Document References.

Related

which is the best way to implement: two separate array or array in array?

I need an advice!
I have an array of objects with many students data(more then 200 students). So, now, i want to implement lesson for this students, so, every day, i will push an array with data inside every array of students. Later i will work with all lesson data!
So my question is:
a) Is the best way to push array inside of every students array?
b) Or make another array with unique _id, and later filter lesson by students _id?
So, i'm looking for performance and speed...
As I understood from your architecture, a suitable option will be moving lessons to a separate collection and storing lesson._id in students[].lessons. You can reach it by using ref property in your mongoose schema.
Here's the example:
lessons collection data:
[
{
"_id": ObjectId("5a934e000102030405000001"),
"name": "First lesson"
},
{
"_id": ObjectId("5a934e000102030405000002"),
"name": "Second lesson"
}
]
groups collection data:
[
{
"_id": ObjectId("5a934e000102030405000003"),
"name": "Group 1",
"students": [
{
"_id": ObjectId("5a934e000102030405000004"),
"name": "John",
"lessons": [ObjectId("5a934e000102030405000001")]
},
{
"_id": ObjectId("5a934e000102030405000005"),
"name": "James",
"lessons": [ObjectId("5a934e000102030405000001"), ObjectId("5a934e000102030405000002")]
}
]
}
]
But I would also moved every student to separate students collection if it is possible (if you currently have students as array field).

Querying the most recent posts in a MongoDB collection

Rather new to Mongodb/Mongoose/Node. Trying to make a query to retrieve the most recent posts (example being the 10 most recent posts) across all documents in a collection.
I tried querying this a few different ways.
MessageboardModel.find({"posts": {"time": {"$gte": ISODate("2014-07-02T00:00:00Z")}}} ...
I tried doing the above just to try getting to the proper nested time property, but everything I was trying throws an error. I'm definitely missing something here...
Here is an example document in the collection:
{
"_id": {
"$oid": "5c435d493dcf9281500cd177"
},
"movie": 433249,
"posts": [
{
"replies": [],
"_id": {
"$oid": "5c435d493dcf9281500cd142"
},
"username": "Username1",
"time": {
"$date": "2019-01-19T17:24:25.204Z"
},
"post": "This is a post title",
"content": "Content here."
},
{
"replies": [],
"_id": {
"$oid": "5c435d493dcf9281500cd123"
},
"username": "Username2",
"time": {
"$date": "2019-01-12T17:24:25.204Z"
},
"post": "This is another post made earlier",
"content": "Content here."
}
],
"__v": 0
}
There are many documents in the collection. I want to get, say the most recent 10 posts, across all of the documents in the entire collection.
Any help?
You can try using aggregation query:
Steps:
1> Match Specific doc
2> Stretch docs of its array using $unwind.
3> Sort using the time field from the posts.
4> Select fields , if specific fields needs to be shown.
5> Add limit, how many docs you want.
<YOUR_MODEL>.aggregate([
{$match:{
"movie": 433249 //you may add find conditions here, otherwise you can keep {} or remove $match from here
}},
{$unwind:"$posts"}, //this will make the each array element with different different docs.
{$sort:{"posts. time":1}}, // sort using the date field now, depends on your requirement use -1 /1
{$project:{posts:1}}, //select docs only from posts field. [u can remove if you want every element, or may modify]
{$limit:10} //you want only last 10 posts
]).exec();
let me know if you still having any issue or getting any error.
would love answer.

How to join two collection in mongo without lookup

I have two collection, there name are post and comment.
The model structure is in the following.
I want to use aggregation query post and sort by comments like length sum, currently I can query a post comments like length sum in the following query statement.
My question is how can I query post and join comment collection in Mongo version 2.6. I know after Mongo 3.2 have a lookup function.
I want to query post collection and sort by foreign comments likes length. Is it have a best way to do this in mongo 2.6?
post
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
}
comment
/* 1 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello world",
"like": [
"2"
]
}
/* 2 */
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"post_id": "5a39e22c27308912334b4567",
"comment": "hello stackoverflow",
"like": [
"1",
"2"
]
}
Query a post comments like sum
db.getCollection('comment').aggregate([
{
"$match": {
post_id: "5a39e22c27308912334b4567"
}
},
{
"$project": {
"likeLength": {
"$size": "$like"
},
"post_id": "$post_id"
}
},
{
"$group": {
_id: "$post_id",
"likeLengthSum": {
"$sum": "$likeLength"
}
}
}
])
There is no "best" way to query, as it'll really depend on your specific needs, but... you cannot perform a single query across multiple collections (aside from the $lookup aggregation pipeline function in later versions, as you already are aware).
You'll need to make multiple queries: one to your post collection, and one to your comment collection.
If you must perform a single query, then consider storing both types of documents in a single collection (with some identifier property to let you filter on either posts or comments, within your query).
There is no other way to join collections in the current MongoDB v6 without $lookup,
I can predict two reasons that causing you the issues,
The $lookup is slow and expensive - How to improve performance?
$lookup optimization:
Follow the guideline provided in the documentation
Use indexs:
You can use the index on the reference collection's fields, as per your sample data you can create an index for post_id field, an index for uid field, or a compound index for both the fields on the basis of your use cases
You can read more about How to Improve Performance with Indexes and Document Filters
db.comment.createIndex({ "post_id": -1 });
db.comment.createIndex({ "uid": -1 });
// or
db.comment.createIndex({ "post_id": -1, "uid": -1 });
Document Filters:
Use the $match, $limit, and $skip stages to restrict the documents that enter the pipeline
You can refer to the documentation for more detailed examples
{ $skip: 0 },
{ $limit: 10 } // as per your use case
Limit the $lookup result:
Try to limit the result of lookup by $limit stage,
Try to coordinate or balance with improved query and the UI/Use cases
You want to avoid $lookup - How to improve the collection schema to avoid $lookup?
Store the analytics/metrics:
If you are trying to get the total counts of the comments in a particular post then you must store the total count in the post collection whenever you get post get a new comment
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10
}
Store minimum reference data:
If you want to show the comments of a particular post, you can limit the result for ex: show 5 comments per post
You can also store a max of 5 latest comments in the post collection to avoid the $lookup, whenever you get the latest comment then add it and just remove the oldest comment from 5 comments
{
"_id": ObjectId("5a39e22c27308912334b4567"),
"uid": "0",
"content": "what is hello world mean?",
// new fields
"total_comments": 10,
"comments": [
{
"_id": ObjectId("5a595d8c2703892c3d8b4567"),
"uid": "1",
"comment": "hello world"
},
{
"_id": ObjectId("5a595d8c2703892c3d8b4512"),
"uid": "2",
"comment": "hello stackoverflow"
}
]
}
Must read about Reduce $lookup Operations
Must read about Improve Your Schema

Querying MongoDB (Using Edge Collection - The most efficient way?)

I've written Users, Clubs and Followers collections for the sake of an example the below.
I want to find all user documents from the Users collection that are following "A famous club". How can I find those? and Which way is the fastest?
More info about 'what do I want to do - Edge collections'
Users collection
{
"_id": "1",
"fullname": "Jared",
"country": "USA"
}
Clubs collection
{
"_id": "12",
"name": "A famous club"
}
Followers collection
{
"_id": "159",
"user_id": "1",
"club_id": "12"
}
PS: I can get the documents using Mongoose like the below way. However, creating followers array takes about 8 seconds with 150.000 records. And second find query -which is queried using followers array- takes about 40 seconds. Is it normal?
Clubs.find(
{ club_id: "12" },
'-_id user_id', // select only one field to better perf.
function(err, docs){
var followers = [];
docs.forEach(function(item){
followers.push(item.user_id)
})
Users.find(
{ _id:{ $in: followers } },
function(error, users) {
console.log(users) // RESULTS
})
})
There is no an eligible formula to manipulate join many-to-many relation on MongoDB. So I combined collections as embedded documents like the below. But the most important taks in this case creating indexes. For instance if you want to query by followingClubs you should create an index like schema.index({ 'followingClubs._id':1 }) using Mongoose. And if you want to query country and followingClubs you should create another index like schema.index({ 'country':1, 'followingClubs._id':1 })
Pay attention when working with Embedded Documents: http://askasya.com/post/largeembeddedarrays
Then you can get your documents fastly. I've tried to get count of 150.000 records using this way it took only 1 second. It's enough for me...
ps: we musn't forget that in my tests my Users collection has never experienced any data fragmentation. Therefore my queries may demonstrated good performance. Especially, followingClubs array of embedded documents.
Users collection
{
"_id": "1",
"fullname": "Jared",
"country": "USA",
"followingClubs": [ {"_id": "12"} ]
}
Clubs collection
{
"_id": "12",
"name": "A famous club"
}

mongodb php - how to do "INNER JOIN"-like query

I'm using the Mongo PHP extension.
My data looks like:
users
{
"_id": "4ca30369fd0e910ecc000006",
"login": "user11",
"pass": "example_pass",
"date": "2010-09-29"
},
{
"_id": "4ca30373fd0e910ecc000007",
"login": "user22",
"pass": "example_pass",
"date": "2010-09-29"
}
news
{
"_id": "4ca305c2fd0e910ecc000003",
"name": "news 333",
"content": "news content 3333",
"user_id": "4ca30373fd0e910ecc000007",
"date": "2010-09-29"
},
{
"_id": "4ca305c2fd0e910ecc00000b",
"name": "news 222",
"content": "news content 2222",
"user_id": "4ca30373fd0e910ecc000007",
"date": "2010-09-29"
},
{
"_id": "4ca305b5fd0e910ecc00000a",
"name": "news 111",
"content": "news content",
"user_id": "4ca30369fd0e910ecc000006",
"date": "2010-09-29"
}
How to run a query similar like this, from PHP?
SELECT n.*, u.*
FROM news AS n
INNER JOIN users AS u ON n.user_id = u.id
MongoDB does not support joins. If you want to map users to the news, you can do the following
1) Do this at the application-layer. Get the list of users, and get the list of news and map them in your application. This method is very expensive if you need this often.
2) If you need to do the previous-step often, you should redesign your schema so that the news articles are stored as embedded documents along with the user documents.
{
"_id": "4ca30373fd0e910ecc000007",
"login": "user22",
"pass": "example_pass",
"date": "2010-09-29"
"news" : [{
"name": "news 222",
"content": "news content 2222",
"date": "2010-09-29"
},
{
"name": "news 222",
"content": "news content 2222",
"date": "2010-09-29"
}]
}
Once you have your data in this format, the query that you are trying to run is implicit. One thing to note, though, is that analytics queries become difficult on such a schema. You will need to use MapReduce to get the most recently added news articles and such queries.
In the end the schema-design and how much denormalization your application can handle depends upon what kind of queries you expect your application to run.
You may find these links useful.
http://www.mongodb.org/display/DOCS/Schema+Design
http://www.blip.tv/file/3704083
I hope that was helpful.
Forget about joins.
do a find on your news. Apply the skip number and limit for paging the results.
$newscollection->find().skip(20).limit(10);
then loop through the collection and grab the user_id in this example you would be limited to 10 items. Now do a query on users for the found user_id items.
// replace 1,2,3,4 with array of userids you found in the news collection.
$usercollection.find( { _id : { $in : [1,2,3,4] } } );
Then when you print out the news it can display user information from the user collection based on the user_id.
You did 2 queries to the database. No messing around with joins and figuring out field names etc. SIMPLE!!!
If you are using the new version of MongoDB (3.2), then you would get something similar with the $lookup operator.
The drawbacks with using this operator are that it is highly inefficient when run over large result sets and it only supports equality for the match where the equality has to be between a single key from each collection. The other limitation is that the right-collection should be an unsharded collection in the same database as the left-collection.
The following aggregation operation on the news collection joins the documents from news with the documents from the users collection using the fields user_id from the news collection and the _id field from the users collection:
db.news.aggregate([
{
"$lookup": {
"from": "users",
"localField": "user_id",
"foreignField": "_id",
"as": "user_docs"
}
}
])
The equivalent PHP example implementation:
<?php
$m = new MongoClient("localhost");
$c = $m->selectDB("test")->selectCollection("news");
$ops = array(
array(
"$lookup" => array(
"from" => "users",
"localField" => "user_id",
"foreignField" => "_id",
"as" => "user_docs"
)
)
);
$results = $c->aggregate($ops);
var_dump($results);
?>
You might be better off embedding the "news" within the users' documents.
You can't do that in mongoDB. And from version 3 Eval() is deprecated, so you shouldn't use stored procedures either.
The only way I know to achieve a server side query involving multiple collections right now it's to use Node.js or similar. But if you are going to try this method, I strongly recommend you to limit the ip addresses allowed to access your machine, for security reasons.
Also, if your collections aren't too big, you can avoid inner joins denormalizing them.