We have a social app where users can chat with each other and we’ve reached 350K messages!
We recently noticed that as the number of messages is growing, the find operations are getting slower! I believe the issue here is that the Message collection is not indexed.
That’s what I want to do now! I found this piece of code at the MongoDB docs:
db.comments.ensure_index(('discussion_id', 1))
This is my Message collection:
{
chatRoom: <Pointer>,
user: <Pointer>,
text: <String>,
isSeen: <Bool>
}
So I guess this is all I have to do:
db.Message.ensure_index(('chatRoom', 1))
Is that just it? Run this command and I’m all set? All existing and future messages will be indexed after that?
Your index actually should depend on what your query looks like. Suppose your message query looks like this:
var query = new Parse.Query("Message");
query.equalTo("chatRoom", aChatRoom);
query.equalTo("user", someUser);
query.equalTo("isSeen", false);
query.descending("createdAt");
query.find().then(function(results){//whatever});
Then you would need to build an index on the Message collection specifically for this query. In this case:
db.Message.createIndex({_p_chatRoom:1, _p_user:1, isSeen: -1, _created_at: -1})
Alternatively, an index with just the chatroom will perform much better than no index at all
db.Message.createIndex({_p_chatRoom:1})
To really understand which indexes to build, you'll need to do some reading on the Mongo docs https://docs.mongodb.com/manual/reference/method/db.collection.createIndex/#db.collection.createIndex
I personally use MLab for my Parse MongoDB, because I'm not very knowledgeable about databases, and they actually have a slow query analyzer that recommends indexes based on common queries in your application, so if you don't want to learn the finer points of MongoDB indexing, then MLab is a great place to start
Related
In an SQL database, if I wanted to access some sort of nested data, such as a list of tags or categories for each item in a table, I'd have to use some obscure form of joining in order to send the SQL query once and then only loop through the result cursor.
My question is, in a NoSQL database such as MongoDB, is it OK to query the database repeatedly such that I can do the previous task as follows:
cursor = query for all items
for each item in cursor do
tags = query for item's tags
I know that I can store the tags in an array in the item's document, but I'm assuming that it is somehow not possible to store everything inside the same document. If that is the case, would it be expensive to requery the database repeatedly or is it designed to be used that way?
No, neither in Mongo, nor in any other database should you query a database in a loop. And one good reason for this is performance: in most web apps, database is a bottleneck and devs trying to make as small amount of db calls as possible, whereas here you are trying to make as many as possible.
I mongo you can do what you want in many ways. Some of them are:
putting your tags inside the document {itemName : 'item', tags : [1, 2, 3]}
knowing the list of elements, you do not need a loop to find information about them. You can fetch all results in one query with $in : db.tags.find({ field: { $in: [<value1>, <value2>, ... <valueN> ] }})
You should always try to fulfill a request with as few queries as possible. Keep in mind that each query, even when the database can answer it entirely from cache, requires a network roundtrip between application server, database and back.
Even when you assume that both servers are in the same datacenter and only have a latency of microseconds, these latency times will add up when you query for a large number of documents.
Relational databases solve this issue with the JOIN command. But unfortunately MongoDB has no support for joins. For that reason you should try to build your documents in a way that the most common queries can be answered by a single document. That means that you should denormalize your data. When you have a 1:n relation, you should consider to embed the referencing documents as an array in the main document. Having redundancies in your database is usually not as unacceptable in MongoDB as it is in relational databases.
When you still have good reasons to keep the child-documents as separate documents, you should use a query with the $in operator to query them all at once, as Salvador Dali suggested in his answer.
How does Meteor handle the process of DB indexing? I've read that there are none at this time but I'm particularly concerned with very large data sets, joined with multiple lookups, etc. and will really impact performance. Are these issues taken care of by Mongo and Meteor?
I am coming from a Rails/PostgreSQL background and am about 2 days into Meteor and Mongo.
Thanks.
Meteor does expose a method for creating indexes, which maps to the mongo method db.collection.ensureIndex
You can access it on each Meteor.Collection instance, on the server. For Example:
if (Meteor.isServer){
var myCollection = new Meteor.Collection("dummy");
// create an index on 'dummy', field1 & field2
myCollection._ensureIndex({field1: 1, field2: 1});
}
From a performance POV, create indexes based on what you publish, but avoid over-indexing.
With oplog tailing, the initial query will only run occasionally- and get changes from the oplog.
Without oplog tailing, meteor will re-run the query every 10s, so better indexes have a large gain.
Got a response from the Discover Meteor book folks:
Sacha Greif Mod − Actually, we are in the process of writing a new
sidebar to address migrations. You'll have access to it for free if
you're on the Full or Premium packages :)
Regarding indexes, I think we might address that in an upcoming blog
post :)
Thanks much for the reply. I'm looking forward to both.
I am using free text search of mongo2.4 with pymongo.
What I want is to get the number of documents having some text. In the mongo shell, increasing the limit is a good turnaround, but from python it gets very slow since all documents have to be sent. For indication, the query is ~50 times slower in pymongo compared to mongo shell.
I use a command similar to this:
>>>res=db.command('text','mytable',search='eden',limit=100000)
>>>numfound = res['stats']['nfound']
But as I said, since all documents are returned, it is really slow. Is there a command to specify that you don't need documents, just the stats??
What is the list of all available options??
thx,
colin
I couldn't find a server ticket for this feature - so please add a feature request to: jira.mongodb.org then you'll get updates and feedback from the core server developers.
You can project when doing a text query, so you can reduce the amount sent over the wire - but still sends some information eg:
db.mytable.runCommand( "text", { search: "eden", project: {_id: 0, b: 1}})
I am really new to the programming but I am studying it. I have one problem which I don't know how to solve.
I have collection of docs in mongoDB and I'm using Elasticsearch to query the fields. The problem is I want to store the output of search back in mongoDB but in different DB. I know that I have to create temporary DB which has to be updated with every search result. But how to do this? Or give me documentation to read so I could learn it. I will really appreciate your help!
Mongo does not natively support "temp" collections.
A typical thing to do here is to not actually write the entire results output to another DB since that would be utterly pointless since Elasticsearch does its own caching as such you don't need any layer over the top.
As well, due to IO concerns it is normally a bad idea to write say a result set of 10k records to Mongo or another DB.
There is a feature request for what you talk of: https://jira.mongodb.org/browse/SERVER-3215 but no planning as of yet.
Example
You could have a table of results.
Within this table you would have a doc that looks like:
{keywords: ['bok', 'mongodb']}
Each time you search and scroll through each result item you would write a row to this table populating the keywords field with keywords from that search result. This would be per search result per search result list per search. It would probably be best to just stream each search result to MongoDB as they come in. I have never programmed Python (though I wish to learn) so an example in pseudo:
var elastic_results = [{'elasticresult'}];
foreach(elastic_results as result){
//split down the phrases in this result and make a keywords array
db.results_collection.insert(array_formed_from_splitting_down_result); // Lets just lazy insert no need for batch or trying to shrink the amount of data to one go or whatever, lets just stream it in.
}
So as you go along your results you basically just mass insert as fast a possible create a sort of "stream" of input to MongoDB. It can do this quite well.
This should then give you a shardable list of words and language verbs to process things like MRs on and stuff to aggregate statistics about them.
Without knowing more and more about your scenario this is pretty much my best answer.
This does not use the temp table concept but instead makes your data permanent which is fine by the sounds of it since you wish to use Mongo as a storage engine for further tasks.
Actually there is MongoDB river plugin to work with Elasticsearch...
db.your_table.find().forEach(function(doc) { b.another_table.insert(doc); } );
This question already has answers here:
How do I perform the SQL Join equivalent in MongoDB?
(19 answers)
Closed 6 years ago.
I'm sure MongoDB doesn't officially support "joins". What does this mean?
Does this mean "We cannot connect two collections(tables) together."?
I think if we put the value for _id in collection A to the other_id in collection B, can we simply connect two collections?
If my understanding is correct, MongoDB can connect two tables together, say, when we run a query. This is done by "Reference" written in http://www.mongodb.org/display/DOCS/Schema+Design.
Then what does "joins" really mean?
I'd love to know the answer because this is essential to learn MongoDB schema design. http://www.mongodb.org/display/DOCS/Schema+Design
It's no join since the relationship will only be evaluated when needed. A join (in a SQL database) on the other hand will resolve relationships and return them as if they were a single table (you "join two tables into one").
You can read more about DBRef here:
http://docs.mongodb.org/manual/applications/database-references/
There are two possible solutions for resolving references. One is to do it manually, as you have almost described. Just save a document's _id in another document's other_id, then write your own function to resolve the relationship. The other solution is to use DBRefs as described on the manual page above, which will make MongoDB resolve the relationship client-side on demand. Which solution you choose does not matter so much because both methods will resolve the relationship client-side (note that a SQL database resolves joins on the server-side).
The database does not do joins -- or automatic "linking" between documents. However you can do it yourself client side. If you need to do 2, that is ok, but if you had to do 2000, the number of client/server turnarounds would make the operation slow.
In MongoDB a common pattern is embedding. In relational when normalizing things get broken into parts. Often in mongo these pieces end up being a single document, so no join is needed anyway. But when one is needed, one does it client-side.
Consider the classic ORDER, ORDER-LINEITEM example. One order and 8 line items are 9 rows in relational; in MongoDB we would typically just model this as a single BSON document which is an order with an array of embedded line items. So in that case, the join issue does not arise. However the order would have a CUSTOMER which probably is a separate collection - the client could read the cust_id from the order document, and then go fetch it as needed separately.
There are some videos and slides for schema design talks on the mongodb.org web site I belive.
one kind of join a query in mongoDB, is ask at one collection for id that match , put ids in a list (idlist) , and do find using on other (or same) collection with $in : idlist
u = db.friends.find({"friends": something }).toArray()
idlist= []
u.forEach(function(myDoc) { idlist.push(myDoc.id ); } )
db.family.find({"id": {$in : idlist} } )
The first example you link to shows how MongoDB references behave much like lazy loading not like a join. There isn't a query there that's happening on both collections, rather you query one and then you lookup items from another collection by reference.
The fact that mongoDB is not relational have led some people to consider it useless.
I think that you should know what you are doing before designing a DB. If you choose to use noSQL DB such as MongoDB, you better implement a schema. This will make your collections - more or less - resemble tables in SQL databases. Also, avoid denormalization (embedding), unless necessary for efficiency reasons.
If you want to design your own noSQL database, I suggest to have a look on Firebase documentation. If you understand how they organize the data for their service, you can easily design a similar pattern for yours.
As others pointed out, you will have to do the joins client-side, except with Meteor (a Javascript framework), you can do your joins server-side with this package (I don't know of other framework which enables you to do so). However, I suggest you read this article before deciding to go with this choice.
Edit 28.04.17:
Recently Firebase published this excellent series on designing noSql Databases. They also highlighted in one of the episodes the reasons to avoid joins and how to get around such scenarios by denormalizing your database.
If you use mongoose, you can just use(assuming you're using subdocuments and population):
Profile.findById profileId
.select 'friends'
.exec (err, profile) ->
if err or not profile
handleError err, profile, res
else
Status.find { profile: { $in: profile.friends } }, (err, statuses) ->
if err
handleErr err, statuses, res
else
res.json createJSON statuses
It retrieves Statuses which belong to one of Profile (profileId) friends. Friends is array of references to other Profiles. Profile schema with friends defined:
schema = new mongoose.Schema
# ...
friends: [
type: mongoose.Schema.Types.ObjectId
ref: 'Profile'
unique: true
index: true
]
I came across lot of posts searching for the same - "Mongodb Joins" and alternatives or equivalents. So my answer would help many other who are like me. This is the answer I would be looking for.
I am using Mongoose with Express framework. There is a functionality called Population in place of joins.
As mentioned in Mongoose docs.
There are no joins in MongoDB but sometimes we still want references to documents in other collections. This is where population comes in.
This StackOverflow answer shows a simple example on how to use it.