I'm currently evaluating MongoDB and comparing it to OracleDB. In order to compare them I measure the performance of a dataset in both database environments.
I tried to measure the performance of the count() function in MongoDB but couldn't seem to make it work.
This is what my MongoDB count query looks like at the moment:
db.test2.find({"Interpret": "Apashe"}).count();
It works fine, but how can I measure the time it took MongoDB to perform this? I tried the usual
explain("executionStats")
but it doesn't work that way with count it seems.
Any help would be appreciated
count is estimated and comes out of collection stats. Benchmarking it against oracle doesn't seem very useful.
You can try benchmarking countDocuments which should provide a meaningful signal. Though I am also confused why you decided to benchmark counts, a more sensible starting point should be finds and once you understand how counts are implemented you can benchmark counts and get some useful signal.
I think according to the documentation here:
https://docs.mongodb.com/manual/reference/method/cursor.explain/#mongodb-method-cursor.explain
count() is equivalent to the db.collection.find(query).count() construct. So essentially you can measure the find query.
Related
The official MongoDB driver offers a 'count' and 'estimated document count' API, as far as I know the former command is highly memory intensive so it's recommended to use the latter in situations that require it.
But how accurate is this estimated document count? Can the count be trusted in a Production environment, or is using the count API recommended when absolute accuracy is needed?
Comparing the two, to me it's very difficult to conjure up a scenario in which you'd want to use countDocuments() when estimatedDocumentCount() was an option.
That is, the equivalent form of estimatedDocumentCount() is countDocuments({}), i.e., an empty query filter. The cost of the first function is O(1); the second is O(N), and if N is very large, the cost will be prohibitive.
Both return a count, which, in a scenario in which Mongo has been deployed, is likely to be quite ephemeral, i.e., it's inaccurate the moment you have it, as the collection changes.
Please review the MongoDB documentation for estimatedDocumentCount(). Specifically, they note that "After an unclean shutdown of a mongod using the Wired Tiger storage engine, count statistics reported by db.collection.estimatedDocumentCount() may be inaccurate." This is due to metadata being used for the count and checkpoint drift, which will typically be resolved after 60 seconds or so.
In contrast, the MongoDB documentation for countDocuments() states that this method is a wrapper that performs a $group aggregation stage to $sum the results set, ensuring absolute accuracy of the count.
Thus, if absolute accuracy is essential, use countDocuments(). If all you need is a rough estimate, use estimatedDocumentCount(). The names are accurate to their purpose and should be used accordingly.
The main difference is filtering.
count_documents can be filtered on like a normal query whereas estimated_document_count cannot be.
If filtering is not part of your use case then I would use count_documents since it is much faster.
I have a graph database with 1.000.000 vertex.
When I run this query:
it work well but look at this:
The OR should not affect the performance but it does.
Why?
What is happening here is that the second query is doing a full scan and matching all the records against the two conditions.
Try to execute and EXPLAIN of the queries and see the difference.
We are working hard to make it much better in v 3.0
I am reasking this question as i thought this question should be on seperate thread from this one in-mongodb-know-index-of-array-element-matched-with-in-operator.
I am using mongoDB and actually i was writing all of my queries using simple queries which are find, update etc. (No Aggregations). Now i read on many SO posts see this one for example mongodb-aggregation-match-vs-find-speed. Now i thought about why increasing computation time on server because as if i will compute more then my server load will become more, so i tried to use aggregations and i thought i am going in right direction now. But later on my previous question andreas-limoli told me about not using aggregations as it is slow and for using simple queries and computing on server. Now literally i am in a delimma about what should i use, i am working with mongoDB from a year now but i don't have any knowledge about its performance when data size increases so i completely don't know which one should i pick.
Also one more thing i didn't find on anywhere, if aggregation is slower than is it because of $lookup or not, because $lookup is the foremost thing i thought about using aggregation because otherwise i have to execute many queries serially and then compute on server which appears to me very poor in front of aggregation.
Also i read about 100MB restriction on mongodb aggregation when passing data from one pipeline to other, so how people handle that case efficiently and also if they turn on Disk usage then because Disk usage slow down everything than how people handle that case.
Also i fetched 30,000 sample collection and tried to run aggregation with $match and find query and i found that aggregation was little bit faster than find query which was aggregation took 180ms to execute where as find took 220 ms to execute.
Please help me out guys please it would be really helpful for me.
Aggregation pipelines are costly queries. It might impact on your performance as an increasing data because of CPU memory. If you can achieve the with find query, go for it because Aggregation is costlier once DB data increases.
Aggregation framework in MongoDB is similar to join operations in SQL. Aggregation pipelines are generally resource intensive operations. So if in case your work is satisfied with simple queries, you should use that one at first place.
However, if it is absolute necessary then you can use aggregation pipelines in case you need to fetch the data from the multiple collections.
I'm using mongoose in my project. When the number of documents in my collection becomes bigger, the method of find+sort becomes slower. So I use aggregate+$sort instead. I just wonder why?
Without seeing your data and your query it is difficult to answer why aggregate+sort is faster than find+sort.
But below are the things that holds good on find and aggregate
A well indexed(Indexing that suits your query) data will always yield faster results on your find query.
The components of aggregation pipeline which you use on your aggregate query, more operations is directly proportional to more execution time.
When you go for aggregation pipeline you can create new fields such as sum, avg and so on, which is not possible in a find.
see this thread for more info
MongoDB {aggregation $match} vs {find} speed
I'm evaluating MongoDB, coming from Membased/memcached because I want more flexibility.
Of course Membase is excellent in doing fast (multi)-key lookups.
I like the additional options that MongoDB gives me, but is it also fast in doing multi-key lookups? I've seen the $or and $in operator and I'm sure I can model it with that. I just want to know if it's performant (in the same league) as Membase.
use-case, e.g., Lucene/Solr returns 20 product-ids. Lookup these product-ids in Couchdb to return docs/ appropriate fields.
Thanks,
Geert-Jan
For your use case, I'd say it is, from my experience: I hacked some analytics into a database of mine that made a lot of $in queries with thousands of ids and it worked fine (it was a hack). To my surprise, it worked rather well, in the lower millisecond area.
Of course, it's hard to compare this, and -as usual- theory is a bad companion when it comes to performance. I guess the best way to figure it out is to migrate some test data and send some queries to the system.
Use MongoDB's excellent built-in profiler, use $explain, keep the one index per query rule in mind, take a look at the logs, keep an eye on mongostat, and do some benchmarks. This shouldn't take too long and give you a definite and affirmative answer. If your queries turn out slow, people here and on the news group probably have some ideas how to improve the exact query, or the indexation.
One index per query. It's sometimes thought that queries on multiple
keys can use multiple indexes; this is not the case with MongoDB. If
you have a query that selects on multiple keys, and you want that
query to use an index efficiently, then a compound-key index is
necessary.
http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ#IndexingAdviceandFAQ-Oneindexperquery.
There's more information on that page as well with regard to Indexes.
The bottom line is Mongo will be great if your indexes are in memory and you are indexing on the columns you want to query using composite keys. If you have poor indexing then your performance will suffer as a result. This is pretty much in line with most systems.