Performance difference when using find+projection vs select - mongodb

Is there a difference or performance penalty when I use db.collection.find('stuff'{projection}) vs db.collection.find('stuff').select({'keyWeWant'})?
I've struggled with documentation for a while now and can't find anything useful.

I would suggest to check $explain, you will find the actual query execution. I believe internally it will same
https://docs.mongodb.org/manual/reference/operator/meta/explain/

Can you explain which language you are using because the syntax
db.collection.find('stuff').select({'keyWeWant'})
Is something I don't know.
For your question :
Using projection wil only return the fields included in the projection so you will save network. You can utilize this even further and create a so called covered-index, that holds the query keys and projection keys and so you get your data directly from the index which will be a serious performance gain.
Offcourse, adding to much data to your index will hurt the performance on that side so choose your battle wisely
Regards
Chris

Related

MongoDB Data Transformation for UI display

We've stumbled into a huge issue, which we need to figure out how to achieve in a right manner.
We are using MongoDB via Mongoose and dump a lot of different data.
We need to create a big aggregation from few collections based on certain inputs.
Creating an aggregation function doesn't supply good performance times.
We need to find a technical solution, the correct one,
To actually create "ETL" but no a real ETL, to store a real time sample of the data, so UI layer could query it smoothly.
Let's say i have 5 collections, which i need a real time display of 3 fields of each, and "join" using aggregation won't supply a good enough performance wise solution.
We might need a mediator, we thought dumping data using etl to redshift but it doesn't feel like the right solution.
It seems like a common problem, but we don't seem to find the smooth and correct solution.
We don't mind deploying whatever needed.
Thanks for any advice.

What is the cutoff parameter stands for in Sphinx SetLimits method?

What is the cutoff parameter stands for in Sphinx SetLimits method?
How can I optimize my queries using it?
I understand that when you use cutoff parameter, Sphinx stops the search when it finds specified count of records.
But is it useful in standard queries with offset/limit ?
Can i win something in efficiency by using it ?
I see only one condition - If I know the exact count of sought-for records.
It can be use used as such, and in some circumstances it can help performance.
In the general case, if you dont care about accuarcy, instead wanting performance, might be worth looking.
But its tricky to use well, need a deep understanding of your results, and specific requirements. Its not a few sentances in a online forum reply.

Aggregate,Find,Group confusion?

I am building a web based system for my organization, using Mongo DB, I have gone through the document provided by mongo db and came to the following conclusion:
find: Cannot pull data from sub array.
group: Cannot work in sharded environment.
aggregate:Best for sub arrays, but has performance issue when data set is large.
Map Reduce : Too risky to write map and reduce function.
So,if someone can help me out with the best approach to work with sub array document, in production environment having sharded cluster.
Example:
{"testdata":{"studdet":[{"id","name":"xxxx","marks",80}.....]}}
now my "studdet" is a huge collection of more than 1000, rows for each document,
So suppose my query is:
"Find all the "name" from "studdet" where marks is greater than 80"
its definitely going to be an aggregate query, so is it feasible to go with aggregate in this case because ,"find" cannot do this and "group" will not work in sharded environment, so if I go with aggregate what will be the performance impact, i need to call this query most of the time.
Please have a look at:
http://docs.mongodb.org/manual/core/data-modeling/
and
http://docs.mongodb.org/manual/tutorial/model-embedded-one-to-many-relationships-between-documents/#data-modeling-example-one-to-many
These documents describe the decisions in creating a good document schema in MongoDB. That is one of the hardest things to do in MongoDB, and one of the most important. It will affect your performance etc.
In your case running a database that has a student collection with an array of grades looks to be the best bet.
{_id:, …., grades:[{type:”test”, grade:80},….]}
In general, and, given your sample data set, the aggregation framework is the best choice. The aggregation framework is faster then map reduce in most cases (certainly in execution speed, it is C++ vs javascript for map reduce).
If your data's working set becomes so large you have to shard then aggregation, and everything else, will be slower. Not, however, slower then putting everything on a single machine that has a lot of page faults. Generally you need a working set larger then the RAM available on a modern computer for sharding to be the correct way to go such that you can keep everything in RAM. (At this point a commercial support contract for Mongo for assistance is going to be a less then the cost of hardware, and that include extensive help with schema design.)
If you need anything else please don’t hesitate to ask.
Best,
Charlie

How to see sort information system.profile (mongodb)?

I enabled profiling mode 2 (all events). My idea is to write a script, that will go through all the queries and execute explain plans and see whether any of the queries do not use indexes. However it's impossible to do so as I don't see any sorting information in the system.profile. Why so?
Thanks
UPDATE:
Imagine, you have a users collection. And you have created an index on this collection: user(name, createdAt). Now I would like to find out some users sorted by time. In system.profile the second part of the query (sorting/pagination) won't be saved, however it's very important to understand what sorting operation were used, as it affects the performance and index selection.
So my intention was to create a script that will go through all statements in system.profile and execute explain plans and see whether indexes are used or not. If not, I can automatically catch all new/not reliable queries while executing integration tests.
Why did 10gen choose to not include sorting information in the system.profile?
I'd suggest if you're interested in the feature, to request it here. (I couldn't find an existing suggestion for this feature). You might get more detail about why it hasn't been included.
I can see how it could be valuable sometimes, although it's recommended that you compare the nscanned vs the nreturned as a way of knowing whether indexes are being used (and more importantly, making sure as few documents are scanned as possible).
So, while not perfect, it gives you a quick way of evaluating performance and detecting index usage.
You can use the aggregation framework to sort
e.g. slowest operations first
db.system.profile.aggregate( { $sort : { millis : -1 }})
Sorry guys. I found an answer to that question. Actually mongo has it. It's called orderby inside the query element. My bad :-)

Is mongoDB efficient in doing multi-key lookups?

I'm evaluating MongoDB, coming from Membased/memcached because I want more flexibility.
Of course Membase is excellent in doing fast (multi)-key lookups.
I like the additional options that MongoDB gives me, but is it also fast in doing multi-key lookups? I've seen the $or and $in operator and I'm sure I can model it with that. I just want to know if it's performant (in the same league) as Membase.
use-case, e.g., Lucene/Solr returns 20 product-ids. Lookup these product-ids in Couchdb to return docs/ appropriate fields.
Thanks,
Geert-Jan
For your use case, I'd say it is, from my experience: I hacked some analytics into a database of mine that made a lot of $in queries with thousands of ids and it worked fine (it was a hack). To my surprise, it worked rather well, in the lower millisecond area.
Of course, it's hard to compare this, and -as usual- theory is a bad companion when it comes to performance. I guess the best way to figure it out is to migrate some test data and send some queries to the system.
Use MongoDB's excellent built-in profiler, use $explain, keep the one index per query rule in mind, take a look at the logs, keep an eye on mongostat, and do some benchmarks. This shouldn't take too long and give you a definite and affirmative answer. If your queries turn out slow, people here and on the news group probably have some ideas how to improve the exact query, or the indexation.
One index per query. It's sometimes thought that queries on multiple
keys can use multiple indexes; this is not the case with MongoDB. If
you have a query that selects on multiple keys, and you want that
query to use an index efficiently, then a compound-key index is
necessary.
http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ#IndexingAdviceandFAQ-Oneindexperquery.
There's more information on that page as well with regard to Indexes.
The bottom line is Mongo will be great if your indexes are in memory and you are indexing on the columns you want to query using composite keys. If you have poor indexing then your performance will suffer as a result. This is pretty much in line with most systems.