What is the cutoff parameter stands for in Sphinx SetLimits method? - sphinx

What is the cutoff parameter stands for in Sphinx SetLimits method?
How can I optimize my queries using it?
I understand that when you use cutoff parameter, Sphinx stops the search when it finds specified count of records.
But is it useful in standard queries with offset/limit ?
Can i win something in efficiency by using it ?
I see only one condition - If I know the exact count of sought-for records.

It can be use used as such, and in some circumstances it can help performance.
In the general case, if you dont care about accuarcy, instead wanting performance, might be worth looking.
But its tricky to use well, need a deep understanding of your results, and specific requirements. Its not a few sentances in a online forum reply.

Related

MongoDB aggregate for Dashboard

I want to show the data in MongoDB on the dashboard. I implemented it by applying the "Aggregate"
.
I am constantly receiving the "Query Targeting: Scanned Objects / Returned has gone about 1000" alert. How do I solve this alert? The method I thought of is as follows.
Remove the aggregation function from the dashboard: If we need the aggregation data, send a query at that time to obtain the data.
Separate aggregate functions and send queries from business logic: Divide data obtained at once through aggregate functions into multiple queries and then combine the data.
If there is a better way, I wonder if there is a common way.
I am constantly receiving the "Query Targeting: Scanned Objects / Returned has gone about 1000" alert. How do I solve this alert?
What, specifically, are you trying to solve here?
The Query Targeting metric (and associated alert) provides general information regarding the efficiency of the workload against the cluster. It can help with identifying potential problems there, most notably when relevant indexes are missing. Some more information about the metrics and actions that you can take for it are described here.
That said, the metric itself is not perfect. The fact that the targeting ratio is high enough to trigger an alert does not necessarily mean that there is a problem or that any particular action needs to be taken. Particularly notable here is that aggregation operations can cause misleading targeting ratios depending on what types of transformations the pipeline is applying. So the existence of the alert indicates there may be some improvements that could be pursued, but it does not guarantee that there are. You can certainly take a look at the workload using the strategies described in that documentation to determine if any actions like index creation are needed in your specific situation.
The two approaches that you specifically mention in the question could be considered but they kind of don't directly address the alert itself. Certainly if these are heavy aggregations that aren't needed for the application to function then there may be good reason to consider reducing their frequency. But if they are needed by the application and they are structured to be reasonably efficient, then I would not recommend trying to make any drastic adjustments just to avoid triggering the alert. Rather it may be the case that the default query targeting alert is too low for your particular use case and workload and you may consider raising it instead.

Measuring the performance of a count query in MongoDB

I'm currently evaluating MongoDB and comparing it to OracleDB. In order to compare them I measure the performance of a dataset in both database environments.
I tried to measure the performance of the count() function in MongoDB but couldn't seem to make it work.
This is what my MongoDB count query looks like at the moment:
db.test2.find({"Interpret": "Apashe"}).count();
It works fine, but how can I measure the time it took MongoDB to perform this? I tried the usual
explain("executionStats")
but it doesn't work that way with count it seems.
Any help would be appreciated
count is estimated and comes out of collection stats. Benchmarking it against oracle doesn't seem very useful.
You can try benchmarking countDocuments which should provide a meaningful signal. Though I am also confused why you decided to benchmark counts, a more sensible starting point should be finds and once you understand how counts are implemented you can benchmark counts and get some useful signal.
I think according to the documentation here:
https://docs.mongodb.com/manual/reference/method/cursor.explain/#mongodb-method-cursor.explain
count() is equivalent to the db.collection.find(query).count() construct. So essentially you can measure the find query.

Alternative to Boolean OR in Sphinx?

I have/had a mysql query that was pretty fast using in e.g.
FieldA in (X,Y,Z)
I've moved over to Sphinx which is clearly much faster EXCEPT when using pipes in case like this e.g.
#(FieldA) (X|Y|Z)
Where X|Y|Z are actually about 40 different values. The MysQl In takes .3 seconds the Sphinx takes over a minute. Given how much faster Sphinx has proven to be I am wondering if there is some 'IN' version for Sphinx with multiple values vs | which clearly is slowing it down.
Really it depends on a lot of things. For certain queries, changing to use a MVA might better than using keywords. (they you do have an 'IN' function )
... particularly if you have other search keywords.
Sphinxes full-text indexing is optimized for answering short user entered queries. To answer a long 'OR' style query, it has to load and merge each wordlist. And rank all that. Its all overhead.
Whereas attribute based filtering is generally pretty quick, particully if you have a highly selective keyword query, which gives a relatively short list of potential matches.

Performance difference when using find+projection vs select

Is there a difference or performance penalty when I use db.collection.find('stuff'{projection}) vs db.collection.find('stuff').select({'keyWeWant'})?
I've struggled with documentation for a while now and can't find anything useful.
I would suggest to check $explain, you will find the actual query execution. I believe internally it will same
https://docs.mongodb.org/manual/reference/operator/meta/explain/
Can you explain which language you are using because the syntax
db.collection.find('stuff').select({'keyWeWant'})
Is something I don't know.
For your question :
Using projection wil only return the fields included in the projection so you will save network. You can utilize this even further and create a so called covered-index, that holds the query keys and projection keys and so you get your data directly from the index which will be a serious performance gain.
Offcourse, adding to much data to your index will hurt the performance on that side so choose your battle wisely
Regards
Chris

Is mongoDB efficient in doing multi-key lookups?

I'm evaluating MongoDB, coming from Membased/memcached because I want more flexibility.
Of course Membase is excellent in doing fast (multi)-key lookups.
I like the additional options that MongoDB gives me, but is it also fast in doing multi-key lookups? I've seen the $or and $in operator and I'm sure I can model it with that. I just want to know if it's performant (in the same league) as Membase.
use-case, e.g., Lucene/Solr returns 20 product-ids. Lookup these product-ids in Couchdb to return docs/ appropriate fields.
Thanks,
Geert-Jan
For your use case, I'd say it is, from my experience: I hacked some analytics into a database of mine that made a lot of $in queries with thousands of ids and it worked fine (it was a hack). To my surprise, it worked rather well, in the lower millisecond area.
Of course, it's hard to compare this, and -as usual- theory is a bad companion when it comes to performance. I guess the best way to figure it out is to migrate some test data and send some queries to the system.
Use MongoDB's excellent built-in profiler, use $explain, keep the one index per query rule in mind, take a look at the logs, keep an eye on mongostat, and do some benchmarks. This shouldn't take too long and give you a definite and affirmative answer. If your queries turn out slow, people here and on the news group probably have some ideas how to improve the exact query, or the indexation.
One index per query. It's sometimes thought that queries on multiple
keys can use multiple indexes; this is not the case with MongoDB. If
you have a query that selects on multiple keys, and you want that
query to use an index efficiently, then a compound-key index is
necessary.
http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ#IndexingAdviceandFAQ-Oneindexperquery.
There's more information on that page as well with regard to Indexes.
The bottom line is Mongo will be great if your indexes are in memory and you are indexing on the columns you want to query using composite keys. If you have poor indexing then your performance will suffer as a result. This is pretty much in line with most systems.