I started using SPARQL queries on an RDF dataset, and I wanted to know whether there's a difference (time consumption) between SELECT queries and ASK queries, given the same constraints.
To be more precise: I don't really need the results but I need just to check if there's data which satisfies my constraints or no;
That is why I'm asking whether it would be better to use ASK or SELECT queries (regarding performance).
Since you only need to know if any solution in the endpoint would satisfy the query, ASK is the verb to use. There are various reasons why some SELECT queries might return (some) result(s) faster than the parallel ASK, but overall ASK will be faster.
Related
I'm wondering how $in works behind the scenes, and what optimizations are made. Does it loop through the database, looking for the required items, or know immediately where those are? Do indexes matter in those operations?
I'm trying to be efficient as possible, by making one query, and querying the documents I need in one go, but maybe when providing a single ID, which is guaranteed to be indexed, it's faster, and worth the multiple queries.
I guess there is a factor of how many documents we're talking about, in my case it's only a few. I assume with a lot of IDs it may worth it to just query them in one go, but maybe not. I'm not too experienced in mongo.
Generally, It is always better to reduce network roundtrip to the database.
In your case, using $in operator is better because if you make many requests to the database for each id, you will have so many roundtrips.
when you send your query to the database, it will try to create the most efficient execution plan for your query and if there are any indices that can help to achieve a more efficient execution plan, the database will use them.
Mongo creates an index on _id filed of the document by default.
I am writing a public-facing query and search API where users can specify multiple conditions. These conditions will be mapped to a complex SQLAlchemy query through various filters and conditions.
What would be the best way to limit the complexity and duration of the query, so that users cannot, by accident or by purpose, cause a denial of service on the system by creating queries that take too long to process? The unsafe conditions would be 1) returning too many items once 2) the resulting query is too complex and slow and will take forever for the PostgreSQL server to evaluate. The database may return a handful or millions of items depending on the filter.
Is implementing pagination with a fixed size LIMIT the only way to go or are there more advanced ways to tackle this?
Can I add a condition on the evaluated SQL expression or SQLAlchemy session to timeout after e.g. 1 second, no matter what the query looks like?
Using PostgreSQL as the database.
I already read official documentation to get the basic idea on getPlanCache() and hint().
getPlanCache()
Displays the cached query plans for the specified query shape.
The query optimizer only caches the plans for those query shapes that can have more than one viable plan.
Official Documentation: https://docs.mongodb.com/manual/reference/method/PlanCache.getPlansByQuery/
hint()
The $hint operator forces the query optimizer to use a specific index to fulfill the query. Specify the index either by the index name or by document.
Official Documentation: https://docs.mongodb.com/manual/reference/operator/meta/hint/
MyQuestion
If I can make sure the specific collection can cache the plan, I don't need to use hint() to ensure optimized performance. Is that correct?
I already read official documentation to get the basic idea on getPlanCache() and hint().
To be clear: these are troubleshooting aids for investigating query performance. The MongoDB query planner chooses the most efficent plan available based on a measure of "work" involved in executing a given query shape. If there is only a single viable plan, there is no need to cache the plan selection. If there are multiple query plans available for the same query shape, the query planner will periodically evaluate performance and update the cached plan selection if appropriate.
The query plan cache methods allow you to inspect and clear information in the plan cache. Generally you would only want to clear the plan cache while investigating issues in a development/staging environment as this could have a noticeable affect on a busy deployment.
If I can make sure the specific collection can cache the plan, I don't need to use hint() to ensure optimized performance. Is that correct?
In general you should avoid using hint (outside of testing query plans) as this bypasses the query planner and forces use of the hinted index even if there might be a more efficient index available.
If a specific query is not performing as expected, explain() output is the best starting point for insight into the query planning process. If you're not sure how to optimise a specific query, I'd suggest posting a question on DBA StackExchange including the output of explain(true) (verbose explain) and your MongoDB server version.
For a helpful presentation, see: Reading the .explain() Output - Charlie Swanson (June 2017).
Can someone explain the exact difference of MLT and normal select query in Solr ? I know that Solr uses an advanced form of TF.IDF to score documents based on a select query for a textual field, but how does the scoring algorithm differ when MLT is being used ?
I'm not sure if the question really makes sense - More Like This is used to find more documents similar to one you already have. This is different from entering a query and wanting to get something back, they're used to solve very different modes of operation.
Behind the scenes they're both queries in the meaning of "looks up something in the index based on input", which for MLT is the terms from the existing document, instead of the query the user has entered.
You can see how the MLT query is built in MoreLikeThis.java. If I read the code correctly, a PriorityQueue is used to fetch the scores for all the terms, which are then added as boost queries to a large set of terms in a boolean query, where each term is set to SHOULD occur. That way the terms are boosted based on MLT semantics, while it uses the ClassicSimilarity behind the scenes.
But again, the use case for MLT is very different from when you'd use a regular query.
I'm evaluating MongoDB, coming from Membased/memcached because I want more flexibility.
Of course Membase is excellent in doing fast (multi)-key lookups.
I like the additional options that MongoDB gives me, but is it also fast in doing multi-key lookups? I've seen the $or and $in operator and I'm sure I can model it with that. I just want to know if it's performant (in the same league) as Membase.
use-case, e.g., Lucene/Solr returns 20 product-ids. Lookup these product-ids in Couchdb to return docs/ appropriate fields.
Thanks,
Geert-Jan
For your use case, I'd say it is, from my experience: I hacked some analytics into a database of mine that made a lot of $in queries with thousands of ids and it worked fine (it was a hack). To my surprise, it worked rather well, in the lower millisecond area.
Of course, it's hard to compare this, and -as usual- theory is a bad companion when it comes to performance. I guess the best way to figure it out is to migrate some test data and send some queries to the system.
Use MongoDB's excellent built-in profiler, use $explain, keep the one index per query rule in mind, take a look at the logs, keep an eye on mongostat, and do some benchmarks. This shouldn't take too long and give you a definite and affirmative answer. If your queries turn out slow, people here and on the news group probably have some ideas how to improve the exact query, or the indexation.
One index per query. It's sometimes thought that queries on multiple
keys can use multiple indexes; this is not the case with MongoDB. If
you have a query that selects on multiple keys, and you want that
query to use an index efficiently, then a compound-key index is
necessary.
http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ#IndexingAdviceandFAQ-Oneindexperquery.
There's more information on that page as well with regard to Indexes.
The bottom line is Mongo will be great if your indexes are in memory and you are indexing on the columns you want to query using composite keys. If you have poor indexing then your performance will suffer as a result. This is pretty much in line with most systems.