OrientDB : How do I implement pagination? - orientdb

I have gone through this documentation about pagination.
It says Skip-Limit is the slower way because OrientDB repeats the query and just skips the first X records from the result. I want to skip 100000 records and select the next 20.
Will this query die? Is there any other way I can achieve this, but without using RID-LIMIT?

Related

Couchbase N1QL Query getting distinct on the basis of particular fields

I have a document structure which looks something like this:
{
...
"groupedFieldKey": "groupedFieldVal",
"otherFieldKey": "otherFieldVal",
"filterFieldKey": "filterFieldVal"
...
}
I am trying to fetch all documents which are unique with respect to groupedFieldKey. I also want to fetch otherField from ANY of these documents. This otherFieldKey has minor changes from one document to another, but I am comfortable with getting ANY of these values.
SELECT DISTINCT groupedFieldKey, otherField
FROM bucket
WHERE filterFieldKey = "filterFieldVal";
This query fetches all the documents because of the minor variations.
SELECT groupedFieldKey, maxOtherFieldKey
FROM bucket
WHERE filterFieldKey = "filterFieldVal"
GROUP BY groupFieldKey
LETTING maxOtherFieldKey= MAX(otherFieldKey);
This query works as expected, but is taking a long time due to the GROUP BY step. As this query is used to show products in UI, this is not a desired behaviour. I have tried applying indexes, but it has not given fast results.
Actual details of the records:
Number of records = 100,000
Size per record = Approx 10 KB
Time taken to load the first 10 records: 3s
Is there a better way to do this? A way of getting DISTINCT only on particular fields will be good.
EDIT 1:
You can follow this discussion thread in Couchbase forum: https://forums.couchbase.com/t/getting-distinct-on-the-basis-of-a-field-with-other-fields/26458
GROUP must materialize all the documents. You can try covering index
CREATE INDEX ix1 ON bucket(filterFieldKey, groupFieldKey, otherFieldKey);

Why MongoDB find has same performance as count

I am running tests against my MongoDB and for some reason find has the same performance as count.
Stats:
orders collection size: ~20M,
orders with product_id 6: ~5K
product_id is indexed for improved performance.
Query: db.orders.find({product_id: 6}) vs db.orders.find({product_id: 6}).count()
result the orders for the product vs 5K after 0.08ms
Why count isn't dramatically faster? it can find the first and last elements position with the product_id index
As Mongo documentation for count states, calling count is same as calling find, but instead of returning the docs, it just counts them. In order to perform this count, it iterates over the cursor. It can't just read the index and determine the number of documents based on first and last value of some ID, especially since you can have index on some other field that's not ID (and Mongo IDs are not auto-incrementing). So basically find and count is the same operation, but instead of getting the documents, it just goes over them and sums their number and return it to you.
Also, if you want a faster result, you could use estimatedDocumentsCount (docs) which would go straight to collection's metadata. This results in loss of the ability to ask "What number of documents can I expect if I trigger this query?". If you need to find a count of docs for a query in a faster way, then you could use countDocuments (docs) which is a wrapper around an aggregate query. From my knowledge of Mongo, the provided query looks like a fastest way to count query results without calling count. I guess that this should be preferred way regarding performances for counting the docs from now on (since it's introduced in version 4.0.3).

Are Postgres WHERE clauses run sequentially?

I'm looking at using Postgres as a database to let our clients segment their customers.
The idea is that they can select a bunch of conditions in our front-end admin, and these conditions will get mapped to a SQL query. Right now, I'm thinking the best structure could be something like this:
SELECT DISTINCT id FROM users
WHERE id IN (
-- condition 1
)
AND id IN (
-- condition 2
)
AND id IN (
-- etc
)
Efficiency and query speed is super important to us, and I'm wondering if this is the best way of structuring things. When going through each of the WHERE clauses, will Postgres pass the id values from one to the next?
The ideal scenario would be, for a group of 1m users:
Query 1 filters down to 100k
Query 2 filters down from 100k to 10k
Query 3 filters down to 10k to 5k
As opposed to:
Query 1 filters from 1m to 100k
Query 2 filters down from 1m to 50k
Query 3 filters down from 1m to 80k
The intersection of all queries are mashed together, to 5k
Maybe I'm misunderstanding something here, I'd love to get your thoughts!
Thanks!
Postgres uses a query planner to figure out how to most efficiently apply your query. It may reorder things or change how certain query operations (such as joins) are implemented, based on statistical information periodically collected in the background.
To determine how the query planner will structure a given query, stick EXPLAIN in front of it:
EXPLAIN SELECT DISTINCT id FROM users ...;
This will output the query plan for that query. Note that an empty table may get a totally different query plan from a table with (say) 10,000 rows, so be sure to test on real(istic) data.
Database engines are much more sophisticated than that.
The specific order of the conditions should not matter. They will take your query as a whole and try to figure out the best way to get the data according to all the conditions you specified, the indexes that each table has, the amount of records each condition will filter out, etc.
If you want to get an idea of how your query will actually be solved you can ask the engine to "explain" it for you: http://www.postgresql.org/docs/current/static/sql-explain.html
However, please note that there is a lot of technical background on how DB engines actually work in order to understand what that explanation means.

Implementation of limit in mongodb

My collection name is trial and data size is 112mb
My query is,
db.trial.find()
and i have added limit up-to 10.
db.trial.find.limit(10).
but the limit is not working.the entire query is running.
Replace
db.trial.find.limit(10)
with
db.trial.find().limit(10)
Also you mention that the entire database is being queried? Run this
db.trial.find().limit(10).explain()
It will tell you how many documents it looked at before stopping the query (nscanned). You will see that nscanned will be 10.
The .limit() modifier on it's own will only "limit" the results of the query that is processed, so that works as designed to "limit" the results returned. In a raw form though with no query you should just have the n scanned as the limit you want:
db.trial.find().limit(10)
If your intent is to only operate on a set number of documents you can alter this with the $maxScan modifier:
db.trial.find({})._addSpecial( "$maxScan" , 11 )
Which causes the query engine to "give up" after the set number of documents have been scanned. But that should only really matter when there is something meaningful in the query.
If you are actually trying to do "paging" then you are better of using "range" queries with $gt and $lt and cousins to effectively change the range of selection that is done in your query.

MongoDB - Can I have multiple sort fields in a map reduce?

I have a map/reduce that I noticed was taking nearly 10 seconds to run, even though no results were being returned. Using the mongo shell, I was able to determine that my initial query was the culprit. I was able to add a 2nd sort field to the query, which sped it up drastically.
However, when I try to add that 2nd sort field to my map reduce, I get the error "could not create cursor over [collection] for query ...". Is there any way for me to add a 2nd sort field to the map reduce?
Edit: The goal of my query is to find the first record created by each user/day. So the key that I am emitting is the user's id + created on day, ignoring any time. That way I am grouping all records created by a user on a given day. In my reduce, I am then taking the record that was created first. I have actually ditched using map/reduce and am now doing essentially the same thing, but with a normal find() and then some javascript to group and reduce.