Query using "LIMIT" still warns about result set with more than 10000 records - orientdb

I have a simple model where I have a CLIENT class. Nodes from CLIENT can have a relationship OWS_MONEY_TO to another CLIENT. That's it.
With this setup, I created 10 million CLIENT nodes and 50 million random relationships of OWS_MONEY_TO within random CLIENTS.
When I run this query:
MATCH
{class:CLIENT, as:A}-OWS_MONEY_TO->{class:CLIENT, as:B}
RETURN A.name as Payer, B.name as Receiver limit 10
I hit this error:
Query 'SELECT FROM CLIENT' returned a result set with more than 10000 records.
Check if you really need all these records, or reduce the resultset by using a
LIMIT to improve both performance and used RAM
I'm using a limit already, as you may see, and I haven't been able to figure out how to get any result from this query.

Related

Couchbase N1QL Query getting distinct on the basis of particular fields

I have a document structure which looks something like this:
{
...
"groupedFieldKey": "groupedFieldVal",
"otherFieldKey": "otherFieldVal",
"filterFieldKey": "filterFieldVal"
...
}
I am trying to fetch all documents which are unique with respect to groupedFieldKey. I also want to fetch otherField from ANY of these documents. This otherFieldKey has minor changes from one document to another, but I am comfortable with getting ANY of these values.
SELECT DISTINCT groupedFieldKey, otherField
FROM bucket
WHERE filterFieldKey = "filterFieldVal";
This query fetches all the documents because of the minor variations.
SELECT groupedFieldKey, maxOtherFieldKey
FROM bucket
WHERE filterFieldKey = "filterFieldVal"
GROUP BY groupFieldKey
LETTING maxOtherFieldKey= MAX(otherFieldKey);
This query works as expected, but is taking a long time due to the GROUP BY step. As this query is used to show products in UI, this is not a desired behaviour. I have tried applying indexes, but it has not given fast results.
Actual details of the records:
Number of records = 100,000
Size per record = Approx 10 KB
Time taken to load the first 10 records: 3s
Is there a better way to do this? A way of getting DISTINCT only on particular fields will be good.
EDIT 1:
You can follow this discussion thread in Couchbase forum: https://forums.couchbase.com/t/getting-distinct-on-the-basis-of-a-field-with-other-fields/26458
GROUP must materialize all the documents. You can try covering index
CREATE INDEX ix1 ON bucket(filterFieldKey, groupFieldKey, otherFieldKey);

Getting output of 2 queries in one mongodb call from java morphia

I'm fairly new to mongodb but I was wondering if there's a way by which we can get 2 different results from same mongodb collection in one database call uisng mongo java driver with morphia.
I have a collection accounts and I'm fetching data based on a key accountId. I need below two results/outputs from this collection in one query.
count of all the documents where accountID is 'xyz'
ResultList of first N documents where accountID is 'xyz' AND resultSet is sorted by a timestamp field.
to resolve the second scenario I'm using:
..Query....limit(N).order("TimeField").field("TimeField").filter("accountID =", "xyz").asList();
This is working fine as per expectation but to get the total count (scenario 1) of all documents with accountId = 'xyz' needs another mongodb call, which I want to avoid.
MongoDB doesn't support such batching on queries, unfortunately. You'll have to execute two separate calls.

Query one document per association from MongoDB

I'm investigating how MongoDB would work for us. One of the most used queries is used to get latest (or from a given time) measurements for each station. There is thousands of stations and each station has tens of thousands of measurements.
So we plan to have one collection for stations and another for measurements.
In SQL we would do the query with
SELECT * FROM measurements
INNER JOIN (
SELECT max(meas_time) station_id
FROM measurements
WHERE meas_time <= 'time_to_query'
GROUP BY station_id
) t2 ON t2.station_id = measurements.station_id
AND t2.meas_time = measurements.meas_time
This returns one measurement for each station, and the measurement is the newest one before the 'time_to_query'.
What query should be used in MongoDB to produce the same result? We are really using Rails and MongoId, but it should not matter.
update:
This question is not about how to perform a JOIN in MongoDB. The fact that in SQL getting the right data out of the table requires a join doesn't necessary mean that in MongoDB we would also need a join. There is only one table used in the query.
We came up with this query
db.measurements.aggregate([{$group:{ _id:{'station_id':"$station_id"}, time:{$max:'$meas_time'}}}]);
with indexes
db.measurements.createIndex({ station_id: 1, meas_time: -1 });
Even though it seems to give the right data it is really slow. Takes roughly a minute to get a bit over 3000 documents from a collection of 65 million.
Just found that MongoDB is not using the index in this query even though we are using the 3.2 version.
I guess worst case solution would be something like this (out of my head):
meassures = []
StationId.all.each do |station|
meassurement = Meassurment.where(station_id: station.id, meas_time <= 'time_to_query').order_by(meas_time: -1).limit(1)
meassures << [station.name, meassurement.measure, ....]
end
It depends on how much time query can take. Data should anyway be indexed by station_id and meas_time.
How much time does the SQL query take?

Are Postgres WHERE clauses run sequentially?

I'm looking at using Postgres as a database to let our clients segment their customers.
The idea is that they can select a bunch of conditions in our front-end admin, and these conditions will get mapped to a SQL query. Right now, I'm thinking the best structure could be something like this:
SELECT DISTINCT id FROM users
WHERE id IN (
-- condition 1
)
AND id IN (
-- condition 2
)
AND id IN (
-- etc
)
Efficiency and query speed is super important to us, and I'm wondering if this is the best way of structuring things. When going through each of the WHERE clauses, will Postgres pass the id values from one to the next?
The ideal scenario would be, for a group of 1m users:
Query 1 filters down to 100k
Query 2 filters down from 100k to 10k
Query 3 filters down to 10k to 5k
As opposed to:
Query 1 filters from 1m to 100k
Query 2 filters down from 1m to 50k
Query 3 filters down from 1m to 80k
The intersection of all queries are mashed together, to 5k
Maybe I'm misunderstanding something here, I'd love to get your thoughts!
Thanks!
Postgres uses a query planner to figure out how to most efficiently apply your query. It may reorder things or change how certain query operations (such as joins) are implemented, based on statistical information periodically collected in the background.
To determine how the query planner will structure a given query, stick EXPLAIN in front of it:
EXPLAIN SELECT DISTINCT id FROM users ...;
This will output the query plan for that query. Note that an empty table may get a totally different query plan from a table with (say) 10,000 rows, so be sure to test on real(istic) data.
Database engines are much more sophisticated than that.
The specific order of the conditions should not matter. They will take your query as a whole and try to figure out the best way to get the data according to all the conditions you specified, the indexes that each table has, the amount of records each condition will filter out, etc.
If you want to get an idea of how your query will actually be solved you can ask the engine to "explain" it for you: http://www.postgresql.org/docs/current/static/sql-explain.html
However, please note that there is a lot of technical background on how DB engines actually work in order to understand what that explanation means.

Does QBO v3 QueryService limit responses to 100 rows? Can I set a higher limit?

I am using QueryService to retrieve a list of customers. This seems to limit the number of returned rows to 100 rows, maximum.
Here is my code:
QueryService<Intuit.Ipp.Data.Customer> customerQueryService = new QueryService<Intuit.Ipp.Data.Customer>(serviceContext);
List<Intuit.Ipp.Data.Customer> customers = customerQueryService.Select(c => c).ToList();
How do I set a higher limit for the maximum number of returned rows?
https://developer.intuit.com/docs/0025_quickbooksapi/0050_data_services/020_key_concepts/00300_query_operations
Maximum Number of Entities in a Response
The maximum number of entities that can be returned in a response is 1000. If the result size is not specified, the default number is 100. If a query returns many entities, fetch the entities in chunks, as described in Pagination. To determine the number of entities that a particular query returns, probe by using the COUNT keyword in the query. See Count for details.