I'am having a low performance in a simple query.
I have a class SimpleVertex with a field named s.
I added 1000000 vertex with the field "u"+i
When hay run this query:
select from SimpleVertex where s like 'u1000'
it took 13.001 sec to return the vertex.
if I run this query:
select from SimpleVertex where s like 'u1000%'
it took 13.214 sec to return, but the same queries running on a PostgreSQL run on 112ms and 233ms.
Why is the query so slow?
Related
I have created a mongodb database using mongolite and I create index on the _row key on the database using following command:
collection$index(add = '{"_row" : 1}')
when I query a document via Robo3T program with the db.getCollection('collection').find({"_row": "ENSG00000197616"}) command, my index works and it takes less than a second to query the data.
Robo3T screen shot >>> pay attention to the query time
This is also the case when I query the data using pymongo package in python.
python screenshot >>> pay attention to query time
Surprisingly, when I perform the same query with mongolite, it takes more than 10 seconds to query data:
system.time(collection$find(query = '{"_row": "ENSG00000197616"}'))
user system elapsed
12.221 0.005 12.269
I think this can only come from mongolite package, otherwise, it wouldn't work on the other programs as well.
Any input is highly appreciated!
I found the solution here:
https://github.com/jeroen/mongolite/issues/37
The time consuming part is not data query but simplifying it in a dataframe.
I am running a mongo query like this
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$exists:true, $ne:[]}}).sort({dateField:-1})
The collection has approx. 10^6 documents. I have indexes on the stringField and dateField (both ascending). This query takes ~3-4 seconds to run.
However, if I change my query to either of the below, it executes within 100ms
Remove $ne
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$exists:true}}).sort({dateField:-1})
Remove $exists
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$ne:[]}}).sort({dateField:-1})
Remove sort
db.getCollection('collection').find({stringField:"stringValue", arrayField:{$exists:true, $ne:[]}})
Use arrayField.0
db.getCollection('foodfulfilments').find({stringField:"stringValue", "arrayField.0":{$exists:true}}).sort({dateField:-1})
The explain of these queries do not provide any insights to why the first query is so slow?
MongoDb version 3.4.18
what is the best solution to improve the following distance query in order to improve the performance.
SELECT count(*) FROM place WHERE DISTANCE(lat, lng, 42.0697, -87.7878) < 10
The query always warn the following message if you have large data set around 80k
fetched more than 50000 records: to speed up the execution, create an index or change the query to use an existent index"
create the following index but it's not involved in that query.
place.distance NOTUNIQUE ["lat","lng"] SBTREE
You can use a spatial index.
You can look the documentation http://orientdb.com/docs/2.1/Spatial-Index.html
I'm investigating how MongoDB would work for us. One of the most used queries is used to get latest (or from a given time) measurements for each station. There is thousands of stations and each station has tens of thousands of measurements.
So we plan to have one collection for stations and another for measurements.
In SQL we would do the query with
SELECT * FROM measurements
INNER JOIN (
SELECT max(meas_time) station_id
FROM measurements
WHERE meas_time <= 'time_to_query'
GROUP BY station_id
) t2 ON t2.station_id = measurements.station_id
AND t2.meas_time = measurements.meas_time
This returns one measurement for each station, and the measurement is the newest one before the 'time_to_query'.
What query should be used in MongoDB to produce the same result? We are really using Rails and MongoId, but it should not matter.
update:
This question is not about how to perform a JOIN in MongoDB. The fact that in SQL getting the right data out of the table requires a join doesn't necessary mean that in MongoDB we would also need a join. There is only one table used in the query.
We came up with this query
db.measurements.aggregate([{$group:{ _id:{'station_id':"$station_id"}, time:{$max:'$meas_time'}}}]);
with indexes
db.measurements.createIndex({ station_id: 1, meas_time: -1 });
Even though it seems to give the right data it is really slow. Takes roughly a minute to get a bit over 3000 documents from a collection of 65 million.
Just found that MongoDB is not using the index in this query even though we are using the 3.2 version.
I guess worst case solution would be something like this (out of my head):
meassures = []
StationId.all.each do |station|
meassurement = Meassurment.where(station_id: station.id, meas_time <= 'time_to_query').order_by(meas_time: -1).limit(1)
meassures << [station.name, meassurement.measure, ....]
end
It depends on how much time query can take. Data should anyway be indexed by station_id and meas_time.
How much time does the SQL query take?
I'm looking at using Postgres as a database to let our clients segment their customers.
The idea is that they can select a bunch of conditions in our front-end admin, and these conditions will get mapped to a SQL query. Right now, I'm thinking the best structure could be something like this:
SELECT DISTINCT id FROM users
WHERE id IN (
-- condition 1
)
AND id IN (
-- condition 2
)
AND id IN (
-- etc
)
Efficiency and query speed is super important to us, and I'm wondering if this is the best way of structuring things. When going through each of the WHERE clauses, will Postgres pass the id values from one to the next?
The ideal scenario would be, for a group of 1m users:
Query 1 filters down to 100k
Query 2 filters down from 100k to 10k
Query 3 filters down to 10k to 5k
As opposed to:
Query 1 filters from 1m to 100k
Query 2 filters down from 1m to 50k
Query 3 filters down from 1m to 80k
The intersection of all queries are mashed together, to 5k
Maybe I'm misunderstanding something here, I'd love to get your thoughts!
Thanks!
Postgres uses a query planner to figure out how to most efficiently apply your query. It may reorder things or change how certain query operations (such as joins) are implemented, based on statistical information periodically collected in the background.
To determine how the query planner will structure a given query, stick EXPLAIN in front of it:
EXPLAIN SELECT DISTINCT id FROM users ...;
This will output the query plan for that query. Note that an empty table may get a totally different query plan from a table with (say) 10,000 rows, so be sure to test on real(istic) data.
Database engines are much more sophisticated than that.
The specific order of the conditions should not matter. They will take your query as a whole and try to figure out the best way to get the data according to all the conditions you specified, the indexes that each table has, the amount of records each condition will filter out, etc.
If you want to get an idea of how your query will actually be solved you can ask the engine to "explain" it for you: http://www.postgresql.org/docs/current/static/sql-explain.html
However, please note that there is a lot of technical background on how DB engines actually work in order to understand what that explanation means.