Cannot modify PostgreSql query, how to speed it up - postgresql

I'm looking for speeding up a query (PostgreSql 9.5), but I cannot change it, because is executed by an application I cannot modify.
So, I captured the query from the PostgreSql logs, here it is:
SELECT Count(*)
FROM (SELECT ti.idturnosistemaexterno,
ti.historiaclinica_hp,
p.patientname,
CASE
WHEN ( ti.creationdate :: VARCHAR IS NOT NULL ) THEN
Date_trunc('SECOND', ti.creationdate) :: VARCHAR
ELSE 'NO EXISTE' :: VARCHAR
END AS creationdate,
CASE
WHEN ( st.idstudy :: VARCHAR IS NOT NULL ) THEN 'SI' :: VARCHAR
ELSE 'NO' :: VARCHAR
END AS idstudy,
st.institutionname,
CASE
WHEN ( st.created_time :: VARCHAR IS NOT NULL ) THEN
Date_trunc('SECOND', st.created_time) :: VARCHAR
ELSE 'NO EXISTE' :: VARCHAR
END AS created_time,
ti.enviado,
st.accessionnumber,
st.modality
FROM study st
right join turnointegracion ti
ON st.accessionnumber = ti.idturnosistemaexterno
left join patient p
ON st.idpatient = p.idpatient
ORDER BY ti.creationdate DESC) AS foo;
The explain analyze output is this:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=231136.16..231136.17 rows=1 width=0) (actual time=32765.883..32765.883 rows=1 loops=1)
-> Sort (cost=230150.04..230314.39 rows=65741 width=8) (actual time=32754.992..32761.780 rows=64751 loops=1)
Sort Key: ti.creationdate DESC
Sort Method: external merge Disk: 1648kB
-> Hash Right Join (cost=219856.39..224889.28 rows=65741 width=8) (actual time=26653.007..32714.961 rows=64751 loops=1)
Hash Cond: ((st.accessionnumber)::text = (ti.idturnosistemaexterno)::text)
-> Seq Scan on study st (cost=0.00..4086.26 rows=77126 width=12) (actual time=12.983..6032.251 rows=77106 loops=1)
-> Hash (cost=219048.95..219048.95 rows=64595 width=16) (actual time=26639.722..26639.722 rows=64601 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 3602kB
-> Seq Scan on turnointegracion ti (cost=0.00..219048.95 rows=64595 width=16) (actual time=17.259..26611.806 rows=64601 loops=1)
Planning time: 25.519 ms
Execution time: 32766.710 ms
(12 rows)
Here are the related table definitions:
Table "public.turnointegracion"
Column | Type | Modifiers
---------------------------+-----------------------------+--------------------------------------------------------------------
idturnosistemaexterno | character varying(50) |
historiaclinica_hp | integer |
matriculaprofrealiza | character varying(10) |
matriculaprofinforma | character varying(10) |
obrasocial | character varying(20) |
planobrasocial | character varying(20) |
nroafiliado | character varying(20) |
nroautorizacion | character varying(20) |
matriculaprofprescribe | character varying(10) |
codigonomenclador | character varying(10) |
cantidadcodigonomenclador | integer |
importeunitariohonorarios | money |
importeunitarioderechos | money |
nrodefacturacion | character varying(15) |
informe | bytea |
titulodelestudio | character varying(250) |
fechaprescripcion | timestamp without time zone |
fechahora | timestamp without time zone |
enviado | boolean | not null default false
enviadofechahora | timestamp without time zone |
procesado_hp | timestamp without time zone |
modalidad | character varying(6) |
orden | integer | not null default nextval('turnointegracion_orden_seq'::regclass)
idturno | integer | not null default nextval('seq_turnointegracion_idturno'::regclass)
creationdate | timestamp without time zone | default now()
informetxt | text |
informedisponible | timestamp without time zone |
informeprocesado | timestamp without time zone |
Indexes:
"turnointegracion_pkey" PRIMARY KEY, btree (idturno)
"idx_fechahora" btree (fechahora)
"idx_historiaclinicahp" btree (historiaclinica_hp)
"idx_idturnosistemaexterno" btree (idturnosistemaexterno)
"idx_informedisponible" btree (informedisponible)
"idx_turnointegracion_creationdate" btree (creationdate DESC)
"idx_turnointegracion_idturnosistext_text" btree ((idturnosistemaexterno::text))
Table "public.study"
Column | Type | Modifiers
------------------------------+-----------------------------+---------------------------------------------------------
idstudy | integer | not null default nextval('study_idstudy_seq'::regclass)
studydate | date |
studytime | time without time zone |
studyid | character varying(20) |
studydescription | character varying(255) |
modality | character varying(2) |
modalityaetitle | character varying(50) |
nameofphysiciansreadingstudy | character varying(255) |
accessionnumber | character varying(20) |
performingphysiciansname | character varying(255) |
referringphysiciansname | character varying(255) |
studyinstanceuid | character varying(255) |
status | status_ |
institutionname | character varying(100) |
idpatient | integer |
created_time | timestamp without time zone |
Indexes:
"study_pkey" PRIMARY KEY, btree (idstudy)
"study_studyinstanceuid_key" UNIQUE CONSTRAINT, btree (studyinstanceuid)
"idx_study_accession_text" btree ((accessionnumber::text))
"idx_study_accessionnumber" btree (accessionnumber)
"idx_study_idstudy" btree (idstudy)
Foreign-key constraints:
"study_idpatient_fkey" FOREIGN KEY (idpatient) REFERENCES patient(idpatient)
Referenced by:
TABLE "series" CONSTRAINT "series_idstudy_fkey" FOREIGN KEY (idstudy) REFERENCES study(idstudy)
As you can see, I've added indexes on the affected columns but the planner is still doing sequential scans. Is there a way to improve this?.

There is no WHERE condition, due to this join:
right join turnointegracion ti
ON st.accessionnumber = ti.idturnosistemaexterno
you're reading all records from turnointegracion, adding an index for `creationdate' you can accelerate the sort function, but again all records are returned.
Filtering by creationdate can reduce the final time.

Related

Postgresql index is not used for slow queries >30s

POSTGRESQL VERSION: 10
HARDWARE: 4 workers / 16GBRAM / 50% used
I'm not a Postgresql expert. I have just read a lot of documentation and did a lot of tests.
I have some postgresql queries whick take a lot of times > 30s because of 10 millions of rows on a table.
Column | Type | Collation | Nullable | Default
------------------------------+--------------------------+-----------+----------+----------------------------------------------------------
id | integer | | not null |
cveid | character varying(50) | | |
summary | text | | not null |
published | timestamp with time zone | | |
modified | timestamp with time zone | | |
assigner | character varying(128) | | |
vulnerable_products | character varying(250)[] | | |
cvss | double precision | | |
cvss_time | timestamp with time zone | | |
cvss_vector | character varying(250) | | |
access | jsonb | | not null |
impact | jsonb | | not null |
score | integer | | not null |
is_exploitable | boolean | | not null |
is_confirmed | boolean | | not null |
is_in_the_news | boolean | | not null |
is_in_the_wild | boolean | | not null |
reflinks | jsonb | | not null |
reflinkids | jsonb | | not null |
created_at | timestamp with time zone | | |
history_id | integer | | not null | nextval('vulns_historicalvuln_history_id_seq'::regclass)
history_date | timestamp with time zone | | not null |
history_change_reason | character varying(100) | | |
history_type | character varying(1) | | not null |
Indexes:
"vulns_historicalvuln_pkey" PRIMARY KEY, btree (history_id)
"btree_varchar" btree (history_type varchar_pattern_ops)
"vulns_historicalvuln_cve_id_850876bb" btree (cve_id)
"vulns_historicalvuln_cwe_id_2013d697" btree (cwe_id)
"vulns_historicalvuln_history_user_id_9e25ebf5" btree (history_user_id)
"vulns_historicalvuln_id_773f2af7" btree (id)
--- TRUNCATE
Foreign-key constraints:
"vulns_historicalvuln_history_user_id_9e25ebf5_fk_custusers" FOREIGN KEY (history_user_id) REFERENCES custusers_user(id) DEFERRABLE INITIALLY DEFERRED
Example of queries:
SELECT * FROM vulns_historicalvuln WHERE history_type <> '+' order by id desc fetch first 10000 rows only; -> 30s without cache
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..31878.33 rows=10000 width=1736) (actual time=0.173..32839.474 rows=10000 loops=1)
-> Index Scan Backward using vulns_historicalvuln_id_773f2af7 on vulns_historicalvuln (cost=0.43..26346955.92 rows=8264960 width=1736) (actual time=0.172..32830.958 rows=10000 loops=1)
Filter: ((history_type)::text <> '+'::text)
Rows Removed by Filter: 296
Planning time: 19.514 ms
Execution time: 32845.015 ms
SELECT DISTINCT "vulns"."id", "vulns"."uuid", "vulns"."feedid", "vulns"."cve_id", "vulns"."cveid", "vulns"."summary", "vulns"."published", "vulns"."modified", "vulns"."assigner", "vulns"."cwe_id", "vulns"."vulnerable_packages_versions", "vulns"."vulnerable_products", "vulns"."vulnerable_product_versions", "vulns"."cvss", "vulns"."cvss_time", "vulns"."cvss_version", "vulns"."cvss_vector", "vulns"."cvss_metrics", "vulns"."access", "vulns"."impact", "vulns"."cvss3", "vulns"."cvss3_vector", "vulns"."cvss3_version", "vulns"."cvss3_metrics", "vulns"."score", "vulns"."is_exploitable", "vulns"."is_confirmed", "vulns"."is_in_the_news", "vulns"."is_in_the_wild", "vulns"."reflinks", "vulns"."reflinkids", "vulns"."created_at", "vulns"."updated_at", "vulns"."id" AS "exploit_count", false AS "monitored", '42' AS "org" FROM "vulns" WHERE ("vulns"."score" >= 0 AND "vulns"."score" <= 100) ORDER BY "vulns"."updated_at" DESC LIMIT 10
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=315191.32..315192.17 rows=10 width=1691) (actual time=3013.964..3013.990 rows=10 loops=1)
-> Unique (cost=315191.32..329642.42 rows=170013 width=1691) (actual time=3013.962..3013.986 rows=10 loops=1)
-> Sort (cost=315191.32..315616.35 rows=170013 width=1691) (actual time=3013.961..3013.970 rows=10 loops=1)
Sort Key: updated_at DESC, id, uuid, feedid, cve_id, cveid, summary, published, modified, assigner, cwe_id, vulnerable_packages_versions, vulnerable_products, vulnerable_product_versions, cvss, cvss_time, cvss_version, cvss_vector, cvss_metrics, access, impact, cvss3, cvss3_vector, cvss3_version, cvss3_metrics, score, is_exploitable, is_confirmed, is_in_the_news, is_in_the_wild, reflinks, reflinkids, created_at
Sort Method: external merge Disk: 277648kB
-> Seq Scan on vulns (cost=0.00..50542.19 rows=170013 width=1691) (actual time=0.044..836.597 rows=169846 loops=1)
Filter: ((score >= 0) AND (score <= 100))
Planning time: 3.183 ms
Execution time: 3070.346 ms
I have created a btree varchar index btree_varchar" btree (history_type varchar_pattern_ops) like this:
CREATE INDEX CONCURRENTLY btree_varchar ON vulns_historicalvuln (history_type varchar_pattern_ops);
I have also created a index for vulns score for my second queries:
CREATE INDEX CONCURRENTLY ON vulns (score);
I read a lot of post and documentation about slow queries and index. I'am sure it's the solution about slow queries but the query plan of Postgresql doesn't use the index I have created. It estimates that it processes faster with seq scan than using the index...
SELECT relname, indexrelname, idx_scan FROM pg_catalog.pg_stat_user_indexes;
relname | indexrelname | idx_scan
-------------------------------------+-----------------------------------------------------------------+------------
vulns_historicalvuln | btree_varchar | 0
Could you tell me if my index is well designed ? How I can debug this, feel free to ask more information if needed.
Thanks
After some research, I understand that index is not the solution of my problem here.
Low_cardinality (repeated value) of this field make the index useless.
The time of the query postgresql here is normal because of 30M rows matched.
I close this issue because there is no problem with index here.

Postgresql - not using parallelism

I am executing a select query using full outer join across 2 tables which are in 2 different databases. I'm using Postgresql 9.6.
The query is not going with parallelism even if we set the below parameters:
work_mem=256MB,
max_worker_process=40,
force_parallel_mode=on,
max_parallel_workers_per_gather=4,
parallel_tuple_cost=0.1,
parallel_setup_cost=1000,
min_parallel_relation_size=8MB
This is the query:
SELECT mea.ocs_cdr_type,
mea.ocs_time_stamp,
mea.sum_ocs_call_cost,
ctr.ctr_name
FROM mea_req_54 mea
FULL OUTER JOIN country ctr ON mea.ocs_imei = ctr.ctr_name;
This is the definition of mea_req_54:
Table "public.mea_req_54"
Column | Type | Modifiers
----------------------------+-----------------------------+-----------
mer_id | numeric(19,0) | not null
mer_from_dttm | timestamp without time zone | not null
mer_to_dttm | timestamp without time zone | not null
fng_id | numeric(19,0) |
ocs_imsi_number_norm | character varying(255) |
ocs_account_number | character varying(255) |
ocs_charging_id | character varying(255) |
ocs_cdr_type | character varying(255) |
ocs_bit_description | character varying(255) |
ocs_time_stamp_raw | timestamp without time zone |
ocs_time_stamp | timestamp without time zone |
ocs_duration | numeric(10,0) |
ocs_duration_str | character varying(255) |
ocs_upload_volume | numeric(19,0) |
ocs_download_volume | numeric(19,0) |
sum_ocs_total_volume | numeric(19,0) |
sum_ocs_call_cost | numeric(19,0) |
ocs_plmn_identifier | character varying(255) |
ocs_imei | character varying(255) |
ocs_user_loc_info | character varying(255) |
ocs_bp_id | character varying(255) |
ocs_ref_spec_from_contract | character varying(255) |
ocs_subapp_in_contract_acc | character varying(255) |
ocs_baseline_date_bill | timestamp without time zone |
ocs_target_date_bill | timestamp without time zone |
ocs_date_of_origin_bill | timestamp without time zone |
ctr_id | numeric(10,0) |
ctr_iso_cd | character varying(255) |
ctr_name | character varying(255) |
dblink_run | numeric(10,0) |
Indexes:
"mea_req_54_pk" UNIQUE, btree (mer_id)
This is the definition of country:
Table "public.country"
Column | Type | Modifiers
---------------------+------------------------+-----------
ctr_id | numeric(10,0) | not null
ctr_iso_cd | character varying(255) | not null
ctr_name | character varying(255) | not null
system_generated_fl | character(1) |
ctr_delete_fl | character(1) | not null
ctr_dial_code | character varying(255) | not null
ctr_version_id | numeric(10,0) | not null
ptn_id | numeric(10,0) | not null
Indexes:
"country_ak" UNIQUE, btree (ctr_name)
"country_pk" UNIQUE, btree (ctr_id)
"country_ss1" UNIQUE, btree (ctr_iso_cd)
This is the execution plan:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Hash Full Join (cost=482.50..14564568.50 rows=300000000 width=29) (actual time=8.810..305863.949 rows=300015000 loops=1)
Hash Cond: ((mea.ocs_imei)::text = (ctr.ctr_name)::text)
-> Seq Scan on mea_req_54 mea (cost=0.00..10439086.00 rows=300000000 width=19) (actual time=0.005..131927.791 rows=300000000 loops=1)
-> Hash (cost=295.00..295.00 rows=15000 width=13) (actual time=8.784..8.784 rows=15000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 791kB
-> Seq Scan on country ctr (cost=0.00..295.00 rows=15000 width=13) (actual time=0.008..4.138 rows=15000 loops=1)
Planning time: 0.085 ms
Execution time: 355065.791 ms
(8 rows)
The documentation is silent about that, but in backend/optimizer/path/joinpath.c, function hash_inner_and_outer, I find the following enlightening comment:
/*
* If the joinrel is parallel-safe, we may be able to consider a
* partial hash join. However, we can't handle JOIN_UNIQUE_OUTER,
* because the outer path will be partial, and therefore we won't be
* able to properly guarantee uniqueness. Similarly, we can't handle
* JOIN_FULL and JOIN_RIGHT, because they can produce false null
* extended rows. Also, the resulting path must not be parameterized.
*/
This makes sense – a parallel worker that scans part of mea_req_54 has no way to know if there is a row in country that does not match any of the rows in mea_req_54.
Now nested loop joins cannot be used for full outer joins, so all that remains is a parallel merge join.
I can't say if a merge join is an option here, but you may try and create an index on mea_req_54(ocs_imei) and see if that helps the optimizer choose a parallel plan.
Otherwise, you are probably out of luck.
Post reducing the parallel_tuple_cost and parallel_setup_cost parameters from the default values, the query ran with parallelism.
But I wanted to know about these parameters what exactly they are ?

Large amount of data query

I have a partition table, now about 200 multiple partitions, each partition table inside about 12 million data. Now select very slow, table index of the corresponding field has been established, but is still very slow, I see execution plan, found a large number of from the disk to read data, to me, I changed how to adjust and optimize it
gjdd4=# \d t_bus_position_20160306_20160308
Table "public.t_bus_position_20160306_20160308"
Column | Type | Modifiers
------------------------+--------------------------------+--------------------
pos_uuid | character varying(20) | collate zh_CN.utf8
pos_line_uuid | character varying(20) |
pos_line_type | character varying(20) |
pos_bus_uuid | character varying(20) | collate zh_CN.utf8
pos_dev_uuid | character varying(20) |
pos_sta_uuid | character varying(20) |
pos_drv_ic_card | character varying(30) |
pos_lng | character varying(30) |
pos_lat | character varying(30) |
pos_bus_speed | character varying(20) |
pos_real_time_status | character varying(20) |
pos_gather_time | timestamp(6) without time zone |
pos_storage_time | timestamp(6) without time zone |
pos_is_offset | boolean |
pos_is_overspeed | character varying(1) |
pos_cursor_over_ground | character varying(20) |
pos_all_alarms | character varying(30) |
pos_is_in_station | character varying(1) |
pos_closed_alarms | character varying(30) |
pos_dis_to_pre_i | integer |
pos_odometer_i | bigint |
pos_relative_location | real |
pos_dis_to_pre | real |
pos_odometer | double precision |
pos_gather_time1 | bigint |
Indexes:
"idx_multi" btree (pos_bus_uuid, pos_gather_time DESC)
"idx_trgm" btree (replace(to_char(pos_gather_time, 'YYYYMMDDHH24'::text), ' '::text, ''::text))
"idx_trgm1" btree (to_char(pos_gather_time, 'YYYYMMDD'::text))
"tp_20160306_20160308_pos_dev_uuid_idx" btree (pos_dev_uuid)
Check constraints:
"t_bus_position_20160306_20160308_pos_gather_time_check" CHECK (pos_gather_time >= '2016-03-06 00:00:00'::timestamp without time zone AND
pos_gather_time < '2016-03-09 00:00:00'::timestamp without time zone)
The plan is like this.
gjdd4=# explain(costs,buffers,timing,analyze) select pos_bus_uuid from test2 group by pos_bus_uuid;
HashAggregate (cost=802989.75..802993.00 rows=325 width=21) (actual time=42721.528..42721.679 rows=354 loops=1)
Group Key: pos_bus_uuid
Buffers: shared hit=3560 read=567491
I/O Timings: read=20231.511
-> Seq Scan on test2 (cost=0.00..756602.00 rows=18555100 width=21) (actual time=0.067..27749.533 rows=18555100 loops=1)
Buffers: shared hit=3560 read=567491
I/O Timings: read=20231.511
Planning time: 0.116 ms
Execution time: 42721.839 ms
(9 rows)
Time: 42722.629 ms
Your query does not do any real aggregation but merely distinct. If this is what you really want (all distinct pos_bus_uuid values) than you can use a technique called loose index scan:
Here the tailored query assuming pos_bus_uuid has a not null constraint:
WITH RECURSIVE t AS (
(SELECT pos_bus_uuid FROM test2 ORDER BY pos_bus_uuid LIMIT 1) -- parentheses required
UNION ALL
SELECT (SELECT pos_bus_uuid FROM test2
WHERE pos_bus_uuid > t.pos_bus_uuid ORDER BY pos_bus_uuid LIMIT 1)
FROM t
WHERE t.pos_bus_uuid IS NOT NULL
)
SELECT pos_bus_uuid FROM t WHERE pos_bus_uuid IS NOT NULL;
Your index pos_bus_uuid should be good enough for this query.
Markus Winand's answer is correct - you need to copy the full sql, syntax error is due to your not including last line - 'SELECT pos_bus_uuid FROM t WHERE pos_bus_uuid IS NOT NULL;'
Would have added this as a comment, but reputation too low to comment.

Postgresql very slow query on indexed column

I have table with 50 mln rows. One column named u_sphinx is very important available values are 1,2,3. Now all rows have value 3 but, when i checking for new rows (u_sphinx = 1) the query is very slow. What could be wrong ? Maybe index is broken ? Server: Debian, 8GB 4x Intel(R) Xeon(R) CPU E3-1220 V2 # 3.10GHz
Table structure:
base=> \d u_user
Table "public.u_user"
Column | Type | Modifiers
u_ip | character varying |
u_agent | text |
u_agent_js | text |
u_resolution_id | integer |
u_os | character varying |
u_os_id | smallint |
u_platform | character varying |
u_language | character varying |
u_language_id | smallint |
u_language_js | character varying |
u_cookie | smallint |
u_java | smallint |
u_color_depth | integer |
u_flash | character varying |
u_charset | character varying |
u_doctype | character varying |
u_compat_mode | character varying |
u_sex | character varying |
u_age | character varying |
u_theme | character varying |
u_behave | character varying |
u_targeting | character varying |
u_resolution | character varying |
u_user_hash | bigint |
u_tech_hash | character varying |
u_last_target_data_time | integer |
u_last_target_prof_time | integer |
u_id | bigint | not null default nextval('u_user_u_id_seq'::regclass)
u_sphinx | smallint | not null default 1::smallint
Indexes:
"u_user_u_id_pk" PRIMARY KEY, btree (u_id)
"u_user_hash_index" btree (u_user_hash)
"u_user_u_sphinx_ind" btree (u_sphinx)
Slow query:
base=> explain analyze SELECT u_id FROM u_user WHERE u_sphinx = 1 LIMIT 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.15 rows=1 width=8) (actual time=485146.252..485146.252 rows=0 loops=1)
-> Seq Scan on u_user (cost=0.00..3023707.80 rows=19848860 width=8) (actual time=485146.249..485146.249 rows=0 loops=1)
Filter: (u_sphinx = 1)
Rows Removed by Filter: 23170476
Total runtime: 485160.241 ms
(5 rows)
Solved:
After adding partial index
base=> explain analyze SELECT u_id FROM u_user WHERE u_sphinx = 1 LIMIT 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.27..4.28 rows=1 width=8) (actual time=0.063..0.063 rows=0 loops=1)
-> Index Scan using u_user_u_sphinx_index_1 on u_user (cost=0.27..4.28 rows=1 width=8) (actual time=0.061..0.061 rows=0 loops=1)
Index Cond: (u_sphinx = 1)
Total runtime: 0.106 ms
Thx for #Kouber Saparev
Try making a partial index.
CREATE INDEX u_user_u_sphinx_idx ON u_user (u_sphinx) WHERE u_sphinx = 1;
Your query plan looks like the DB is treating the query as if 1 was so common in the DB that it'll be better off digging into a disk page or two in order to identify a relevant row, instead of adding the overhead of plowing through an index and finding a row in a random disk page.
This could be an indication that you forgot to run to analyze the table so the planner has proper stats:
analyze u_user

postgresql SELECT DISTINCT ON running very slow

This query ran instantly:
mydb=# SELECT reports.* FROM reports WHERE reports.id = 9988 ORDER BY time DESC LIMIT 1;
This query took 33 seconds to run (and I only selected report with unit_id 9988 here. I will potentially have hundreds, if not thousands.):
(UPDATE: This is results of using EXPLAIN ANALYZIE):
mydb=# EXPLAIN ANALYZE SELECT DISTINCT ON (unit_id) r.* FROM reports r WHERE r.unit_id IN (3007, 3011, 6193) ORDER BY unit_id, time DESC;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=1377569.23..1381106.10 rows=11 width=155) (actual time=97175.381..97710.369 rows=3 loops=1)
-> Sort (cost=1377569.23..1379337.66 rows=707375 width=155) (actual time=97175.379..97616.039 rows=764509 loops=1)
Sort Key: unit_id, "time"
Sort Method: external merge Disk: 92336kB
-> Bitmap Heap Scan on reports r (cost=20224.85..1142005.76 rows=707375 width=155) (actual time=12396.930..94097.890 rows=764509 loops=1)
Recheck Cond: (unit_id = ANY ('{3007,3011,6193}'::integer[]))
-> Bitmap Index Scan on index_reports_on_unit_id (cost=0.00..20048.01 rows=707375 width=0) (actual time=12382.176..12382.176 rows=764700 loops=1)
Index Cond: (unit_id = ANY ('{3007,3011,6193}'::integer[]))
Total runtime: 97982.363 ms
(9 rows)
The schema of reports table is as follows:
mydb=# \d+ reports
Table "public.reports"
Column | Type | Modifiers | Storage | Description
----------------+-----------------------------+------------------------------------------------------+----------+-------------
id | integer | not null default nextval('reports_id_seq'::regclass) | plain |
unit_id | integer | not null | plain |
time_secs | integer | not null | plain |
time | timestamp without time zone | | plain |
latitude | numeric(15,10) | not null | main |
longitude | numeric(15,10) | not null | main |
speed | integer | | plain |
io | integer | | plain |
msg_type | integer | | plain |
msg_code | integer | | plain |
signal | integer | | plain |
cellid | integer | | plain |
lac | integer | | plain |
processed | boolean | default false | plain |
created_at | timestamp without time zone | | plain |
updated_at | timestamp without time zone | | plain |
street | character varying(255) | | extended |
county | character varying(255) | | extended |
state | character varying(255) | | extended |
postal_code | character varying(255) | | extended |
country | character varying(255) | | extended |
distance | numeric | | main |
gps_valid | boolean | default true | plain |
city | character varying(255) | | extended |
street_number | character varying(255) | | extended |
address_source | integer | | plain |
source | integer | default 0 | plain |
driver_id | integer | | plain |
Indexes:
"reports_pkey" PRIMARY KEY, btree (id)
"reports_uniqueness_index" UNIQUE, btree (unit_id, "time", latitude, longitude)
"index_reports_on_address_source" btree (address_source DESC)
"index_reports_on_driver_id" btree (driver_id)
"index_reports_on_time" btree ("time")
"index_reports_on_time_secs" btree (time_secs)
"index_reports_on_unit_id" btree (unit_id)
Foreign-key constraints:
"reports_driver_id_fkey" FOREIGN KEY (driver_id) REFERENCES drivers(id)
"reports_unit_id_fkey" FOREIGN KEY (unit_id) REFERENCES units(id)
Referenced by:
TABLE "alerts" CONSTRAINT "alerts_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports(id)
TABLE "pressure_transmitters" CONSTRAINT "pressure_transmitters_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports(id)
TABLE "thermoking" CONSTRAINT "thermoking_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports(id)
Has OIDs: no
Why is SELECT DISTINCT ON running so slow?
There is relative slow Bitmap Heap Scan with RECHECK - pls, try to increase a work_mem