Improving performance of query - postgresql

We have a PostgreSQL query with multiple tables and left outer joins, and is running very slow.
It is completing in 25-40s, so we want to optimize it more and want to decrease run time to 1-2 sec.
select a.campaignid, b.campaign_name , case when b.message_type_id = 1 then 'Promotional'
when b.message_type_id = 2 then 'Transactional'
else 'Other' end as Campaign_type, c.username , aggregator_type,
e.cli_manager_id as senderID,
b.schedule_time as campaign_schedule_date,
count(a.mobile) as campaign_submitted_count, count(case when a.status = 'DELIVRD' then mobile end) as Delivered,
count(a.mobile) as Total_count,
count(case when a.status = 'FAILED' then mobile end) as failure_count,
count(case when a.status = 'DND_check_failed' then mobile end) as DND_count,
sum(credits_used) as credits_used
from tbl_cdr_test a left outer join tbl_campaign b
on a.campaignid = b.tbl_campaign_id left outer join tbl_users_master c
on b.user_id =c.user_master_id
left outer join tbl_cli_manager e on b.user_id = e.user_id
left outer join tbl_user_channel f on b.user_id =f.user_id
left outer join tbl_user_configurations g on b.user_id = g.user_id
where date(insert_datetime) between '2020-05-23' and '2020-06-23'
and c.username = coalesce(null, c.username)
and g.msg_cat_id = coalesce(null, g.msg_cat_id)
and a.campaignid = coalesce(null, a.campaignid)
and e.cli_manager_id = coalesce(null, e.cli_manager_id)
group by a.campaignid, b.campaign_name , b.message_type_id,c.username , b.schedule_time,
aggregator_type, e.cli_manager_id;
We have create appropriate indexes as well, but still it is taking time.
Moreover there is "external merge disk" sorting method in execution plan whereas to resolve same I have set work_mem = 50MB. Still it is using disk sort instead of memory.Please suggest
Below is execution plan:
GroupAggregate (cost=4872.01..4872.07 rows=1 width=543) (actual time=20564.239..27415.264 rows=8 loops=1)
Group Key: a.campaignid, b.campaign_name, b.message_type_id, c.username, b.schedule_time, f.aggregator_type, e.cli_manager_id
-> Sort (cost=4872.01..4872.01 rows=1 width=483) (actual time=19627.424..25020.702 rows=3206196 loops=1)
Sort Key: a.campaignid, b.campaign_name, b.message_type_id, c.username, b.schedule_time, f.aggregator_type, e.cli_manager_id
Sort Method: external merge Disk: 281456kB
-> Nested Loop (cost=22.03..4872.00 rows=1 width=483) (actual time=99.704..12086.244 rows=3206196 loops=1)
Join Filter: (b.user_id = g.user_id)
-> Nested Loop Left Join (cost=21.89..4871.79 rows=1 width=495) (actual time=99.688..4518.533 rows=3206196 loops=1)
-> Nested Loop (cost=21.75..4871.54 rows=1 width=77) (actual time=99.664..935.689 rows=356244 loops=1)
-> Nested Loop (cost=21.33..31.57 rows=1 width=65) (actual time=0.295..2.376 rows=588 loops=1)
Join Filter: (b.user_id = c.user_master_id)
-> Merge Join (cost=21.18..30.22 rows=6 width=46) (actual time=0.246..0.663 rows=588 loops=1)
Merge Cond: (e.user_id = b.user_id)
-> Index Scan using "idx_FK_7hc6agd_tbl_cli_ma_1592228110_32" on tbl_cli_manager e (cost=0.42..6281.84 rows=762 width=12) (actual time=0.014..0.035 rows=5 loops=1)
Filter: (cli_manager_id = COALESCE(cli_manager_id))
-> Sort (cost=20.76..21.13 rows=147 width=34) (actual time=0.225..0.333 rows=585 loops=1)
Sort Key: b.user_id
Sort Method: quicksort Memory: 36kB
-> Seq Scan on tbl_campaign b (cost=0.00..15.47 rows=147 width=34) (actual time=0.013..0.154 rows=147 loops=1)
-> Index Scan using ind_user_master_c_user on tbl_users_master c (cost=0.14..0.21 rows=1 width=19) (actual time=0.002..0.002 rows=1 loops=588)
Index Cond: (user_master_id = e.user_id)
Filter: ((username)::text = (COALESCE(username))::text)
-> Append (cost=0.42..4839.94 rows=3 width=20) (actual time=0.546..1.426 rows=606 loops=588)
-> Index Scan using testh11_campaignid_idx on testh11 a (cost=0.42..4253.99 rows=2 width=20) (actual time=0.543..0.543 rows=0 loops=588)
Index Cond: (campaignid = b.tbl_campaign_id)
Filter: ((campaignid = COALESCE(campaignid)) AND (date(insert_datetime) >= '2020-05-23'::date) AND (date(insert_datetime) <= '2020-06-23'::date))
Rows Removed by Filter: 656
-> Index Scan using testh21_campaignid_idx on testh21 a_1 (cost=0.42..585.94 rows=1 width=20) (actual time=0.002..0.796 rows=606 loops=588)
Index Cond: (campaignid = b.tbl_campaign_id)
Filter: ((campaignid = COALESCE(campaignid)) AND (date(insert_datetime) >= '2020-05-23'::date) AND (date(insert_datetime) <= '2020-06-23'::date))
-> Index Scan using idx_user_id_tbl_user_c_1592227657_19 on tbl_user_channel f (cost=0.14..0.24 rows=1 width=422) (actual time=0.002..0.004 rows=9 loops=356244)
Index Cond: (user_id = b.user_id)
-> Index Scan using "idx_FK_6958qvy_tbl_user_c_1592228774_151" on tbl_user_configurations g (cost=0.14..0.20 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=3206196)
Index Cond: (user_id = e.user_id)
Filter: (msg_cat_id = COALESCE(msg_cat_id))
Planning Time: 6.561 ms
Execution Time: 27477.860 ms

There is a gross underestimate of the result rows for the index scan on testh21. The consequence is that PostgreSQL chooses nested loop joins, which is where your time is spent.
Try the following:
New statistics:
ANALYZE testh21;
If that improves the estimate, make sure that autoanalyze treats the table more often.
Prevent bad estimates caused by correlation:
CREATE STATISTICS testh21_stat (dependencies)
ON campaignid, insert_datetime FROM testh21;
ANALYZE testh21;
Perhaps there is a correlation between the columns, and that improves the estimate.
More detailed statistics: try raising default_statistics_target before ANALYZE of the table.
If you cannot improve the estimates, take the hammer and set enable_nestloop = off for the duration of the query.

Related

Subquery is very slow when add another column query

SELECT a1.object_id,
(SELECT COALESCE(json_agg(b1), '[]')
FROM (
SELECT c1.root_id,
(SELECT COALESCE(json_agg(d1), '[]')
FROM (
SELECT e1.root_id,
(SELECT COALESCE(json_agg(f1), '[]')
FROM (
SELECT g1.root_id,
(SELECT SUM(h2.amount)
FROM table_5 AS h1
LEFT OUTER JOIN table_6 AS h2 on h1.hash_id = h2.hash_id
WHERE g1.object_id = h1.root_id
AND h2.mode = 'Real'
AND h2.type = 'Reject'
) AS amount
FROM table_4 AS g1
LEFT OUTER JOIN table_5 AS g2
ON g1.general_id = g2.object_id
LEFT OUTER JOIN table_6 AS g3
ON g1.properties_hash_id = g3.hash_id
WHERE e1.object_id = g1.root_id
) AS f1) AS tickets
FROM table_3 AS e1
WHERE c1.object_id = e1.root_id
) AS d1) AS blocks
FROM table_2 AS c1
WHERE a1.object_id = c1.root_id
) AS b1) AS sources
FROM table_1 AS a1
With AND h2.type = 'Reject' line the query takes ~5 Seconds
...
SubPlan 1
-> Aggregate (cost=22.77..22.78 rows=1 width=32) (actual time=0.296..0.296 rows=1 loops=16502)
-> Nested Loop (cost=10.72..22.76 rows=1 width=6) (actual time=0.184..0.295 rows=1 loops=16502)
-> Index Scan using t041_type_idx on table_6 h2 (cost=0.15..8.17 rows=1 width=10) (actual time=0.001..0.007 rows=20 loops=16502)
Index Cond: (type = 'Reject'::valid_optimized_segment_types)
Filter: (mode = 'Real'::valid_optimized_segment_modes)
Rows Removed by Filter: 20
-> Bitmap Heap Scan on table_5 h1 (cost=10.57..14.58 rows=1 width=4) (actual time=0.012..0.012 rows=0 loops=330040)
Recheck Cond: ((g1.object_id = root_id) AND (hash_id = h2.hash_id))
Heap Blocks: exact=12379
-> BitmapAnd (cost=10.57..10.57 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=330040)
-> Bitmap Index Scan on t031_root_id_fkey (cost=0.00..4.32 rows=3 width=0) (actual time=0.001..0.001 rows=2 loops=330040)
Index Cond: (root_id = g1.object_id)
-> Bitmap Index Scan on t031_md5_id_idx (cost=0.00..6.00 rows=228 width=0) (actual time=0.010..0.010 rows=619 loops=330040)
Index Cond: (hash_id = h2.hash_id)
Planning Time: 0.894 ms
Execution Time: 4925.776 ms
Without AND h2.type = 'Reject' line the query takes ~770ms
SubPlan 1
-> Aggregate (cost=25.77..25.78 rows=1 width=32) (actual time=0.045..0.046 rows=1 loops=16502)
-> Hash Join (cost=15.32..25.77 rows=2 width=6) (actual time=0.019..0.041 rows=1 loops=16502)
Hash Cond: (h2.hash_id = h1.hash_id)
-> Seq Scan on table_6 h2 (cost=0.00..8.56 rows=298 width=10) (actual time=0.001..0.026 rows=285 loops=16502)
Filter: (mode = 'Real'::valid_optimized_segment_modes)
Rows Removed by Filter: 80
-> Hash (cost=15.28..15.28 rows=3 width=4) (actual time=0.002..0.002 rows=2 loops=16502)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Scan using t031_root_id_fkey on table_5 h1 (cost=0.29..15.28 rows=3 width=4) (actual time=0.001..0.002 rows=2 loops=16502)
Index Cond: (root_id = g1.object_id)
Planning Time: 0.889 ms
Execution Time: 787.264 ms
Why is this single line causing so much differences (h2.type is
BTREE indexed)?
How else can I achieve the same result with better efficiency?
-> Index Scan using t041_type_idx on table_6 h2 (cost=0.15..8.17 rows=1 width=10) (actual time=0.001..0.007 rows=20 loops=16502)
It thinks there will be 1 row, and there are 20. Meaning the next node is executed 20 times more than it thought it would be. If it knew here would be 20, it might have chosen a different plan.
You could try to create cross-column statistics so it can get a better estimate for that row count.
Or you could just speed up the next node, by adding the multi column index on table_5 (root_id,hash_id). It would still get executed far more times than the planner thinks it will, but each execution will be faster.
Or you might just force it into a different plan by making one of those indexes unusable:
...JOIN table_6 AS h2 on h1.hash_id+0 = h2.hash_id

Analyze: Why a query taking could take so long, seems costs are low?

I am having these results for analyze for a simple query that does not return more than 150 records from tables less than 200 records most of them, as I have a table that stores latest value and the other fields are FK of the data.
Update: see the new results from same query some our later. The site is not public and/or there should be not users right now as it is in development.
explain analyze
SELECT lv.station_id,
s.name AS station_name,
s.latitude,
s.longitude,
s.elevation,
lv.element_id,
e.symbol AS element_symbol,
u.symbol,
e.name AS element_name,
lv.last_datetime AS datetime,
lv.last_value AS valor,
s.basin_id,
s.municipality_id
FROM (((element_station lv /*350 records*/
JOIN stations s ON ((lv.station_id = s.id))) /*40 records*/
JOIN elements e ON ((lv.element_id = e.id))) /*103 records*/
JOIN units u ON ((e.unit_id = u.id))) /* 32 records */
WHERE s.id = lv.station_id AND e.id = lv.element_id AND lv.interval_id = 6 and
lv.last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)
I have already tried VACUUM and after that some is saved, but again after some times it goes up. I have implemented an index on the fields.
Nested Loop (cost=0.29..2654.66 rows=1 width=92) (actual time=1219.390..35296.253 rows=157 loops=1)
Join Filter: (e.unit_id = u.id)
Rows Removed by Join Filter: 4867
-> Nested Loop (cost=0.29..2652.93 rows=1 width=92) (actual time=1219.383..35294.083 rows=157 loops=1)
Join Filter: (lv.element_id = e.id)
Rows Removed by Join Filter: 16014
-> Nested Loop (cost=0.29..2648.62 rows=1 width=61) (actual time=1219.301..35132.373 rows=157 loops=1)
-> Seq Scan on element_station lv (cost=0.00..2640.30 rows=1 width=20) (actual time=1219.248..1385.517 rows=157 loops=1)
Filter: ((interval_id = 6) AND (last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)))
Rows Removed by Filter: 168
-> Index Scan using stations_pkey on stations s (cost=0.29..8.31 rows=1 width=45) (actual time=3.471..214.941 rows=1 loops=157)
Index Cond: (id = lv.station_id)
-> Seq Scan on elements e (cost=0.00..3.03 rows=103 width=35) (actual time=0.003..0.999 rows=103 loops=157)
-> Seq Scan on units u (cost=0.00..1.32 rows=32 width=8) (actual time=0.002..0.005 rows=32 loops=157)
Planning time: 8.312 ms
Execution time: 35296.427 ms
update, same query running it tonight; no changes:
Sort (cost=601.74..601.88 rows=55 width=92) (actual time=1.822..1.841 rows=172 loops=1)
Sort Key: lv.last_datetime DESC
Sort Method: quicksort Memory: 52kB
-> Nested Loop (cost=11.60..600.15 rows=55 width=92) (actual time=0.287..1.680 rows=172 loops=1)
-> Hash Join (cost=11.31..248.15 rows=55 width=51) (actual time=0.263..0.616 rows=172 loops=1)
Hash Cond: (e.unit_id = u.id)
-> Hash Join (cost=9.59..245.60 rows=75 width=51) (actual time=0.225..0.528 rows=172 loops=1)
Hash Cond: (lv.element_id = e.id)
-> Bitmap Heap Scan on element_station lv (cost=5.27..240.25 rows=75 width=20) (actual time=0.150..0.359 rows=172 loops=1)
Recheck Cond: ((last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)) AND (interval_id = 6))
Heap Blocks: exact=22
-> Bitmap Index Scan on element_station_latest (cost=0.00..5.25 rows=75 width=0) (actual time=0.136..0.136 rows=226 loops=1)
Index Cond: ((last_datetime >= ((now() - '06:00:00'::interval) - '01:00:00'::interval)) AND (interval_id = 6))
-> Hash (cost=3.03..3.03 rows=103 width=35) (actual time=0.062..0.062 rows=103 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 15kB
-> Seq Scan on elements e (cost=0.00..3.03 rows=103 width=35) (actual time=0.006..0.031 rows=103 loops=1)
-> Hash (cost=1.32..1.32 rows=32 width=8) (actual time=0.019..0.019 rows=32 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Seq Scan on units u (cost=0.00..1.32 rows=32 width=8) (actual time=0.003..0.005 rows=32 loops=1)
-> Index Scan using stations_pkey on stations s (cost=0.29..6.39 rows=1 width=45) (actual time=0.005..0.006 rows=1 loops=172)
Index Cond: (id = lv.station_id)
Planning time: 2.390 ms
Execution time: 2.009 ms
The problem is the misestimate of the number of rows in the sequential scan on element_station. Either autoanalyze has kicked in and calculated new statistics for the table or the data changed.
The problem is probably that PostgreSQL doesn't know the result of
((now() - '06:00:00'::interval) - '01:00:00'::interval)
at query planning time.
If that is possible for you, do it in two steps: First, calculate the expression above (either in PostgreSQL or on the client side). Then run the query with the result as a constant. That will make it easier for PostgreSQL to estimate the result count.

Optimizing PostgreSQL Query

I need help optimizing this PostgreSQL Query. I've already created some indexes but the only index being used is "cp_campaign_task_ap_index".
EXPLAIN ANALYZE SELECT * FROM campaign_phones
INNER JOIN campaigns ON campaigns.id = campaign_phones.campaign_id
INNER JOIN campaign_ports ON campaign_ports.campaign_id = campaigns.id
WHERE campaign_phones.task_id IS NULL
AND campaign_phones.assigned_port = 2
AND (campaigns.auto_start IS TRUE)
AND (campaigns.starts_at_date::date <= '2018-07-08'
AND campaigns.ends_at_date::date >= '2018-07-08')
AND (campaign_ports.gateway_port_id = 611)
AND (campaign_ports.op_mode != 1)
ORDER BY campaigns.last_sent_at ASC NULLS FIRST LIMIT 1;`
The output of the command is:
Limit (cost=26031.86..26031.87 rows=1 width=475) (actual time=2335.421..2335.421 rows=1 loops=1)
-> Sort (cost=26031.86..26047.26 rows=6158 width=475) (actual time=2335.419..2335.419 rows=1 loops=1)
Sort Key: campaigns.last_sent_at NULLS FIRST
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=136.10..26001.07 rows=6158 width=475) (actual time=1.176..1510.276 rows=36666 loops=1)
Join Filter: (campaigns.id = campaign_phones.campaign_id)
-> Nested Loop (cost=0.00..28.28 rows=2 width=218) (actual time=0.163..0.435 rows=4 loops=1)
Join Filter: (campaigns.id = campaign_ports.campaign_id)
Rows Removed by Join Filter: 113
-> Seq Scan on campaign_ports (cost=0.00..21.48 rows=9 width=55) (actual time=0.017..0.318 rows=9 loops=1)
Filter: ((op_mode <> 1) AND (gateway_port_id = 611))
Rows Removed by Filter: 823
-> Materialize (cost=0.00..5.74 rows=8 width=163) (actual time=0.001..0.008 rows=13 loops=9)
-> Seq Scan on campaigns (cost=0.00..5.70 rows=8 width=163) (actual time=0.011..0.050 rows=13 loops=1)
Filter: ((auto_start IS TRUE) AND ((starts_at_date)::date <= '2018-07-08'::date) AND ((ends_at_date)::date >= '2018-07-08'::date))
Rows Removed by Filter: 22
-> Bitmap Heap Scan on campaign_phones (cost=136.10..12931.82 rows=4366 width=249) (actual time=43.079..302.895 rows=9166 loops=4)
Recheck Cond: ((campaign_id = campaign_ports.campaign_id) AND (task_id IS NULL) AND (assigned_port = 2))
Heap Blocks: exact=6686
-> Bitmap Index Scan on cp_campaign_task_ap_index (cost=0.00..135.01 rows=4366 width=0) (actual time=8.884..8.884 rows=9167 loops=4)
Index Cond: ((campaign_id = campaign_ports.campaign_id) AND (task_id IS NULL) AND (assigned_port = 2))
Planning time: 1.115 ms
Execution time: 2335.563 ms
The "campaign_phones" relation could have many rows, perhaps a million.
I don't know where to start optimizing, perhaps creating indexes or changing query structure.
Thanks.

Postgres Query not selecting index for a column with OR condition

I have a query where the Postgres is performing a Hash join with sequence scan instead of an Index join with Nested loop, when I use an OR condition. This is causing the query to take 2 seconds instead of completing in < 100ms. I have run VACUUM ANALYZE and have rebuilt the index on the PATIENTCHARTNOTE table (which is about 5GB) but its still using hash join. Do you have any suggestions on how I can improve this?
explain analyze
SELECT Count (_pcn.id) AS total_open_note
FROM patientchartnote _pcn
INNER JOIN appointment _appt
ON _appt.id = _pcn.appointment_id
INNER JOIN patient _pt
ON _pt.id = _appt.patient_id
LEFT OUTER JOIN person _ps
ON _ps.id = _pt.appuser_id
WHERE _pcn.active = true
AND _pt.active = true
AND _appt.datecomplete IS NULL
AND _pcn.title IS NOT NULL
AND _pcn.title <> ''
AND ( _pt.assigned_to_user_id = '136964'
OR _pcn.createdby_id = '136964'
);
Aggregate (cost=237655.59..237655.60 rows=1 width=8) (actual time=1602.069..1602.069 rows=1 loops=1)
-> Hash Join (cost=83095.43..237645.30 rows=4117 width=4) (actual time=944.850..1602.014 rows=241 loops=1)
Hash Cond: (_appt.patient_id = _pt.id)
Join Filter: ((_pt.assigned_to_user_id = 136964) OR (_pcn.createdby_id = 136964))
Rows Removed by Join Filter: 94036
-> Hash Join (cost=46650.68..182243.64 rows=556034 width=12) (actual time=415.862..1163.812 rows=94457 loops=1)
Hash Cond: (_pcn.appointment_id = _appt.id)
-> Seq Scan on patientchartnote _pcn (cost=0.00..112794.20 rows=1073978 width=12) (actual time=0.016..423.262 rows=1
073618 loops=1)
Filter: (active AND (title IS NOT NULL) AND ((title)::text <> ''::text))
Rows Removed by Filter: 22488
-> Hash (cost=35223.61..35223.61 rows=696486 width=8) (actual time=414.749..414.749 rows=692839 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 2732kB
-> Seq Scan on appointment _appt (cost=0.00..35223.61 rows=696486 width=8) (actual time=0.010..271.208 rows=69
2839 loops=1)
Filter: (datecomplete IS NULL)
Rows Removed by Filter: 652426
-> Hash (cost=24698.57..24698.57 rows=675694 width=12) (actual time=351.566..351.566 rows=674929 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 2737kB
-> Seq Scan on patient _pt (cost=0.00..24698.57 rows=675694 width=12) (actual time=0.013..197.268 rows=674929 loops=
1)
Filter: active
Rows Removed by Filter: 17426
Planning time: 1.533 ms
Execution time: 1602.715 ms
When I replace "OR _pcn.createdby_id = '136964'" with "AND _pcn.createdby_id = '136964'", Postgres performs an index scan
Aggregate (cost=29167.56..29167.57 rows=1 width=8) (actual time=937.743..937.743 rows=1 loops=1)
-> Nested Loop (cost=1.28..29167.55 rows=7 width=4) (actual time=19.136..937.669 rows=37 loops=1)
-> Nested Loop (cost=0.85..27393.03 rows=1654 width=4) (actual time=2.154..910.250 rows=1649 loops=1)
-> Index Scan using patient_activeassigned_idx on patient _pt (cost=0.42..3075.00 rows=1644 width=8) (actual time=1.
599..11.820 rows=1627 loops=1)
Index Cond: ((active = true) AND (assigned_to_user_id = 136964))
Filter: active
-> Index Scan using appointment_datepatient_idx on appointment _appt (cost=0.43..14.75 rows=4 width=8) (actual time=
0.543..0.550 rows=1 loops=1627)
Index Cond: ((patient_id = _pt.id) AND (datecomplete IS NULL))
-> Index Scan using patientchartnote_activeappointment_idx on patientchartnote _pcn (cost=0.43..1.06 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1649)
Index Cond: ((active = true) AND (createdby_id = 136964) AND (appointment_id = _appt.id) AND (title IS NOT NULL))
Filter: (active AND ((title)::text <> ''::text))
Planning time: 1.489 ms
Execution time: 937.910 ms
(13 rows)
Using OR in SQL queries usually results in bad performance.
That is because – different from AND – it does not restrict, but extend the number of rows in the query result. With AND, you can use an index scan for one part of the condition and further restrict the result set with a filter on the second condition. That is not possible with OR.
So PostgreSQL does the only thing left: it computes the whole join and then filters out all rows that do not match the condition. Of course that is very inefficient when you are joining three tables (I didn't count the outer join).
Assuming that all columns called id are primary keys, you could rewrite the query as follows:
SELECT count(*) FROM
(SELECT _pcn.id
FROM patientchartnote _pcn
INNER JOIN appointment _appt
ON _appt.id = _pcn.appointment_id
INNER JOIN patient _pt
ON _pt.id = _appt.patient_id
LEFT OUTER JOIN person _ps
ON _ps.id = _pt.appuser_id
WHERE _pcn.active = true
AND _pt.active = true
AND _appt.datecomplete IS NULL
AND _pcn.title IS NOT NULL
AND _pcn.title <> ''
AND _pt.assigned_to_user_id = '136964'
UNION
SELECT _pcn.id
FROM patientchartnote _pcn
INNER JOIN appointment _appt
ON _appt.id = _pcn.appointment_id
INNER JOIN patient _pt
ON _pt.id = _appt.patient_id
LEFT OUTER JOIN person _ps
ON _ps.id = _pt.appuser_id
WHERE _pcn.active = true
AND _pt.active = true
AND _appt.datecomplete IS NULL
AND _pcn.title IS NOT NULL
AND _pcn.title <> ''
AND _pcn.createdby_id = '136964'
) q;
Even though this is running the query twice, indexes can be used to filter out all but a few rows early on, so this query should perform better.

How To speed up postgresql self join query

I am using join on the same table to get 50 rows at a time, but it takes 20 secs to get back 50 rows.
Select Distinct ON (S1.service) S1.account,S1.stid,S1.receiver,S1.identifier,
S1.binfo,S2.dlr_mask,S2.time,S1.msgdata,S1.msg_cost
from sql_sent_sms S1
left join sql_sent_sms S2
on S1.service = S2.service
where S2.time Between :senttime and :endtime
and S1.userid=:userid
Order By S1.service");
I have created index on time , service ,userid parameter.
But it didnt helped much.
what else should i do to speed up .
This is the result of analysing the query
EXPLAIN ANALYZE Select Distinct ON (S1.service) S1.account,S1.stid,S1.receiver,S1.identifier,S1.binfo,S2.dlr_mask,S2.time,S1.msgdata,S1.msg_cost
from sql_sent_sms S1
left join sql_sent_sms S2
on S1.service = S2.service
where S2.time Between '1459759193' and '1459849193'
and S1.userid='10412144'
Order By S1.service , S2.dlr_mask asc limit 50;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=56635220.85..56938614.70 rows=50 width=308) (actual time=1.177..1.177 rows=0 loops=1)
-> Unique (cost=56635220.85..57035700.74 rows=66 width=308) (actual time=1.176..1.176 rows=0 loops=1)
-> Sort (cost=56635220.85..56835460.79 rows=80095977 width=308) (actual time=1.176..1.176 rows=0 loops=1)
Sort Key: s1.service, s2.dlr_mask
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=1.24..127912.78 rows=80095977 width=308) (actual time=1.166..1.166 rows=0 loops=1)
-> Index Scan using idx_sentmsgbydate on sql_sent_sms s1 (cost=0.43..21247.62 rows=5765 width=292) (actual time=0.035..0.091 rows=34 loops=1)
Index Cond: ((userid)::text = '10412144'::text)
-> Index Only Scan using idx_sentmsgandstauts on sql_sent_sms s2 (cost=0.81..18.45 rows=5 width=126) (actual time=0.030..0.030 rows=0 loops=34)
Index Cond: ((service = (s1.service)::text) AND ("time" >= 1459759193::bigint) AND ("time" <= 1459849193::bigint))
Heap Fetches: 0
Planning time: 2.471 ms
Execution time: 1.269 ms
(13 rows)