I have 2 tables, t1 and t2, each with a geography type column called pts_geog, and each with a column id which is a unit identifier. I want to add a column to t1 which counts how many units from t2 are within a distance of 1000m to the any given point in t1. Both tables reasonably large, with each about 150000 rows. To compute the distance of each point in t1 to each point in t2 however results in a very expensive operation, so I am looking for some guidance as to what I'm doing has any hope. I have never been able to complete this operation because out of memory. I could split the operation somehow (with a where along another dimension of t1), but I need more help. Here is the select that I would like to use:
select
count(nullif(
ST_DWithin(
g1.pts_geog,
g2.gts_geog,
1000,
false),
false)) as close_1000
from
t1 as g1,
t2 as g2
where
g1.pts_geog IS NOT NULL
and
g2.pts_geog IS NOT NULL
GROUP BY g1.id
suggested answer and EXPLAIN:
airbnb=> EXPLAIN ANALYZE
airbnb-> SELECT t1.listing_id, count(*)
airbnb-> FROM paris as t1
airbnb-> JOIN airdna_property as t2
airbnb-> ON ST_DWithin( t1.pts_geog, t2.pts_geog,1000 )
airbnb-> WHERE t2.city='Paris'
airbnb-> group by t1.listing_id;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1030317.33..1030386.39 rows=6906 width=8) (actual time=2802071.616..2802084.109 rows=54400 loops=1)
Group Key: t1.listing_id
-> Nested Loop (cost=0.41..1030282.80 rows=6906 width=8) (actual time=0.827..2604319.421 rows=785571807 loops=1)
-> Seq Scan on airdna_property t2 (cost=0.00..74893.44 rows=141004 width=56) (actual time=0.131..738.133 rows=141506 loops=1)
Filter: (city = 'Paris'::text)
Rows Removed by Filter: 400052
-> Index Scan using paris_pts_geog_idx on paris t1 (cost=0.41..6.77 rows=1 width=64) (actual time=0.133..17.865 rows=5552 loops=141506)
Index Cond: (pts_geog && _st_expand(t2.pts_geog, '1000'::double precision))
Filter: ((t2.pts_geog && _st_expand(pts_geog, '1000'::double precision)) AND _st_dwithin(pts_geog, t2.pts_geog, '1000'::double precision, true))
Rows Removed by Filter: 3260
Planning time: 0.197 ms
Execution time: 2802086.005 ms
output of version:
version | postgis_version
----------------------------------------------------------------------------------------------------------+---------------------------------------
PostgreSQL 9.5.7 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005, 64-bit | 2.2 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
Update 2
This is after creating the indices as suggested. notice that the number of rows slightly increased because I added new data, but this is still the same size of problem. it takes 52 minutes. It still says Seq Scan on city, and I don't understand: why doesn't it do an index scan there, given I created one?
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=904989.83..905049.21 rows=5938 width=8) (actual time=3118569.759..3118581.444 rows=54400 loops=1)
Group Key: t1.listing_id
-> Nested Loop (cost=0.41..904960.14 rows=5938 width=8) (actual time=2.624..2881694.755 rows=837837851 loops=1)
-> Seq Scan on airdna_property t2 (cost=0.00..74842.84 rows=121245 width=56) (actual time=2.263..949.073 rows=151018 loops=1)
Filter: (city = 'Paris'::text)
Rows Removed by Filter: 435564
-> Index Scan using paris_pts_geog_idx on paris t1 (cost=0.41..6.84 rows=1 width=64) (actual time=0.139..18.555 rows=5548 loops=151018)
Index Cond: (pts_geog && _st_expand(t2.pts_geog, '1000'::double precision))
Filter: ((t2.pts_geog && _st_expand(pts_geog, '1000'::double precision)) AND _st_dwithin(pts_geog, t2.pts_geog, '1000'::double precision, true))
Rows Removed by Filter: 3257
Planning time: 0.377 ms
Execution time: 3118583.203 ms
(12 rows)
All you're doing is selecting the count just move the clause out of the select list to trim up the join.
SELECT t1.id, count(*)
FROM t1
JOIN t2
ON ST_DWithin( t1.pts_geog, t2.pts_geog, 1000 )
GROUP BY t1.id;
If you need an index, which ST_DWithin can use run this..
CREATE INDEX ON t1 USING gist (pts_geog);
CREATE INDEX ON t2 USING gist (pts_geog);
VACUUM ANALYZE t1;
VACUUM ANALYZE t2;
Now run the SELECT query above.
Update 2
Your plan shows that you have seq scan on city, so create an index on city and then we'll see what more we can do
CREATE INDEX ON airdna_property (city);
ANALYZE airdna_property;
Related
When joining on a table and then filtering (LIMIT 30 for instance), Postgres will apply a JOIN operation on all rows, even if the columns from those rows is only used in the returned column, and not as a filtering predicate.
This would be understandable for an INNER JOIN (PG has to know if the row will be returned or not) or for a LEFT JOIN without a unique constraint (PG has to know if more than one row will be returned or not), but for a LEFT JOIN on a UNIQUE column, this seems wasteful: if the query matches 10k rows, then 10k joins will be performed, and then only 30 will be returned.
It would seem more efficient to "delay", or defer, the join, as much as possible, and this is something that I've seen happen on some other queries.
Splitting this into a subquery (SELECT * FROM (SELECT * FROM main WHERE x LIMIT 30) LEFT JOIN secondary) works, by ensuring that only 30 items are returned from the main table before joining them, but it feels like I'm missing something, and the "standard" form of the query should also apply the same optimization.
Looking at the EXPLAIN plans, however, I can see that the number of rows joined is always the total number of rows, without "early bailing out" as you could see when, for instance, running a Seq Scan with a LIMIT 5.
Example schema, with a main table and a secondary one: secondary columns will only be returned, never filtered on.
drop table if exists secondary;
drop table if exists main;
create table main(id int primary key not null, main_column int);
create index main_column on main(main_column);
insert into main(id, main_column) SELECT i, i % 3000 from generate_series( 1, 1000000, 1) i;
create table secondary(id serial primary key not null, main_id int references main(id) not null, secondary_column int);
create unique index secondary_main_id on secondary(main_id);
insert into secondary(main_id, secondary_column) SELECT i, (i + 17) % 113 from generate_series( 1, 1000000, 1) i;
analyze main;
analyze secondary;
Example query:
explain analyze verbose select main.id, main_column, secondary_column
from main
left join secondary on main.id = secondary.main_id
where main_column = 5
order by main.id
limit 50;
This is the most "obvious" way of writing the query, takes on average around 5ms on my computer.
Explain:
Limit (cost=3742.93..3743.05 rows=50 width=12) (actual time=5.010..5.322 rows=50 loops=1)
Output: main.id, main.main_column, secondary.secondary_column
-> Sort (cost=3742.93..3743.76 rows=332 width=12) (actual time=5.006..5.094 rows=50 loops=1)
Output: main.id, main.main_column, secondary.secondary_column
Sort Key: main.id
Sort Method: top-N heapsort Memory: 27kB
-> Nested Loop Left Join (cost=11.42..3731.90 rows=332 width=12) (actual time=0.123..4.446 rows=334 loops=1)
Output: main.id, main.main_column, secondary.secondary_column
Inner Unique: true
-> Bitmap Heap Scan on public.main (cost=11.00..1036.99 rows=332 width=8) (actual time=0.106..1.021 rows=334 loops=1)
Output: main.id, main.main_column
Recheck Cond: (main.main_column = 5)
Heap Blocks: exact=334
-> Bitmap Index Scan on main_column (cost=0.00..10.92 rows=332 width=0) (actual time=0.056..0.057 rows=334 loops=1)
Index Cond: (main.main_column = 5)
-> Index Scan using secondary_main_id on public.secondary (cost=0.42..8.12 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=334)
Output: secondary.id, secondary.main_id, secondary.secondary_column
Index Cond: (secondary.main_id = main.id)
Planning Time: 0.761 ms
Execution Time: 5.423 ms
explain analyze verbose select m.id, main_column, secondary_column
from (
select main.id, main_column
from main
where main_column = 5
order by main.id
limit 50
) m
left join secondary on m.id = secondary.main_id
where main_column = 5
order by m.id
limit 50
This returns the same results, in 2ms.
The total EXPLAIN cost is also three times higher, in line with the performance gain we're seeing.
Limit (cost=1048.44..1057.21 rows=1 width=12) (actual time=1.219..2.027 rows=50 loops=1)
Output: m.id, m.main_column, secondary.secondary_column
-> Nested Loop Left Join (cost=1048.44..1057.21 rows=1 width=12) (actual time=1.216..1.900 rows=50 loops=1)
Output: m.id, m.main_column, secondary.secondary_column
Inner Unique: true
-> Subquery Scan on m (cost=1048.02..1048.77 rows=1 width=8) (actual time=1.201..1.515 rows=50 loops=1)
Output: m.id, m.main_column
Filter: (m.main_column = 5)
-> Limit (cost=1048.02..1048.14 rows=50 width=8) (actual time=1.196..1.384 rows=50 loops=1)
Output: main.id, main.main_column
-> Sort (cost=1048.02..1048.85 rows=332 width=8) (actual time=1.194..1.260 rows=50 loops=1)
Output: main.id, main.main_column
Sort Key: main.id
Sort Method: top-N heapsort Memory: 27kB
-> Bitmap Heap Scan on public.main (cost=11.00..1036.99 rows=332 width=8) (actual time=0.054..0.753 rows=334 loops=1)
Output: main.id, main.main_column
Recheck Cond: (main.main_column = 5)
Heap Blocks: exact=334
-> Bitmap Index Scan on main_column (cost=0.00..10.92 rows=332 width=0) (actual time=0.029..0.030 rows=334 loops=1)
Index Cond: (main.main_column = 5)
-> Index Scan using secondary_main_id on public.secondary (cost=0.42..8.44 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=50)
Output: secondary.id, secondary.main_id, secondary.secondary_column
Index Cond: (secondary.main_id = m.id)
Planning Time: 0.161 ms
Execution Time: 2.115 ms
This is a toy dataset here, but on a real DB, the IO difference is significant (no need to fetch 1000 rows when 30 are enough), and the timing difference also quickly adds up (up to an order of magnitude slower).
So my question: is there any way to get the planner to understand that the JOIN can be applied much later in the process?
It seems like something that could be applied automatically to gain a sizeable performance boost.
Deferred joins are good. It's usually helpful to run the limit operation on a subquery that yields only the id values. The order by....limit operation has to sort less data just to discard it.
select main.id, main.main_column, secondary.secondary_column
from main
join (
select id
from main
where main_column = 5
order by id
limit 50
) selection on main.id = selection.id
left join secondary on main.id = secondary.main_id
order by main.id
limit 50
It's also possible adding id to your main_column index will help. With a BTREE index the query planner knows it can get the id values in ascending order from the index, so it may be able to skip the sort step entirely and just scan the first 50 values.
create index main_column on main(main_column, id);
Edit In a large table, the heavy lifting of your query will be the selection of the 50 main.id values to process. To get those 50 id values as cheaply as possible you can use a scan of the covering index I proposed with the subquery I proposed. Once you've got your 50 id values, looking up 50 rows' worth of details from your various tables by main.id and secondary.main_id is trivial; you have the correct indexes in place and it's a limited number of rows. Because it's a limited number of rows it won't take much time.
It looks like your table sizes are too small for various optimizations to have much effect, though. Query plans change a lot when tables are larger.
Alternative query, using row_number() instead of LIMIT (I think you could even omit LIMIT here):
-- prepare q3 AS
select m.id, main_column, secondary_column
from (
select id, main_column
, row_number() OVER (ORDER BY id, main_column) AS rn
from main
where main_column = 5
) m
left join secondary on m.id = secondary.main_id
WHERE m.rn <= 50
ORDER BY m.id
LIMIT 50
;
Puttting the subsetting into a CTE can avoid it to be merged into the main query:
PREPARE q6 AS
WITH
-- MATERIALIZED -- not needed before version 12
xxx AS (
SELECT DISTINCT x.id
FROM main x
WHERE x.main_column = 5
ORDER BY x.id
LIMIT 50
)
select m.id, m.main_column, s.secondary_column
from main m
left join secondary s on m.id = s.main_id
WHERE EXISTS (
SELECT *
FROM xxx x WHERE x.id = m.id
)
order by m.id
-- limit 50
;
I'm having trouble with a "quite simple" request performance:
DB schema:
CREATE TABLE bigdata3.data_1_2021
(
p_value float8 NOT NULL,
p_timestamp tsrange NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_data_1_2021_ts ON bigdata3.data_1_2021 USING gist (p_timestamp);
CREATE INDEX IF NOT EXISTS idx_data_1_2021_ts2 ON bigdata3.data_1_2021 USING btree (p_timestamp);
FYI I'm using btree_gist extention
CREATE EXTENSION IF NOT EXISTS btree_gist;
Also, there are 19037 rows in my table. So now, the request:
WITH data_1 AS
(
SELECT t1.p_value AS value,
t1.p_timestamp AS TS
FROM "bigdata3".data_1_2021 AS t1
WHERE TSRANGE( '2021-02-01 00:00:00.000'::TIMESTAMP,'2021-02-17 09:51:54.000'::TIMESTAMP) && t1.p_timestamp
)
SELECT t1.ts AS ts,
t2.ts AS ts,
t1.value,
t2.value
FROM data_1 as t1
INNER JOIN data_1 as t2 ON t1.ts && t2.ts
This request takes 1 minute.
When I run an explain, many things seems strange to me:
QUERY PLAN
Nested Loop (cost=508.96..8108195.71 rows=1801582 width=80)
Join Filter: (t1.ts && t2.ts)
CTE data_1
-> Seq Scan on data_1_2021 t1_1 (cost=0.00..508.96 rows=18982 width=29)
Filter: ('["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange && p_timestamp)
-> CTE Scan on data_1 t1 (cost=0.00..379.64 rows=18982 width=40)
-> CTE Scan on data_1 t2 (cost=0.00..379.64 rows=18982 width=40)
1) I expect the sequence scan on the ts range to use the "idx_data_1_2021_ts" index
2) I expect the join to use the very same index for a hash or merge join
The stranger thing comes now:
WITH data_1 AS
(
SELECT t1.p_value AS value,
t1.p_timestamp AS TS
FROM "bigdata3".data_1_2021 AS t1
WHERE TSRANGE( '2021-02-01 00:00:00.000'::TIMESTAMP,'2021-02-17 09:51:54.000'::TIMESTAMP) && t1.p_timestamp
),
data_2 AS
(
SELECT t1.p_value AS value,
t1.p_timestamp AS TS
FROM "bigdata3".data_1_2021 AS t1
WHERE TSRANGE( '2021-02-01 00:00:00.000'::TIMESTAMP,'2021-02-17 09:51:54.000'::TIMESTAMP) && t1.p_timestamp
)
SELECT t1.ts AS ts,
t2.ts AS ts,
t1.value,
t2.value
FROM data_1 as t1
INNER JOIN data_2 as t2 ON t1.ts && t2.ts
I only duplicate my data_1 as a data_2 and change my join to join data_1 with data_2:
Nested Loop (cost=0.28..116154.41 rows=1801582 width=58)
-> Seq Scan on data_1_2021 t1 (cost=0.00..508.96 rows=18982 width=29)
Filter: ('["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange && p_timestamp)
-> Index Scan using idx_data_1_2021_ts on data_1_2021 t1_1 (cost=0.28..4.19 rows=190 width=29)
Index Cond: ((p_timestamp && t1.p_timestamp) AND (p_timestamp && '["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange))
The request take 1 second and now uses the index!
But ... it's still not perfect because of the seq scan and the nested loop.
Another piece of info: switching to = operator on the join makes the first case faster, but the second case slower ...
Does anybody have an explanation for why it is not properly using the index when joining the very same table? Also I take any advice to make this request going faster.
Many thanks,
Clément
PS: I know this request can look stupid, I made my real case simple to point out my issue.
Edit 1: As requested, the analyze+buffer explain of the first request:
QUERY PLAN
Nested Loop (cost=509.04..8122335.52 rows=1802721 width=40) (actual time=0.025..216996.205 rows=19680 loops=1)
Join Filter: (t1.ts && t2.ts)
Rows Removed by Join Filter: 359841220
Buffers: shared hit=271
CTE data_1
-> Seq Scan on data_1_2021 t1_1 (cost=0.00..509.04 rows=18988 width=29) (actual time=0.013..38.263 rows=18970 loops=1)
Filter: ('["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange && p_timestamp)
Rows Removed by Filter: 73
Buffers: shared hit=271
-> CTE Scan on data_1 t1 (cost=0.00..379.76 rows=18988 width=40) (actual time=0.016..8.083 rows=18970 loops=1)
Buffers: shared hit=1
-> CTE Scan on data_1 t2 (cost=0.00..379.76 rows=18988 width=40) (actual time=0.000..4.723 rows=18970 loops=18970)
Buffers: shared hit=270
Planning Time: 0.176 ms
Execution Time: 217208.300 ms
AND the second:
QUERY PLAN
Nested Loop (cost=0.28..116190.34 rows=1802721 width=58) (actual time=280.133..817.611 rows=19680 loops=1)
Buffers: shared hit=76361
-> Seq Scan on data_1_2021 t1 (cost=0.00..509.04 rows=18988 width=29) (actual time=0.030..7.909 rows=18970 loops=1)
Filter: ('["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange && p_timestamp)
Rows Removed by Filter: 73
Buffers: shared hit=271
-> Index Scan using idx_data_1_2021_ts on data_1_2021 t1_1 (cost=0.28..4.19 rows=190 width=29) (actual time=0.041..0.042 rows=1 loops=18970)
Index Cond: ((p_timestamp && t1.p_timestamp) AND (p_timestamp && '["2021-02-01 00:00:00","2021-02-17 09:51:54")'::tsrange))
Buffers: shared hit=76090
Planning Time: 709.820 ms
Execution Time: 981.659 ms
There are too many questions here, I'll answer the first two:
The index is not used, because the query fetches almost all the rows from the table anyway.
Hash or merge joins can only be used with join conditions that use the = operator. This is quite obvious: a hash can only be probed for equality, and a merge join requires sorting and total order.
Because your CTE is referenced twice in the query, the planner automatically materializes it. Once materialized, it can't use the index on the underlying table anymore. (That is, it can't use for the highly selective condition t1.ts && t2.ts. It could still use for the "first half of February" condition as that occurs prior to the materialization, but since it is so non-selective, it chooses not to use it)
You can force it not to materialize it:
WITH data_1 AS NOT MATERIALIZED (...
In my hands, doing this produces the same execution plan as writing two separate CTEs, each of which is referenced only once.
I am having a really slow query (~100mins). I have omitted a lot of the inner child nodes by denoting it with a suffix ...
HashAggregate (cost=6449635645.84..6449635742.59 rows=1290 width=112) (actual time=5853093.882..5853095.159 rows=785 loops=1)
Group Key: p.processid
-> Nested Loop (cost=10851145.36..6449523319.09 rows=832050 width=112) (actual time=166573.289..5853043.076 rows=3904 loops=1)
Join Filter: (SubPlan 2)
Rows Removed by Join Filter: 617040
-> Merge Left Join (cost=5425572.68..5439530.95 rows=1290 width=799) (actual time=80092.782..80114.828 rows=788 loops=1) ...
-> Materialize (cost=5425572.68..5439550.30 rows=1290 width=112) (actual time=109.689..109.934 rows=788 loops=788) ...
SubPlan 2
-> Limit (cost=3869.12..3869.13 rows=5 width=8) (actual time=9.155..9.156 rows=5 loops=620944) ...
Planning time: 1796.764 ms
Execution time: 5853316.418 ms
(2836 rows)
The above query plan is a query executed to the view, schema below (simplified)
create or replace view foo_bar_view(processid, column_1, run_count) as
SELECT
q.processid,
q.column_1,
q.run_count
FROM
(
SELECT
r.processid,
avg(h.some_column) AS column_1,
-- many more aggregate function on many more columns
count(1) AS run_count
FROM
foo_bar_table r,
foo_bar_table h
WHERE (h.processid IN (SELECT p.processid
FROM process p
LEFT JOIN bar i ON p.barid = i.id
LEFT JOIN foo ii ON i.fooid = ii.fooid
JOIN foofoobar pt ON p.typeid = pt.typeid AND pt.displayname ~~
((SELECT ('%'::text || property.value) || '%'::text
FROM property
WHERE property.name = 'something'::text))
WHERE p.processid < r.processid
AND (ii.name = r.foo_name OR ii.name IS NULL AND r.foo_name IS NULL)
ORDER BY p.processid DESC
LIMIT 5))
GROUP BY r.processid
) q;
I would just like to understand, does this mean that most of the time is spent performing the GROUP BY processid?
If not, what is causing the issue? I can't think of a reason why is this query so slow.
The aggregate functions used are avg, min, max, stddev.
A total of 52 of them were used, 4 on each of the 13 columns.
Update: Expanding on the child node of SubPlan 2. We can see that the Bitmap Index Scan on process_pkey part is the bottleneck.
-> Bitmap Heap Scan on process p_30 (cost=1825.89..3786.00 rows=715 width=24) (actual time=8.642..8.833 rows=394 loops=620944)
Recheck Cond: ((typeid = pt_30.typeid) AND (processid < p.processid))
Heap Blocks: exact=185476288
-> BitmapAnd (cost=1825.89..1825.89 rows=715 width=0) (actual time=8.611..8.611 rows=0 loops=620944)
-> Bitmap Index Scan on ix_process_typeid (cost=0.00..40.50 rows=2144 width=0) (actual time=0.077..0.077 rows=788 loops=620944)
Index Cond: (typeid = pt_30.typeid)
-> Bitmap Index Scan on process_pkey (cost=0.00..1761.20 rows=95037 width=0) (actual time=8.481..8.481 rows=145093 loops=620944)
Index Cond: (processid < p.processid)
What I am unable to figure out is why is it using a Bitmap Index Scan and not Index Scan. From what it seems, there should only be 788 rows that needs to be compared? Wouldn't that be faster? If not how can I optimise this query?
processid is of bigint type and has an index
The complete execution plan is here.
You conveniently left out the names of the tables in the execution plan, but I assume that the nested loop join is between foo_bar_table r and foo_bar_table h, and the subplan is the IN condition.
The high execution time is caused by the subplan, which is executed for each potential join result, that is 788 * 788 = 620944 times. 620944 * 9.156 accounts for 5685363 milliseconds.
Create this index:
CREATE INDEX ON process (typeid, processid, installationid);
And run VACUUM:
VACUUM process;
That should give you a fast index-only scan.
My table1 has 25,000+ rows and table2 has only 1 row. Both of them have almost 30 columns. I need to add all the columns in table2 (which has only one row) to the columns in table1 so I can do further calculations. One way to do it is
select * from table1 cross join table2
It gives the desired results but the performance is not good.
I am wondering if there is a better or faster way to get the combined table. I am using PostgreSQL
Here is the output from
explain analyze select * from table1 cross join table2
Nested Loop (cost=0.00..195264.90 rows=15533650 width=336) (actual time=0.013..46.189 rows=25465 loops=1)
-> Seq Scan on table1 (cost=0.00..1076.65 rows=25465 width=232) (actual time=0.007..6.912 rows=25465 loops=1)
-> Materialize (cost=0.00..19.15 rows=610 width=104) (actual time=0.000..0.000 rows=1 loops=25465)
-> Seq Scan on table2 (cost=0.00..16.10 rows=610 width=104) (actual time=0.001..0.002 rows=1 loops=1)
Planning time: 0.153 ms
Execution time: 50.868 ms
Thanks.
I am using 9.1
Here is the query:
select a.id,b.flag
from a, b
where b.starting_date>='2002-01-01'::date
and a.zip_code= b.zip_code
and abs( a.date1-b.date1 ) <=60
and abs( a.balance-b.balance ) <=1000
and abs( a.value-b.value ) <=3000
Where table a and b have every single field indexed (not multi-column index) using btree.
Here is explain analyze verbose results:
Merge Join (cost=0.00..448431434.24 rows=440104742 width=16) (actual time=384.268..15151912.652 rows=672144 loops=1)
Output: a.id, b.flag
Merge Cond: (a.zip_code = b.zip_code)
Join Filter: ((abs((a.date1 - b.date1)) <= 60) AND (abs((a.balance - b.balance)) <= 1000) AND (abs((a.value - b.value)) <= 3000))
-> Index Scan using indx_a_zip_code on a (cost=0.00..950851.26 rows=6800857 width=32) (actual time=0.028..22292.274 rows=2080440 loops=1)
Output: a.id, a.zip_code, a.date1, a.balance, a.value
-> Materialize (cost=0.00..1906889.40 rows=19744024 width=28) (actual time=0.032..6148075.701 rows=6472114362 loops=1)
Output: b.balance, b.date1, b.zip_code, b.starting_date, b.value,
-> Index Scan using indx_zip_code on b (cost=0.00..1857529.34 rows=19744024 width=28) (actual time=0.025..76893.104 rows=19078422 loops=1)
Output:b.balance, b.date1, b.zip_code, b.starting_date
Filter: (b.starting_date >= '2002-01-01'::date)
Total runtime: 15155983.643 ms
It appears to me that the query planner is using index for zip_code but not other fields. I have 8 million rows in table a and 20 million rows in table b and it takes 3 hours to finish (The RAM is 64GB, and all critical postgresql server configurations have been tuned according to:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
Any advice/suggestion is appreciated. Thanks a million!