Postgres FDW Remote only JOIN causes fetch all data - postgresql

I'm trying to fetch data from a remote server with JOIN clause which involves only remote tables, but it is very slow because the planner decides to fetch all data from two tables and merge it locally.
When I add WHERE clause it fixes the problem and JOIN is executed fully on a remote server.
Problem is reproducible on small example:
-- remote server
create table test
(
id serial
constraint test_pk
primary key,
name text
);
create table test2
(
test_id int
constraint test2_test_id_fk
references test (id),
info text
);
SELECT Query:
SELECT "test".id FROM "test" JOIN "test2" ON "test"."id"="test2".test_id;
Output from EXPLAIN VERBOSE (on empty tables!):
Merge Join (cost=732.29..1388.59 rows=42778 width=4)
Output: test.id
Merge Cond: (test.id = test2.test_id)
-> Sort (cost=366.15..373.46 rows=2925 width=4)
Output: test.id
Sort Key: test.id
-> Foreign Scan on public.test (cost=100.00..197.75 rows=2925 width=4)
Output: test.id
Remote SQL: SELECT id FROM public.test
-> Sort (cost=366.15..373.46 rows=2925 width=4)
Output: test2.test_id
Sort Key: test2.test_id
-> Foreign Scan on public.test2 (cost=100.00..197.75 rows=2925 width=4)
Output: test2.test_id
Remote SQL: SELECT test_id FROM public.test2
After adding WHERE test.id=1
Foreign Scan (cost=100.00..198.75 rows=225 width=4)
Output: test.id
Relations: (public.test) INNER JOIN (public.test2)
Remote SQL: SELECT r1.id FROM (public.test r1 INNER JOIN public.test2 r2 ON (((r2.test_id = 1)) AND ((r1.id = 1))))
I'm using AWS RDS Postgres v10.18 on both sides.
What is going on? How to force execution on a remote server?
I didn't find anything with that problem.
Thanks for any help.

PostgreSQL has no idea how much data it will find in those tables, and its completely arbitrary guess is not very good.
You can help it out by doing this:
alter server fdw options (add use_remote_estimate 'on');
Planning will take substantially longer because it needs to make multiple roundtrips to the foreign server to do the planning, but IME that is usually well worth it.
You can instead ANALYZE the foreign tables on the local side, so that it stores the stats locally. Planning time should not suffer as much as with use_remote_estimate. You would need to repeat occasionally, as they will not be recomputed automatically. I've had poor experiences with this, but that was several releases ago so maybe it has improved.
Either one fixes your reproducer case for me

PostgreSQL estimates that the join result will consist of 42778 rows, so it thinks it more efficient to join the tables locally rather than transferring the big result set.
If that estimate is not correct, ANALYZE both foreign tables to get accurate statistics, then try again. Remember that foreign tables are not analyzed automatically.
In general, when asking performance questions, always include EXPLAIN (ANALYZE, BUFFERS) output.

Related

Speed up PostgreSQL query that joins a small table with a big table

I am trying to find the fastest way to JOIN a small table with a big table in PostgreSQL.
Please have a look at the following minimal example that creates a small table with 5.000 rows and a big table with 3.000.000 rows:
-- Create small table
CREATE TABLE small_table (
id INTEGER,
text VARCHAR(100)
);
-- Create big table
CREATE TABLE big_table (
id INTEGER
);
-- Insert random data into small table (5.000 rows)
INSERT INTO
small_table (id, text)
SELECT
generate_series(1, 5000) AS id,
md5(random()::text) AS text;
-- Insert random data into big table (3.000.000 rows)
INSERT INTO
big_table (id)
SELECT id FROM
(
SELECT
generate_series(1, 3000000),
floor(random() * 5000) AS id
) random;
Now join the tables using the INTEGER ids to get the according text from the small table for every entry in the big table:
-- Join small table with big table
SELECT big_table.id, small_table.text
FROM big_table
INNER JOIN small_table
ON big_table.id = small_table.id;
The JOIN takes between 2 and 4 seconds to run.
Here is the execution plan of the query:
1. Hash Inner Join (cost=154.5..84679.5 rows=3000000 width=37) (rows=2999394 loops=1)
Hash Cond: (big_table.id = small_table.id)
2. Seq Scan on public.big_table as big_table (cost=0..43275 rows=3000000 width=4) (rows=3000000 loops=1)
3. Hash (cost=92..92 rows=5000 width=37) (rows=5000 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 416 kB
4. Seq Scan on public.small_table as small_table (cost=0..92 rows=5000 width=37) (rows=5000 loops=1)
When a add the timings to EXPLAIN ANALYZE in PGAdmin the query takes way longer. Here is the result including the timings:
Here are the statistics per node type:
And here the statistics per relation:
As we can see:
A Hash Inner Join is used which according to this should already be the best strategy.
A Seq Scan is used for both tables. From my understanding that is okay because all rows are read anyway. Creating an index on id in both tables still leads to a seq scan.
Only 416 kB of memory are used so changing work_mem should not have an effect.
Please correct me if any of my assumptions and conclusions are wrong.
So my question is: What else can be done to speed up the query? Is it even possible or am I hitting the limits of PostgreSQL or an on-disc SQL database in general?
EDIT:
Here is the result of explain (analyze, buffers, verbose, timing). And here the same with a forced merge_join.
That is already running as efficiently as possible, and there is no possible improvement. The way your data are, you need all rows from both tables, so the sequential scans required by the hash join are the most efficient way to do it.
The only possible way to improve on the speed is to see that the sequential scans don't need to access the disk. That is, you need to have enough RAM to keep both tables cached. Then PostgreSQL will effectively act as an in-memory database.
Processing may be slower initially, until all data are in cache. You can improve on that by pre-warming the cache: the pg_prewarm standard extension allows you to load both tables in cache.

Postges makes Seq Scan instead of Index Only Scan for a very simple SELECT x ... ORDER BY x

I have a table union_events with 17M rows and 102 columns in Postgres 10. I run the commands:
CREATE INDEX union_events_index ON temp_schema_to_delete.union_events(id)
ANALYZE temp_schema_to_delete.union_events
EXPLAIN SELECT id FROM temp_schema_to_delete.union_events ORDER BY id
and get the following results:
Sort (cost=3614290.72..3658708.19 rows=17766988 width=4)
Sort Key: id
-> Seq Scan on union_events (cost=0.00..1474905.88 rows=17766988 width=4)
id is some non-null and non-unique integer field.
I expect that my index will be used and I haven't to sort the table once more.
I made a quick test:
SELECT s INTO temp_schema_to_delete.test FROM generate_series(0, 10000000) AS s
CREATE INDEX test_index ON temp_schema_to_delete.test(s)
ANALYZE temp_schema_to_delete.test
EXPLAIN SELECT s FROM temp_schema_to_delete.test ORDER BY s
It gets:
Index Only Scan using test_index on test (cost=0.43..303940.15 rows=10000048 width=4)
It seems okay.
What is wrong with my first table or query? Why the index on id is not used?
As #joop recommended, I've made a VACUUM for the table.
It resulted in a better plan that uses my index.
For me still it's not clear, how VACUUM could help with repsect to the fact that I've created this table using SELECT ... INTO TABLE and haven't done any UPDATEs or DELETEs.
It would be great if somebody could explain this in other answer, maybe there exists some effective solution then VACUUM.

Postgres doing a sort on simple join

I have two tables in my database (address and person_address). Address has a PK in address_id. person_address has a PK on (address_id, person_id, usage_code)
When joining this two tables through the address_id, my expectation is that the PK index is used on both cases. However, Postgres is adding sort and materialize steps to the plan, which slows down the execution of the query. I have tried dropping indexes (person_address had an index on address_id), analyzing stats, without success.
I will appreciate any help on how to isolate this situation since those queries run slower than expected on our production environment
This is the query:
select *
from person_addresses pa
join address a
on pa.address_id = a.address_id
This is the plan :
Merge Join (cost=1506935.96..2416648.39 rows=16033774 width=338)
Merge Cond: (pa.address_id = ((a.address_id)::numeric))
-> Index Scan using person_addresses_pkey on person_addresses pa (cost=0.43..592822.76 rows=5256374 width=104)
-> Materialize (cost=1506935.53..1526969.90 rows=4006874 width=234)
-> Sort (cost=1506935.53..1516952.71 rows=4006874 width=234)
Sort Key: ((a.address_id)::numeric)
-> Seq Scan on address a (cost=0.00..163604.74 rows=4006874 width=234)
Thanks.
Edit 1. After the comment checked the data types and found a discrepancy. Fixing the data type changed the plan to the following
Hash Join (cost=343467.18..881125.47 rows=5256374 width=348)
Hash Cond: (pa.address_id = a.address_id)
-> Seq Scan on person_addresses pa (cost=0.00..147477.74 rows=5256374 width=104)
-> Hash (cost=159113.97..159113.97 rows=4033697 width=244)
-> Seq Scan on address_normalization a (cost=0.00..159113.97 rows=4033697 width=244)
Performance improvement is evident on the plan, but am wondering if the sequential scans are expected without any filters
So there are two questions here:
why did Postgres choose the (expensive) "Merge Join" in the first query?
The reason for this is that it could not use the more efficient "Hash Join" because the hash values of integer and numeric values would be different. But the Merge join requires that the values are sorted, and that's where the "Sort" step comes from in the first execution plan. Given the number of rows a "Nested Loop" would have been even more expensive.
The second question is:
I am wondering if the sequential scans are expected without any filters
Yes they are expected. The query retrieves all matching rows from both tables and that is done most efficiently by scanning all rows. An index scan requires about 2-3 I/O operations per row that has to be retrieved. A sequential scan usually requires less than one I/O operation as one block (which is the smallest unit the database reads from the disk) contains multiple rows.
You can run explain (analyze, buffers) to see how much "logical reads" each step takes.

Query simplification based on selected columns

I'm trying to understand how PostgreSQL simplifies a query: let's say I have 2 tables ("tb_thing" and "tb_thing_template"), where each thing points to a template, and that I run a query like this:
EXPLAIN SELECT
tb_thing.id
FROM
tb_thing,
tb_thing_template
WHERE
tb_thing_template.id = tb_thing.template_id
;
This is the result:
QUERY PLAN
---------------------------------------------------------------------------------
Hash Join (cost=34.75..64.47 rows=788 width=4)
Hash Cond: (tb_thing.template_id = tb_thing_template.id)
-> Seq Scan on tb_thing (cost=0.00..18.88 rows=788 width=8)
-> Hash (cost=21.00..21.00 rows=1100 width=4)
-> Seq Scan on tb_thing_template (cost=0.00..21.00 rows=1100 width=4)
The planner is joining the two tables even if I'm just selecting one field from "tb_thing" and nothing from "tb_thing_template". I was hoping the planner was smart enough to figure out it didn't need to actually join the "tb_thing_template" table because I'm not selecting anything from it.
Why does it do the join anyway? Why isn't the column selection taken into account when the query is planned?
Thanks!
Semantically your query and a simple SELECT tb_thing.id FROM tb_thing are not the same.
Assume, for instance, that table tb_thing_template has 4 rows with an identical id value that is also a tb_thing.template_id. The result of your query will then have 4 rows with the same tb_thing.id. Inversely, if a tb_thing.template_id is not present in tb_thing_template.id then that row will not be output.
Only when tb_thing_template.id is a PRIMARY KEY (so unique) and tb_thing.template_id is a FOREIGN KEY to that id with just a single row for each PRIMARY KEY, so a 1:1 relationship, are both queries semantically the same. Even a 1:N relationship, which is more typical in a PK-FK relationship, would require the join in a semantic sense. But the planner has no way of knowing if the relationship is 1:1, so you get the join.
But you should not try to spoof the query planner; it is smart, but not necessarily smarter than you (might be) dumb.

optimize Query in PostgreSQL

SELECT count(*)
FROM contacts_lists
JOIN plain_contacts
ON contacts_lists.contact_id = plain_contacts.contact_id
JOIN contacts
ON contacts.id = plain_contacts.contact_id
WHERE plain_contacts.has_email
AND NOT contacts.email_bad
AND NOT contacts.email_unsub
AND contacts_lists.list_id =67339
how can i optimize this query.. could you please explain...
Reformatting your query plan for clarity:
QUERY PLAN Aggregate (cost=126377.96..126377.97 rows=1 width=0)
-> Hash Join (cost=6014.51..126225.38 rows=61033 width=0)
Hash Cond: (contacts_lists.contact_id = plain_contacts.contact_id)
-> Hash Join (cost=3067.30..121828.63 rows=61033 width=8)
Hash Cond: (contacts_lists.contact_id = contacts.id)
-> Index Scan using index_contacts_lists_on_list_id_and_contact_id
on contacts_lists (cost=0.00..116909.97 rows=61033 width=4)
Index Cond: (list_id = 66996)
-> Hash (cost=1721.41..1721.41 rows=84551 width=4)
-> Seq Scan on contacts (cost=0.00..1721.41 rows=84551 width=4)
Filter: ((NOT email_bad) AND (NOT email_unsub))
-> Hash (cost=2474.97..2474.97 rows=37779 width=4)
-> Seq Scan on plain_contacts (cost=0.00..2474.97 rows=37779 width=4)
Filter: has_email
Two partial indexes might eliminate seq scans depending on your data distribution:
-- if many contacts have bad emails or are unsubscribed:
CREATE INDEX contacts_valid_email_idx ON contacts (id)
WHERE (NOT email_bad AND NOT email_unsub);
-- if many contacts have no email:
CREATE INDEX plain_contacts_valid_email_idx ON plain_contacts (id)
WHERE (has_email);
You might be missing an index on a foreign key:
CREATE INDEX plain_contacts_contact_id_idx ON plain_contacts (contact_id);
Last but not least if you've never analyzed your data, you need to run:
VACUUM ANALYZE;
If it's still slow once all that is done, there isn't much you can do short of merging your plain_contacts and your contacts tables: getting the above query plan in spite of the above indexes means most/all of your subscribers are subscribed to that particular list -- in which case the above query plan is the fastest you'll get.
This is already a very simple query that the database will run in the most efficient way providing that statistics are up to date
So in terms of the query itself there's not much to do.
In terms of database administration you can add indexes - there should be indexes in the database for all the join conditions and also for the most selective part of the where clause (list_id, contact_id as FK in plain_contacts and contacts_lists). This is the most significant opportunity to improve performance of this query (orders of magnitude). Still as SpliFF notes, you probably already have those indexes, so check.
Also, postgres has good explain command that you should learn and use. It will help with optimizing queries.
Since you only want to inlude rows that has some flags set in the joined tables, I would move that statements into the join clause:
SELECT count(*)
FROM contacts_lists
JOIN plain_contacts
ON contacts_lists.contact_id = plain_contacts.contact_id
AND NOT plain_contacts.has_email
JOIN contacts
ON contacts.id = plain_contacts.contact_id
AND NOT contacts.email_unsub
AND NOT contacts.email_bad
WHERE contacts_lists.list_id =67339
I'm not sure if this would make a great impact on performance, but worth a try. You should probably have indexes on the joined tables as well for optimal performance, like this:
plain_contacts: contact_id, has_email
contacts: id, email_unsub, email_bad
Have you run ANALYZE on the database recently? Do the row counts in the EXPLAIN plan look like they make sense? (Looks like you ran only EXPLAIN. EXPLAIN ANALYZE gives both estimated and actual timings.)
You can use SELECT count(1) ... but other than that I'd say it looks fine. You could always cache some parts of the query using views or put indexes on contact_id and list_id if you're really struggling (I assume you have one on id already).