I am trying to optimize ip address geolocation queries on the following raw schema (dataset is from an https://ipinfo.io/ free trial):
CREATE TABLE raw_geolocations (
start_ip INET NOT NULL,
end_ip INET NOT NULL,
join_key CIDR NOT NULL,
city TEXT NOT NULL,
region TEXT,
country TEXT NOT NULL,
lat NUMERIC NOT NULL,
lng NUMERIC NOT NULL,
postal TEXT,
timezone TEXT NOT NULL
);
Just querying with:
select *
from unnest(array[
inet '<ip_address_string>'
]) ip_address
left join raw_geolocations g on ip_address between start_ip and end_ip;
was terribly slow (between 50s and 120s for single ip address queries).
So I followed a "guide" here and migrated raw_geolocations to a new table called ip_geolocations with a gist index created on the ip_segment field:
create type iprange as range (subtype=inet);
create table ip_geolocations(
id bigserial primary key not null,
ip_segment iprange not null,
join_key cidr not null,
city TEXT NOT NULL,
region TEXT,
country TEXT NOT NULL,
lat NUMERIC NOT NULL,
lng NUMERIC NOT NULL,
postal TEXT,
timezone TEXT NOT NULL
);
insert into ip_geolocations(ip_segment, join_key, city, region, country, lat, lng, postal, timezone)
select iprange(start_ip, end_ip, '[]'), join_key, city, region, country, lat, lng, postal, timezone
from raw_geolocations;
create index gist_idx_ip_geolocations_ip_segment on ip_geolocations USING gist (ip_segment);
This was great for single ip address queries, bringing execution time to between 20ms and 200ms, however, was fairly slow for bulk queries, taking around 1.5s to 3s for 100 ip addresses at once.
Sample query:
select *
from unnest(array[
inet '<ip_address_string>'
]) ip_address
left join ip_geolocations g on g.ip_segment #> ip_address;
A few problems as I dove into other optimizations:
I then decided to look into gin indexes, but ran into the following postgres error when running the following command:
create index test_idx_geolocations on ip_geolocations using gin (ip_segment);
ERROR: operator class "inet_ops" does not exist for access method "gin"
Time: 2.173 ms
I am struggling to understand the instructions for how to use inet_ops as an operator class for either the iprange type I defined or even inet and cidr types. Additionally, I tried to define my own operator class to no avail:
create operator class gin_iprange_ops default for type iprange
using gin as
OPERATOR 7 #> (iprange, inet),
OPERATOR 8 <# (inet, iprange);
ERROR: operator does not exist: iprange #> inet
This error doesn't make much sense to me since the sample query above works with the #> operator.
Whilst being blocked, I decided to see if just playing around with postgres configurations would improve query times, so I made the following changes:
alter table ip_geolocations set (parallel_workers = 4);
set max_parallel_maintenance_workers to 4;
set maintenance_work_mem to '1 GB';
This seemed to non-linearly improve index creation time, but didn't have any noticeable effects when it came to query execution time.
Lastly, there are other ip address tables of a similar form:
CREATE TABLE raw_<other_table> (
start_ip INET NOT NULL,
end_ip INET NOT NULL,
join_key CIDR NOT NULL,
... <other_fields>
);
that I want to be able to join on. I followed a similar set of steps for each of these other tables (3 of them) and migrated them to ip_<other_table> tables with iprange fields and gist indices.
Unfortunately, joining these tables on the join_key was still incredibly slow.
Any help is much appreciated. I am very new to the world of IP address queries (range contains queries), custom operators classes, index types (gist and gin), so any pointers or documentation that might help me would be great.
Edits
Query plan where I just check for the ip_address being in the ip_segment: iprange range:
explain analyze select *
from unnest(array[
inet '98.237.137.99'
]) ip_address
left join ip_geolocations g on g.ip_segment #> ip_address
left join ip_companies c on c.ip_segment #> ip_address
left join ip_carriers cr on cr.ip_segment #> ip_address
left join ip_privacy_detections pd on pd.ip_segment #> ip_address;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=1930.48..7972624532924773.00 rows=931561551811655 width=412) (actual time=562.324..589.083 rows=1 loops=1)
-> Nested Loop Left Join (cost=23.78..1160195946065.69 rows=116337334725 width=356) (actual time=287.962..314.719 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.84..30592828311.99 rows=1694915933 width=293) (actual time=282.013..308.768 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.42..515233.65 rows=27965 width=179) (actual time=270.532..297.172 rows=1 loops=1)
-> Function Scan on unnest ip_address (cost=0.00..0.01 rows=1 width=32) (actual time=0.006..0.008 rows=1 loops=1)
-> Index Scan using idx_ip_companies_ip_segment on ip_companies c (cost=0.42..513835.38 rows=139826 width=147) (actual time=270.519..297.157 rows=1 loops=1)
Index Cond: (ip_segment #> ip_address.ip_address)
-> Index Scan using idx_ip_geolocations_ip_segment on ip_geolocations g (cost=0.42..1090919.64 rows=303041 width=114) (actual time=11.474..11.588 rows=1 loops=1)
Index Cond: (ip_segment #> ip_address.ip_address)
-> Bitmap Heap Scan on ip_carriers cr (cost=22.94..663.04 rows=343 width=63) (actual time=5.939..5.939 rows=0 loops=1)
Recheck Cond: (ip_segment #> ip_address.ip_address)
-> Bitmap Index Scan on test_idx_cr (cost=0.00..22.85 rows=343 width=0) (actual time=5.935..5.935 rows=0 loops=1)
Index Cond: (ip_segment #> ip_address.ip_address)
-> Bitmap Heap Scan on ip_privacy_detections pd (cost=1906.70..68119.89 rows=40037 width=56) (actual time=274.352..274.353 rows=0 loops=1)
Recheck Cond: (ip_segment #> ip_address.ip_address)
-> Bitmap Index Scan on idx_ip_privacy_detections_ip_segment (cost=0.00..1896.69 rows=40037 width=0) (actual time=274.349..274.350 rows=0 loops=1)
Index Cond: (ip_segment #> ip_address.ip_address)
Planning Time: 0.739 ms
Execution Time: 589.823 ms
(19 rows)
Note that for each of the 4 tables being queried, I built a gist index on their ip_segment fields.
Each of these left joins could honestly be separate queries that are associated together in the application layer, but in an ideal world I can get the rds to handle associations (joins) for me.
Running the above query for just 1 ip address takes 26.376 ms. For around 100 random ip addresses it takes around 11309.123 ms (11 seconds) (and improves to around 1.8 seconds after running the same query a few times).
Query plan where I use the provided cidr join_key field for table joins:
explain analyze select *
from unnest(array[
inet '98.237.137.99'
]) ip_address
left join ip_geolocations g on g.ip_segment #> ip_address
left join ip_companies c on c.join_key = g.join_key
left join ip_carriers cr on cr.join_key = g.join_key
left join ip_privacy_detections pd on pd.join_key = g.join_key;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Right Join (cost=13413427.68..762295480.26 rows=49873343109 width=412) (actual time=53725.909..53726.119 rows=121 loops=1)
Merge Cond: ((c.join_key)::inet = (g.join_key)::inet)
-> Sort (cost=10678179.38..10748092.42 rows=27965218 width=147) (actual time=45444.278..46291.865 rows=4939342 loops=1)
Sort Key: c.join_key
Sort Method: external merge Disk: 4415120kB
-> Seq Scan on ip_companies c (cost=0.00..910715.18 rows=27965218 width=147) (actual time=0.014..4597.313 rows=27965218 loops=1)
-> Materialize (cost=2735248.30..3478041.50 rows=41149314 width=265) (actual time=6963.430..6963.598 rows=121 loops=1)
-> Merge Left Join (cost=2735248.30..3375168.21 rows=41149314 width=265) (actual time=6963.426..6963.502 rows=121 loops=1)
Merge Cond: ((g.join_key)::inet = (pd.join_key)::inet)
-> Merge Left Join (cost=1112932.44..1115281.67 rows=124973 width=209) (actual time=80.721..80.725 rows=1 loops=1)
Merge Cond: ((g.join_key)::inet = (cr.join_key)::inet)
-> Sort (cost=1103325.02..1103476.54 rows=60608 width=146) (actual time=28.093..28.094 rows=1 loops=1)
Sort Key: g.join_key
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=0.42..1093950.06 rows=60608 width=146) (actual time=27.717..28.086 rows=1 loops=1)
-> Function Scan on unnest ip_address (cost=0.00..0.01 rows=1 width=32) (actual time=0.016..0.017 rows=1 loops=1)
-> Index Scan using idx_ip_geolocations_ip_segment on ip_geolocations g (cost=0.42..1090919.64 rows=303041 width=114) (actual time=27.695..28.062 rows=1 loops=1)
Index Cond: (ip_segment #> ip_address.ip_address)
-> Materialize (cost=9607.42..9950.61 rows=68639 width=63) (actual time=44.198..50.257 rows=20609 loops=1)
-> Sort (cost=9607.42..9779.01 rows=68639 width=63) (actual time=44.195..47.308 rows=20609 loops=1)
Sort Key: cr.join_key
Sort Method: external merge Disk: 5120kB
-> Seq Scan on ip_carriers cr (cost=0.00..1510.39 rows=68639 width=63) (actual time=0.486..12.759 rows=68639 loops=1)
-> Materialize (cost=1622315.86..1662352.94 rows=8007417 width=56) (actual time=5143.369..6417.708 rows=4015488 loops=1)
-> Sort (cost=1622315.86..1642334.40 rows=8007417 width=56) (actual time=5143.366..5843.105 rows=4015488 loops=1)
Sort Key: pd.join_key
Sort Method: external merge Disk: 439120kB
-> Seq Scan on ip_privacy_detections pd (cost=0.00..156763.17 rows=8007417 width=56) (actual time=0.802..845.180 rows=8007417 loops=1)
Planning Time: 12.618 ms
Execution Time: 54800.482 ms
(30 rows)
Note that for each of the 4 tables being queried, I built a gist index on the pair of fields (ip_segment range_ops, join_key inet_ops) fields.
This query took around 59 seconds to execute for even a single ip address, which is abysmal. I wonder if it has to do with using range_ops instead of inet_ops for the ip_segment gist index field.
What confuses me is why the gist indexes I built on join_key and ip_segment weren't even used for the 3 tables that were not ip_geolocations.
One thing to note is that I am comparing this to ipinfo's actual API which for the same set of ip addresses comes in at < 650 ms for each full request (for 100 ip addresses per batch) (meaning that there is a lot of room for improvement).
I think the main issue is that you are searching for ip address values between string values.
I would try storing the Big Integer value of the ip address along with your other information inside of the geolocations tables.
After the IP address is converted to an Big Integer, you would check for the IPint between the start IPint and end IPint
Here's an example of what parsing the data could look like:
on 1860329659 between 1449189901 and 4278870277
Instead of
on '110.226.96.187' between '86.96.226.13' and '255.10.97.5'
This is way better than string value comparisons.
I've included a temporary table to test the test data
Here's how to convert the ip into a Big Integer:
Declare #TestData TABLE
(
IP varchar(20)
);
INSERT INTO #TestData (IP) select '255.10.97.5'
INSERT INTO #TestData (IP) select '113.7.99.253'
INSERT INTO #TestData (IP) select '96.25.111.85'
Declare #NewData TABLE
(
IP varchar(20),
IPint bigint
);
INSERT INTO #NewData (IP, IPint)
select t1.IP,
(( convert(bigint,parsename(t1.ip, 4))*power(256,3)) + (parsename(t1.ip, 3) * power(256,2) ) + (parsename(t1.ip, 2) * power(256,1)) + (parsename(t1.ip, power(256,0) ))) as IPint
from #TestData as t1
select n1.* from #NewData as n1
The result is:
IP
IPint
255.10.97.5
4278870277
110.226.96.187
1860329659
86.96.226.13
1449189901
Turns out the best approach was actually to create a BTREE index on just the start_ip field after converting ip columns to INET. Then the query was a simple check of <= or BETWEEN.
Related
I want to index my tables for the following query:
select
t.*
from main_transaction t
left join main_profile profile on profile.id = t.profile_id
left join main_customer customer on (customer.id = profile.user_id)
where
(upper(t.request_no) like upper(('%'||#requestNumber||'%')) or OR upper(c.phone) LIKE upper(concat('%',||#phoneNumber||,'%')))
and t.service_type = 'SERVICE_1'
and t.status = 'SUCCESS'
and t.mode = 'AUTO'
and t.transaction_type = 'WITHDRAW'
and customer.client = 'corp'
and t.pub_date>='2018-09-05' and t.pub_date<='2018-11-05'
order by t.pub_date desc, t.id asc
LIMIT 1000;
This is how I tried to index my tables:
CREATE INDEX main_transaction_pr_id ON main_transaction (profile_id);
CREATE INDEX main_profile_user_id ON main_profile (user_id);
CREATE INDEX main_customer_client ON main_customer (client);
CREATE INDEX main_transaction_gin_req_no ON main_transaction USING gin (upper(request_no) gin_trgm_ops);
CREATE INDEX main_customer_gin_phone ON main_customer USING gin (upper(phone) gin_trgm_ops);
CREATE INDEX main_transaction_general ON main_transaction (service_type, status, mode, transaction_type); --> don't know if this one is true!!
After indexing like above my query is spending over 4.5 seconds for just selecting 1000 rows!
I am selecting from the following table which has 34 columns including 3 FOREIGN KEYs and it has over 3 million data rows:
CREATE TABLE main_transaction (
id integer NOT NULL DEFAULT nextval('main_transaction_id_seq'::regclass),
description character varying(255) NOT NULL,
request_no character varying(18),
account character varying(50),
service_type character varying(50),
pub_date" timestamptz(6) NOT NULL,
"service_id" varchar(50) COLLATE "pg_catalog"."default",
....
);
I am also joining two tables (main_profile, main_customer) for searching customer.phone and for selecting customer.client. To get to the main_customer table from main_transaction table, I can only go by main_profile
My question is how can I index my table too increase performance for above query?
Please, do not use UNION for OR for this case (upper(t.request_no) like upper(('%'||#requestNumber||'%')) or OR upper(c.phone) LIKE upper(concat('%',||#phoneNumber||,'%'))) instead can we use case when condition? Because, I have to convert my PostgreSQL query into Hibernate JPA! And I don't know how to convert UNION except Hibernate - Native SQL which I am not allowed to use.
Explain:
Limit (cost=411601.73..411601.82 rows=38 width=1906) (actual time=3885.380..3885.381 rows=1 loops=1)
-> Sort (cost=411601.73..411601.82 rows=38 width=1906) (actual time=3885.380..3885.380 rows=1 loops=1)
Sort Key: t.pub_date DESC, t.id
Sort Method: quicksort Memory: 27kB
-> Hash Join (cost=20817.10..411600.73 rows=38 width=1906) (actual time=3214.473..3885.369 rows=1 loops=1)
Hash Cond: (t.profile_id = profile.id)
Join Filter: ((upper((t.request_no)::text) ~~ '%20181104-2158-2723948%'::text) OR (upper((customer.phone)::text) ~~ '%20181104-2158-2723948%'::text))
Rows Removed by Join Filter: 593118
-> Seq Scan on main_transaction t (cost=0.00..288212.28 rows=205572 width=1906) (actual time=0.068..1527.677 rows=593119 loops=1)
Filter: ((pub_date >= '2016-09-05 00:00:00+05'::timestamp with time zone) AND (pub_date <= '2018-11-05 00:00:00+05'::timestamp with time zone) AND ((service_type)::text = 'SERVICE_1'::text) AND ((status)::text = 'SUCCESS'::text) AND ((mode)::text = 'AUTO'::text) AND ((transaction_type)::text = 'WITHDRAW'::text))
Rows Removed by Filter: 2132732
-> Hash (cost=17670.80..17670.80 rows=180984 width=16) (actual time=211.211..211.211 rows=181516 loops=1)
Buckets: 131072 Batches: 4 Memory Usage: 3166kB
-> Hash Join (cost=6936.09..17670.80 rows=180984 width=16) (actual time=46.846..183.689 rows=181516 loops=1)
Hash Cond: (customer.id = profile.user_id)
-> Seq Scan on main_customer customer (cost=0.00..5699.73 rows=181106 width=16) (actual time=0.013..40.866 rows=181618 loops=1)
Filter: ((client)::text = 'corp'::text)
Rows Removed by Filter: 16920
-> Hash (cost=3680.04..3680.04 rows=198404 width=8) (actual time=46.087..46.087 rows=198404 loops=1)
Buckets: 131072 Batches: 4 Memory Usage: 2966kB
-> Seq Scan on main_profile profile (cost=0.00..3680.04 rows=198404 width=8) (actual time=0.008..20.099 rows=198404 loops=1)
Planning time: 0.757 ms
Execution time: 3885.680 ms
With the restriction to not use UNION, you won't get a good plan.
You can slightly speed up processing with the following indexes:
main_transaction ((service_type::text), (status::text), (mode::text),
(transaction_type::text), pub_date)
main_customer ((client::text))
These should at least get rid of the sequential scans, but the hash join that takes the lion's share of the processing time will remain.
I have two tables:
table_1 with ~1 million lines, with columns id_t1: integer, c1_t1: varchar, etc.
table_2 with ~50 million lines, with columns id_t2: integer, ref_id_t1: integer, c1_t2: varchar, etc.
ref_id_t1 is filled with id_t1 values , however they are not linked by a foreign key as table_2 doesn't know about table_1.
I need to do a request on both table like the following:
SELECT * FROM table_1 t1 WHERE t1.c1_t1= 'A' AND t1.id_t1 IN
(SELECT t2.ref_id_t1 FROM table_2 t2 WHERE t2.c1_t2 LIKE '%abc%');
Without any change or with basic indexes the request takes about a minute to complete as a sequencial scan is peformed on table_2. To prevent this I created a GIN idex with gin_trgm_ops option:
CREATE EXTENSION pg_trgm;
CREATE INDEX c1_t2_gin_index ON table_2 USING gin (c1_t2, gin_trgm_ops);
However this does not solve the problem as the inner request still takes a very long time.
EXPLAIN ANALYSE SELECT t2.ref_id_t1 FROM table_2 t2 WHERE t2.c1_t2 LIKE '%abc%'
Gives the following
Bitmap Heap Scan on table_2 t2 (cost=664.20..189671.00 rows=65058 width=4) (actual time=5101.286..22854.838 rows=69631 loops=1)
Recheck Cond: ((c1_t2 )::text ~~ '%1.1%'::text)
Rows Removed by Index Recheck: 49069703
Heap Blocks: exact=611548
-> Bitmap Index Scan on gin_trg (cost=0.00..647.94 rows=65058 width=0) (actual time=4911.125..4911.125 rows=49139334 loops=1)
Index Cond: ((c1_t2)::text ~~ '%1.1%'::text)
Planning time: 0.529 ms
Execution time: 22863.017 ms
The Bitmap Index Scan is fast, but as we need t2.ref_id_t1 PostgreSQL needs to perform an Bitmap Heap Scan which is not quick on 65000 lines of data.
The solution to avoid the Bitmap Heap Scan would be to perform an Index Only Scan. This is possible using multiple column with btree indexes, see https://www.postgresql.org/docs/9.6/static/indexes-index-only-scans.html
If I change the request like to search the begining of c1_t2, even with the inner request returning 90000 lines, and if I create a btree index on c1_t2 and ref_id_t1 the request takes just over a second.
CREATE INDEX c1_t2_ref_id_t1_index
ON table_2 USING btree
(c1_t2 varchar_pattern_ops ASC NULLS LAST, ref_id_t1 ASC NULLS LAST)
EXPLAIN ANALYSE SELECT * FROM table_1 t1 WHERE t1.c1_t1= 'A' AND t1.id_t1 IN
(SELECT t2.ref_id_t1 FROM table_2 t2 WHERE t2.c1_t2 LIKE 'aaa%');
Hash Join (cost=56561.99..105233.96 rows=1 width=2522) (actual time=953.647..1068.488 rows=36 loops=1)
Hash Cond: (t1.id_t1 = t2.ref_id_t1)
-> Seq Scan on table_1 t1 (cost=0.00..48669.65 rows=615 width=2522) (actual time=0.088..667.576 rows=790 loops=1)
Filter: (c1_t1 = 'A')
Rows Removed by Filter: 1083798
-> Hash (cost=56553.74..56553.74 rows=660 width=4) (actual time=400.657..400.657 rows=69632 loops=1)
Buckets: 131072 (originally 1024) Batches: 1 (originally 1) Memory Usage: 3472kB
-> HashAggregate (cost=56547.14..56553.74 rows=660 width=4) (actual time=380.280..391.871 rows=69632 loops=1)
Group Key: t2.ref_id_t1
-> Index Only Scan using c1_t2_ref_id_t1_index on table_2 t2 (cost=0.56..53907.28 rows=1055943 width=4) (actual time=0.014..202.034 rows=974737 loops=1)
Index Cond: ((c1_t2 ~>=~ 'aaa'::text) AND (c1_t2 ~<~ 'chb'::text))
Filter: ((c1_t2 )::text ~~ 'aaa%'::text)
Heap Fetches: 0
Planning time: 1.512 ms
Execution time: 1069.712 ms
However this is not possible with gin indexes, as these indexes don't store all data in the key.
Is there a way to use pg_trmg like extension with btree index so we can have index only scan with LIKE '%abc%' requests?
I'm trying to understand such a huge difference in performance of two queries.
Let's assume I have two tables.
First one contains A records for some set of domains:
Table "public.dns_a"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
name | character varying(125) | | extended | |
a | inet | | main | |
Indexes:
"dns_a_a_idx" btree (a)
"dns_a_name_idx" btree (name varchar_pattern_ops)
Second table handles CNAME records:
Table "public.dns_cname"
Column | Type | Modifiers | Storage | Stats target | Description
--------+------------------------+-----------+----------+--------------+-------------
name | character varying(256) | | extended | |
cname | character varying(256) | | extended | |
Indexes:
"dns_cname_cname_idx" btree (cname varchar_pattern_ops)
"dns_cname_name_idx" btree (name varchar_pattern_ops)
Now I'm trying to solve "simple" problem with getting all the domains pointing to the same IP address, including CNAME.
The first attempt to use CTE works kind of fine:
EXPLAIN ANALYZE WITH RECURSIVE names_traverse AS (
(
SELECT name::varchar(256), NULL::varchar(256) as cname, a FROM dns_a WHERE a = '118.145.5.20'
)
UNION ALL
SELECT c.name, c.cname, NULL::inet as a FROM names_traverse nt, dns_cname c WHERE c.cname=nt.name
)
SELECT * FROM names_traverse;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on names_traverse (cost=3051757.20..4337044.86 rows=64264383 width=1064) (actual time=0.037..1697.444 rows=199 loops=1)
CTE names_traverse
-> Recursive Union (cost=0.57..3051757.20 rows=64264383 width=45) (actual time=0.036..1697.395 rows=199 loops=1)
-> Index Scan using dns_a_a_idx on dns_a (cost=0.57..1988.89 rows=1953 width=24) (actual time=0.035..0.064 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Merge Join (cost=4377.00..176448.06 rows=6426243 width=45) (actual time=498.101..848.648 rows=92 loops=2)
Merge Cond: ((c.cname)::text = (nt.name)::text)
-> Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2)
-> Materialize (cost=4376.44..4474.09 rows=19530 width=516) (actual time=0.039..0.084 rows=187 loops=2)
-> Sort (cost=4376.44..4425.27 rows=19530 width=516) (actual time=0.037..0.053 rows=100 loops=2)
Sort Key: nt.name USING ~<~
Sort Method: quicksort Memory: 33kB
-> WorkTable Scan on names_traverse nt (cost=0.00..390.60 rows=19530 width=516) (actual time=0.001..0.007 rows=100 loops=2)
Planning time: 0.130 ms
Execution time: 1697.477 ms
(15 rows)
There are two loops in the example above, so if I make a simple outer join query, I get much better results:
EXPLAIN ANALYZE
SELECT *
FROM dns_a a
LEFT JOIN dns_cname c1 ON (c1.cname=a.name)
LEFT JOIN dns_cname c2 ON (c2.cname=c1.name)
WHERE a.a='118.145.5.20';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=1.68..65674.19 rows=1953 width=114) (actual time=1.086..12.992 rows=189 loops=1)
-> Nested Loop Left Join (cost=1.12..46889.57 rows=1953 width=69) (actual time=1.085..2.154 rows=189 loops=1)
-> Index Scan using dns_a_a_idx on dns_a a (cost=0.57..1988.89 rows=1953 width=24) (actual time=0.022..0.055 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Index Scan using dns_cname_cname_idx on dns_cname c1 (cost=0.56..19.70 rows=329 width=45) (actual time=0.137..0.148 rows=13 loops=14)
Index Cond: ((cname)::text = (a.name)::text)
-> Index Scan using dns_cname_cname_idx on dns_cname c2 (cost=0.56..6.33 rows=329 width=45) (actual time=0.057..0.057 rows=0 loops=189)
Index Cond: ((cname)::text = (c1.name)::text)
Planning time: 0.452 ms
Execution time: 13.012 ms
(10 rows)
Time: 13.787 ms
So, the performance difference is about 100 times and that's the thing that worries me.
I like the convenience of recursive CTE and prefer to use it instead of doing dirty tricks on application side, but I don't get why the cost of Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2) is so high.
Am I missing something important regarding CTE or the issue is with something else?
Thanks!
Update: A friend of mine spotted the number of affected rows I missed Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..69958.06 rows=2268434 width=45) (actual time=4.732..688.456 rows=2219973 loops=2), it equals total number of rows in the table and, if I understand correctly, it performs full index scan without condition and I don't get where condition is missed.
Result: After applying SET LOCAL enable_mergejoin TO false; execution time is much, much better.
EXPLAIN ANALYZE WITH RECURSIVE names_traverse AS (
(
SELECT name::varchar(256), NULL::varchar(256) as cname, a FROM dns_a WHERE a = '118.145.5.20'
)
UNION ALL
SELECT c.name, c.cname, NULL::inet as a FROM names_traverse nt, dns_cname c WHERE c.cname=nt.name
)
SELECT * FROM names_traverse;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on names_traverse (cost=4746432.42..6527720.02 rows=89064380 width=1064) (actual time=0.718..45.656 rows=199 loops=1)
CTE names_traverse
-> Recursive Union (cost=0.57..4746432.42 rows=89064380 width=45) (actual time=0.717..45.597 rows=199 loops=1)
-> Index Scan using dns_a_a_idx on dns_a (cost=0.57..74.82 rows=2700 width=24) (actual time=0.716..0.717 rows=14 loops=1)
Index Cond: (a = '118.145.5.20'::inet)
-> Nested Loop (cost=0.56..296507.00 rows=8906168 width=45) (actual time=11.276..22.418 rows=92 loops=2)
-> WorkTable Scan on names_traverse nt (cost=0.00..540.00 rows=27000 width=516) (actual time=0.000..0.013 rows=100 loops=2)
-> Index Scan using dns_cname_cname_idx on dns_cname c (cost=0.56..7.66 rows=330 width=45) (actual time=0.125..0.225 rows=1 loops=199)
Index Cond: ((cname)::text = (nt.name)::text)
Planning time: 0.253 ms
Execution time: 45.697 ms
(11 rows)
The first query is slow because of the index scan, as you noted.
The plan has to scan the complete index in order to get dns_cname sorted by cname, which is needed for the merge join. A merge join requires that both input tables are sorted by the join key, which can either be done with an index scan over the complete table (as in this case) or by a sequential scan followed by an explicit sort.
You will notice that the planner grossly overestimates all row counts for the CTE evaluation, which is probably the root of the problem. For fewer rows, PostgreSQL might choose a nested loop join which would not have to scan the whole table dns_cname.
That may be fixable or not. One thing that I can see immediately is that the estimate for the initial value '118.145.5.20' is too high by a factor 139.5, which is pretty bad. You might fix that by running ANALYZE on dns_cname, perhaps after increasing the statistics target for the column:
ALTER TABLE dns_a ALTER a SET STATISTICS 1000;
See if that makes a difference.
If that doesn't do the trick, you can manually set enable_mergejoin and enable_hashjoin to off and see if a plan with a nested loop join is really better or not. If you can get away with changing these parameters for this one statement only (probably with SET LOCAL) and get a better result that way, that is another option you have.
In short: Distinct,Min,Max on the Left hand side of a Left Join, should be answerable without doing the join.
I’m using a SQL array type (on Postgres 9.3) to condense several rows of data in to a single row, and then a view to return the unnested normalized view. I do this to save on index costs, as well as to get Postgres to compress the data in the array.
Things work pretty well, but some queries that could be answered without unnesting and materializing/exploding the view are quite expensive because they are deferred till after the view is materialized. Is there any way to solve this?
Here is the basic table:
CREATE TABLE mt_count_by_day
(
run_id integer NOT NULL,
type character varying(64) NOT NULL,
start_day date NOT NULL,
end_day date NOT NULL,
counts bigint[] NOT NULL,
CONSTRAINT mt_count_by_day_pkey PRIMARY KEY (run_id, type),
)
An index on ‘type’ just for good measure:
CREATE INDEX runinfo_mt_count_by_day_type_idx on runinfo.mt_count_by_day (type);
Here is the view that uses generate_series and unnest
CREATE OR REPLACE VIEW runinfo.v_mt_count_by_day AS
SELECT mt_count_by_day.run_id,
mt_count_by_day.type,
mt_count_by_day.brand,
generate_series(mt_count_by_day.start_day::timestamp without time zone, mt_count_by_day.end_day - '1 day'::interval, '1 day'::interval) AS row_date,
unnest(mt_count_by_day.counts) AS row_count
FROM runinfo.mt_count_by_day;
What if I want to do distinct on the ‘type' column?
explain analyze select distinct(type) from mt_count_by_day;
"HashAggregate (cost=9566.81..9577.28 rows=1047 width=19) (actual time=171.653..172.019 rows=1221 loops=1)"
" -> Seq Scan on mt_count_by_day (cost=0.00..9318.25 rows=99425 width=19) (actual time=0.089..99.110 rows=99425 loops=1)"
"Total runtime: 172.338 ms"
Now what happens if I do the same on the view?
explain analyze select distinct(type) from v_mt_count_by_day;
"HashAggregate (cost=1749752.88..1749763.34 rows=1047 width=19) (actual time=58586.934..58587.191 rows=1221 loops=1)"
" -> Subquery Scan on v_mt_count_by_day (cost=0.00..1501190.38 rows=99425000 width=19) (actual time=0.114..37134.349 rows=68299959 loops=1)"
" -> Seq Scan on mt_count_by_day (cost=0.00..506940.38 rows=99425000 width=597) (actual time=0.113..24907.147 rows=68299959 loops=1)"
"Total runtime: 58587.474 ms"
Is there a way to get postgres to recognize that it can solve this without first exploding the view?
Here we can see for comparison we are counting the number of rows matching criteria in the table vs the view. Everything works as expected. Postgres filters down the rows before materializing the view. Not quite the same, but this property is what makes our data more manageable.
explain analyze select count(*) from mt_count_by_day where type = ’SOCIAL_GOOGLE'
"Aggregate (cost=157.01..157.02 rows=1 width=0) (actual time=0.538..0.538 rows=1 loops=1)"
" -> Bitmap Heap Scan on mt_count_by_day (cost=4.73..156.91 rows=40 width=0) (actual time=0.139..0.509 rows=122 loops=1)"
" Recheck Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
" -> Bitmap Index Scan on runinfo_mt_count_by_day_type_idx (cost=0.00..4.72 rows=40 width=0) (actual time=0.098..0.098 rows=122 loops=1)"
" Index Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
"Total runtime: 0.625 ms"
explain analyze select count(*) from v_mt_count_by_day where type = 'SOCIAL_GOOGLE'
"Aggregate (cost=857.11..857.12 rows=1 width=0) (actual time=6.827..6.827 rows=1 loops=1)"
" -> Bitmap Heap Scan on mt_count_by_day (cost=4.73..357.11 rows=40000 width=597) (actual time=0.124..5.294 rows=15916 loops=1)"
" Recheck Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
" -> Bitmap Index Scan on runinfo_mt_count_by_day_type_idx (cost=0.00..4.72 rows=40 width=0) (actual time=0.082..0.082 rows=122 loops=1)"
" Index Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
"Total runtime: 6.885 ms"
Here is the code required to reproduce this:
CREATE TABLE base_table
(
run_id integer NOT NULL,
type integer NOT NULL,
start_day date NOT NULL,
end_day date NOT NULL,
counts bigint[] NOT NULL
CONSTRAINT match_check CHECK (end_day > start_day AND (end_day - start_day) = array_length(counts, 1)),
CONSTRAINT base_table_pkey PRIMARY KEY (run_id, type)
);
--Just because...
CREATE INDEX base_type_idx on base_table (type);
CREATE OR REPLACE VIEW v_foo AS
SELECT m.run_id,
m.type,
t.row_date::date,
t.row_count
FROM base_table m
LEFT JOIN LATERAL ROWS FROM (
unnest(m.counts),
generate_series(m.start_day, m.end_day-1, interval '1d')
) t(row_count, row_date) ON true;
insert into base_table
select a.run_id, a.type, '20120101'::date as start_day, '20120401'::date as end_day, b.counts from (SELECT N AS run_id, L as type
FROM
generate_series(1, 10000) N
CROSS JOIN
generate_series(1, 7) L
ORDER BY N, L) a, (SELECT array_agg(generate_series)::bigint[] as counts FROM generate_series(1, 91) ) b
And the results on 9.4.1:
explain analyze select distinct type from base_table;
"HashAggregate (cost=6750.00..6750.03 rows=3 width=4) (actual time=51.939..51.940 rows=3 loops=1)"
" Group Key: type"
" -> Seq Scan on base_table (cost=0.00..6600.00 rows=60000 width=4) (actual time=0.030..33.655 rows=60000 loops=1)"
"Planning time: 0.086 ms"
"Execution time: 51.975 ms"
explain analyze select distinct type from v_foo;
"HashAggregate (cost=1356600.01..1356600.04 rows=3 width=4) (actual time=9215.630..9215.630 rows=3 loops=1)"
" Group Key: m.type"
" -> Nested Loop Left Join (cost=0.01..1206600.01 rows=60000000 width=4) (actual time=0.112..7834.094 rows=5460000 loops=1)"
" -> Seq Scan on base_table m (cost=0.00..6600.00 rows=60000 width=764) (actual time=0.009..42.694 rows=60000 loops=1)"
" -> Function Scan on t (cost=0.01..10.01 rows=1000 width=0) (actual time=0.091..0.111 rows=91 loops=60000)"
"Planning time: 0.132 ms"
"Execution time: 9215.686 ms"
Generally, the Postgres query planner does "inline" views to optimize the whole query. Per documentation:
One application of the rewrite system is in the realization of views.
Whenever a query against a view (i.e., a virtual table) is made, the
rewrite system rewrites the user's query to a query that accesses the
base tables given in the view definition instead.
But I don't think Postgres is smart enough to conclude that it can reach the same result from the base table without exploding rows.
You can try this alternative query with a LATERAL join. It's cleaner:
CREATE OR REPLACE VIEW runinfo.v_mt_count_by_day AS
SELECT m.run_id, m.type, m.brand
, m.start_day + c.rn - 1 AS row_date
, c.row_count
FROM runinfo.mt_count_by_day m
LEFT JOIN LATERAL unnest(m.counts) WITH ORDINALITY c(row_count, rn) ON true;
It also makes clear that one of (end_day, start_day) is redundant.
Using LEFT JOIN because that might allow the query planner to ignore the join from your query:
SELECT DISTINCT type FROM v_mt_count_by_day;
Else (with a CROSS JOIN or INNER JOIN) it must evaluate the join to see whether rows from the first table are eliminated.
BTW, it's:
SELECT DISTINCT type ...
not:
SELECT DISTINCT(type) ...
Note that this returns a date instead of the timestamp in your original. Easer, and I guess it's what you want anyway?
Requires Postgres 9.3+ Details:
PostgreSQL unnest() with element number
ROWS FROM in Postgres 9.4+
To explode both columns in parallel safely:
CREATE OR REPLACE VIEW runinfo.v_mt_count_by_day AS
SELECT m.run_id, m.type, m.brand
t.row_date::date, t.row_count
FROM runinfo.mt_count_by_day m
LEFT JOIN LATERAL ROWS FROM (
unnest(m.counts)
, generate_series(m.start_day, m.end_day, interval '1d')
) t(row_count, row_date) ON true;
The main benefit: This would not derail into a Cartesian product if the two SRF don't return the same number of rows. Instead, NULL values would be padded.
Again, I can't say whether this would help the query planner with a faster plan for DISTINCT type without testing.
I am using postgres 9.2.4.
We have a background job that imports user's emails into our system and stores them in a postgres database table.
Below is the table:
CREATE TABLE emails
(
id serial NOT NULL,
subject text,
body text,
personal boolean,
sent_at timestamp without time zone NOT NULL,
created_at timestamp without time zone,
updated_at timestamp without time zone,
account_id integer NOT NULL,
sender_user_id integer,
sender_contact_id integer,
html text,
folder text,
draft boolean DEFAULT false,
check_for_response timestamp without time zone,
send_time timestamp without time zone,
CONSTRAINT emails_pkey PRIMARY KEY (id),
CONSTRAINT emails_account_id_fkey FOREIGN KEY (account_id)
REFERENCES accounts (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT emails_sender_contact_id_fkey FOREIGN KEY (sender_contact_id)
REFERENCES contacts (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (
OIDS=FALSE
);
ALTER TABLE emails
OWNER TO paulcowan;
-- Index: emails_account_id_index
-- DROP INDEX emails_account_id_index;
CREATE INDEX emails_account_id_index
ON emails
USING btree
(account_id);
-- Index: emails_sender_contact_id_index
-- DROP INDEX emails_sender_contact_id_index;
CREATE INDEX emails_sender_contact_id_index
ON emails
USING btree
(sender_contact_id);
-- Index: emails_sender_user_id_index
-- DROP INDEX emails_sender_user_id_index;
CREATE INDEX emails_sender_user_id_index
ON emails
USING btree
(sender_user_id);
The query is further complicated because I have a view on this table where I pull in other data:
CREATE OR REPLACE VIEW email_graphs AS
SELECT emails.id, emails.subject, emails.body, emails.folder, emails.html,
emails.personal, emails.draft, emails.created_at, emails.updated_at,
emails.sent_at, emails.sender_contact_id, emails.sender_user_id,
emails.addresses, emails.read_by, emails.check_for_response,
emails.send_time, ts.ids AS todo_ids, cs.ids AS call_ids,
ds.ids AS deal_ids, ms.ids AS meeting_ids, c.comments, p.people,
atts.ids AS attachment_ids
FROM emails
LEFT JOIN ( SELECT todos.reference_email_id AS email_id,
array_to_json(array_agg(todos.id)) AS ids
FROM todos
GROUP BY todos.reference_email_id) ts ON ts.email_id = emails.id
LEFT JOIN ( SELECT calls.reference_email_id AS email_id,
array_to_json(array_agg(calls.id)) AS ids
FROM calls
GROUP BY calls.reference_email_id) cs ON cs.email_id = emails.id
LEFT JOIN ( SELECT deals.reference_email_id AS email_id,
array_to_json(array_agg(deals.id)) AS ids
FROM deals
GROUP BY deals.reference_email_id) ds ON ds.email_id = emails.id
LEFT JOIN ( SELECT meetings.reference_email_id AS email_id,
array_to_json(array_agg(meetings.id)) AS ids
FROM meetings
GROUP BY meetings.reference_email_id) ms ON ms.email_id = emails.id
LEFT JOIN ( SELECT comments.email_id,
array_to_json(array_agg(( SELECT row_to_json(r.*) AS row_to_json
FROM ( VALUES (comments.id,comments.text,comments.author_id,comments.created_at,comments.updated_at)) r(id, text, author_id, created_at, updated_at)))) AS comments
FROM comments
WHERE comments.email_id IS NOT NULL
GROUP BY comments.email_id) c ON c.email_id = emails.id
LEFT JOIN ( SELECT email_participants.email_id,
array_to_json(array_agg(( SELECT row_to_json(r.*) AS row_to_json
FROM ( VALUES (email_participants.user_id,email_participants.contact_id,email_participants.kind)) r(user_id, contact_id, kind)))) AS people
FROM email_participants
GROUP BY email_participants.email_id) p ON p.email_id = emails.id
LEFT JOIN ( SELECT attachments.reference_email_id AS email_id,
array_to_json(array_agg(attachments.id)) AS ids
FROM attachments
GROUP BY attachments.reference_email_id) atts ON atts.email_id = emails.id;
ALTER TABLE email_graphs
OWNER TO paulcowan;
We then run paginated queries against this view e.g.
SELECT "email_graphs".* FROM "email_graphs" INNER JOIN "email_participants" ON ("email_participants"."email_id" = "email_graphs"."id") WHERE (("user_id" = 75) AND ("folder" = 'INBOX')) ORDER BY "sent_at" DESC LIMIT 5 OFFSET 0
As the table has grown, queries on this table have dramatically slowed down.
If I run the paginated query with EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT "email_graphs".* FROM "email_graphs" INNER JOIN "email_participants" ON ("email_participants"."email_id" = "email_graphs"."id") WHERE (("user_id" = 75) AND ("folder" = 'INBOX')) ORDER BY "sent_at" DESC LIMIT 5 OFFSET 0;
I get this result
-> Seq Scan on deals (cost=0.00..9.11 rows=36 width=8) (actual time=0.003..0.044 rows=34 loops=1)
-> Sort (cost=5.36..5.43 rows=131 width=36) (actual time=0.416..0.416 rows=1 loops=1)
Sort Key: ms.email_id
Sort Method: quicksort Memory: 26kB
-> Subquery Scan on ms (cost=3.52..4.44 rows=131 width=36) (actual time=0.408..0.411 rows=1 loops=1)
-> HashAggregate (cost=3.52..4.05 rows=131 width=8) (actual time=0.406..0.408 rows=1 loops=1)
-> Seq Scan on meetings (cost=0.00..3.39 rows=131 width=8) (actual time=0.006..0.163 rows=161 loops=1)
-> Sort (cost=18.81..18.91 rows=199 width=36) (actual time=0.012..0.012 rows=0 loops=1)
Sort Key: c.email_id
Sort Method: quicksort Memory: 25kB
-> Subquery Scan on c (cost=15.90..17.29 rows=199 width=36) (actual time=0.007..0.007 rows=0 loops=1)
-> HashAggregate (cost=15.90..16.70 rows=199 width=60) (actual time=0.006..0.006 rows=0 loops=1)
-> Seq Scan on comments (cost=0.00..12.22 rows=736 width=60) (actual time=0.004..0.004 rows=0 loops=1)
Filter: (email_id IS NOT NULL)
Rows Removed by Filter: 2
SubPlan 1
-> Values Scan on "*VALUES*" (cost=0.00..0.00 rows=1 width=56) (never executed)
-> Materialize (cost=4220.14..4883.55 rows=27275 width=36) (actual time=247.720..1189.545 rows=29516 loops=1)
-> GroupAggregate (cost=4220.14..4788.09 rows=27275 width=15) (actual time=247.715..1131.787 rows=29516 loops=1)
-> Sort (cost=4220.14..4261.86 rows=83426 width=15) (actual time=247.634..339.376 rows=82632 loops=1)
Sort Key: public.email_participants.email_id
Sort Method: external sort Disk: 1760kB
-> Seq Scan on email_participants (cost=0.00..2856.28 rows=83426 width=15) (actual time=0.009..88.938 rows=82720 loops=1)
SubPlan 2
-> Values Scan on "*VALUES*" (cost=0.00..0.00 rows=1 width=40) (actual time=0.004..0.005 rows=1 loops=82631)
-> Sort (cost=2.01..2.01 rows=1 width=36) (actual time=0.074..0.077 rows=3 loops=1)
Sort Key: atts.email_id
Sort Method: quicksort Memory: 25kB
-> Subquery Scan on atts (cost=2.00..2.01 rows=1 width=36) (actual time=0.048..0.060 rows=3 loops=1)
-> HashAggregate (cost=2.00..2.01 rows=1 width=8) (actual time=0.045..0.051 rows=3 loops=1)
-> Seq Scan on attachments (cost=0.00..2.00 rows=1 width=8) (actual time=0.013..0.021 rows=5 loops=1)
-> Index Only Scan using email_participants_email_id_user_id_index on email_participants (cost=0.00..990.04 rows=269 width=4) (actual time=1.357..2.886 rows=43 loops=1)
Index Cond: (user_id = 75)
Heap Fetches: 43
Total runtime: 1642.157 ms
(75 rows)
I am definitely not looking for fixes :) or a refactored query. Any sort of hight level advice would be most welcome.
Per my comment, the gist of the issue lies in aggregates that get joined with one another. This prevents the use of indexes, and yields a bunch of merge joins (and a materialize) in your query plan…
Put another way, think of it as a plan so convoluted that Postgres proceeds by materializing temporary tables in memory, and then sorting them repeatedly until they're all merge-joined as appropriate. From where I'm standing, the full hogwash seems to amount to selecting all rows from all tables, and all of their possible relationships. Once it has been figured out and conjured up into existence, Postgres proceeds to sorting the mess in order to extract the top-n rows.
Anyway, you want rewrite the query so it can actually use indexes to begin with.
Part of this is simple. This, for instance, is a big no no:
select …,
ts.ids AS todo_ids, cs.ids AS call_ids,
ds.ids AS deal_ids, ms.ids AS meeting_ids, c.comments, p.people,
atts.ids AS attachment_ids
Fetch the emails in one query. Fetch the related objects in separate queries with a bite-sized email_id in (…) clause. Merely doing that should speed things up quite a bit.
For the rest, it may or may not be simple or involve some re-engineering of your schema. I only scanned through the incomprehensible monster and its gruesome query plan, so I cannot comment for sure.
I think the big view is not likely to ever perform well and that you should break it into more manageable components, but still here are two specific bits of advice that come to mind:
Schema change
Move the text and html bodies out of the main table.
Although large contents get automatically stored in TOAST space, mail parts will often be smaller than the TOAST threshold (~2000 bytes), especially for plain text, so it won't happen systematically.
Each non-TOASTED content inflates the table in a way that is detrimental to I/O performance and caching, if you consider that the primary purpose of the table is to contain the header fields like sender, recipients, date, subject...
I can test this with contents I happen to have in a mail database. On a sample of the 55k mails in my inbox:
average text/plain size: 1511 bytes.
average text/html size: 11895 bytes (but 42395 messages have no html at all)
Size of the mail table without the bodies: 14Mb (no TOAST)
If adding the bodies as 2 more TEXT columns like you have: 59Mb in main storage, 61Mb in TOAST.
Despite TOAST, the main storage appears to be 4 times larger. So when scanning the table without need for the TEXT columns, 80% of the I/Os are wasted. Future row updates are likely to make this worse with the fragmentation effect.
The effect in terms of block reads can be spotted through the pg_statio_all_tables view (compare heap_blks_read + heap_blks_hit before and after the query)
Tuning
This part of the EXPLAIN
Sort Method: external sort Disk: 1760kB
suggests that you work_mem is too small. You don't want to hit the disk for such small sorts. Make it as least 10Mb unless you're low on free memory. While you're at it, set shared_buffers to a reasonable value if it's still the default. See http://wiki.postgresql.org/wiki/Performance_Optimization for more.