postgresql date filed query performance

postgresql date filed query performance - postgresql

i have two table . news table have 7m record and news_publish table have 70m record
when i execute this query took enormous amounts of time and very slow .
i add three index for tuning but query is slow.
when i google this problem i found that someone suggest that change statistics to 1000 and i chage it ,but problem is yet
alter table khb_news alter submitteddate set statistics 1000;
SELECT n.id as newsid ,n.title,p.submitteddate as publishdate,
n.summary ,n.smallImageid ,
n.classification ,n.submitteddate as newsdate,
p.toorganizationid
from khb_news n
join khb_news_publish p
on n.id=p.newsid
left join dataitem b on b.id=n.classification
where
n.classification in (1) and n.newstype=60
AND n.submitteddate >= '2014/06/01'::timestamp AND n.submitteddate <'2014/08/01'::timestamp and p.toorganizationid=123123
order by p.id desc
limit 10 offset 0
indexes is :
CREATE INDEX "p.id"
ON khb_news_publish
USING btree
(id DESC);
CREATE INDEX idx_toorganization
ON khb_news_publish
USING btree
(toorganizationid);
CREATE INDEX "idx_n.classification_n.newstype_n.submitteddate"
ON khb_news
USING btree
(classification, newstype, submitteddate);
after add this indexes and run explain analyze i get this explain
"Limit (cost=0.99..10100.13 rows=10 width=284) (actual time=24711.831..24712.849 rows=10 loops=1)"
" -> Nested Loop (cost=0.99..5946373.12 rows=5888 width=284) (actual time=24711.827..24712.837 rows=10 loops=1)"
" -> Index Scan using "p.id" on khb_news_publish p (cost=0.56..4748906.31 rows=380294 width=32) (actual time=2.068..23338.731 rows=194209 loops=1)"
" Filter: (toorganizationid = 95607)"
" Rows Removed by Filter: 36333074"
" -> Index Scan using khb_news_pkey on khb_news n (cost=0.43..3.14 rows=1 width=260) (actual time=0.006..0.006 rows=0 loops=194209)"
" Index Cond: (id = p.newsid)"
" Filter: ((submitteddate >= '2014-06-01 00:00:00'::timestamp without time zone) AND (submitteddate < '2014-08-01 00:00:00'::timestamp without time zone) AND (newstype = 60) AND (classification = ANY ('{19,20,21}'::bigint[])))"
" Rows Removed by Filter: 1"
"Planning time: 3.871 ms"
"Execution time: 24712.982 ms"
i add explain in https://explain.depesz.com/s/Gym
how can change query to make it faster ??

You should start with creating an index on khb_news_publish(toorganizationid, id)
CREATE INDEX idx_toorganization_id
ON khb_news_publish
USING btree
(toorganizationid, id);
This should fix the problem but you might also need index:
CREATE INDEX idx_id_classification_newstype_submitteddate
ON khb_news
USING btree
(classification, newstype, submitteddate, id);

Related

Select different column in PostgreSQL, different MultiColumn indexes were applied. How to make it choose the expected index

I have a total of over 6,000,000 rows of data in this table, after filtering it by where condition, there are about 120,000 rows of data.
Currently I have created two multi-column indexers
CREATE INDEX "IDX_module_method_height" ON "events" ("module", "method", "block_height")
CREATE INDEX "IDX_module_method" ON "events" ("module", "method")
When I run the following sql and it was really fast
The fast one:
explain analyze Select block_height from events where (module='amm' and method in ('Traded', 'LiquidityAdded')) order by block_height desc limit 500 offset 200;
"Limit (cost=2748.32..2749.57 rows=500 width=4) (actual time=51.207..51.288 rows=500 loops=1)"
" -> Sort (cost=2747.82..2757.85 rows=4010 width=4) (actual time=51.183..51.236 rows=700 loops=1)"
" Sort Key: block_height DESC"
" Sort Method: top-N heapsort Memory: 81kB"
" -> Index Only Scan using ""IDX_module_method_height"" on events (cost=0.56..2538.28 rows=4010 width=4) (actual time=0.061..35.880 rows=128860 loops=1)"
" Index Cond: ((method = ANY ('{Traded,LiquidityAdded}'::text[])) AND (module = 'amm'::text))"
" Heap Fetches: 17403"
"Planning Time: 0.212 ms"
"Execution Time: 51.344 ms"
But when I add one more selected column (eg: data), it is really slow, but i really need data column.
The slow one (just add one more data field in select)
explain analyze Select block_height, data from events where (module='amm' and method in ('Traded', 'LiquidityAdded')) order by block_height desc limit 500 offset 200;
"Limit (cost=14459.53..14460.78 rows=500 width=133) (actual time=12061.968..12062.068 rows=500 loops=1)"
" -> Sort (cost=14459.03..14469.06 rows=4011 width=133) (actual time=12061.935..12062.012 rows=700 loops=1)"
" Sort Key: block_height DESC"
" Sort Method: top-N heapsort Memory: 371kB"
" -> Index Scan using "IDX_module_method" on events (cost=0.43..14249.43 rows=4011 width=133) (actual time=1.302..12014.625 rows=128860 loops=1)"
" Index Cond: (((module)::text = 'amm'::text) AND ((method)::text = ANY ('{Traded,LiquidityAdded}'::text[])))"
"Planning Time: 0.144 ms"
"Execution Time: 12063.364 ms"
Why do select columns affect the selection of indexes, I need data fields and how should I create indexes to make sql efficient?

reason is that it needs to do table lookup. your WHERE clause can be solved by both indices, the smaller is faster. with single field it could do INDEX ONLY SCAN, now it does INDEX SCAN.
you can try to make your index fit "better"
CREATE INDEX "IDX_module_method_height_desc" ON "events" ("module", "method", "block_height" DESC)
if your table have primary key, you could make your table lookup small with index only scan and LIMIT first.
SELECT block_height, data
FROM events
JOIN (
SELECT id
FROM events
WHERE ...
ORDER BY block_height DESC
LIMIT 500 OFFSET 200) x USING (id)
otherwise as others suggested you can extend your index with "include" if you use Postgres 11 or newer
CREATE INDEX "IDX_module_method_height_desc" ON "events" ("module", "method", "block_height" DESC) INCLUDE ("data")

Postgresql max query on big indexed table has slow performance

I have a table inside my Postgresql database, called consumer_actions. It contains all the actions done by consumers registered in my app. At the moment, this table has ~ 500 million records. What i'm trying to do is to get the maximum id, based on the system that the action came from.
The definition of the table is:
CREATE TABLE public.consumer_actions (
id int4 NOT NULL,
system_id int4 NOT NULL,
consumer_id int4 NOT NULL,
action_id int4 NOT NULL,
payload_json jsonb NULL,
external_system_date timestamptz NULL,
local_system_date timestamptz NULL,
CONSTRAINT consumer_actions_pkey PRIMARY KEY (id, system_id)
);
CREATE INDEX consumer_actions_ext_date ON public.consumer_actions USING btree (external_system_date);
CREATE INDEX consumer_actions_system_consumer_id ON public.consumer_actions USING btree (system_id, consumer_id);
when i'm trying
select max(id) from consumer_actions where system_id = 1
it takes less than one second, but if i try to use the same index (consumer_actions_system_consumer_id) to get the max(id) by system_id = 2, it takes more than an hour.
select max(id) from consumer_actions where system_id = 2
I have also checked the query planner, is looks similar for both queries; i also rerun vacuum analyze on the table and a reindex. Neither of them helped. Any idea what i can do to improve the second query time?
Here are the query planners for both tables, and the size at the moment of this table:
explain analyze
select max(id) from consumer_actions where system_id = 1;
Result (cost=1.49..1.50 rows=1 width=4) (actual time=0.062..0.063 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.57..1.49 rows=1 width=4) (actual time=0.057..0.057 rows=1 loops=1)
-> Index Only Scan Backward using consumer_actions_pkey on consumer_actions ca (cost=0.57..524024735.49 rows=572451344 width=4) (actual time=0.055..0.055 rows=1 loops=1)
Index Cond: ((id IS NOT NULL) AND (system_id = 1))
Heap Fetches: 1
Planning Time: 0.173 ms
Execution Time: 0.092 ms
explain analyze
select max(id) from consumer_actions where system_id = 2;
Result (cost=6.46..6.47 rows=1 width=4) (actual time=7099484.855..7099484.858 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.57..6.46 rows=1 width=4) (actual time=7099484.839..7099484.841 rows=1 loops=1)
-> Index Only Scan Backward using consumer_actions_pkey on consumer_actions ca (cost=0.57..20205843.58 rows=3436129 width=4) (actual time=7099484.833..7099484.834 rows=1 loops=1)
Index Cond: ((id IS NOT NULL) AND (system_id = 2))
Heap Fetches: 1
Planning Time: 3.078 ms
Execution Time: 7099484.992 ms
(8 rows)
select count(*) from consumer_actions; --result is 577408504

Instead of using an aggregation function like max() that has to potentially scan and aggregate large numbers of rows for a table like yours you could get similar results with a query designed to return the fewest rows possible:
SELECT id FROM consumer_actions WHERE system_id = ? ORDER BY id DESC LIMIT 1;
This should still benefit significantly in performance from the existing indices.

I think that you should create an index like this one
CREATE INDEX consumer_actions_system_system_id_id ON public.consumer_actions USING btree (system_id, id);

PostgreSQL table indexing

I want to index my tables for the following query:
select
t.*
from main_transaction t
left join main_profile profile on profile.id = t.profile_id
left join main_customer customer on (customer.id = profile.user_id)
where
(upper(t.request_no) like upper(('%'||#requestNumber||'%')) or OR upper(c.phone) LIKE upper(concat('%',||#phoneNumber||,'%')))
and t.service_type = 'SERVICE_1'
and t.status = 'SUCCESS'
and t.mode = 'AUTO'
and t.transaction_type = 'WITHDRAW'
and customer.client = 'corp'
and t.pub_date>='2018-09-05' and t.pub_date<='2018-11-05'
order by t.pub_date desc, t.id asc
LIMIT 1000;
This is how I tried to index my tables:
CREATE INDEX main_transaction_pr_id ON main_transaction (profile_id);
CREATE INDEX main_profile_user_id ON main_profile (user_id);
CREATE INDEX main_customer_client ON main_customer (client);
CREATE INDEX main_transaction_gin_req_no ON main_transaction USING gin (upper(request_no) gin_trgm_ops);
CREATE INDEX main_customer_gin_phone ON main_customer USING gin (upper(phone) gin_trgm_ops);
CREATE INDEX main_transaction_general ON main_transaction (service_type, status, mode, transaction_type); --> don't know if this one is true!!
After indexing like above my query is spending over 4.5 seconds for just selecting 1000 rows!
I am selecting from the following table which has 34 columns including 3 FOREIGN KEYs and it has over 3 million data rows:
CREATE TABLE main_transaction (
id integer NOT NULL DEFAULT nextval('main_transaction_id_seq'::regclass),
description character varying(255) NOT NULL,
request_no character varying(18),
account character varying(50),
service_type character varying(50),
pub_date" timestamptz(6) NOT NULL,
"service_id" varchar(50) COLLATE "pg_catalog"."default",
....
);
I am also joining two tables (main_profile, main_customer) for searching customer.phone and for selecting customer.client. To get to the main_customer table from main_transaction table, I can only go by main_profile
My question is how can I index my table too increase performance for above query?
Please, do not use UNION for OR for this case (upper(t.request_no) like upper(('%'||#requestNumber||'%')) or OR upper(c.phone) LIKE upper(concat('%',||#phoneNumber||,'%'))) instead can we use case when condition? Because, I have to convert my PostgreSQL query into Hibernate JPA! And I don't know how to convert UNION except Hibernate - Native SQL which I am not allowed to use.
Explain:
Limit (cost=411601.73..411601.82 rows=38 width=1906) (actual time=3885.380..3885.381 rows=1 loops=1)
-> Sort (cost=411601.73..411601.82 rows=38 width=1906) (actual time=3885.380..3885.380 rows=1 loops=1)
Sort Key: t.pub_date DESC, t.id
Sort Method: quicksort Memory: 27kB
-> Hash Join (cost=20817.10..411600.73 rows=38 width=1906) (actual time=3214.473..3885.369 rows=1 loops=1)
Hash Cond: (t.profile_id = profile.id)
Join Filter: ((upper((t.request_no)::text) ~~ '%20181104-2158-2723948%'::text) OR (upper((customer.phone)::text) ~~ '%20181104-2158-2723948%'::text))
Rows Removed by Join Filter: 593118
-> Seq Scan on main_transaction t (cost=0.00..288212.28 rows=205572 width=1906) (actual time=0.068..1527.677 rows=593119 loops=1)
Filter: ((pub_date >= '2016-09-05 00:00:00+05'::timestamp with time zone) AND (pub_date <= '2018-11-05 00:00:00+05'::timestamp with time zone) AND ((service_type)::text = 'SERVICE_1'::text) AND ((status)::text = 'SUCCESS'::text) AND ((mode)::text = 'AUTO'::text) AND ((transaction_type)::text = 'WITHDRAW'::text))
Rows Removed by Filter: 2132732
-> Hash (cost=17670.80..17670.80 rows=180984 width=16) (actual time=211.211..211.211 rows=181516 loops=1)
Buckets: 131072 Batches: 4 Memory Usage: 3166kB
-> Hash Join (cost=6936.09..17670.80 rows=180984 width=16) (actual time=46.846..183.689 rows=181516 loops=1)
Hash Cond: (customer.id = profile.user_id)
-> Seq Scan on main_customer customer (cost=0.00..5699.73 rows=181106 width=16) (actual time=0.013..40.866 rows=181618 loops=1)
Filter: ((client)::text = 'corp'::text)
Rows Removed by Filter: 16920
-> Hash (cost=3680.04..3680.04 rows=198404 width=8) (actual time=46.087..46.087 rows=198404 loops=1)
Buckets: 131072 Batches: 4 Memory Usage: 2966kB
-> Seq Scan on main_profile profile (cost=0.00..3680.04 rows=198404 width=8) (actual time=0.008..20.099 rows=198404 loops=1)
Planning time: 0.757 ms
Execution time: 3885.680 ms

With the restriction to not use UNION, you won't get a good plan.
You can slightly speed up processing with the following indexes:
main_transaction ((service_type::text), (status::text), (mode::text),
(transaction_type::text), pub_date)
main_customer ((client::text))
These should at least get rid of the sequential scans, but the hash join that takes the lion's share of the processing time will remain.

Query too slow in Postgresql in table with > 12M rows

I have a simple table with more than 12 Million rows growing every time, in my web app.
+-----+-----+------+-------+--------+
| id | dtt | cus | event | server |
-------------------------------------
I'm getting the count of today events by customer using this query
SELECT COUNT(*) FROM events
WHERE dtt AT TIME ZONE 'America/Santiago' >=date(now() AT TIME ZONE 'America/Santiago') + interval '1s'
AND cus=2
And the performance is very bad for my web app : 22702 ms.
"Aggregate (cost=685814.54..685814.55 rows=1 width=0) (actual time=21773.451..21773.452 rows=1 loops=1)"
" -> Seq Scan on events (cost=0.00..675644.52 rows=4068008 width=0) (actual time=10277.508..21732.548 rows=409808 loops=1)"
" Filter: ((cus = 2) AND (timezone('America/Santiago'::text, dtt) >= (date(timezone('America/Santiago'::text, now())) + '00:00:01'::interval)))"
" Rows Removed by Filter: 12077798"
"Planning time: 0.127 ms"
"Execution time: 21773.509 ms"
I have the next Indexes created:
CREATE INDEX events_dtt_idx
ON events
USING btree
(dtt);
CREATE INDEX events_id_desc
ON events
USING btree
(id DESC NULLS LAST);
CREATE INDEX events_cus_idx
ON events
USING btree
(cus);
CREATE INDEX events_id_idx
ON events
USING btree
(id);
Using Postgresql 9.4, Linux x64
How can I improve that? Thanks in advance.

something like:
CREATE INDEX dtt_tz_idx ON events (DATE(dtt AT TIME ZONE 'America/Santiago'));
then query
SELECT COUNT(*) FROM events
WHERE DATE(TIMEZONE('America/Santiago'::text, dtt)) >=date(now() AT TIME ZONE 'America/Santiago') + interval '1s'
AND cus=2
If it doesn't work, try "\d dtt_tz_idx" in psql and try to match the datatypes on your query with the index.

Finally I could fix the problem with that index:
CREATE INDEX dtt_tz_idx ON events (TIMEZONE('America/Santiago'::text, dtt));
Thanks sivan & vyegorov for your guide, now the plan is:
"Aggregate (cost=567240.43..567240.44 rows=1 width=0) (actual time=238.440..238.440 rows=1 loops=1)"
" -> Bitmap Heap Scan on events (cost=82620.28..556463.97 rows=4310584 width=0) (actual time=41.445..208.870 rows=344453 loops=1)"
" Recheck Cond: (timezone('America/Santiago'::text, dtt) >= (date(timezone('America/Santiago'::text, now())) + '00:00:01'::interval))"
" Filter: (cus = 2)"
" Rows Removed by Filter: 9433"
" Heap Blocks: exact=9426"
" -> Bitmap Index Scan on dtt_tz_idx (cost=0.00..81542.63 rows=4415225 width=0) (actual time=38.866..38.866 rows=353886 loops=1)"
" Index Cond: (timezone('America/Santiago'::text, dtt) >= (date(timezone('America/Santiago'::text, now())) + '00:00:01'::interval))"
"Planning time: 0.221 ms"
"Execution time: 238.509 ms"

Is it possible to answer queries on a view before fully materializing the view?

In short: Distinct,Min,Max on the Left hand side of a Left Join, should be answerable without doing the join.
I’m using a SQL array type (on Postgres 9.3) to condense several rows of data in to a single row, and then a view to return the unnested normalized view. I do this to save on index costs, as well as to get Postgres to compress the data in the array.
Things work pretty well, but some queries that could be answered without unnesting and materializing/exploding the view are quite expensive because they are deferred till after the view is materialized. Is there any way to solve this?
Here is the basic table:
CREATE TABLE mt_count_by_day
(
run_id integer NOT NULL,
type character varying(64) NOT NULL,
start_day date NOT NULL,
end_day date NOT NULL,
counts bigint[] NOT NULL,
CONSTRAINT mt_count_by_day_pkey PRIMARY KEY (run_id, type),
)
An index on ‘type’ just for good measure:
CREATE INDEX runinfo_mt_count_by_day_type_idx on runinfo.mt_count_by_day (type);
Here is the view that uses generate_series and unnest
CREATE OR REPLACE VIEW runinfo.v_mt_count_by_day AS
SELECT mt_count_by_day.run_id,
mt_count_by_day.type,
mt_count_by_day.brand,
generate_series(mt_count_by_day.start_day::timestamp without time zone, mt_count_by_day.end_day - '1 day'::interval, '1 day'::interval) AS row_date,
unnest(mt_count_by_day.counts) AS row_count
FROM runinfo.mt_count_by_day;
What if I want to do distinct on the ‘type' column?
explain analyze select distinct(type) from mt_count_by_day;
"HashAggregate (cost=9566.81..9577.28 rows=1047 width=19) (actual time=171.653..172.019 rows=1221 loops=1)"
" -> Seq Scan on mt_count_by_day (cost=0.00..9318.25 rows=99425 width=19) (actual time=0.089..99.110 rows=99425 loops=1)"
"Total runtime: 172.338 ms"
Now what happens if I do the same on the view?
explain analyze select distinct(type) from v_mt_count_by_day;
"HashAggregate (cost=1749752.88..1749763.34 rows=1047 width=19) (actual time=58586.934..58587.191 rows=1221 loops=1)"
" -> Subquery Scan on v_mt_count_by_day (cost=0.00..1501190.38 rows=99425000 width=19) (actual time=0.114..37134.349 rows=68299959 loops=1)"
" -> Seq Scan on mt_count_by_day (cost=0.00..506940.38 rows=99425000 width=597) (actual time=0.113..24907.147 rows=68299959 loops=1)"
"Total runtime: 58587.474 ms"
Is there a way to get postgres to recognize that it can solve this without first exploding the view?
Here we can see for comparison we are counting the number of rows matching criteria in the table vs the view. Everything works as expected. Postgres filters down the rows before materializing the view. Not quite the same, but this property is what makes our data more manageable.
explain analyze select count(*) from mt_count_by_day where type = ’SOCIAL_GOOGLE'
"Aggregate (cost=157.01..157.02 rows=1 width=0) (actual time=0.538..0.538 rows=1 loops=1)"
" -> Bitmap Heap Scan on mt_count_by_day (cost=4.73..156.91 rows=40 width=0) (actual time=0.139..0.509 rows=122 loops=1)"
" Recheck Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
" -> Bitmap Index Scan on runinfo_mt_count_by_day_type_idx (cost=0.00..4.72 rows=40 width=0) (actual time=0.098..0.098 rows=122 loops=1)"
" Index Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
"Total runtime: 0.625 ms"
explain analyze select count(*) from v_mt_count_by_day where type = 'SOCIAL_GOOGLE'
"Aggregate (cost=857.11..857.12 rows=1 width=0) (actual time=6.827..6.827 rows=1 loops=1)"
" -> Bitmap Heap Scan on mt_count_by_day (cost=4.73..357.11 rows=40000 width=597) (actual time=0.124..5.294 rows=15916 loops=1)"
" Recheck Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
" -> Bitmap Index Scan on runinfo_mt_count_by_day_type_idx (cost=0.00..4.72 rows=40 width=0) (actual time=0.082..0.082 rows=122 loops=1)"
" Index Cond: ((type)::text = 'SOCIAL_GOOGLE'::text)"
"Total runtime: 6.885 ms"
Here is the code required to reproduce this:
CREATE TABLE base_table
(
run_id integer NOT NULL,
type integer NOT NULL,
start_day date NOT NULL,
end_day date NOT NULL,
counts bigint[] NOT NULL
CONSTRAINT match_check CHECK (end_day > start_day AND (end_day - start_day) = array_length(counts, 1)),
CONSTRAINT base_table_pkey PRIMARY KEY (run_id, type)
);
--Just because...
CREATE INDEX base_type_idx on base_table (type);
CREATE OR REPLACE VIEW v_foo AS
SELECT m.run_id,
m.type,
t.row_date::date,
t.row_count
FROM base_table m
LEFT JOIN LATERAL ROWS FROM (
unnest(m.counts),
generate_series(m.start_day, m.end_day-1, interval '1d')
) t(row_count, row_date) ON true;
insert into base_table
select a.run_id, a.type, '20120101'::date as start_day, '20120401'::date as end_day, b.counts from (SELECT N AS run_id, L as type
FROM
generate_series(1, 10000) N
CROSS JOIN
generate_series(1, 7) L
ORDER BY N, L) a, (SELECT array_agg(generate_series)::bigint[] as counts FROM generate_series(1, 91) ) b
And the results on 9.4.1:
explain analyze select distinct type from base_table;
"HashAggregate (cost=6750.00..6750.03 rows=3 width=4) (actual time=51.939..51.940 rows=3 loops=1)"
" Group Key: type"
" -> Seq Scan on base_table (cost=0.00..6600.00 rows=60000 width=4) (actual time=0.030..33.655 rows=60000 loops=1)"
"Planning time: 0.086 ms"
"Execution time: 51.975 ms"
explain analyze select distinct type from v_foo;
"HashAggregate (cost=1356600.01..1356600.04 rows=3 width=4) (actual time=9215.630..9215.630 rows=3 loops=1)"
" Group Key: m.type"
" -> Nested Loop Left Join (cost=0.01..1206600.01 rows=60000000 width=4) (actual time=0.112..7834.094 rows=5460000 loops=1)"
" -> Seq Scan on base_table m (cost=0.00..6600.00 rows=60000 width=764) (actual time=0.009..42.694 rows=60000 loops=1)"
" -> Function Scan on t (cost=0.01..10.01 rows=1000 width=0) (actual time=0.091..0.111 rows=91 loops=60000)"
"Planning time: 0.132 ms"
"Execution time: 9215.686 ms"

Generally, the Postgres query planner does "inline" views to optimize the whole query. Per documentation:
One application of the rewrite system is in the realization of views.
Whenever a query against a view (i.e., a virtual table) is made, the
rewrite system rewrites the user's query to a query that accesses the
base tables given in the view definition instead.
But I don't think Postgres is smart enough to conclude that it can reach the same result from the base table without exploding rows.
You can try this alternative query with a LATERAL join. It's cleaner:
CREATE OR REPLACE VIEW runinfo.v_mt_count_by_day AS
SELECT m.run_id, m.type, m.brand
, m.start_day + c.rn - 1 AS row_date
, c.row_count
FROM runinfo.mt_count_by_day m
LEFT JOIN LATERAL unnest(m.counts) WITH ORDINALITY c(row_count, rn) ON true;
It also makes clear that one of (end_day, start_day) is redundant.
Using LEFT JOIN because that might allow the query planner to ignore the join from your query:
SELECT DISTINCT type FROM v_mt_count_by_day;
Else (with a CROSS JOIN or INNER JOIN) it must evaluate the join to see whether rows from the first table are eliminated.
BTW, it's:
SELECT DISTINCT type ...
not:
SELECT DISTINCT(type) ...
Note that this returns a date instead of the timestamp in your original. Easer, and I guess it's what you want anyway?
Requires Postgres 9.3+ Details:
PostgreSQL unnest() with element number
ROWS FROM in Postgres 9.4+
To explode both columns in parallel safely:
CREATE OR REPLACE VIEW runinfo.v_mt_count_by_day AS
SELECT m.run_id, m.type, m.brand
t.row_date::date, t.row_count
FROM runinfo.mt_count_by_day m
LEFT JOIN LATERAL ROWS FROM (
unnest(m.counts)
, generate_series(m.start_day, m.end_day, interval '1d')
) t(row_count, row_date) ON true;
The main benefit: This would not derail into a Cartesian product if the two SRF don't return the same number of rows. Instead, NULL values would be padded.
Again, I can't say whether this would help the query planner with a faster plan for DISTINCT type without testing.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

postgresql date filed query performance - postgresql

Related

Select different column in PostgreSQL, different MultiColumn indexes were applied. How to make it choose the expected index

Postgresql max query on big indexed table has slow performance

PostgreSQL table indexing

Query too slow in Postgresql in table with > 12M rows

Is it possible to answer queries on a view before fully materializing the view?

Categories

Resources