Partial gin index does not work with WHERE - postgresql

I have the following table:
CREATE TABLE m2m_entries_n_elements(
entry_id UUID
element_id UUID
value JSONB
)
Value is jsonb object in following format: {<type>: <value>}
And I want to create GIN index only for number values:
CREATE INDEX IF NOT EXISTS idx_element_value_number
ON m2m_entries_n_elements
USING GIN (element_id, CAST(value ->> 'number' AS INT))
WHERE value ? 'number';
But when I use EXPLAIN ANALYZE I see that index does not work:
EXPLAIN ANALYZE
SELECT *
FROM m2m_entries_n_elements WHERE CAST(value ->> 'number' AS INT) = 2;
Seq Scan on m2m_entries_n_elements (cost=0.00..349.02 rows=50 width=89) (actual time=0.013..2.087 rows=1663 loops=1)
Filter: (((value ->> 'number'::text))::integer = 2)
Rows Removed by Filter: 8338
Planning Time: 0.042 ms
Execution Time: 2.150 ms
But if I remove WHERE value ? 'number' from creating the index, it starts working:
Bitmap Heap Scan on m2m_entries_n_elements (cost=6.39..70.29 rows=50 width=89) (actual time=0.284..0.819 rows=1663 loops=1)
Recheck Cond: (((value ->> 'number'::text))::integer = 2)
Heap Blocks: exact=149
-> Bitmap Index Scan on idx_elements (cost=0.00..6.38 rows=50 width=0) (actual time=0.257..0.258 rows=1663 loops=1)
Index Cond: (((value ->> 'number'::text))::integer = 2)
Planning Time: 0.207 ms
Execution Time: 0.922 ms

PostgreSQL does not have a general theorem prover. Maybe you intuit that value ->> 'number' being defined implies that value ? 'number' is true, but PostgreSQL doesn't know that. You would need to explicitly include the ? condition in your query to get use of the index.
But PostgreSQL is smart enough to know that CAST(value ->> 'number' AS INT) = 2 does imply that the LHS can't be null, so if you create the partial index WHERE value ->> 'number' IS NOT NULL then it will get used with no change to your query.

Related

Optimizing a postgres request with arithmetical operators

I have a simple request like this, on a very large table :
(select "table_a"."id",
"table_a"."b_id",
"table_a"."timestamp"
from "table_a"
left outer join "table_b"
on "table_b"."b_id" = "table_a"."b_id"
where ((cast("table_b"."total" ->> 'bar' as int) - coalesce(
(cast("table_b"."ok" ->> 'bar' as int) +
cast("table_b"."ko" ->> 'bar' as int)), 0)) > 0 and coalesce(
(cast("table_b"."ok" ->> 'bar' as int) +
cast("table_b"."ko" ->> 'bar' as int)),
0) > 0)
order by "table_a"."timestamp" desc fetch next 25 rows only)
Problem is it takes quite some time :
Limit (cost=0.84..160.44 rows=25 width=41) (actual time=2267.067..2267.069 rows=0 loops=1)
-> Nested Loop (cost=0.84..124849.43 rows=19557 width=41) (actual time=2267.065..2267.066 rows=0 loops=1)
-> Index Scan using table_a_timestamp_index on table_a (cost=0.42..10523.32 rows=188976 width=33) (actual time=0.011..57.550 rows=188976 loops=1)
-> Index Scan using table_b_b_id_key on table_b (cost=0.42..0.60 rows=1 width=103) (actual time=0.011..0.011 rows=0 loops=188976)
Index Cond: ((b_id)::text = (table_a.b_id)::text)
" Filter: ((COALESCE((((ok ->> 'bar'::text))::integer + ((ko ->> 'bar'::text))::integer), 0) > 0) AND ((((total ->> 'bar'::text))::integer - COALESCE((((ok ->> 'bar'::text))::integer + ((ko ->> 'bar'::text))::integer), 0)) > 0))"
Rows Removed by Filter: 1
Planning Time: 0.411 ms
Execution Time: 2267.135 ms
I tried adding indexes :
create index table_b_bar_total ON "table_b" using BTREE (coalesce(
(cast("table_b"."ok" ->> 'bar' as int) +
cast("table_b"."ko" ->> 'bar' as int)),
0));
create index table_b_bar_remaining ON "table_b" using BTREE
((cast("table_b"."total" ->> 'bar' as int) - coalesce(
(cast("table_b"."ok" ->> 'bar' as int) +
cast("table_b"."ko" ->> 'bar' as int)), 0)));
But it doesn't change anything . How can I make this request run faster ?
Ordinary column indexes don't have their own statistics, as the table's statistics are sufficient for the indexed to be assessed for planning. But expressional indexes have their own statistics collected (on the expression results) whenever the table is analyzed. But a problem is that creating an expressional index does not trigger an autoanalyze to be run on the table, so those needed stats can stay uncollected for a long time. So it is a good idea to manually run ANALYZE after creating an expressional index.
Since your expressions are always compared to zero, it might be better to create one index on the larger expression (including the >0 comparisons and the ANDing of them as part of the indexed expression), rather than two indexes which need to be bitmap-ANDed. Since that expression is a boolean, it might be tempting to create a partial index with it, but I think that that would be a mistake. Unlike expressional indexes, partial indexes do not have statistics collected, and so do not help inform the planner on how many rows will be found.

PostgreSQL 10.4 - how to index jsonb for sql functions not operators?

I have a table named "k3_order" with jsonb column "json_delivery".
Example content of that column is:
{
"delivery_cost": "11.99",
"packageNumbers": [
"0000000596034Q"
]
}
I've created index on json_delivery->'packageNumbers':
CREATE INDEX test_idx ON k3_order USING gin(json_delivery->'packageNumbers');
Now I use this two SQL Queries:
SELECT id, delivery_method_id
FROM k3_order
WHERE jsonb_exists (json_delivery->'packageNumbers', '0000000596034Q');
SELECT id, delivery_method_id
FROM k3_order
WHERE json_delivery->'packageNumbers' ? '0000000596034Q';
The second is faster and using index, but the first doesn't.
Is there any way to create index in PostgreSQL 10.4 in order for query 1) to use it?
Is this even possible in PostgreSQL 10.4 or newer versions?
EXPLAIN ANALYZE SELECT id, delivery_method_id
FROM k3_order
WHERE jsonb_exists (json_delivery->'packageNumbers', > '0000000596034Q');
produces:
Seq Scan on k3_order (cost=0.00..117058.10 rows=216847 width=8 (actual time=162.001..569.863 rows=1 loops=1)
Filter: jsonb_exists((json_delivery -> 'packageNumbers'::text), '0000000596034Q'::text)
Rows Removed by Filter: 650539
Planning time: 0.748 ms
Execution time: 569.886 ms
EXPLAIN ANALYZE SELECT id, delivery_method_id
FROM k3_order
WHERE json_delivery->'packageNumbers' ? '0000000596034Q';
produces:
Bitmap Heap Scan on k3_order (cost=21.04..2479.03 rows=651 width=8) (actual time=0.022..0.022 rows=1 loops=1)
Recheck Cond: ((json_delivery -> 'packageNumbers'::text) ? '0000000596034Q'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on test_idx (cost=0.00..20.88 rows=651 width=0) (actual time=0.016..0.016 rows=1 loops=1)
Index Cond: ((json_delivery -> 'packageNumbers'::text) ? '0000000596034Q'::text)
Planning time: 0.182 ms
Execution time: 0.050 ms
Indexes can only be used by queries in the following cases:
the WHERE condition contains an expression of the form <indexed expression> <operator> <constant>, where
an index has been created on <indexed expression>
<operator> is an operator in the index family of the operator class of the index
<constant> is an expression that stays constant for the duration of the index scan
the ORDER BY clause has the same or the exact opposite ordering as the index definition, and the index access method supports sorting (from v13 on, an index can also be used if it contains the starting columns of the ORDER BY clause)
the PostgreSQL version is v12 and higher, and the WHERE condition contains an expression of the form bool_func(...), where the function returns boolean and has a planner support function.
Now json_delivery->'packageNumbers' ? '0000000596034Q' satisfies the first condition, so an index scan can be used.
jsonb_exists(json_delivery->'packageNumbers', > '0000000596034Q') could only use an index if there were a planner support function for jsonb_exists, but there is none:
SELECT prosupport FROM pg_proc
WHERE proname = 'jsonb_exists';
prosupport
════════════
-
(1 row)

Is Postgres smart and using my is not null index in this query?

I have a index like this on my candidates and their first_name column:
CREATE INDEX ix_public_candidates_first_name_not_null
ON public.candidates (first_name)
WHERE first_name IS NOT NULL;
Is Postgres smart enough to know that an equal operator means it can't be null or am I just lucky that my "is not null" index is used in this query?
select *
from public.candidates
where first_name = 'Erik'
Analyze output:
Bitmap Heap Scan on candidates (cost=57.46..8096.88 rows=2714 width=352) (actual time=1.481..18.847 rows=2460 loops=1)
Recheck Cond: (first_name = 'Erik'::citext)
Heap Blocks: exact=2256
-> Bitmap Index Scan on ix_public_candidates_first_name_not_null (cost=0.00..56.78 rows=2714 width=0) (actual time=1.204..1.204 rows=2460 loops=1)
Index Cond: (first_name = 'Erik'::citext)
Planning time: 0.785 ms
Execution time: 19.340 ms
The PostgreSQL optimizer is not based on lucky guesses.
It can indeed infer that anything that matches an equality condition cannot be NULL; the proof is the execution plan you show.

How can I get an hstore query that searches multiple terms to use indexes?

I have a query that is underperforming, data is an hstore column:
SELECT "vouchers".* FROM "vouchers" WHERE "vouchers"."type" IN ('VoucherType') AND ((data -> 'giver_id')::integer = 1) AND ((data -> 'recipient_email') is NULL)
I've tried adding the following indexes:
CREATE INDEX free_boxes_recipient ON vouchers USING gin ((data->'recipient_email')) WHERE ((data->'recipient_email') IS NULL);
CREATE INDEX voucher_type_giver ON vouchers USING gin ((data->'giver_id')::int)
As well as an overall index: CREATE INDEX voucher_type_data ON vouchers USING gin (data)
Here's the current query plan:
Seq Scan on vouchers (cost=0.00..15158.70 rows=5 width=125) (actual time=122.818..122.818 rows=0 loops=1)
Filter: (((data -> 'recipient_email'::text) IS NULL) AND ((type)::text = 'VoucherType'::text) AND (((data -> 'giver_id'::text))::integer = 1))
Rows Removed by Filter: 335148
Planning time: 0.196 ms
Execution time: 122.860 ms
How can I index this hstore column to get it down to a more reasonable query?
For the documentation:
hstore has GiST and GIN index support for the #>, ?, ?& and ?| operators.
You are searching for an integer value so you can use simple btree index like this:
CREATE INDEX ON vouchers (((data->'giver_id')::int));
EXPLAIN ANALYSE
SELECT *
FROM vouchers
WHERE vtype IN ('VoucherType')
AND (data -> 'giver_id')::integer = 1
AND (data -> 'recipient_email') is NULL;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on vouchers (cost=4.66..82.19 rows=1 width=34) (actual time=0.750..0.858 rows=95 loops=1)
Recheck Cond: (((data -> 'giver_id'::text))::integer = 1)
Filter: (((data -> 'recipient_email'::text) IS NULL) AND (vtype = 'VoucherType'::text))
Heap Blocks: exact=62
-> Bitmap Index Scan on vouchers_int4_idx (cost=0.00..4.66 rows=50 width=0) (actual time=0.018..0.018 rows=95 loops=1)
Index Cond: (((data -> 'giver_id'::text))::integer = 1)
Planning time: 2.115 ms
Execution time: 0.896 ms
(8 rows)

Postgresql COALESCE performance problem

I have this table in Postgresql:
CREATE TABLE my_table
(
id bigint NOT NULL,
value bigint,
CONSTRAINT my_table_pkey PRIMARY KEY (id)
);
There are ~50000 rows in my_table.
The question is, why the query:
SELECT * FROM my_table WHERE id = COALESCE(null, id) and value = ?
is slower than this one:
SELECT * FROM my_table WHERE value = ?
Is there any solution, other than optimizing the query string in app-layer?
EDIT: Practically, the question is how to rewrite the query select * from my_table where id=coalesce(?, id) and value=? to have worst case performance not less than that of select * from my_table where value=? in Postgresql 9.0
Try rewriting the query of the form
SELECT *
FROM my_table
WHERE value = ?
AND (? IS NULL OR id = ?)
From my own quick tests
INSERT INTO my_table select generate_series(1,50000),1;
UPDATE my_table SET value = id%17;
CREATE INDEX val_idx ON my_table(value);
VACUUM ANALYZE my_table;
\set idval 17
\set pval 0
explain analyze
SELECT *
FROM my_table
WHERE value = :pval
AND (:idval IS NULL OR id = :idval);
Index Scan using my_table_pkey on my_table (cost=0.00..8.29 rows=1 width=16) (actual time=0.034..0.035 rows=1 loops=1)
Index Cond: (id = 17)
Filter: (value = 0)
Total runtime: 0.064 ms
\set idval null
explain analyze
SELECT *
FROM my_table
WHERE value = :pval
AND (:idval IS NULL OR id = :idval);
Bitmap Heap Scan on my_table (cost=58.59..635.62 rows=2882 width=16) (actual time=0.373..1.594 rows=2941 loops=1)
Recheck Cond: (value = 0)
-> Bitmap Index Scan on validx (cost=0.00..57.87 rows=2882 width=0) (actual time=0.324..0.324 rows=2941 loops=1)
Index Cond: (value = 0)
Total runtime: 1.811 ms
From creating a similar table, populating it, updating statistics, and finally looking at the output of EXPLAIN ANALYZE, the only difference I see is that the first query filters like this:
Filter: ((id = COALESCE(id)) AND (value = 3))
and the second one filters like this:
Filter: (value = 3)
I see substantially different performance and execution plans when there's an index on the column "value". In the first case
Bitmap Heap Scan on my_table (cost=19.52..552.60 rows=5 width=16) (actual time=19.311..20.679 rows=1000 loops=1)
Recheck Cond: (value = 3)
Filter: (id = COALESCE(id))
-> Bitmap Index Scan on t2 (cost=0.00..19.52 rows=968 width=0) (actual time=19.260..19.260 rows=1000 loops=1)
Index Cond: (value = 3)
Total runtime: 22.138 ms
and in the second
Bitmap Heap Scan on my_table (cost=19.76..550.42 rows=968 width=16) (actual time=0.302..1.293 rows=1000 loops=1)
Recheck Cond: (value = 3)
-> Bitmap Index Scan on t2 (cost=0.00..19.52 rows=968 width=0) (actual time=0.276..0.276 rows=1000 loops=1)
Index Cond: (value = 3)
Total runtime: 2.174 ms
So I'd say it's slower because the db engine a) evaluates the COALESCE() expression rather than optimizing it away, and b) evaluating it involves an additional filter condition.