Table Indexing on PostgreSQL for performance - postgresql

I am solving performance issues on PostgreSQL and I have the following table:
CREATE TABLE main_transaction (
id integer NOT NULL DEFAULT nextval('main_transaction_id_seq'::regclass),
description character varying(255) NOT NULL,
request_no character varying(18),
account character varying(50),
....
)
Above table has 34 columns including 3 FOREIGN KEYs and it has over 1 Million rows data. I have the following conditional SELECT query:
SELECT * FROM main_transaction
WHERE upper(request_no) LIKE upper(concat('%','20080417-0258-0697','%'))
Returning the result in over 2 seconds. I want to decrease working time by using table indexing. So far, I have used btree indexing. However, I didn't notice any fast result. My question is, how can I improve performance for above query?

Your only chance to search for a pattern that begins with % is a trigram index:
CREATE EXTENSION pg_trgm;
CREATE INDEX ON main_transaction
USING gin (upper(request_no) gin_trgm_ops);

Related

Add index and partitioning for Postgress table

I have this table in PostgreSQL database with 6 millions for rows.
CREATE TABLE IF NOT EXISTS public.processed
(
id bigint NOT NULL DEFAULT nextval('processed_id_seq'::regclass),
created_at timestamp without time zone,
word character varying(200) COLLATE pg_catalog."default",
score double precision,
updated_at timestamp without time zone,
is_domain_available boolean,
CONSTRAINT processed_pkey PRIMARY KEY (id),
CONSTRAINT uk_tb03fca6mojpw7wogvaqvwprw UNIQUE (word)
)
I want to optimize it for performance like adding index for column and add partitioning.
Should I add index only for column word or it should be better to add it for several columns.
What is the recommended to partition this table?
Are there other recommended ways like adding compression for example to do some optimization?
First there is no compression, nor columnar indexes in PostGreSQL, like other RBBMS that have those features (as an example Microsoft SQL have 4 ways to compress data without needs to decompress to read or seek, and can use columstore indexes). For columnar indexes you have to go on the Fujistu PG version that cost a lot...
https://www.postgresql.fastware.com/in-memory-columnar-index-brochure
So the only ways you have to accelerates some accesses to seeks on "word" column are :
storing a hash of the word in an additionnal column and use this colums to do searches after having indexed it
effectively use a partitionning with an equilibrate split like sanborn cutter tables
And finally combine the two options.

Convert PostgreSQL JSONB column results for use in condition with IN

I have a table with a JSONB column that is used to store multiple tags (integer) that have been applied to a task, eg.: '[123, 456, 789]'.
ALTER TABLE "public"."task" ADD COLUMN "tags" jsonb;
I also have a table dedicated to storing all the tags that can be used, and the primary key of each record is used in my JSONB column of the task table.
CREATE TABLE public.tag (
tag_id serial NOT NULL,
label varchar(50) NOT NULL,
);
In this table (tag) I have an index based on the task ID, and I want to use this index in a query that returns the tags labels that were used in a task.
SELECT * FROM task, tag WHERE task.tags #> to_jsonb(tag.tag_id)
Using to_jsonb is really bad as it doesn't use my table's index, but if I change the SQL to something like the example below, the index is used and SQL performance is much better.
SELECT * FROM tag WHERE tag.tag_id IN (123, 456, 789)
How do I convert the jsonb column (task table) to a set of integer values ​​that can be used with the IN condition, as in the example below?
SELECT * FROM task, tag WHERE tag.tag_id IN (task.tags);
You can use PostgreSQL jsonb_array_elements function which convert JSON elements to table records. For example:
SELECT * FROM task, tag WHERE tag.tag_id in (
select jsonb_array_elements('[200, 100, 789]'::jsonb)::int4 as json_data
);
But, for best performance, if you get JSON data from the table fields, so you must index this JSON field not use the standard btree index type. For JSON types PostgreSQL has a different index type as GIN index. This index type will give the best performance. I use this index in my table which has a million records. Very very best performance. Example for creating GIN index:
CREATE INDEX tag_table_json_index ON tag_table USING gin (json_field_name jsonb_path_ops);

Postgres UUID array index possible with ANY?

I have a table like:
CREATE TABLE IF NOT EXISTS my_table (
id uuid NOT NULL PRIMARY KEY,
duplicate_ids uuid[] DEFAULT NULL,
);
And my query is:
SELECT * FROM my_table WHERE 'some-uuid'=ANY(duplicate_ids)
Using EXPLAIN ANALYZE and trying lots of different indexes, I am unable to get the above to use an index.
Here's what I've tried (Postgres 12):
CREATE INDEX duplicate_ids_idx ON my_table USING GIN (duplicate_ids);
CREATE INDEX duplicate_ids_idx ON my_table USING GIN (duplicate_ids array_ops);
CREATE INDEX duplicate_ids_idx ON my_table USING BTREE (duplicate_ids);
CREATE INDEX duplicate_ids_idx ON my_table USING BTREE (duplicate_ids array_ops);
I've also ran SET enable_seqscan TO off; before these tests to enforce index usage.
Questions I've read:
Can PostgreSQL index array columns?
Doesn't seem to apply for single values
https://dba.stackexchange.com/questions/125413/index-not-used-with-any-but-used-with-in
Seems to be talking about indexing multiple columns and using IN
Thank you very much for your time.
Question was answered by #a_horse_with_no_name
The solution appears to be to use something like:
SELECT * FROM my_table WHERE duplicate_ids && array['some_uuid']::uuid[]

Should this PostgreSQL query use the indexes?

I have two tables:
CREATE TABLE soils (
sample_id TEXT PRIMARY KEY,
project_id TEXT,
technician_id TEXT
);
CREATE INDEX soils_idx
ON soils
USING btree
(sample_id COLLATE pg_catalog."default");
CREATE TABLE assays (
sample_id TEXT PRIMARY KEY,
mo_ppm NUMERIC
);
CREATE INDEX assays_idx
ON assays
USING btree
(sample_id COLLATE pg_catalog."default");
Each table contains about a half million records, and, in reality, about 20 additional columns each, of type TEXT (omitted in the DDL posted above to save time here).
When I perform the query:
EXPLAIN SELECT
s.sample_id, s.project_id, s.technician_id, a.mo_ppm
FROM
soils AS s INNER JOIN assays AS a ON s.sample_id = a.sample_id
I get 2 SEQ SCANs, rather than a lookup to the index. Is that expected behaviour?
Since you have no WHERE conditions, you effectively read the whole table. It's cheaper to run sequential scans and not involve any indexes at all.
Try:
EXPLAIN
SELECT s.sample_id, s.project_id, s.technician_id, a.mo_ppm
FROM soils s
JOIN assays a USING (sample_id)
WHERE <some condition that returns few rows>;
... and an index matching the WHERE condition should be used.
You don't need to define an index on a PRIMARY KEY column. A PK constraint is implemented with a unique index automatically. Your additional index is redundant and of no use.
An index on a foreign key column would be a good idea, but there isn't one in your example, which looks odd. Like the two tables could be combined into one. Probably just over-simplification for the test case.
Finally, for big tables, I would consider using a simple integer primary key instead of text, possibly a serial column. That's typically faster.
Yes, that's expected behaviour. On the other hand it depends on your random_page_cost, seq_page_cost and effective_cache_size settings. Your query doesn't have WHERE clause hence it might be faster to read everything sequentially. You can try to penalise sequential scan:
set enable_seqscan = off;
explain analyse <your query>;
and then compare plan/cost/IO wait (it is not possible to disable seq-scan but it gets very high cost -- ~1e7 (or 1e8)).
If you have SSD and WHERE clause in your query then you can lower random_page_cost to 1.5..2.5 and encourage PG to use index.

Can you create an index in the CREATE TABLE definition?

I want to add indexes to some of the columns in a table on creation. Is there are way to add them to the CREATE TABLE definition or do I have to add them afterward with another query?
CREATE INDEX reply_user_id ON reply USING btree (user_id);
There doesn't seem to be any way of specifying an index in the CREATE TABLE syntax. PostgreSQL does however create an index for unique constraints and primary keys by default, as described in this note:
PostgreSQL automatically creates an index for each unique constraint and primary key constraint to enforce uniqueness.
Other than that, if you want a non-unique index, you will need to create it yourself in a separate CREATE INDEX query.
No.
However, you can create unique indexes in the create, but that's because they are classed as constraints. You can't create a "general" index.
Peter Krauss is looking for a canonical answer:
There are a MODERN SYNTAX (year 2020), so please explain and show examples, compatible with postgresql.org/docs/current/sql-createtable.html
You are searching for inline index definition, which is not available for PostgreSQL up to current version 12. Except UNIQUE/PRIMARY KEY constraint, that creates underlying index for you.
CREATE TABLE
[ CONSTRAINT constraint_name ]
{ CHECK ( expression ) [ NO INHERIT ] |
UNIQUE ( column_name [, ... ] ) index_parameters |
PRIMARY KEY ( column_name [, ... ] ) index_parameters |
The sample syntax of inline column definition(here SQL Server):
CREATE TABLE tab(
id INT PRIMARY KEY, -- constraint
c INT INDEX filtered (c) WHERE c > 10, -- filtered index
b VARCHAR(10) NOT NULL INDEX idx_tab_b, -- index on column
d VARCHAR(20) NOT NULL,
INDEX my_index NONCLUSTERED(d) -- index on column as separate entry
);
db<>fiddle demo
The rationale behind introducing them is quite interesting What are Inline Indexes? by Phil Factor