Is it possible to mix btree and gist in a Postgres index? - postgresql

I have a table built like this:
create table pois (
id varchar(32) primary key,
location geography(Point,4326),
category varchar(32),
entity_class varchar(1),
hide boolean
);
on which most queries look like this:
SELECT * from pois
WHERE ( (
ST_contains(st_buffer(ST_SetSRID(ST_LineFromEncodedPolyline('ifp_Ik_vpAfj~FrehHhtxAhdaDpxbCnoa#~g|Ay_['), 4326)::geography, 5000)::geometry, location::geometry)
AND ST_Distance(location, ST_LineFromEncodedPolyline('ifp_Ik_vpAfj~FrehHhtxAhdaDpxbCnoa#~g|Ay_[')) < 5000
AND hide = false
AND entity_class in ('A', 'B')
) );
currently I have two indexes. one on location "pois_location_idx" gist (location) and one on hide and entity_class: "pois_select_idx" btree (hide, entity_class)
Performance is acceptable but I am wondering if there is a better indexing strategy, and specifically if it is possible and makes sense to have mixed btree + gist indexes.

You can use the operator classes from the btree_gist extension to create a multi-column GiST index:
CREATE EXTENSION btree_gist;
CREATE INDEX ON pois USING gist (location, category);
In your special case I would doubt the usefulness of that, because hide is a boolean and a GiST index cannot support an IN clause.
Perhaps it would be better to create a partial index:
CREATE INDEX ON pois USING gist (location) WHERE NOT hide AND entity_class in ('A', 'B');
Such an index can only be used for queries whose WHERE clause matches that of the index, so it is less universally useful.

Related

Postgres UUID array index possible with ANY?

I have a table like:
CREATE TABLE IF NOT EXISTS my_table (
id uuid NOT NULL PRIMARY KEY,
duplicate_ids uuid[] DEFAULT NULL,
);
And my query is:
SELECT * FROM my_table WHERE 'some-uuid'=ANY(duplicate_ids)
Using EXPLAIN ANALYZE and trying lots of different indexes, I am unable to get the above to use an index.
Here's what I've tried (Postgres 12):
CREATE INDEX duplicate_ids_idx ON my_table USING GIN (duplicate_ids);
CREATE INDEX duplicate_ids_idx ON my_table USING GIN (duplicate_ids array_ops);
CREATE INDEX duplicate_ids_idx ON my_table USING BTREE (duplicate_ids);
CREATE INDEX duplicate_ids_idx ON my_table USING BTREE (duplicate_ids array_ops);
I've also ran SET enable_seqscan TO off; before these tests to enforce index usage.
Questions I've read:
Can PostgreSQL index array columns?
Doesn't seem to apply for single values
https://dba.stackexchange.com/questions/125413/index-not-used-with-any-but-used-with-in
Seems to be talking about indexing multiple columns and using IN
Thank you very much for your time.
Question was answered by #a_horse_with_no_name
The solution appears to be to use something like:
SELECT * FROM my_table WHERE duplicate_ids && array['some_uuid']::uuid[]

Use index to speed up query using values from different tables

I have a table products, a table orders and a table orderProducts.
Products have a name as a PK (apple, banana, mango) and a price .
orders have a created_at date and an id as a PK.
orderProducts connects orders and products, so they have a product_name and an order_id. Now I would like to show all orders for a given product that happened in the last 24 hours.
I use the following query:
SELECT
orders.id,
orders.created_at,
products.name,
products.price
FROM
orderProducts
JOIN products ON
products.name=orderProducts.product
JOIN orders ON
orders.id=orderProducts.order
WHERE
products.name='banana'
AND
orders.created_at BETWEEN NOW() - INTERVAL '24 HOURS' AND NOW()
ORDER BY
orders.created_at
This works, but I would like to optimize this query with an index. This index would need to first be ordered by
the product name, so it can be filtered
then the created_at of the order in descending order, so it can select only the ones from 24 hours ago
The problem is, that from what I have seen, indexes can only be created on a single table, without the possibility of joining another tables values to it. Since two individual index do not solve this problem either, I was wondering if there was an alternative way to optimize this particular query.
Here are the table scripts:
CREATE TABLE products
(
name text PRIMARY KEY,
price integer,
)
CREATE TABLE orders
(
id SERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW(),
)
CREATE TABLE orderProducts
(
product text REFERENCES products(name),
"order" integer REFERENCES orders(id),
)
First of all. Please do not put indices everywhere - that lead to slower changing operations...
As proposed by #Laurenz Albe - do not guess - check.
Other than that. Note that you know product name, price is repeated - so you can query that once. Question if in your case two queries are going to be faster then single one... Check that.
Please read docs. I would try this index:
create index orders_id_created_at on orders(created_at desc, id)
Normally id should go first, since that is unique, however here system should be able to filter out on both predicates - where/join. Just guessing here.
orderProducts I would like to see index on both columns, however for this query only one should be needed. In practice you are going from products to orders, or other way - both paths are possible, that is why I've wrote about indexing both columns. I would use two separate indexes:
create index orderproducts_product_id on orderproducts (product_id) include (order_id);
create index orderproducts_order_id on orderproducts (order_id) include (product_id);
Probably that is not changing much, but... idea is to use only index, but not the table itself.
These rules are important in terms of performance:
Integer index faster than string index, therefore, you should try to make the primary keys always be an integer. Because join the tables uses primary keys too.
If when in where clauses always use two fields then we must create an index for both fields.
Foreign-Keys are not indexed, you must create an index for foreign-key fields manually.
So, recommended table scripts will be are that:
CREATE TABLE products
(
id serial primary key,
name text,
price integer
);
CREATE UNIQUE INDEX products_name_idx ON products USING btree (name);
CREATE TABLE orders
(
id SERIAL PRIMARY KEY,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX orders_created_at_idx ON orders USING btree (created_at);
CREATE TABLE orderProducts
(
product_id integer REFERENCES products(id),
order_id integer REFERENCES orders(id)
);
CREATE INDEX orderproducts_product_id_idx ON orderproducts USING btree (product_id, order_id);
---- OR ----
CREATE INDEX orderproducts_product_id ON orderproducts (product_id);
CREATE INDEX orderproducts_order_id ON orderproducts (order_id);

How to create a multicolumn GiST index in Postgresql

The postgresql documentation specifies that a GiST index can have multiple columns, but does not provide an example of what this might look like.
I have a table that tracks assets owned by different customers.
CREATE TABLE asset (
id serial PRIMARY KEY,
description text NOT NULL,
customer_id uuid NOT NULL
);
I'm writing a query that allows a customer to search for an asset based on words in it's description.
SELECT *
FROM asset
WHERE to_tsvector('english', asset.description) ## plainto_tsvector('english', ?)
AND asset.customer_id = ?;
Were this a non-tsvector query, I'd build a simple multicolumn index
CREATE INDEX idx_name ON asset(customer_id, description);
I can create an index only on the tsvector:
CREATE INDEX idx_name ON asset USING gist(to_tsvector('english', asset.description));
However, the query optimizer doesn't use the gist index, because it seems to want to do customer_id filtering first. Is there a way that I can include the non-tsvector field customer_id in the gist index, somehow, or am I out of luck?
You can create a multi column GIN or GiST index in Postrgres using the following:
CREATE INDEX index_name
ON table_name
USING gist (field_one gist_trgm_ops, field_two gist_trgm_ops);

Should this PostgreSQL query use the indexes?

I have two tables:
CREATE TABLE soils (
sample_id TEXT PRIMARY KEY,
project_id TEXT,
technician_id TEXT
);
CREATE INDEX soils_idx
ON soils
USING btree
(sample_id COLLATE pg_catalog."default");
CREATE TABLE assays (
sample_id TEXT PRIMARY KEY,
mo_ppm NUMERIC
);
CREATE INDEX assays_idx
ON assays
USING btree
(sample_id COLLATE pg_catalog."default");
Each table contains about a half million records, and, in reality, about 20 additional columns each, of type TEXT (omitted in the DDL posted above to save time here).
When I perform the query:
EXPLAIN SELECT
s.sample_id, s.project_id, s.technician_id, a.mo_ppm
FROM
soils AS s INNER JOIN assays AS a ON s.sample_id = a.sample_id
I get 2 SEQ SCANs, rather than a lookup to the index. Is that expected behaviour?
Since you have no WHERE conditions, you effectively read the whole table. It's cheaper to run sequential scans and not involve any indexes at all.
Try:
EXPLAIN
SELECT s.sample_id, s.project_id, s.technician_id, a.mo_ppm
FROM soils s
JOIN assays a USING (sample_id)
WHERE <some condition that returns few rows>;
... and an index matching the WHERE condition should be used.
You don't need to define an index on a PRIMARY KEY column. A PK constraint is implemented with a unique index automatically. Your additional index is redundant and of no use.
An index on a foreign key column would be a good idea, but there isn't one in your example, which looks odd. Like the two tables could be combined into one. Probably just over-simplification for the test case.
Finally, for big tables, I would consider using a simple integer primary key instead of text, possibly a serial column. That's typically faster.
Yes, that's expected behaviour. On the other hand it depends on your random_page_cost, seq_page_cost and effective_cache_size settings. Your query doesn't have WHERE clause hence it might be faster to read everything sequentially. You can try to penalise sequential scan:
set enable_seqscan = off;
explain analyse <your query>;
and then compare plan/cost/IO wait (it is not possible to disable seq-scan but it gets very high cost -- ~1e7 (or 1e8)).
If you have SSD and WHERE clause in your query then you can lower random_page_cost to 1.5..2.5 and encourage PG to use index.

Can you create an index in the CREATE TABLE definition?

I want to add indexes to some of the columns in a table on creation. Is there are way to add them to the CREATE TABLE definition or do I have to add them afterward with another query?
CREATE INDEX reply_user_id ON reply USING btree (user_id);
There doesn't seem to be any way of specifying an index in the CREATE TABLE syntax. PostgreSQL does however create an index for unique constraints and primary keys by default, as described in this note:
PostgreSQL automatically creates an index for each unique constraint and primary key constraint to enforce uniqueness.
Other than that, if you want a non-unique index, you will need to create it yourself in a separate CREATE INDEX query.
No.
However, you can create unique indexes in the create, but that's because they are classed as constraints. You can't create a "general" index.
Peter Krauss is looking for a canonical answer:
There are a MODERN SYNTAX (year 2020), so please explain and show examples, compatible with postgresql.org/docs/current/sql-createtable.html
You are searching for inline index definition, which is not available for PostgreSQL up to current version 12. Except UNIQUE/PRIMARY KEY constraint, that creates underlying index for you.
CREATE TABLE
[ CONSTRAINT constraint_name ]
{ CHECK ( expression ) [ NO INHERIT ] |
UNIQUE ( column_name [, ... ] ) index_parameters |
PRIMARY KEY ( column_name [, ... ] ) index_parameters |
The sample syntax of inline column definition(here SQL Server):
CREATE TABLE tab(
id INT PRIMARY KEY, -- constraint
c INT INDEX filtered (c) WHERE c > 10, -- filtered index
b VARCHAR(10) NOT NULL INDEX idx_tab_b, -- index on column
d VARCHAR(20) NOT NULL,
INDEX my_index NONCLUSTERED(d) -- index on column as separate entry
);
db<>fiddle demo
The rationale behind introducing them is quite interesting What are Inline Indexes? by Phil Factor