Single Column Table query much faster than with Multi Column Table - postgresql-performance

I have a bunch of strings for which I like to find the best match in a table column. This column contains about 400,000 rows and the strings are no longer than 100 characters.
When I run the query on the whole table (8 columns in total) my query takes about 32 secs. All text columns have a GIN index.
with my_phrases(phrase) as (
values
('ABC'),
('123'),
('XYZ'),
('MNO'),
('KLM'),
('FOO'),
('AYE'),
('OPS')
)
select my_phrases.phrase, best_match.phrase, best_match.similarity
from my_phrases,
lateral (
select opm.phrase, similarity(opm.phrase, my_phrases.phrase) similarity
from my_table opm
where phrase % my_phrases.phrase
order by opm.phrase <-> my_phrases.phrase
limit 1
) best_match
order by my_phrases.phrase
;
Now when I copy the column phrase into a separate table and add a GIN index. The query becomes 400ms.
with my_phrases(phrase) as (
values
('ABC'),
('123'),
('XYZ'),
('MNO'),
('KLM'),
('FOO'),
('AYE'),
('OPS')
)
select my_phrases.phrase, best_match.phrase, best_match.similarity
from my_phrases,
lateral (
select phrase, similarity(phrase, my_phrases.phrase) similarity
from my_one_column_table
where phrase % section.phrase
order by phrase <-> my_phrases.phrase
limit 1
) best_match
order by my_phrases.phrase
;
Here is more info on the server:
select version(); ->
PostgreSQL 14.3 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), 64-bit
The multi-column table was created from a csv as follows:
create table my_table
(
id serial primary key,
p_id uuid,
p_name varchar,
b_id uuid,
b_name varchar,
pt_id uuid,
pt_name varchar,
phrase varchar
);
\copy my_table(p_id, p_name, b_id, b_name, pt_id, pt_name, phrase) from 'XXX.csv' csv header;
create index my_table__p_name__gin on my_table using gin(p_name gin_trgm_ops);
create index my_table__b_name__gin on my_table using gin(b_name gin_trgm_ops);
create index my_table__pt_name__gin on my_table using gin(pt_name gin_trgm_ops);
create index my_table__[hrase]__gin on my_table using gin(phrase gin_trgm_ops);
\d+ my_table
Column │ Type │ Collation │ Nullable │ Default │ Storage │ Compression │ Stats target │ Description
═══════════════════╪═══════════════════╪═══════════╪══════════╪═══════════════════════════════════════════════════════╪══════════╪═════════════╪══════════════╪═════════════
id │ integer │ │ not null │ nextval('my_table_id_seq'::regclass) │ plain │ │ │
p_id │ uuid │ │ │ │ plain │ │ │
p_name │ character varying │ │ │ │ extended │ │ │
b_id │ uuid │ │ │ │ plain │ │ │
b_name │ character varying │ │ │ │ extended │ │ │
pt_id │ uuid │ │ │ │ plain │ │ │
pt_name │ character varying │ │ │ │ extended │ │ │
phrase │ character varying │ │ │ │ extended │ │ │
Indexes:
"my_table_pkey" PRIMARY KEY, btree (id)
"my_table__b_name__gin" gin (b_name gin_trgm_ops)
"my_table__phrase__gin" gin (phrase gin_trgm_ops)
"my_table__phrase__idx" btree (phrase)
"my_table__p_name__gin" gin (p_name gin_trgm_ops)
"my_table__pt_name__gin" gin (pt_name gin_trgm_ops)
Access method: heap
For the slow query, the query plan is here
For the fast query, the query plan is here

Related

Copy data from one table to another with different column names and column ordering in timescaledb

I have 2 timescaledb tables with different order of columns and even different column names for for some. Is it possible for me to copy data from one table to another table ?
Essentially the new table has hyper table on it, but the old one does not have hyper table on it.
I have looked at https://docs.timescale.com/timescaledb/latest/how-to-guides/migrate-data/same-db/
However it only seems to tell i need to have same column names and even the same column order in the create table syntax. Can you assist, am new to timescaledb
eg:
**table1**
id
price
datetime_string
**table2**
id
time
price
I created a minimal example here:
CREATE TABLE old_table ( id bigserial, time_string text NOT NULL, price decimal);
CREATE TABLE new_table ( time TIMESTAMP NOT NULL, price decimal);
SELECT create_hypertable('new_table', 'time');
INSERT INTO old_table (time_string, price) VALUES
('2021-08-26 10:09:00.01', 10.1),
('2021-08-26 10:09:00.08', 10.0),
('2021-08-26 10:09:00.23', 10.2),
('2021-08-26 10:09:00.40', 10.3);
INSERT INTO new_table SELECT time_string::timestamp as time, price from old_table;
Results:
playground=# \i move_data.sql
CREATE TABLE
CREATE TABLE
┌─────────────────────────┐
│ create_hypertable │
├─────────────────────────┤
│ (19,public,new_table,t) │
└─────────────────────────┘
(1 row)
INSERT 0 4
INSERT 0 4
playground=# table new_table;
┌────────────────────────┬───────┐
│ time │ price │
├────────────────────────┼───────┤
│ 2021-08-26 10:09:00.01 │ 10.1 │
│ 2021-08-26 10:09:00.08 │ 10.0 │
│ 2021-08-26 10:09:00.23 │ 10.2 │
│ 2021-08-26 10:09:00.4 │ 10.3 │
└────────────────────────┴───────┘
(4 rows)
Can you try to execute your select first and see if the relation matches the table structure?

Postgresql pattern match a selection

I'm trying to find a selection where the start of a column matches a column in another table in postgres. I'm looking to do something along the lines of the following.
Return all records in table1 where table1.name starts with any of the labels in table2.labels.
SELECT
name
FROM table1
WHERE
name LIKE (SELECT distinct label FROM table2);
You should append % sign to a label to use it in like. Also, use any() as the subquery may yield more than one row.
select name
from table1
where name like any(select distinct concat(label, '%') from table2);
You can join data from tables on any operator you want, including for example the regexp matching operator ~.
begin;
create table so.a(f1 text);
create table so.b(f2 text);
insert into so.a(f1)
select md5(x::text)
from generate_series(1, 300) t(x);
insert into so.b(f2)
select substring(md5(x::text) from (20*random())::int for 4)
from generate_series(1, 20) t(x);
select f2, f1
from so.a
join so.b
on a.f1 ~ b.f2
order by f2;
rollback;
Which gives:
pgloader# \i /Users/dim/dev/temp/stackoverflow/45693581.sql
BEGIN
CREATE TABLE
CREATE TABLE
INSERT 0 300
INSERT 0 20
f2 │ f1
══════╪══════════════════════════════════
12bd │ 6512bd43d9caa6e02c990b0a82652dca
3708 │ 98f13708210194c475687be6106a3b84
4d76 │ c20ad4d76fe97759aa27a0c99bff6710
5d77 │ e4da3b7fbbce2345d7772b0674a318d5
5f74 │ 1f0e3dad99908345f7439f8ffabdffc4
5fce │ 8f14e45fceea167a5a36dedd4bea2543
6790 │ 1679091c5a880faf6fb5e6087eb1b2dc
6802 │ d3d9446802a44259755d38e6d163e820
6816 │ 6f4922f45568161a8cdf4ad2299f6d23
74d9 │ c74d97b01eae257e44aa9d5bade97baf
7ff0 │ 9bf31c7ff062936a96d3c8bd1f8f2ff3
820d │ c4ca4238a0b923820dcc509a6f75849b
87e4 │ eccbc87e4b5ce2fe28308fd9f2a7baf3
95fb │ c9f0f895fb98ab9159f51fd0297e236d
aab │ 32bb90e8976aab5298d5da10fe66f21d
aab │ aab3238922bcc25a6f606eb525ffdc56
aab │ d395771085aab05244a4fb8fd91bf4ee
c51c │ 45c48cce2e2d7fbdea1afc51c7c6ad26
c51c │ c51ce410c124a10e0db5e4b97fc2af39
ce2e │ 45c48cce2e2d7fbdea1afc51c7c6ad26
d918 │ a87ff679a2f3e71d9181a67b7542122c
e728 │ c81e728d9d4c2f636f067f89cc14862c
fdf2 │ 70efdf2ec9b086079795c442636b55fb
(23 rows)
ROLLBACK
The dataset isn't very interesting, granted. You can speed that up with using the pg_trgm extension at https://www.postgresql.org/docs/current/static/pgtrgm.html

comparing two fields that may be null

Is there a comparison operator where a.unitnum = b.unitnum would be true if both a.unitnum and b.unitnum are null? Seems that a.unitnum IS b.unitnum is invalid
yes, there is IS DISTINCT FROM and IS NOT DISTINCT FROM
postgres=# \pset null ****
Null display is "****".
postgres=# select null = null;
┌──────────┐
│ ?column? │
╞══════════╡
│ **** │
└──────────┘
(1 row)
postgres=# select null is not distinct from null;
┌──────────┐
│ ?column? │
╞══════════╡
│ t │
└──────────┘
(1 row)
postgres=# select 10 = null;
┌──────────┐
│ ?column? │
╞══════════╡
│ **** │
└──────────┘
(1 row)
postgres=# select 10 is distinct from null;
┌──────────┐
│ ?column? │
╞══════════╡
│ t │
└──────────┘
(1 row)
postgres=# select 10 is not distinct from null;
┌──────────┐
│ ?column? │
╞══════════╡
│ f │
└──────────┘
(1 row)
postgres=# select 10 is not distinct from 20;
┌──────────┐
│ ?column? │
╞══════════╡
│ f │
└──────────┘
(1 row)
yes, there is, but it is recomended to not use it. here is sample:
t=# select null = null;
?column?
----------
(1 row)
t=# set transform_null_equals = on;
SET
t=# select null = null;
?column?
----------
t
(1 row)
UPDATE: apparently would work only for comparison column = NULL, not column = column:
t=# with s as (select null::int a, null::int b) select a <> b from s;
?column?
----------
(1 row)
so the shortest comparison would be coalesce:
t=# with s as (select null::int a, null::int b) select coalesce(a,b,0) = 0 from s;
?column?
----------
t
(1 row)
IF(a.unitnum IS null AND b.unitnum IS null)
THEN
RAISE NOTICE 'unitum field is null in both a and b tables'
ELSE
RAISE NOTICE 'unitum field is not null in at least one a or b tables'
END IF;
No but you can use a.unitnum = b.unitnum or (a.unitnum is null and b.unitnum is null)
If you need to handle all cases:
a.unitnum is null b.unitnum is null
a.unitnum is null b.unitnum is not null
a.unitnum is not null b.unitnum is null
a.unitnum is not null b.unitnum is not null
Then you may want to use this expression:
select *
from a, b
where
((a.unitnum is not null) and (b.unitnum is not null) and (a.unitnum = b.unitnum)) or
((a.unitnum is null) and (b.unitnum is null));
Here you can test how it works:
SELECT
((a is not null) and (b is not null) and (a = b)) or
((a is null) and (b is null))
FROM (VALUES (null,null)
, (null,1)
, (1,null)
, (1,1)
, (1,2)
) t1 (a, b);
P.S.
Just use IS NOT DISTINCT FROM from the accepted answer... It works the same but shorter.

Postgres: Create duplicates of existing rows, changing one value?

I am working in Postgres 9.4. I have a table that looks like this:
Column │ Type │ Modifiers
─────────────────┼──────────────────────┼───────────────────────
id │ integer │ not null default
total_list_size │ integer │ not null
date │ date │ not null
pct_id │ character varying(3) │
I want to take all values where date='2015-09-01', and create identical new entries with the date 2015-10-01.
How can I best do this?
I can get the list of values to copy with SELECT * from mytable WHERE date='2015-09-01', but I'm not sure what to do after that.
If the column id is serial then
INSERT INTO mytable (total_list_size, date, pct_id)
SELECT total_list_size, '2015-10-01', pct_id
FROM mytable
WHERE date = '2015-09-01';
else, if you want the ids to be duplicated:
INSERT INTO mytable (id, total_list_size, date, pct_id)
SELECT id, total_list_size, '2015-10-01', pct_id
FROM mytable
WHERE date = '2015-09-01';

How to check table UNLOGGED with postgresql?

CREATE UNLOGGED TABLE IF NOT EXISTS <tablename>
How can I first check if the desired table is created UNLOGGED, and if not alter the table accordingly?
postgres 9.4
You can check column relpersistence of table pg_class;
postgres=# select relpersistence, relname from pg_class where relname like 'foo%';
┌────────────────┬─────────┐
│ relpersistence │ relname │
╞════════════════╪═════════╡
│ p │ foo │
│ p │ foo1 │
│ u │ foo2 │
└────────────────┴─────────┘
(3 rows)
foo2 is unlogged table.