Liquibase insert select where not exists - postgresql

I want to insert into table1 multiple rows from table2. The problem is that I have a field of same name in table2 and table1 and I don't want to insert data if there's already a record with same value in this field. Now I have something like this:
insert into table1 (id, sameField, constantField, superFied)
select gen_random_uuid(), "sameField", 'constant', "anotherField"
from table2;
And I assume I need to do something like this:
insert into table1 (id, sameField, constantField, superFied)
select gen_random_uuid(), "sameField", 'constant', "anotherField"
from table2
where not exists ... ?
What I need to write instead of ? if I want this logic: check if there's already same value in sameField in table1 when selecting sameField from table2? DBMS is Postgres.

You can use a sub-query to see whether the record exists. You will need to define the column(s) which should be unique.
create table table2(
id varchar(100),
sameField varchar(25),
constant varchar(25),
superField varchar(25)
);
insert into table2 values
(gen_random_uuid(),'same1','constant1','super1'),
(gen_random_uuid(),'same2','constant2','super2')
✓
2 rows affected
create table table1(
id varchar(100),
sameField varchar(25),
constant varchar(25),
superField varchar(25)
);
insert into table1 values
(gen_random_uuid(),'same1','constant1','super1');
✓
1 rows affected
insert into table1 (id, sameField, constant, superField)
select uuid_in(md5(random()::text || clock_timestamp()::text)::cstring),
t2.sameField, 'constant', t2.superField
from table2 t2
where sameField not in (select sameField from table1)
1 rows affected
select * from table1;
select * from table2;
id | samefield | constant | superfield
:----------------------------------- | :-------- | :-------- | :---------
4cf10b1c-7a3f-4323-9a16-cce681fcd6d8 | same1 | constant1 | super1
d8cf27a0-3f55-da50-c274-c4a76c697b84 | same2 | constant | super2
id | samefield | constant | superfield
:----------------------------------- | :-------- | :-------- | :---------
c8a83804-9f0b-4d97-8049-51c2c8c54665 | same1 | constant1 | super1
3a9cf8b5-8488-4278-a06a-fd75fa74e206 | same2 | constant2 | super2
db<>fiddle here

Related

Optimizing Postgres Count For Aggregated Select

I have a query that is intended to retrieve the counts of each grouped product like so
SELECT
product_name,
product_color,
(array_agg("product_distributor"))[1] AS "product_distributor",
(array_agg("product_release"))[1] AS "product_release",
COUNT(*) AS "count"
FROM
product
WHERE
product.id IN (
SELECT
id
FROM
product
WHERE
(product_name ilike "%red%"
OR product_color ilike "%red%")
AND product_type = 1)
GROUP BY
product_name, product_color
LIMIT
1000
OFFSET
0
This query is run on the following table
Column | Type | Collation | Nullable | Default
---------------------+--------------------------+-----------+----------+---------
product_type | integer | | not null |
id | integer | | not null |
product_name | citext | | not null |
product_color | character varying(255) | | |
product_distributor | integer | | |
product_release | timestamp with time zone | | |
created_at | timestamp with time zone | | not null |
updated_at | timestamp with time zone | | not null |
Indexes:
"product_pkey" PRIMARY KEY, btree (id)
"product_distributer_index" btree (product_distributor)
"product_product_type_name_color" UNIQUE, btree (product_type, name, color)
"product_product_type_index" btree (product_type)
"product_name_color_index" btree (name, color)
Foreign-key constraints:
"product_product_type_fkey" FOREIGN KEY (product_type) REFERENCES product_type(id) ON UPDATE CASCADE ON DELETE CASCADE
"product_product_distributor_id" FOREIGN KEY (product_distributor) REFERENCES product_distributor(id)
How can I improve the performance of this query, specifically the COUNT(*) portion, which when removed improves the query but is requisite?
You may try using an INNER JOIN in place of a WHERE ... IN clause.
WITH selected_products AS (
SELECT id
FROM product
WHERE (product_name ilike "%red%" OR product_color ilike "%red%")
AND product_type = 1
)
SELECT product_name,
product_color,
(ARRAY_AGG("product_distributor"))[1] AS "product_distributor",
(ARRAY_AGG("product_release"))[1] AS "product_release",
COUNT(*) AS "count"
FROM product p
INNER JOIN selected_products sp
ON p.id = sp.id
GROUP BY product_name,
product_color
LIMIT 1000
OFFSET 0
Then create an index on the "product.id" field as follows:
CREATE INDEX product_ids_idx ON product USING HASH (id);

PostgreSQL add SERIAL column to existing table with values based on ORDER BY

I have a large table (6+ million rows) that I'd like to add an auto-incrementing integer column sid, where sid is set on existing rows based on an ORDER BY inserted_at ASC. In other words, the oldest record based on inserted_at would be set to 1 and the latest record would be the total record count. Any tips on how I might approach this?
Add a sid column and UPDATE SET ... FROM ... WHERE:
UPDATE test
SET sid = t.rownum
FROM (SELECT id, row_number() OVER (ORDER BY inserted_at ASC) as rownum
FROM test) t
WHERE test.id = t.id
Note that this relies on there being a primary key, id.
(If your table did not already have a primary key, you would have to make one first.)
For example,
-- create test table
DROP TABLE IF EXISTS test;
CREATE TABLE test (
id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
, foo text
, inserted_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO test (foo, inserted_at) VALUES
('XYZ', '2019-02-14 00:00:00-00')
, ('DEF', '2010-02-14 00:00:00-00')
, ('ABC', '2000-02-14 00:00:00-00');
-- +----+-----+------------------------+
-- | id | foo | inserted_at |
-- +----+-----+------------------------+
-- | 1 | XYZ | 2019-02-13 19:00:00-05 |
-- | 2 | DEF | 2010-02-13 19:00:00-05 |
-- | 3 | ABC | 2000-02-13 19:00:00-05 |
-- +----+-----+------------------------+
ALTER TABLE test ADD COLUMN sid INT;
UPDATE test
SET sid = t.rownum
FROM (SELECT id, row_number() OVER (ORDER BY inserted_at ASC) as rownum
FROM test) t
WHERE test.id = t.id
yields
+----+-----+------------------------+-----+
| id | foo | inserted_at | sid |
+----+-----+------------------------+-----+
| 3 | ABC | 2000-02-13 19:00:00-05 | 1 |
| 2 | DEF | 2010-02-13 19:00:00-05 | 2 |
| 1 | XYZ | 2019-02-13 19:00:00-05 | 3 |
+----+-----+------------------------+-----+
Finally, make sid SERIAL (or, better, an IDENTITY column):
ALTER TABLE test ALTER COLUMN sid SET NOT NULL;
-- IDENTITY fixes certain issue which may arise with SERIAL
ALTER TABLE test ALTER COLUMN sid ADD GENERATED BY DEFAULT AS IDENTITY;
-- ALTER TABLE test ALTER COLUMN sid SERIAL;

Select row position in filtered and ordered row list PostgreSQL

I got this query,
SELECT s.pos
FROM (SELECT t.guild_id, t.user_id
ROW_NUMBER() OVER(ORDER BY t.reputation DESC) AS pos
FROM users t) s
WHERE (s.guild_id, s.user_id) = ($2, $3)
that gets a user's "rank" in a guild, but I want to filter the results by entries that are in an array of t.user_id values (like {'1', '64', '83'}) and have this affect the resulting pos value. I found FILTER and WITHIN GROUP, but I'm not sure how to fit one of those into this query. How would I do that?
Here's the full table if that helps at all:
Table "public.users"
Column | Type | Collation | Nullable | Default
------------+-----------------------+-----------+----------+---------
guild_id | character varying(20) | | not null |
user_id | character varying(20) | | not null |
reputation | real | | not null | 0
Indexes:
"users_pkey" PRIMARY KEY, btree (guild_id, user_id)
Why not select on those first?
WITH UsersWeCareAbout AS (
SELECT * FROM users u WHERE u.user_id = ANY(subgroup_array)
), RepUsers AS (
SELECT t.guild_id, t.user_id, ROW_NUMBER() OVER(ORDER BY t.reputation DESC) AS pos
FROM UsersWeCareAbout t
) SELECT s.pos FROM RepUsers s WHERE (s.guild_id, s.user_id) = ($2, $3)
(untested if only because I didn't really have enough context to test with)

PostgreSQL querying through schemas

I want a query that lists all Customers who's status is "active". This query would return a list of customers who are marked as active. My problem is that I am lost on querying tables that reference other tables. Here is my schema.
CREATE TABLE Customer (
ID BIGSERIAL PRIMARY KEY NOT NULL,
fNAME TEXT NOT NULL,
lNAME TEXT NOT NULL,
create_date DATE NOT NULL DEFAULT NOW()
);
CREATE TABLE CustomerStatus (
recordID BIGSERIAL NOT NULL,
ID BIGSERIAL REFERENCES Customer NOT NULL,
status TEXT NOT NULL,
create_date DATE NOT NULL DEFAULT NOW()
);
INSERT INTO Customer (fNAME, lNAME) VALUES ('MARK', 'JOHNSON'), ('ERICK', 'DAWN'), ('MAY', 'ERICKSON'), ('JESS', 'MARTIN');
INSERT INTO CustomerStatus (ID, status) VALUES (1, 'pending'), (1, 'active');
INSERT INTO CustomerStatus (ID, status) VALUES (2, 'pending'), (2, 'active'), (2, 'cancelled');
INSERT INTO CustomerStatus (ID, status) VALUES (3, 'pending'), (3, 'active');
INSERT INTO CustomerStatus (ID, status) VALUES (4, 'pending');
I took courage to assume that record_id is serial => the latest id would be the last, to produce this qry:
t=# with a as (
select *, max(recordid) over (partition by cs.id)
from Customer c
join CustomerStatus cs on cs.id = c.id
)
select *
from a
where recordid=max and status = 'active';
id | fname | lname | create_date | recordid | id | status | create_date | max
----+-------+----------+-------------+----------+----+--------+-------------+-----
1 | MARK | JOHNSON | 2017-04-27 | 2 | 1 | active | 2017-04-27 | 2
3 | MAY | ERICKSON | 2017-04-27 | 7 | 3 | active | 2017-04-27 | 7
(2 rows)
Time: 0.450 ms

PostgreSQL: duplicate key value violates unique constraint on UPDATE command

When doing an UPDATE query, we got the following error message:
ERROR: duplicate key value violates unique constraint "tableA_pkey"
DETAIL: Key (id)=(47470) already exists.
However, our UPDATE query does not affect the primary key. Here is a simplified version:
UPDATE tableA AS a
SET
items = (
SELECT array_to_string(
array(
SELECT b.value
FROM tableB b
WHERE b.a_id = b.id
GROUP BY b.name
),
','
)
)
WHERE
a.end_at BETWEEN now() AND now() - interval '1 day';
We ensured the primary key sequence was already synced:
\d tableA_id_seq
Which produces:
Column | Type | Value
---------------+---------+--------------------------
sequence_name | name | tableA_id_seq
last_value | bigint | 50364
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 0
is_cycled | boolean | f
is_called | boolean | t
Looking for maximum table index:
select max(id) from tableA;
We got a lower value:
max
-------
50363
(1 row)
Have you any idea on why such a behavior? If we exclude the problematic id, it works.
Another strange point is that replacing the previous UPDATE by:
UPDATE tableA AS a
SET
items = (
SELECT array_to_string(
array(
SELECT b.value
FROM tableB b
WHERE b.a_id = b.id
GROUP BY b.name
),
','
)
)
WHERE a.id = 47470;
It works well. Are we missing something?
EDIT: triggers
I have no user-defined triggers on this table:
SELECT t.tgname, c.relname
FROM pg_trigger t
JOIN pg_class c ON t.tgrelid = c.oid
WHERE
c.relname = 'tableA'
AND
t.tgisinternal = false
;
Which returns no row.
Note: I am using psql (PostgreSQL) 9.3.4 version.
Not really sure what was the cause. However, deleting the two (non vital) records corresponding to already existing ids (?) solved the issue.