Liquibase insert select where not exists

Liquibase insert select where not exists - postgresql

I want to insert into table1 multiple rows from table2. The problem is that I have a field of same name in table2 and table1 and I don't want to insert data if there's already a record with same value in this field. Now I have something like this:
insert into table1 (id, sameField, constantField, superFied)
select gen_random_uuid(), "sameField", 'constant', "anotherField"
from table2;
And I assume I need to do something like this:
insert into table1 (id, sameField, constantField, superFied)
select gen_random_uuid(), "sameField", 'constant', "anotherField"
from table2
where not exists ... ?
What I need to write instead of ? if I want this logic: check if there's already same value in sameField in table1 when selecting sameField from table2? DBMS is Postgres.

You can use a sub-query to see whether the record exists. You will need to define the column(s) which should be unique.
create table table2(
id varchar(100),
sameField varchar(25),
constant varchar(25),
superField varchar(25)
);
insert into table2 values
(gen_random_uuid(),'same1','constant1','super1'),
(gen_random_uuid(),'same2','constant2','super2')
✓
2 rows affected
create table table1(
id varchar(100),
sameField varchar(25),
constant varchar(25),
superField varchar(25)
);
insert into table1 values
(gen_random_uuid(),'same1','constant1','super1');
✓
1 rows affected
insert into table1 (id, sameField, constant, superField)
select uuid_in(md5(random()::text || clock_timestamp()::text)::cstring),
t2.sameField, 'constant', t2.superField
from table2 t2
where sameField not in (select sameField from table1)
1 rows affected
select * from table1;
select * from table2;
id | samefield | constant | superfield
:----------------------------------- | :-------- | :-------- | :---------
4cf10b1c-7a3f-4323-9a16-cce681fcd6d8 | same1 | constant1 | super1
d8cf27a0-3f55-da50-c274-c4a76c697b84 | same2 | constant | super2
id | samefield | constant | superfield
:----------------------------------- | :-------- | :-------- | :---------
c8a83804-9f0b-4d97-8049-51c2c8c54665 | same1 | constant1 | super1
3a9cf8b5-8488-4278-a06a-fd75fa74e206 | same2 | constant2 | super2
db<>fiddle here

Related

Optimizing Postgres Count For Aggregated Select

I have a query that is intended to retrieve the counts of each grouped product like so
SELECT
product_name,
product_color,
(array_agg("product_distributor"))[1] AS "product_distributor",
(array_agg("product_release"))[1] AS "product_release",
COUNT(*) AS "count"
FROM
product
WHERE
product.id IN (
SELECT
id
FROM
product
WHERE
(product_name ilike "%red%"
OR product_color ilike "%red%")
AND product_type = 1)
GROUP BY
product_name, product_color
LIMIT
1000
OFFSET
0
This query is run on the following table
Column | Type | Collation | Nullable | Default
---------------------+--------------------------+-----------+----------+---------
product_type | integer | | not null |
id | integer | | not null |
product_name | citext | | not null |
product_color | character varying(255) | | |
product_distributor | integer | | |
product_release | timestamp with time zone | | |
created_at | timestamp with time zone | | not null |
updated_at | timestamp with time zone | | not null |
Indexes:
"product_pkey" PRIMARY KEY, btree (id)
"product_distributer_index" btree (product_distributor)
"product_product_type_name_color" UNIQUE, btree (product_type, name, color)
"product_product_type_index" btree (product_type)
"product_name_color_index" btree (name, color)
Foreign-key constraints:
"product_product_type_fkey" FOREIGN KEY (product_type) REFERENCES product_type(id) ON UPDATE CASCADE ON DELETE CASCADE
"product_product_distributor_id" FOREIGN KEY (product_distributor) REFERENCES product_distributor(id)
How can I improve the performance of this query, specifically the COUNT(*) portion, which when removed improves the query but is requisite?

You may try using an INNER JOIN in place of a WHERE ... IN clause.
WITH selected_products AS (
SELECT id
FROM product
WHERE (product_name ilike "%red%" OR product_color ilike "%red%")
AND product_type = 1
)
SELECT product_name,
product_color,
(ARRAY_AGG("product_distributor"))[1] AS "product_distributor",
(ARRAY_AGG("product_release"))[1] AS "product_release",
COUNT(*) AS "count"
FROM product p
INNER JOIN selected_products sp
ON p.id = sp.id
GROUP BY product_name,
product_color
LIMIT 1000
OFFSET 0
Then create an index on the "product.id" field as follows:
CREATE INDEX product_ids_idx ON product USING HASH (id);

PostgreSQL add SERIAL column to existing table with values based on ORDER BY

I have a large table (6+ million rows) that I'd like to add an auto-incrementing integer column sid, where sid is set on existing rows based on an ORDER BY inserted_at ASC. In other words, the oldest record based on inserted_at would be set to 1 and the latest record would be the total record count. Any tips on how I might approach this?

Add a sid column and UPDATE SET ... FROM ... WHERE:
UPDATE test
SET sid = t.rownum
FROM (SELECT id, row_number() OVER (ORDER BY inserted_at ASC) as rownum
FROM test) t
WHERE test.id = t.id
Note that this relies on there being a primary key, id.
(If your table did not already have a primary key, you would have to make one first.)
For example,
-- create test table
DROP TABLE IF EXISTS test;
CREATE TABLE test (
id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
, foo text
, inserted_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO test (foo, inserted_at) VALUES
('XYZ', '2019-02-14 00:00:00-00')
, ('DEF', '2010-02-14 00:00:00-00')
, ('ABC', '2000-02-14 00:00:00-00');
-- +----+-----+------------------------+
-- | id | foo | inserted_at |
-- +----+-----+------------------------+
-- | 1 | XYZ | 2019-02-13 19:00:00-05 |
-- | 2 | DEF | 2010-02-13 19:00:00-05 |
-- | 3 | ABC | 2000-02-13 19:00:00-05 |
-- +----+-----+------------------------+
ALTER TABLE test ADD COLUMN sid INT;
UPDATE test
SET sid = t.rownum
FROM (SELECT id, row_number() OVER (ORDER BY inserted_at ASC) as rownum
FROM test) t
WHERE test.id = t.id
yields
+----+-----+------------------------+-----+
| id | foo | inserted_at | sid |
+----+-----+------------------------+-----+
| 3 | ABC | 2000-02-13 19:00:00-05 | 1 |
| 2 | DEF | 2010-02-13 19:00:00-05 | 2 |
| 1 | XYZ | 2019-02-13 19:00:00-05 | 3 |
+----+-----+------------------------+-----+
Finally, make sid SERIAL (or, better, an IDENTITY column):
ALTER TABLE test ALTER COLUMN sid SET NOT NULL;
-- IDENTITY fixes certain issue which may arise with SERIAL
ALTER TABLE test ALTER COLUMN sid ADD GENERATED BY DEFAULT AS IDENTITY;
-- ALTER TABLE test ALTER COLUMN sid SERIAL;

Select row position in filtered and ordered row list PostgreSQL

I got this query,
SELECT s.pos
FROM (SELECT t.guild_id, t.user_id
ROW_NUMBER() OVER(ORDER BY t.reputation DESC) AS pos
FROM users t) s
WHERE (s.guild_id, s.user_id) = ($2, $3)
that gets a user's "rank" in a guild, but I want to filter the results by entries that are in an array of t.user_id values (like {'1', '64', '83'}) and have this affect the resulting pos value. I found FILTER and WITHIN GROUP, but I'm not sure how to fit one of those into this query. How would I do that?
Here's the full table if that helps at all:
Table "public.users"
Column | Type | Collation | Nullable | Default
------------+-----------------------+-----------+----------+---------
guild_id | character varying(20) | | not null |
user_id | character varying(20) | | not null |
reputation | real | | not null | 0
Indexes:
"users_pkey" PRIMARY KEY, btree (guild_id, user_id)

Why not select on those first?
WITH UsersWeCareAbout AS (
SELECT * FROM users u WHERE u.user_id = ANY(subgroup_array)
), RepUsers AS (
SELECT t.guild_id, t.user_id, ROW_NUMBER() OVER(ORDER BY t.reputation DESC) AS pos
FROM UsersWeCareAbout t
) SELECT s.pos FROM RepUsers s WHERE (s.guild_id, s.user_id) = ($2, $3)
(untested if only because I didn't really have enough context to test with)

PostgreSQL querying through schemas

I want a query that lists all Customers who's status is "active". This query would return a list of customers who are marked as active. My problem is that I am lost on querying tables that reference other tables. Here is my schema.
CREATE TABLE Customer (
ID BIGSERIAL PRIMARY KEY NOT NULL,
fNAME TEXT NOT NULL,
lNAME TEXT NOT NULL,
create_date DATE NOT NULL DEFAULT NOW()
);
CREATE TABLE CustomerStatus (
recordID BIGSERIAL NOT NULL,
ID BIGSERIAL REFERENCES Customer NOT NULL,
status TEXT NOT NULL,
create_date DATE NOT NULL DEFAULT NOW()
);
INSERT INTO Customer (fNAME, lNAME) VALUES ('MARK', 'JOHNSON'), ('ERICK', 'DAWN'), ('MAY', 'ERICKSON'), ('JESS', 'MARTIN');
INSERT INTO CustomerStatus (ID, status) VALUES (1, 'pending'), (1, 'active');
INSERT INTO CustomerStatus (ID, status) VALUES (2, 'pending'), (2, 'active'), (2, 'cancelled');
INSERT INTO CustomerStatus (ID, status) VALUES (3, 'pending'), (3, 'active');
INSERT INTO CustomerStatus (ID, status) VALUES (4, 'pending');

I took courage to assume that record_id is serial => the latest id would be the last, to produce this qry:
t=# with a as (
select *, max(recordid) over (partition by cs.id)
from Customer c
join CustomerStatus cs on cs.id = c.id
)
select *
from a
where recordid=max and status = 'active';
id | fname | lname | create_date | recordid | id | status | create_date | max
----+-------+----------+-------------+----------+----+--------+-------------+-----
1 | MARK | JOHNSON | 2017-04-27 | 2 | 1 | active | 2017-04-27 | 2
3 | MAY | ERICKSON | 2017-04-27 | 7 | 3 | active | 2017-04-27 | 7
(2 rows)
Time: 0.450 ms

PostgreSQL: duplicate key value violates unique constraint on UPDATE command

When doing an UPDATE query, we got the following error message:
ERROR: duplicate key value violates unique constraint "tableA_pkey"
DETAIL: Key (id)=(47470) already exists.
However, our UPDATE query does not affect the primary key. Here is a simplified version:
UPDATE tableA AS a
SET
items = (
SELECT array_to_string(
array(
SELECT b.value
FROM tableB b
WHERE b.a_id = b.id
GROUP BY b.name
),
','
)
)
WHERE
a.end_at BETWEEN now() AND now() - interval '1 day';
We ensured the primary key sequence was already synced:
\d tableA_id_seq
Which produces:
Column | Type | Value
---------------+---------+--------------------------
sequence_name | name | tableA_id_seq
last_value | bigint | 50364
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 0
is_cycled | boolean | f
is_called | boolean | t
Looking for maximum table index:
select max(id) from tableA;
We got a lower value:
max
-------
50363
(1 row)
Have you any idea on why such a behavior? If we exclude the problematic id, it works.
Another strange point is that replacing the previous UPDATE by:
UPDATE tableA AS a
SET
items = (
SELECT array_to_string(
array(
SELECT b.value
FROM tableB b
WHERE b.a_id = b.id
GROUP BY b.name
),
','
)
)
WHERE a.id = 47470;
It works well. Are we missing something?
EDIT: triggers
I have no user-defined triggers on this table:
SELECT t.tgname, c.relname
FROM pg_trigger t
JOIN pg_class c ON t.tgrelid = c.oid
WHERE
c.relname = 'tableA'
AND
t.tgisinternal = false
;
Which returns no row.
Note: I am using psql (PostgreSQL) 9.3.4 version.

Not really sure what was the cause. However, deleting the two (non vital) records corresponding to already existing ids (?) solved the issue.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Liquibase insert select where not exists - postgresql

Related

Optimizing Postgres Count For Aggregated Select

PostgreSQL add SERIAL column to existing table with values based on ORDER BY

Select row position in filtered and ordered row list PostgreSQL

PostgreSQL querying through schemas

PostgreSQL: duplicate key value violates unique constraint on UPDATE command

Categories

Resources