PostgreSQL Upsert (On Conflict) with same values in Insert and Update - postgresql

Can I simplify the syntax when I use the same values in an Insert... On Conflict statment?
INSERT INTO cars
(car_id, car_type, car_model)
values
(1, 'tesla', 'model s')
ON CONFLICT (car_id) DO UPDATE SET
car_type = 'tesla',
car_model = 'model s';
There are many more statements of this kind because they are part of a script that gets run on every application update.
Bascially I am looking for a way to not specify the same value twice.

Use the excluded keyword:
INSERT INTO cars
(car_id, car_type, car_model)
values
(1, 'tesla', 'model s')
ON CONFLICT (car_id) DO UPDATE SET
car_type = excluded.car_type,
car_model = excluded.car_model;
This also works correctly with multiple rows, e.g:
INSERT INTO cars
(car_id, car_type, car_model)
values
(1, 'tesla', 'model s'),
(2, 'toyota', 'prius')
ON CONFLICT (car_id) DO UPDATE SET
car_type = excluded.car_type,
car_model = excluded.car_model;

Related

Optimize delete from big size table

Tell me how to optimize the deletion of data from a Postgre table
I have a table like this:
CREATE TABLE IF NOT EXISTS test (
group varchar(255),
id varchar(255),
type varchar(255),
);
INSERT INTO test
(group, id, type)
VALUES
('1', 'qw', 'START'),
('1', 'er', 'PROCESS'),
('1', 'ty', 'FINISH');
INSERT INTO test
(group, id, type)
VALUES
('2', 'as', 'START'),
('2', 'df', 'PROCESS'),
('2', 'fg', 'ERROR');
INSERT INTO test
(group, id, type)
VALUES
('3', 'zx', 'START'),
('3', 'cv', 'PROCESS'),
('3', 'ty', 'ERROR');
INSERT INTO test
(group, id, type)
VALUES
('4', 'df', 'START'),
('4', 'gh', 'PROCESS'),
('4', 'fg', 'ERROR'),
('4', 'ty', 'FINISH');
group
id
type
1
qw
START
1
er
PROCESS
1
ty
FINISH
2
as
START
2
df
PROCESS
2
fg
ERROR
3
zx
START
3
cv
PROCESS
3
ty
ERROR
4
df
START
4
gh
PROCESS
4
fgv
ERROR
4
ty
FINISH
It contains operations combined by one value in the GROUP field
But not all operations reach the end and do not have an operation with the value FINISH in the list, but have type ERROR, like the rows with GROUP 2 and 3
This table is 1 terabyte
I want to delete all chains of operations that did not end with the FINISH status, what is the best way to optimize this?
My code looks like this:
delete from TEST for_delete
where
for_delete.group in (
select group from TEST error
where
error.type='ERROR'
and
error.group NOT IN (select group from TEST where type='FINISH')
);
But for a plate with such a volume, I think it will be terribly slow, can I somehow improve my code?
Very often EXISTS conditions are faster than IN condition. And NOT EXISTS is almost always faster than NOT IN, so you could try something like this:
delete from test t1
where exists (select *
from test t2
where t2."group" = t1."group"
and t2."type" = 'ERROR'
and not exists (select
from test t3
where t3."group" = t2."group"
and t3."type" = 'FINISH'));
Typically, in case like this you should use a MV (Materialized View).
You can create a table where save all the id that you need to delete and keeping in sync using triggers. For example:
CREARE TABLE IF NOT EXISTS test_MV (
id VARCHAR(255) PRIMARY KEY
);
You know the system and the data that you are using, you can also decide to keeping the table in sync using event.
Using a MV, you can delete all the row using a easier and faster way:
delete from TEST for_delete
where
for_delete.id in (
select id from test_MV
);
Sorry for my bad English

How to avoid unnecessary updates when using on conflict with Postgres?

My use case involves syncing a table with an upstream source on a recurring schedule.
Each row has a unique identifier and other columns, and I want to make sure I'm inserting any new upstream rows, and updating any changed upstream rows. And there could be thousands of rows to sync.
But I'd like to avoid unnecessary updates where the row in the database doesn't differ from what's upstream.
Currently I'm using ON CONFLICT UPDATE like so:
INSERT INTO symbols (id, name, status)
VALUES
(1, 'one', 'online'),
(2, 'two', 'offline'),
...
ON CONFLICT (id)
UPDATE SET (id, name, status) = (excluded.id, excluded.name, excluded.status)
RETURNING *
But this will write the updates even when nothing is changing. How should I tweak the UPDATE to performantly check and apply to rows that need it?
You can add a where clause to only update those rows that are different.
INSERT INTO symbols (id, name, status)
VALUES
(1, 'one', 'online'),
(2, 'two', 'offline'),
...
ON CONFLICT (id) DO
UPDATE SET (id, name, status) = (excluded.id, excluded.name, excluded.status)
WHERE (symbols.id, symbols.name, symbols.status) IS DISTINCT FROM (excluded.id, excluded.name, excluded.status)
RETURNING *
However, this will only return the rows that are actually updated, which may impact how you use the returning clause.

Is it possible to bulk update specific values in postgresql efficiently?

I have created a pipeline which is required to update a high number of rows in postgres where each row should be updated differently.
After looking up I found that this could be done using postgres UPDATE.. FROM.. syntax (https://www.postgresql.org/docs/current/sql-update.html) and I came up with the following query that works perfectly fine:
update grades
set course_id = data_table.course_id,
student_id = data_table.student_id,
grade = data_table.grade
from
(select unnest(array[1,2]) as id, unnest(array['Math', 'Math']) as course_id, unnest(array[1000, 1001]) as student_id, unnest(array[95, 100]) as grade) as data_table
where grades.id = data_table.id;
There's also another way to do it with WITH syntax like this:
update grades
set course_id = data_table.course_id,
student_id = data_table.student_id,
grade = data_table.grade
from
(WITH vals (id, course_id, student_id, grade) as (VALUES (1, 'Math', 1000, 95), (2, 'Math', 1001, 100)) SELECT * from vals) as data_table
where grades.id = data_table.id;
My problem is that sometimes I want in some raws to update a field and sometime not. When I don't want to update I just want to keep the value that is currently in the table. In this case, I would want to potentially do something like:
update grades g
set course_id = data_table.course_id,
student_id = data_table.student_id,
grade = data_table.grade
from
(select unnest(array[1,2]) as id, unnest(array[g.course_id, 'Math2']) as course_id, unnest(array[1000, 1001]) as student_id, unnest(array[95, g.grade]) as grade) as data_table
where grades.id = data_table.id;
However this is not possible and I get back the error HINT: There is an entry for table "g", but it cannot be referenced from this part of the query.
Also postgresql documentation specifies about it in the From description:
Note that the target table must not appear in the from_list,
unless you intend a self-join (in which case it must appear with an alias in the from_list).
Does anyone know if there's a way to perform such bulk update ?
I've tried to use JOINs in inner query but with no luck..
Chose a value that cannot be a valid value, eg '-1' for course name and -1 for a grade, and use that for your generated values, then use a case in the insert to direct whether to use the current value or not:
update grades g
set course_id = case when data_table.course_id = '-1' then course_id else data_table.course_id end,
student_id = data_table.student_id,
grade = case when data_table.grade = -1 then g.grade else data_table.grade end
from (
select
unnest(array[1,2]) as id,
unnest(array['-1', 'Math2']) as course_id, -- use '-1' instead of g.course_id
unnest(array[1000, 1001]) as student_id,
unnest(array[95, -1]) as grade -- use -1 instead of g.grade
) as data_table
where grades.id = data_table.id
Pick whatever values you like for the impossible value.
If nulls were not allowed it would have been more straightforward and less code - use null for the impossible value and coalesce() in for the update value.

How to Insert into table using multiple row returned value from subquery?

How exactly do I get this to work? I am trying my best to form a query that takes the primary keys generated from the first query and then inserts them into the 2nd table along with a static 2nd value(33). I am obviously getting a "more than one row returned by a subquery used as an expression" error. I'm googled my eye balls out and can't figure out this issue. Maybe there's a better way to do what I am trying to do.
I am using Postgresql 9.5 if that matters.
WITH x AS (INSERT INTO OPTIONS (manufacturer_id, category, name, description)
VALUES (
UNNEST(ARRAY['10', '22', '33']),
'ExtColor',
UNNEST(ARRAY['EC', 'IC', 'IO']),
UNNEST(ARRAY['a', 'b', 'c'])
)
RETURNING option_id)
INSERT INTO opt_car_data (car_id, option_id) VALUES ((SELECT option_id FROM x), 33);
WITH x AS (
INSERT INTO options (manufacturer_id, category, name, description)
VALUES (
UNNEST(ARRAY['10', '22', '33']),
'ExtColor',
UNNEST(ARRAY['EC', 'IC', 'IO']),
UNNEST(ARRAY['a', 'b', 'c'])
)
RETURNING option_id
)
INSERT INTO opt_car_data (car_id, option_id)
SELECT option_id, 33
FROM x;

Most efficient way to do a bulk UPDATE with pairs of input

Suppose I want to do a bulk update, setting a=b for a collection of a values. This can easily be done with a sequence of UPDATE queries:
UPDATE foo SET value='foo' WHERE id=1
UPDATE foo SET value='bar' WHERE id=2
UPDATE foo SET value='baz' WHERE id=3
But now I suppose I want to do this in bulk. I have a two dimensional array containing the ids and new values:
[ [ 1, 'foo' ]
[ 2, 'bar' ]
[ 3, 'baz' ] ]
Is there an efficient way to do these three UPDATEs in a single SQL query?
Some solutions I have considered:
A temporary table
CREATE TABLE temp ...;
INSERT INTO temp (id,value) VALUES (....);
UPDATE foo USING temp ...
But this really just moves the problem. Although it may be easier (or at least less ugly) to do a bulk INSERT, there are still a minimum of three queries.
Denormalize the input by passing the data pairs as SQL arrays. This makes the query incredibly ugly, though
UPDATE foo
USING (
SELECT
split_part(x,',',1)::INT AS id,
split_part(x,',',2)::VARCHAR AS value
FROM (
SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz']) AS x
) AS x;
)
SET value=x.value WHERE id=x.id
This makes it possible to use a single query, but makes that query ugly, and inefficient (especially for mixed and/or complex data types).
Is there a better solution? Or should I resort to multiple UPDATE queries?
Normally you want to batch-update from a table with sufficient index to make the merge easy:
CREATE TEMP TABLE updates_table
( id integer not null primary key
, val varchar
);
INSERT into updates_table(id, val) VALUES
( 1, 'foo' ) ,( 2, 'bar' ) ,( 3, 'baz' )
;
UPDATE target_table t
SET value = u.val
FROM updates_table u
WHERE t.id = u.id
;
So you should probably populate your update_table by something like:
INSERT into updates_table(id, val)
SELECT
split_part(x,',',1)::INT AS id,
split_part(x,',',2)::VARCHAR AS value
FROM (
SELECT UNNEST(ARRAY['1,foo','2,bar','3,baz'])
) AS x
;
Remember: an index (or the primary key) on the id field in the updates_table is important. (but for small sets like this one, a hashjoin will probably by chosen by the optimiser)
In addition: for updates, it is important to avoid updates with the same value, these cause extra rowversions to be created + plus the resulting VACUUM activity after the update was committed:
UPDATE target_table t
SET value = u.val
FROM updates_table u
WHERE t.id = u.id
AND (t.value IS NULL OR t.value <> u.value)
;
You can use CASE conditional expression:
UPDATE foo
SET "value" = CASE id
WHEN 1 THEN 'foo'
WHEN 2 THEN 'bar'
WHEN 3 THEN 'baz'
END