How to avoid unnecessary updates when using on conflict with Postgres? - postgresql

My use case involves syncing a table with an upstream source on a recurring schedule.
Each row has a unique identifier and other columns, and I want to make sure I'm inserting any new upstream rows, and updating any changed upstream rows. And there could be thousands of rows to sync.
But I'd like to avoid unnecessary updates where the row in the database doesn't differ from what's upstream.
Currently I'm using ON CONFLICT UPDATE like so:
INSERT INTO symbols (id, name, status)
VALUES
(1, 'one', 'online'),
(2, 'two', 'offline'),
...
ON CONFLICT (id)
UPDATE SET (id, name, status) = (excluded.id, excluded.name, excluded.status)
RETURNING *
But this will write the updates even when nothing is changing. How should I tweak the UPDATE to performantly check and apply to rows that need it?

You can add a where clause to only update those rows that are different.
INSERT INTO symbols (id, name, status)
VALUES
(1, 'one', 'online'),
(2, 'two', 'offline'),
...
ON CONFLICT (id) DO
UPDATE SET (id, name, status) = (excluded.id, excluded.name, excluded.status)
WHERE (symbols.id, symbols.name, symbols.status) IS DISTINCT FROM (excluded.id, excluded.name, excluded.status)
RETURNING *
However, this will only return the rows that are actually updated, which may impact how you use the returning clause.

Related

Trivial change in SQL query causes increased network receive throughout and commit count after

Postgres 11.12 on Amazon RDS
4 billion rows in table
1000 new rows a second, inserted in batches of 100
Some time ago I added a new column
ALTER TABLE my_table
ADD COLUMN my_new_column BOOLEAN NULL;
This column was completely not used, nothing was explicitly inserted there. However when I changed the INSERT query to explicitly set the value of this column to null, then:
inserting new rows in batches by the client executes 5x slower
network receive throughout grows around 5 times
commit count grows around 3 times
Any idea what can be causing that? Example below
Before:
INSERT INTO my_table (column1, column2, column3)
VALUES ('value1', 'value2', 'value3')
ON CONFLICT (column1, column2)
DO UPDATE
SET column3 = excluded.column3
After:
INSERT INTO my_table (column1, column2, column3, my_new_column)
VALUES ('value1', 'value2', 'value3', null)
ON CONFLICT (column1, column2)
DO UPDATE
SET column3 = excluded.column3
Thanks to the tip from #jjanes I managed to fix the issue. The bottleneck was somewhere around driver (Spring - JDBC). It looks like the type for that 'null' value could not be resolved causing this whole overhead. Once I explicitly defined the type, then the issue got fixed. Example below:
Before:
val queryParams = mapOf("myNewColumn" to null)
jdbcTemplate.batchUpdate(query, queryParams)
After:
val queryParams = MapSqlParameterSource().apply {
addValue("myNewColumn", null, Types.BOOLEAN)
}
jdbcTemplate.batchUpdate(query, queryParams)

Mybatis Insert PK manually

I am trying to single insert data into table with assigned PK. Manually assiging PK.
XML file
<insert id = "insertStd" parameterType = "com.org.springboot.dao.StudentEntity" useGeneratedKeys = "false" keyProperty = "insertStd.id", keyColumn = "id">
INSERT INTO STUDENT (ID, NAME, BRANCH, PERCENTAGE, PHONE, EMAIL )
VALUES (ID=#{insertStd.id}, NAME=#{insertStd.name}, BRANCH=#{insertStd.branch}, PERCENTAGE=#{insertStd.percentage}, PHONE=#{insertStd.phone}, EMAIL =#{insertStd.email});
</insert>
Service call method
public boolean saveStudent(Student student){
LOGGER.info("Student object save");
int savedId= studentMapper.insertStd(student);
}
Log file
org.springframework.jdbc.badsqlgrammarexception
### Error updating database Causes: cause org.postgresql.util.psqlexception error column id does not exist
HINT: There is a column named "id" in the table "student" but it can't be referenced from this part of the query.
Position 200
### Error may exist in file [c:\.....\StudentMapper.xml]
### Error may involve in com.org.springboot.dao.StudentMapper.insertStd-InLine
### The error occurred while setting parameters
### SQL INSERT INTO STUDENT (ID, NAME, BRANCH, PERCENTAGE, PHONE, EMAIL )
VALUES (ID=?, NAME=?,BRANCH=?, PERCENTAGE=?, PHONE=?, EMAIL=?);
### cause org.postgresql.util.psqlexception ERROR column "id" doesn't exist. //It did worked with JPA id assigned manually.
### There is a column named "ID" in the table "STUDENT", Bbut it cannot be referenced from the part of the query.
The INSERT statement of malformed. The VALUES clause should not include the column names.
Also, since there's no primary auto-generation, you can remove all the other attributes. Just leave the mapper id.
Note: if you want to manually assign the PK value, you need to make sure the table does not have a GENERATED ALWAYS clause for the column. If this is the case, the table will ignore the value you are providing and will use its own rules to generate the PK.
Use:
<insert id="insertStd">
INSERT INTO STUDENT (ID, NAME, BRANCH, PERCENTAGE, PHONE, EMAIL)
VALUES (
#{insertStd.id}, #{insertStd.name}, #{insertStd.branch},
#{insertStd.percentage}, #{insertStd.phone}, #{insertStd.email}
);
</insert>
Your error is easily reproduceable:
create table t (a int, b varchar(10));
insert into t (a, b) values (123, 'ABC'); -- succeeds
insert into t (a, b) values (a=123, b='ABC'); -- fails!
error: column "a" does not exist
See the Fiddle.

Batch INSERT on multiple queries is throwing foreign key violation

I am following this to do batch INSERT
with two queries. The first query inserts into <tableone> and the second query insert into <tabletwo>.
The second table has a foreign key constraints that references <tableone>.
The following code is how I am handling the batch inserts
batchQuery.push(
insertTableOne,
insertTableTwo
);
const query = pgp.helpers.concat(batchQuery);
db.none(query)
insertTableOne looks like
INSERT INTO tableone (id, att2, att3) VALUES
(1, 'a', 'b'), (2, 'c', 'd'), (3, 'e', 'f'), ...
insertTableTwo looks like
INSERT INTO tabletwo (id, tableone_id) VALUES
(10, 1), (20, 2), (30, 3), ...
with a constraint on <tabletwo>
CONSTRAINT fk_tabletwo_tableone_id
FOREIGN KEY (tableone_id)
REFERENCES Tableone (id)
upon db.none(query) I am getting a violates foreign key constraint "fk_tabletwo_tableone_id"
Does the above query not execute in sequence? First insert into table one, then insert into table two?
Is this an issue with how the query is being commited? I have also tried using a transaction shown by the example in the linked page above.
Any thoughts?
If you read through to the documentation for the spex.batch() method (which is used by the pgp.helpers.concat() method from your linked example) says of the values argument:
Array of mixed values (it can be empty), to be resolved
asynchronously, in no particular order.
See http://vitaly-t.github.io/spex/global.html#batch
You probably need to look at another method rather than using batch().
I'd suggest chaining the dependent query using a .then() after the first insert has completed, ie. something like db.none(insertTableOne).then(() => db.none(insertTableTwo))

How do I add a constraint with a where clause in PostgreSQL?

I have a table with reservations. A reservation is made of a date range, and a time range. They also belong to a couple of other models. I would like to add a constraint that makes it impossible for a reservation to happen for overlapping times.
I have this:
CREATE TABLE reservations (
id integer NOT NULL,
dates daterange,
times timerange,
desk_id integer NOT NULL,
space_id integer,
);
ALTER TABLE reservations ADD EXCLUDE USING gist (dates WITH &&, times WITH &&)
It works well. But I want this constraint to be scoped to desk_id and client_id.
It should be possible to save a record for overlapping times/dates when this record is about different desk_id or space_id.
How can I do this?
You just can use the exact same mechanism you were using, but also adding desk_id and space_id to your exclusions. This time, instead of using the && operator (meaning overlaps) with the = operator:
ALTER TABLE reservations
ADD EXCLUDE
USING gist (desk_id WITH =, space_id WITH =, dates WITH &&, times WITH &&) ;
Theses inserts will work, because they involve two different desk_id:
INSERT INTO
reservations
(id, dates, times, desk_id, space_id)
VALUES
(1, '[20170101,20170101]'::daterange, '[10:00,11:00]'::timerange, 10, 10),
(2, '[20170101,20170101]'::daterange, '[10:30,11:00]'::timerange, 20, 10) ;
This insert will fail, because you'd be having a time-range overlap, and the same desk_id and space_id:
INSERT INTO
reservations
(id, dates, times, desk_id, space_id)
VALUES
(3, '[20170101,20170101]'::daterange, '[10:00,11:00]'::timerange, 10, 10) ;

PostgreSQL Upsert (On Conflict) with same values in Insert and Update

Can I simplify the syntax when I use the same values in an Insert... On Conflict statment?
INSERT INTO cars
(car_id, car_type, car_model)
values
(1, 'tesla', 'model s')
ON CONFLICT (car_id) DO UPDATE SET
car_type = 'tesla',
car_model = 'model s';
There are many more statements of this kind because they are part of a script that gets run on every application update.
Bascially I am looking for a way to not specify the same value twice.
Use the excluded keyword:
INSERT INTO cars
(car_id, car_type, car_model)
values
(1, 'tesla', 'model s')
ON CONFLICT (car_id) DO UPDATE SET
car_type = excluded.car_type,
car_model = excluded.car_model;
This also works correctly with multiple rows, e.g:
INSERT INTO cars
(car_id, car_type, car_model)
values
(1, 'tesla', 'model s'),
(2, 'toyota', 'prius')
ON CONFLICT (car_id) DO UPDATE SET
car_type = excluded.car_type,
car_model = excluded.car_model;