NiFi PutDataBaseRecord to Postgres with multi column Unique constraint

NiFi PutDataBaseRecord to Postgres with multi column Unique constraint - postgresql

I have a Postgres database table with a unique constraint based on two columns. The name of the unique constraint is _text_uc
How can I use the PutDataBaseRecord processor to do an UPSERT into the table? Here is the config for the processor below:
The problem is the resulting SQL ends up failing and in the error log we can see the reason is that it generated the following SQL insert statement
INSERT INTO text(customer_id, document_id, creationdate, text, dataset_id)
VALUES (‘Johndoe123’, ‘1234’, '2022-08-26 02:18:48.917+00', ' some random text', 1)
ON CONFLICT (ON CONSTRAINT _text_uc)
DO UPDATE SET (customer_id, document_id, creationdate, text, dataset_id) =
(EXCLUDED. customer_id, EXCLUDED.document_id, EXCLUDED.creationdate, EXCLUDED.text, EXCLUDED.dataset_id)
Whereas it should have not created the brackets around the ON CONSTRAINT statement and should have been as below (based on https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-upsert/):
INSERT INTO text(customer_id, document_id, creationdate, text, dataset_id)
VALUES (‘Johndoe123’, ‘1234’, '2022-08-26 02:18:48.917+00', ' some random text', 1)
ON CONFLICT ON CONSTRAINT _text_uc
DO UPDATE SET (customer_id, document_id, creationdate, text, dataset_id) =
(EXCLUDED. customer_id, EXCLUDED.document_id, EXCLUDED.creationdate, EXCLUDED.text, EXCLUDED.dataset_id)
Any ideas on how I can solve this problem?

Related

Postgres upsert with composite unique key to allow only single null value

As a part of ETL, table continuous_trips has continuous flow of incoming records.
New records are aggregated and get inserted into temp. table called trips_agg every 5 minutes.
CREATE TABLE IF NOT EXISTS trips_agg AS (
SELECT start_time, station_id, from_station, to_station, from_terminus, end_terminus, previous_station, next_station,
AVG(wait_span) AS wait_span,
AVG(walk_span) AS walk_span,
AVG(delay_span) AS delay_span,
SUM(passengers_requests) AS passengers_requests
FROM continuous_trips
GROUP BY start_time, station_id, from_station, to_station, from_terminus, end_terminus, previous_station, next_station
)
The table trips_agg gets dropped after inserting all records into the table daily_trips and recreated during next cycle.
Tables daily_trips & trips_agg have the same columns.
CREATE TABLE IF NOT EXISTS daily_trips (
start_time timestamp without time zone NOT NULL,
station_id text NOT NULL,
from_station text NOT NULL,
to_station text NOT NULL,
from_terminus text NOT NULL,
end_terminus text NOT NULL,
previous_station text,
next_station text,
wait_span interval NOT NULL,
walk_span interval NOT NULL,
delay_span interval NOT NULL,
passengers_requests numeric NOT NULL
)
Note: columns 'previous_station' and 'next_station' allows null.
composite unique key is added as follows:
ALTER TABLE daily_trips ADD CONSTRAINT daily_trips_unique_row UNIQUE
(start_time, station_id, from_station, to_station, from_terminus, end_terminus, previous_station, next_station);
In case unique key is violated upon insertion, the record should be updated. So used upsert strategy.
INSERT INTO daily_trips SELECT * FROM trips_agg
ON CONFLICT (start_time, station_id, from_station, to_station, from_terminus, end_terminus,
previous_station, next_station) DO UPDATE
set wait_span = (daily_trips.wait_span + EXCLUDED.wait_span)/2,
walk_span = (daily_trips.walk_span + EXCLUDED.walk_span)/2 ,
delay_span = (daily_trips.delay_span + EXCLUDED.delay_span)/2,
passengers_requests =(daily_trips.passengers_requests + EXCLUDED.passengers_requests);
When values for all columns are present this setup works perfectly but, it's not the case when any of nullable columns have a null value.
Since Postgres doesn't consider null values to invoke unique constraint, whenever any of nullable columns have null value, a new row is inserted, instead of update. This results into multiple rows for the unique key.
To overcome this, added an index on the table daily_trips after referring this article.
create unique index daily_trips_unique_trip_idx ON daily_trips
(start_time, station_id, from_station, to_station, from_terminus, end_terminus,
(previous_station IS NULL), (next_station IS NULL)
where previous_station IS NULL or fnext_station IS NULL
However, only one row could be added with null value for any nullable column.
For next row with null value for any nullable column, update is not happening and instead getting following error:
ERROR: duplicate key value violates unique constraint "daily_trips_unique_trip_idx"
What is needed?
The unique constraint should be respected and update should happen when there is null value in either of nullable columns 'previous_station' or 'next_station'.
Any help is appreciated.

The solution is to translate NULL to some other value, more specifically the 0-length string (''). The coalesce function does precisely that when used as coalesce (column_name, ''). The problem being creating a unique constraint with that generates a syntax error. So you cannot create that constraint. However, there is a work around, although not a easy one. Postgres enforces unique constraints through a unique index, so just create the index directly.
create unique index daily_trips_unique_row on daily_trips
( start_time
, station_id
, from_station
, to_station
, from_terminus
, end_terminus
, coalesce(previous_station , '')
, coalesce(next_station, '')
);
However, while the above respects the null-ability of index columns it no longer recognizes INSERT ... ON CONFLICT (See example here) . You will either need a function/procedure to handle the exception or use Select ... if exists then Update else Insert logic.

Syntax error on Upsert PostgreSql while usin an insert into with on conflict [duplicate]

I'm getting the following error when doing the following type of insert:
Query:
INSERT INTO accounts (type, person_id) VALUES ('PersonAccount', 1) ON
CONFLICT (type, person_id) WHERE type = 'PersonAccount' DO UPDATE SET
updated_at = EXCLUDED.updated_at RETURNING *
Error:
SQL execution failed (Reason: ERROR: there is no unique or exclusion
constraint matching the ON CONFLICT specification)
I also have an unique INDEX:
CREATE UNIQUE INDEX uniq_person_accounts ON accounts USING btree (type,
person_id) WHERE ((type)::text = 'PersonAccount'::text);
The thing is that sometimes it works, but not every time. I randomly get
that exception, which is really strange. It seems that it can't access that
INDEX or it doesn't know it exists.
Any suggestion?
I'm using PostgreSQL 9.5.5.
Example while executing the code that tries to find or create an account:
INSERT INTO accounts (type, person_id, created_at, updated_at) VALUES ('PersonAccount', 69559, '2017-02-03 12:09:27.259', '2017-02-03 12:09:27.259') ON CONFLICT (type, person_id) WHERE type = 'PersonAccount' DO UPDATE SET updated_at = EXCLUDED.updated_at RETURNING *
SQL execution failed (Reason: ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification)
In this case, I'm sure that the account does not exist. Furthermore, it never outputs the error when the person has already an account. The problem is that, in some cases, it also works if there is no account yet. The query is exactly the same.

Per the docs,
All table_name unique indexes that, without regard to order, contain exactly the
conflict_target-specified columns/expressions are inferred (chosen) as arbiter
indexes. If an index_predicate is specified, it must, as a further requirement
for inference, satisfy arbiter indexes.
The docs go on to say,
[index_predicate are u]sed to allow inference of partial unique indexes
In an understated way, the docs are saying that when using a partial index and
upserting with ON CONFLICT, the index_predicate must be specified. It is not
inferred for you. I learned this
here, and the following example demonstrates this.
CREATE TABLE test.accounts (
id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
type text,
person_id int);
CREATE UNIQUE INDEX accounts_note_idx on accounts (type, person_id) WHERE ((type)::text = 'PersonAccount'::text);
INSERT INTO test.accounts (type, person_id) VALUES ('PersonAccount', 10);
so that we have:
unutbu=# select * from test.accounts;
+----+---------------+-----------+
| id | type | person_id |
+----+---------------+-----------+
| 1 | PersonAccount | 10 |
+----+---------------+-----------+
(1 row)
Without index_predicate we get an error:
INSERT INTO test.accounts (type, person_id) VALUES ('PersonAccount', 10) ON CONFLICT (type, person_id) DO NOTHING;
-- ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification
But if instead you include the index_predicate, WHERE ((type)::text = 'PersonAccount'::text):
INSERT INTO test.accounts (type, person_id) VALUES ('PersonAccount', 10)
ON CONFLICT (type, person_id)
WHERE ((type)::text = 'PersonAccount'::text) DO NOTHING;
then there is no error and DO NOTHING is honored.

A simple solution of this error
First of all let's see the cause of error with a simple example. Here is the table mapping products to categories.
create table if not exists product_categories (
product_id uuid references products(product_id) not null,
category_id uuid references categories(category_id) not null,
whitelist boolean default false
);
If we use this query:
INSERT INTO product_categories (product_id, category_id, whitelist)
VALUES ('123...', '456...', TRUE)
ON CONFLICT (product_id, category_id)
DO UPDATE SET whitelist=EXCLUDED.whitelist;
This will give you error No unique or exclusion constraint matching the ON CONFLICT because there is no unique constraint on product_id and category_id. There could be multiple rows having the same combination of product and category id (so there can never be a conflict on them).
Solution:
Use unique constraint on both product_id and category_id like this:
create table if not exists product_categories (
product_id uuid references products(product_id) not null,
category_id uuid references categories(category_id) not null,
whitelist boolean default false,
primary key(product_id, category_id) -- This will solve the problem
-- unique(product_id, category_id) -- OR this if you already have a primary key
);
Now you can use ON CONFLICT (product_id, category_id) for both columns without any error.
In short: Whatever column(s) you use with on conflict, they should have unique constraint.

The easy way to fix it is by setting the conflicting column as UNIQUE

I did not have a chance to play with UPSERT, but I think you have a case from
docs:
Note that this means a non-partial unique index (a unique index
without a predicate) will be inferred (and thus used by ON CONFLICT)
if such an index satisfying every other criteria is available. If an
attempt at inference is unsuccessful, an error is raised.

I solved the same issue by creating one UNIQUE INDEX for ALL columns you want to include in the ON CONFLICT clause, not one UNIQUE INDEX for each of the columns.
CREATE TABLE table_name (
element_id UUID NOT NULL DEFAULT gen_random_uuid(),
timestamp TIMESTAMP NOT NULL DEFAULT now():::TIMESTAMP,
col1 UUID NOT NULL,
col2 STRING NOT NULL ,
col3 STRING NOT NULL ,
CONSTRAINT "primary" PRIMARY KEY (element_id ASC),
UNIQUE (col1 asc, col2 asc, col3 asc)
);
Which will allow to query like
INSERT INTO table_name (timestamp, col1, col2, col3) VALUES ('timestamp', 'uuid', 'string', 'string')
ON CONFLICT (col1, col2, col3)
DO UPDATE timestamp = EXCLUDED.timestamp, col1 = EXCLUDED.col1, col2 = excluded.col2, col3 = col3.excluded;

How can I bulk insert rows only if a compound primary key don't already exist? [AWS Redshift]

in Amazon Redshift I try to do a bulk insert value in a table from a temp table.
However I only want to insert the values where a compound of values (primary key) not exist in the table, to avoid adding duplicate.
Below the DDL of the table
• clusters_typologies table (table when i want to insert data)
create table if not exists clusters.clusters_typologies
(
cluster_id BIGINT,
typology_id BIGINT,
semantic_id BIGINT,
primary key (cluster_id, typology_id, semantic_id)
);
Temp Table is create with query below and after that all field are correctly inserted.
CREATE TEMPORARY TABLE temporary (
cluster_id bigint,
typology_name varchar(100),
typology_id bigint,
semantic_name varchar(100),
semantic_id bigint
);
Now when i try to insert with that query
INSERT INTO clusters.clusters_typologies (cluster_id, typology_id,semantic_id)
(SELECT temp.cluster_id, temp.typology_id, temp.semantic_id
FROM temporary temp
WHERE NOT EXISTS(SELECT 1
FROM clusters_typologies
where cluster_id = temp.cluster_id
and typology_id = temp.typology_id
and semantic_id = temp.semantic_id));
I got this error and i cannot figured out how to make it work.
Invalid operation: This type of correlated subquery pattern is not supported due to internal error;
Anyone know how to fix or how is the best way to insert in a table with a compound key avoiding duplicate.
Thanks.

To upsert follow this guide
https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html
and note that certain types of correlated subquery are not allowed in redshift - that is the cause of your error
see
https://docs.aws.amazon.com/redshift/latest/dg/r_correlated_subqueries.html

After some attempt I figured out how to do an insert from a temp table, and check from a compound primary key to avoid duplicate.
Basically from AWS documentation that #Jon Scott as sent, I understand that use outer table in inner select is not supported from Redshift.
I solve using a left join and check if the joining column is null.
Below the query I use now.
INSERT INTO clusters.clusters_typologies (cluster_id, typology_id, semantic_id)
(SELECT temp.cluster_id, temp.typology_id, temp.semantic_id
FROM aaaa temp
LEFT JOIN clusters.clusters_typologies clu_typ ON temp.cluster_id = clu_typ.cluster_id AND
temp.typology_id = clu_typ.typology_id AND
temp.semantic_id = clu_typ.semantic_id
WHERE clu_typ.cluster_id IS NULL
AND clu_typ.typology_id IS NULL
AND clu_typ.semantic_id IS NULL);

PostgreSQL 9.5 UPSERT in rule

I have an INSERT rule in an updatable view system, for which I would like to realize an UPSERT, such as :
CREATE OR REPLACE RULE _insert AS
ON INSERT TO vue_pays_gex.bals
DO INSTEAD (
INSERT INTO geo_pays_gex.voie(name, code, district) VALUES (new.name, new.code, new.district)
ON CONFLICT DO NOTHING;
But my since there can be many different combinations of these three columns, I don't think I can set a CONSTRAINT including them all (although I may be missing a point of understanding in the SQL logics), hence nullifying the ON CONFLIT DO NOTHING part.
The ideal solution would seem to be the use of an EXCEPT, but it only works in an INSERT INTO SELECT statement. Is there a way to use an INSERT INTO SELECT statement referring to the newly inserted row? Something like FROM new.bals (in my case)?
If not I could imagine a WHERE NOT EXISTS condition, but the same problem than before arises.
I'm guessing it is a rather common SQL need, but cannot find how to solve it. Any idea?
EDIT :
As requested, here is the table definition :
CREATE TABLE geo_pays_gex.voie
(
id_voie serial NOT NULL,
name character varying(50),
code character varying(15),
district character varying(50),
CONSTRAINT prk_constraint_voie PRIMARY KEY (id_voie),
CONSTRAINT voie_unique_key UNIQUE (name, code, district)
);

How do you define uniqueness? If it is the combination of name + code + district, then just add a constraint UNIQUE(name, code, district) on the table geo_pays_gex.voie. The 3, together, must be unique... but you can have several time the same name, or code, or district.
See it at http://rextester.com/EWR73154
EDIT ***
Since you can have Nulls and want to treat them as a unique value, you can replace the constraint creation by a unique index that replace the nulls
CREATE UNIQUE INDEX
voie_uniq ON voie
(COALESCE(name,''), code, COALESCE(district,''));

In addition to #JGH's answer.
INSERT in rule for INSERT will lead to infinity recursion (Postgres 9.6).
Full (NOT)runnable example:
CREATE SCHEMA ttest;
CREATE TABLE ttest.table_1 (
id bigserial
CONSTRAINT pk_table_1 PRIMARY KEY,
col_1 text,
col_2 text
);
CREATE OR REPLACE RULE table_1_always_upsert AS
ON INSERT TO ttest.table_1
DO INSTEAD (
INSERT INTO ttest.table_1(id, col_1, col_2)
VALUES (new.id, new.col_1, new.col_2)
ON CONFLICT ON CONSTRAINT pk_table_1
DO UPDATE
SET col_1 = new.col_1,
col_2 = new.col_2
);
INSERT INTO ttest.table_1(id, col_1, col_2) -- will result error: infinity recursion in rules
VALUES (1, 'One', 'A'),
(2, 'Two', 'B');
INSERT INTO ttest.table_1(id, col_1, col_2)
VALUES (1, 'One_updated', 'A_updated'),
(2, 'Two_updated', 'B_updated'),
(3, 'Three_inserted', 'C_inserted');
SELECT *
FROM ttest.table_1;

insert into and select

how would the query on:
Update the field total_horas with the hours worked on each project
I have:
insert into proyecto(total_horas)
select trabaja.nhoras
from trabaja;
But it's trying to insert in the first firld of "proyecto" instead on the field "total_horas"
my table:
CREATE TABLE proyecto (
cdpro CHAR(3) NOT NULL PRIMARY KEY,
nombre VARCHAR(30),
coddep CHAR(2),
FOREIGN KEY (coddep)
REFERENCES departamento(cddep)
ON DELETE CASCADE
);
also altered with: alter table proyecto ADD total_horas char ;

You have to put a where condition in select statement.And please elaborate you question. trabaja.nhoras is the column name and you are selecting it from table trabaja
Example:
INSERT INTO proyecto
(total_horas)
SELECT trabaja.nhoras
FROM trabaja
WHERE 'condition' = 'some condition';

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

NiFi PutDataBaseRecord to Postgres with multi column Unique constraint - postgresql

Related

Postgres upsert with composite unique key to allow only single null value

Syntax error on Upsert PostgreSql while usin an insert into with on conflict [duplicate]

How can I bulk insert rows only if a compound primary key don't already exist? [AWS Redshift]

PostgreSQL 9.5 UPSERT in rule

insert into and select

Categories

Resources