Postgres upsert with composite unique key to allow only single null value

Postgres upsert with composite unique key to allow only single null value - postgresql

As a part of ETL, table continuous_trips has continuous flow of incoming records.
New records are aggregated and get inserted into temp. table called trips_agg every 5 minutes.
CREATE TABLE IF NOT EXISTS trips_agg AS (
SELECT start_time, station_id, from_station, to_station, from_terminus, end_terminus, previous_station, next_station,
AVG(wait_span) AS wait_span,
AVG(walk_span) AS walk_span,
AVG(delay_span) AS delay_span,
SUM(passengers_requests) AS passengers_requests
FROM continuous_trips
GROUP BY start_time, station_id, from_station, to_station, from_terminus, end_terminus, previous_station, next_station
)
The table trips_agg gets dropped after inserting all records into the table daily_trips and recreated during next cycle.
Tables daily_trips & trips_agg have the same columns.
CREATE TABLE IF NOT EXISTS daily_trips (
start_time timestamp without time zone NOT NULL,
station_id text NOT NULL,
from_station text NOT NULL,
to_station text NOT NULL,
from_terminus text NOT NULL,
end_terminus text NOT NULL,
previous_station text,
next_station text,
wait_span interval NOT NULL,
walk_span interval NOT NULL,
delay_span interval NOT NULL,
passengers_requests numeric NOT NULL
)
Note: columns 'previous_station' and 'next_station' allows null.
composite unique key is added as follows:
ALTER TABLE daily_trips ADD CONSTRAINT daily_trips_unique_row UNIQUE
(start_time, station_id, from_station, to_station, from_terminus, end_terminus, previous_station, next_station);
In case unique key is violated upon insertion, the record should be updated. So used upsert strategy.
INSERT INTO daily_trips SELECT * FROM trips_agg
ON CONFLICT (start_time, station_id, from_station, to_station, from_terminus, end_terminus,
previous_station, next_station) DO UPDATE
set wait_span = (daily_trips.wait_span + EXCLUDED.wait_span)/2,
walk_span = (daily_trips.walk_span + EXCLUDED.walk_span)/2 ,
delay_span = (daily_trips.delay_span + EXCLUDED.delay_span)/2,
passengers_requests =(daily_trips.passengers_requests + EXCLUDED.passengers_requests);
When values for all columns are present this setup works perfectly but, it's not the case when any of nullable columns have a null value.
Since Postgres doesn't consider null values to invoke unique constraint, whenever any of nullable columns have null value, a new row is inserted, instead of update. This results into multiple rows for the unique key.
To overcome this, added an index on the table daily_trips after referring this article.
create unique index daily_trips_unique_trip_idx ON daily_trips
(start_time, station_id, from_station, to_station, from_terminus, end_terminus,
(previous_station IS NULL), (next_station IS NULL)
where previous_station IS NULL or fnext_station IS NULL
However, only one row could be added with null value for any nullable column.
For next row with null value for any nullable column, update is not happening and instead getting following error:
ERROR: duplicate key value violates unique constraint "daily_trips_unique_trip_idx"
What is needed?
The unique constraint should be respected and update should happen when there is null value in either of nullable columns 'previous_station' or 'next_station'.
Any help is appreciated.

The solution is to translate NULL to some other value, more specifically the 0-length string (''). The coalesce function does precisely that when used as coalesce (column_name, ''). The problem being creating a unique constraint with that generates a syntax error. So you cannot create that constraint. However, there is a work around, although not a easy one. Postgres enforces unique constraints through a unique index, so just create the index directly.
create unique index daily_trips_unique_row on daily_trips
( start_time
, station_id
, from_station
, to_station
, from_terminus
, end_terminus
, coalesce(previous_station , '')
, coalesce(next_station, '')
);
However, while the above respects the null-ability of index columns it no longer recognizes INSERT ... ON CONFLICT (See example here) . You will either need a function/procedure to handle the exception or use Select ... if exists then Update else Insert logic.

Related

PostgreSQL: some troubles to insert from select with on conflict

I have some Postgres tables:
CREATE TABLE source_redshift.staticprompts (
id INT,
projectid BIGINT,
scriptid INT,
promptnum INT,
prompttype VARCHAR(20),
inputs VARCHAR(2000),
attributes VARCHAR(2000),
text VARCHAR(2000),
corpuscode VARCHAR(2000),
comment VARCHAR(2000),
created TIMESTAMP,
modified TIMESTAMP
);
and
CREATE TABLE target_redshift.user_input_conf (
collect_project_id BIGINT NOT NULL UNIQUE,
prompt_type VARCHAR(20),
prompt_input_desc VARCHAR(300),
prompt_input_name VARCHAR(100),
no_of_prompt_count BIGINT,
prompt_input_value VARCHAR(100) UNIQUE,
prompt_input_value_id BIGSERIAL PRIMARY KEY,
script_id BIGINT,
corpuscode VARCHAR(20),
min_recordings VARCHAR(2000),
max_recordings VARCHAR(2000),
recordings_count VARCHAR(2000),
lease_duration VARCHAR(2000),
date_created TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT NOW(),
date_updated TIMESTAMP WITHOUT TIME ZONE,
CONSTRAINT must_be_different UNIQUE (prompt_input_value,collect_project_id)
);
I need copy data from staticprompts to user_input_conf with this rules:
Primary Key : prompt_input_value_id
Unique Values : collect_project_id, prompt_input_value
Data Load Logic :
Insert only when new prompt input value is found for given collect project from source. Inputs column stores the values in JSON format in staticprompts table.
Insert :
Generate unique sequence number for each of the new prompt input value for a collect project id from source and store in prompt_input_value_id.
Update :
If prompt value already exists for a collect project and if there are any value changes on prompt_input_desc or prompt input name or prompt input value then update only those columns.
prompt_input_value_id - Generate unique sequence number for the combination of each prompt_input_value and collect_project_id
prompt_input_value - Inputs.value is stored in the inputs column as JSON text. Create a unique record for each inputs.value. Look at the example below this table.
I try to use this query:
INSERT INTO target_redshift.user_input_conf AS t (
collect_project_id,
prompt_type,
prompt_input_desc,
prompt_input_name,
prompt_input_value,
script_id,
corpuscode)
SELECT
s.projectid,
s.prompttype,
s.inputs::jsonb#>>'{inputs,0,desc}' AS desc,
s.inputs::jsonb#>>'{inputs,0,name}' AS name,
s.inputs::jsonb#>>'{inputs,0,values}' AS values,
s.scriptid,
s.corpuscode
FROM source_redshift.staticprompts AS s
ON CONFLICT (collect_project_id, prompt_input_value)
DO UPDATE SET
(prompt_input_desc, prompt_input_name, prompt_input_value, date_updated) =
(EXCLUDED.prompt_input_desc, EXCLUDED.prompt_input_name, EXCLUDED.prompt_input_value, NOW())
WHERE t.prompt_input_desc != EXCLUDED.prompt_input_desc
OR t.prompt_input_name != EXCLUDED.prompt_input_name
OR t.prompt_input_value != EXCLUDED.prompt_input_value;
""")
But I get an error:
psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "user_input_conf_collect_project_id_key"
DETAIL: Key (collect_project_id)=(1) already exists.

I think there is a misunderstanding. A unique constraint over two columns does not mean that each of the columns is unique, but that the combination of the two columns is unique.
So your must_be_different is different (and weaker) than the unique constraints on prompt_input_value and collect_project_id. For example, if you have the three rows
collect_project_id | prompt_input_value
--------------------+--------------------
1 | a
1 | b
2 | b
they will create a conflict with both single-column unique constraints, but nor with must_be_different.
I guess that the underlying problem is that you want to use INSERT ... ON CONFLICT with multiple unique constraints. That cannot be done; see this question for a discussion and potential solutions.

Syntax error on Upsert PostgreSql while usin an insert into with on conflict [duplicate]

I'm getting the following error when doing the following type of insert:
Query:
INSERT INTO accounts (type, person_id) VALUES ('PersonAccount', 1) ON
CONFLICT (type, person_id) WHERE type = 'PersonAccount' DO UPDATE SET
updated_at = EXCLUDED.updated_at RETURNING *
Error:
SQL execution failed (Reason: ERROR: there is no unique or exclusion
constraint matching the ON CONFLICT specification)
I also have an unique INDEX:
CREATE UNIQUE INDEX uniq_person_accounts ON accounts USING btree (type,
person_id) WHERE ((type)::text = 'PersonAccount'::text);
The thing is that sometimes it works, but not every time. I randomly get
that exception, which is really strange. It seems that it can't access that
INDEX or it doesn't know it exists.
Any suggestion?
I'm using PostgreSQL 9.5.5.
Example while executing the code that tries to find or create an account:
INSERT INTO accounts (type, person_id, created_at, updated_at) VALUES ('PersonAccount', 69559, '2017-02-03 12:09:27.259', '2017-02-03 12:09:27.259') ON CONFLICT (type, person_id) WHERE type = 'PersonAccount' DO UPDATE SET updated_at = EXCLUDED.updated_at RETURNING *
SQL execution failed (Reason: ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification)
In this case, I'm sure that the account does not exist. Furthermore, it never outputs the error when the person has already an account. The problem is that, in some cases, it also works if there is no account yet. The query is exactly the same.

Per the docs,
All table_name unique indexes that, without regard to order, contain exactly the
conflict_target-specified columns/expressions are inferred (chosen) as arbiter
indexes. If an index_predicate is specified, it must, as a further requirement
for inference, satisfy arbiter indexes.
The docs go on to say,
[index_predicate are u]sed to allow inference of partial unique indexes
In an understated way, the docs are saying that when using a partial index and
upserting with ON CONFLICT, the index_predicate must be specified. It is not
inferred for you. I learned this
here, and the following example demonstrates this.
CREATE TABLE test.accounts (
id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
type text,
person_id int);
CREATE UNIQUE INDEX accounts_note_idx on accounts (type, person_id) WHERE ((type)::text = 'PersonAccount'::text);
INSERT INTO test.accounts (type, person_id) VALUES ('PersonAccount', 10);
so that we have:
unutbu=# select * from test.accounts;
+----+---------------+-----------+
| id | type | person_id |
+----+---------------+-----------+
| 1 | PersonAccount | 10 |
+----+---------------+-----------+
(1 row)
Without index_predicate we get an error:
INSERT INTO test.accounts (type, person_id) VALUES ('PersonAccount', 10) ON CONFLICT (type, person_id) DO NOTHING;
-- ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification
But if instead you include the index_predicate, WHERE ((type)::text = 'PersonAccount'::text):
INSERT INTO test.accounts (type, person_id) VALUES ('PersonAccount', 10)
ON CONFLICT (type, person_id)
WHERE ((type)::text = 'PersonAccount'::text) DO NOTHING;
then there is no error and DO NOTHING is honored.

A simple solution of this error
First of all let's see the cause of error with a simple example. Here is the table mapping products to categories.
create table if not exists product_categories (
product_id uuid references products(product_id) not null,
category_id uuid references categories(category_id) not null,
whitelist boolean default false
);
If we use this query:
INSERT INTO product_categories (product_id, category_id, whitelist)
VALUES ('123...', '456...', TRUE)
ON CONFLICT (product_id, category_id)
DO UPDATE SET whitelist=EXCLUDED.whitelist;
This will give you error No unique or exclusion constraint matching the ON CONFLICT because there is no unique constraint on product_id and category_id. There could be multiple rows having the same combination of product and category id (so there can never be a conflict on them).
Solution:
Use unique constraint on both product_id and category_id like this:
create table if not exists product_categories (
product_id uuid references products(product_id) not null,
category_id uuid references categories(category_id) not null,
whitelist boolean default false,
primary key(product_id, category_id) -- This will solve the problem
-- unique(product_id, category_id) -- OR this if you already have a primary key
);
Now you can use ON CONFLICT (product_id, category_id) for both columns without any error.
In short: Whatever column(s) you use with on conflict, they should have unique constraint.

The easy way to fix it is by setting the conflicting column as UNIQUE

I did not have a chance to play with UPSERT, but I think you have a case from
docs:
Note that this means a non-partial unique index (a unique index
without a predicate) will be inferred (and thus used by ON CONFLICT)
if such an index satisfying every other criteria is available. If an
attempt at inference is unsuccessful, an error is raised.

I solved the same issue by creating one UNIQUE INDEX for ALL columns you want to include in the ON CONFLICT clause, not one UNIQUE INDEX for each of the columns.
CREATE TABLE table_name (
element_id UUID NOT NULL DEFAULT gen_random_uuid(),
timestamp TIMESTAMP NOT NULL DEFAULT now():::TIMESTAMP,
col1 UUID NOT NULL,
col2 STRING NOT NULL ,
col3 STRING NOT NULL ,
CONSTRAINT "primary" PRIMARY KEY (element_id ASC),
UNIQUE (col1 asc, col2 asc, col3 asc)
);
Which will allow to query like
INSERT INTO table_name (timestamp, col1, col2, col3) VALUES ('timestamp', 'uuid', 'string', 'string')
ON CONFLICT (col1, col2, col3)
DO UPDATE timestamp = EXCLUDED.timestamp, col1 = EXCLUDED.col1, col2 = excluded.col2, col3 = col3.excluded;

postgreql insert when no row exists

I have a table in postgresql named 'views', containing information about users viewing a classified ad.
CREATE TABLE views (
view_id uuid DEFAULT random_gen_uuid() NOT NULL,
user_id uuid NOT NULL,
ad_id uuid NOT NULL,
timestamp timestamp with time zone DEFAULT 'NOW()' NOT NULL
);
I want to be able to insert a row for a specific user/ad ONLY when there is no other row 'younger' than 5 minutes. So I want to check if there already is a row with the user ID and the ad ID and where the timestamp is less than 5 minutes old. If so, I want to do something like INSERT... ON CONFLICT DO NOTHING.
Is this possible to do with a UNIQUE constraint? Or do I need a CHECK constraint, or do I have to do a separate query first every time I insert this?

You have to do a lookup first, but you can do the lookup and the insert in one statement using something like this:
with invars (user_id, ad_id) as (
values (?, ?) -- Pass your two ids in
)
insert into views (user_id, ad_id)
select user_id, ad_id
from invars i
where not exists (select 1
from views
where (user_id, ad_id) = (i.user_id, i.ad_id)
and "timestamp" >= now() - interval '5 minutes');

A view that shows the name of the server, the id of the instance and the number of active sessions (a session is active if the end timestamp is null)

CREATE TABLE instances(
ser_name VARCHAR(20) NOT NULL,
id INTEGER NOT NULL ,
ser_ip VARCHAR(16) NOT NULL,
status VARCHAR(10) NOT NULL,
creation_ts TIMESTAMP,
CONSTRAINT instance_id PRIMARY KEY(id)
);
CREATE TABLE characters(
nickname VARCHAR(15) NOT NULL,
type VARCHAR(10) NOT NULL,
c_level INTEGER NOT NULL,
game_data VARCHAR(40) NOT NULL,
start_ts TIMESTAMP ,
end_ts TIMESTAMP NULL ,
player_ip VARCHAR(16) NOT NULL,
instance_id INTEGER NOT NULL,
player_username VARCHAR(15),
CONSTRAINT chara_nick PRIMARY KEY(nickname)
);
ALTER TABLE
instances ADD CONSTRAINT ins_ser_name FOREIGN KEY(ser_name) REFERENCES servers(name);
ALTER TABLE
instances ADD CONSTRAINT ins_ser_ip FOREIGN KEY(ser_ip) REFERENCES servers(ip);
ALTER TABLE
characters ADD CONSTRAINT chara_inst_id FOREIGN KEY(instance_id) REFERENCES instances(id);
ALTER TABLE
characters ADD CONSTRAINT chara_player_username FOREIGN KEY(player_username) REFERENCES players(username);
insert into instances values
('serverA','1','138.201.233.18','active','2020-10-20'),
('serverB','2','138.201.233.19','active','2020-10-20'),
('serverE','3','138.201.233.14','active','2020-10-20');
insert into characters values
('characterA','typeA','1','Game data of characterA','2020-07-18 02:12:12','2020-07-18 02:32:30','192.188.11.1','1','nabin123'),
('characterB','typeB','3','Game data of characterB','2020-07-19 02:10:12',null,'192.180.12.1','2','rabin123'),
('characterC','typeC','1','Game data of characterC','2020-07-18 02:12:12',null,'192.189.10.1','3','sabin123'),
('characterD','typeA','1','Game data of characterD','2020-07-18 02:12:12','2020-07-18 02:32:30','192.178.11.1','2','nabin123'),
('characterE','typeB','3','Game data of characterE','2020-07-19 02:10:12',null,'192.190.12.1','1','rabin123'),
('characterF','typeC','1','Game data of characterF','2020-07-18 02:12:12',null,'192.188.10.1','3','sabin123'),
('characterG','typeD','1','Game data of characterG','2020-07-18 02:12:12',null,'192.188.13.1','1','nabin123'),
('characterH','typeD','3','Game data of characterH','2020-07-19 02:10:12',null,'192.180.17.1','2','bipin123'),
('characterI','typeD','1','Game data of characterI','2020-07-18 02:12:12','2020-07-18 02:32:30','192.189.18.1','3','dhiraj123'),
('characterJ','typeD','3','Game data of characterJ','2020-07-18 02:12:12',null,'192.178.19.1','2','prabin123'),
('characterK','typeB','4','Game data of characterK','2020-07-19 02:10:12','2020-07-19 02:11:30','192.190.20.1','1','rabin123'),
('characterL','typeC','2','Game data of characterL','2020-07-18 02:12:12',null,'192.192.11.1','3','sabin123'),
('characterM','typeC','3','Game data of characterM','2020-07-18 02:12:12',null,'192.192.11.1','2','sabin123');
here I need a view that shows the name of the server, the id of the instance and the number of active sessions (a session is active if the end timestamp is null). do my code wrong or something else? i am starting to learn so hoping for positive best answers.
my view
create view active_sessions as
select i.ser_name, i.id, count(end_ts) as active
from instances i, characters c
where i.id=c.instance_id and c.end_ts = null
group by i.ser_name, i.id;

This does not do what you want:
where i.id = c.instance_id and c.end_ts = null
Nothing is equal to null. You need is null to check a value against null.
Also, count(end_ts) will always produce 0, as we know already that end_ts is null, which count() does not consider.
Finally, I would highly recommend using a standard join (with the on keyword), rather than an implicit join (with a comma in the from clause): this old syntax from decades ago should not be used in new code. I think that a left join is closer to what you want (it would also take in account instances that have no character at all).
So:
create view active_sessions as
select i.ser_name, i.id, count(c.nickname) as active
from instances i
left join characters c on i.id = c.instance_id and c.end_ts is null
group by i.ser_name, i.id;

How can I set the next value of a serial for the serial used by the primary key of a table in postgres?

I have Table A. Table A owns a sequence.
I create Table B, inheriting from Table A.
Table A and B now use the same default value for their primary key column.
For a simplified example, Table A is "person", and B is "bulk_upload_person".
CREATE TABLE "testing"."person" (
"person_id" serial, --Resulting DDL: int4 NOT NULL DEFAULT nextval('person_person_id_seq'::regclass)
"public" bool NOT NULL DEFAULT false
);
--SQL Ran
CREATE TABLE "testing"."bulk_upload_person" (
"upload_id" int4 NOT NULL
)
INHERITS ("testing"."person");
--Resulting DDL
CREATE TABLE "testing"."bulk_upload_person" (
"person_id" int4 NOT NULL DEFAULT nextval('person_person_id_seq'::regclass),
"public" bool NOT NULL DEFAULT false,
"upload_id" int4 NOT NULL
)
INHERITS ("testing"."person");
For table A, I can get the sequence by using pg_get_table_serial_seqence.
How can I get and then set the next value of the sequence if I only know about Table B? I want to add n to the value.
I need to do this in order to populate multiple related objects at once, while being able to know what primary IDs they will have, rather than having to query the tables I've just populated to determine the IDs.
By populate, I mean inserting multiple rows in one statement.
insert into "testing"."bulk_upload_person" ( "person_id", "public", "upload_id") values ( '1', 'f', '1'), ( '2', 't', '1'); --etc
I think our situation is similar to https://stackoverflow.com/a/8007835/89211 but we don't want to keep the lock on the table beyond getting and setting the next value of the serial for each table.
Currently we are doing this by getting the name of the sequence by regexing the default value of the primary key for Table B, but it feels like there's probably a better way to do this that we don't realise.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Postgres upsert with composite unique key to allow only single null value - postgresql

Related

PostgreSQL: some troubles to insert from select with on conflict

Syntax error on Upsert PostgreSql while usin an insert into with on conflict [duplicate]

postgreql insert when no row exists

A view that shows the name of the server, the id of the instance and the number of active sessions (a session is active if the end timestamp is null)

How can I set the next value of a serial for the serial used by the primary key of a table in postgres?

Categories

Resources