Unique partial constraint in Postgres? - postgresql

I have a table and want to update some of the rows like this:
CREATE TABLE abc(i INTEGER, deleted BOOLEAN);
CREATE UNIQUE INDEX myidx ON abc(i) WHERE NOT deleted;
INSERT INTO abc VALUES (4), (5);
UPDATE abc SET i = i - 1;
Which works ok because of the order in which the UPDATE is processing the rows, but when the UPDATE is attempted like this, it fails:
UPDATE abc SET i = i + 1;
ERROR: 23505: duplicate key value violates unique constraint "myidx"
DETAIL: Key (i)=(4) already exists.
SCHEMA NAME: public
TABLE NAME: abc
CONSTRAINT NAME: myidx
LOCATION: _bt_check_unique, nbtinsert.c:534
Time: 0.472 ms
The reason of the error is, in the middle of the update 2 rows would have had the value i = 4, even though at the end of the update all rows would have had unique values.
So I thought of changing the index into a deferred constraint, but according to the docs, this is not possible as my index is partial (so it only enforces uniqueness on some rows):
A uniqueness restriction covering only some rows cannot be written as a unique constraint, but it is possible to enforce such a restriction by creating a unique partial index.
The docs say to use partial indexes, but those can't be deferred, so I go back to the original problem.
So far my solution would be to set i = NULL whenever I mark deleted = true so it's not considered duplicated by my constraint anymore.
Is there a better solution to this? Maybe a way to make the UPDATE go always in the direction I want?
Please note:
I cannot DELETE the row, that's why the deleted column is there. The actual delete is done after some human validation happens.
Update:
The reason I'm bulk-updating that unique column is because this table contains a sequence that is used in the UI for sorting the records (the users drag and drop the records as they wish). And they can also delete them (so I shift the sequences of elements occurring after the one that was deleted).
The actual columns look more like this (name TEXT, description TEXT, ..., sequence NUMBER).
That sequence row is what in the simplified case I called i. So say I have 3 records with (name, sequence):
("Laptop", 1)
("Mobile", 2)
("Desktop", 3)
And I the user deletes the middle one, I want to end up with:
("Laptop", 1)
("Desktop", 2) // <--- updated here

Related

How can a relational database with foreign key constraints ingest data that may be in the wrong order?

The database is ingesting data from a stream, and all the rows needed to satisfy a foreign key constraint may be late or never arrive.
This can likely be accomplished by using another datastore, one without foreign key constraints, and then when all the needed data is available, read into the database which has fk constraints. However, this adds complexity and I'd like to avoid it.
We're working on a solution that creates "placeholder" rows to point the foreign key to. When the real data comes in, the placeholder is replaced with real values. Again, this adds complexity, but it's the best solution we've found so far.
How do people typically solve this problem?
Edit: Some sample data which might help explain the problem:
Let's say we have these tables:
CREATE TABLE order (
id INTEGER NOT NULL,
order_number,
PRIMARY KEY (id),
UNIQUE (order_number)
);
CREATE TABLE line_item (
id INTEGER NOT NULL,
order_number INTEGER REFERENCES order(order_number),
PRIMARY KEY (id)
);
If I insert an order first, not a problem! But let's say I try:
INSERT INTO line_item (order_number) values (123) before order 123 was inserted. This will fail the fk constraint of course. But this might be the order I get the data, since it's reading from a stream that is collecting this data from multiple sources.
Also, to address #philpxy's question, I didn't really find much on this. One thing that was mentioned was deferred constraints. This is a mechanism that waits to do the fk constraints at the end of a transaction. I don't think it's possible to do that in my case however, since these insert statements will be run at random times whenever the data is received.
You have a business workflow problem, because line items of individual orders are coming in before the orders themselves have come in. One workaround, perhaps not ideal, would be to create a before insert trigger which checks, for every incoming insert to the line_item table, whether that order already exists in the order table. If not, then it will first insert the order record before trying the insert on line_item.
CREATE OR REPLACE FUNCTION "public"."fn_insert_order" () RETURNS trigger AS $$
BEGIN
INSERT INTO "order" (order_number)
SELECT NEW.order_number
WHERE NOT EXISTS (SELECT 1 FROM "order" WHERE order_number = NEW.order_number);
RETURN NEW;
END
$$
LANGUAGE 'plpgsql'
# trigger
CREATE TRIGGER "trigger_insert_order"
BEFORE INSERT ON line_item FOR EACH ROW
EXECUTE PROCEDURE fn_insert_order()
Note: I am assuming that the id column of the order table in fact is auto increment, in which case Postgres would automatically assign a value to it when inserting as above. Most likely, this is what you want, as having two id columns which both need to be manually assigned does not make much sense.
You could accomplish that with a BEFORE INSERT trigger on line_item.
In that trigger you query order if a matching item exists, and if not, you insert a dummy row.
That will allow the INSERT to succeed, at the cost of some performance.
To insert rows into order, use
INSERT INTO order ...
ON CONFLICT ON (order_number) DO UPDATE SET
id = EXCLUDED.id;
Updating a primary key is problematic and may lead to conflicts. One way you could get around that is if you use negative ids for artificially generated orders (assuming that the real ids are positive). If you have any references to that primary key, you'd have to define the constraint with ON UPDATE CASCADE.

postgresql unique constraint for any integer from two columns (or from array)

How to guarantee a uniqueness of any integer from the two columns / array?
Example: I create a table and insert one row in it:
CREATE TABLE mytest(a integer NOT NULL, b integer NOT NULL);
INSERT INTO mytest values (1,2);
What UNIQUE INDEX should I create to not allow add any of the following values
INSERT INTO mytest values (1,3); # because 1 is already there
INSERT INTO mytest values (3,1); # because 1 is already there
INSERT INTO mytest values (2,3); # because 2 is already there
INSERT INTO mytest values (3,2); # because 2 is already there
I can have array of two elements instead of two columns if it helps somehow.
Surely, I can invent some workaround, the following come into my mind:
create separate table for all numbers, have unique index there, and add values to two tables with transaction. If the number is not unique, it won't be added to the second table, and transaction fails
add two rows instead of one, and have additional field for id-of-the-pair.
But I want to have one table and I need one row with two elements in it. Is that possible?
You can use exclusion constraint on table along with intarray to quickly perform search of overlapping arrays:
CREATE EXTENSION intarray;
CREATE TABLE test (
a int[],
EXCLUDE USING gist (a gist__int_ops WITH &&)
);
INSERT INTO test values('{1,2}');
INSERT INTO test values('{2,3}');
>> ERROR: conflicting key value violates exclusion constraint "test_a_excl"
>> DETAIL: Key (a)=({2,3}) conflicts with existing key (a)=({1,2}).

incorrect data update on Sybase trigger execution

I have a table test_123 with the column as:
int_1 (int),
datetime_1 (datetime),
tinyint_1 (tinyint),
datetime_2 (datetime)
So when column datetime_1 is updated and the value at column tinyint_1 = 1 that time i have to update my column datetime_2 with column value of datetime_1
I have created the below trigger for this.. but with my trigger it is updating all datetime2 column values with datetime_1 column when tinyint_1 = 1 .. but i just want to update that particular row where datetime_1 value has updated( i mean changed)..
Below is the trigger..
CREATE TRIGGER test_trigger_upd
ON test_123
FOR UPDATE
AS
FOR EACH STATEMENT
IF UPDATE(datetime_1)
BEGIN
UPDATE test_123
SET test_123.datetime_2 = inserted.datetime_1
WHERE test_123.tinyint_1 = 1
END
ROW-level triggers are not supported in ASE. There are only after-statement triggers.
As commented earlier, the problem you're facing is that you need to be able to link the rows in the 'inserted' pseudo-table to the base table itself. You can only do that if there is a key -- meaning: a column that uniquely identifies a row, or a combination of columns that does so. Without that, you simply cannot identify the row that needs to be updated, since there may be multiple rows with identical column values if uniqueness is not guaranteed.
(and on a side note: not having a key in a table is bad design practice -- and this problem is one of the many reasons why).
A simple solution is to add an identity column to the table, e.g.
ALTER TABLE test_123 ADD idcol INT IDENTITY NOT NULL
You can then add a predicate 'test_123.idcol = inserted.idcol' to the trigger join.

PostgreSQL multi-column unique constraint causes errors

I am new to postgresql and have a question about multiple column unique constraint.
I got this error when tried to add rows to the table:
ERROR: duplicate key value violates unique constraint "i_rb_on"
DETAIL: Key (a_fk, b_fk)=(296, 16) already exists.
I used this code (short version):
INSERT INTO rb_on (a_fk, b_fk) SELECT a.pk, b.pk FROM A, B WHERE NOT EXISTS (SELECT * FROM rb_on WHERE a_fk=a.pk AND b_fk=b.pk);
i_rb_on is unique constraint / columns (a_fk, b_fk).
It seems that my WHERE NOT EXISTS doesn't provide a protection against the duplicate key error for this kind of unique key.
UPDATE:
INSERT INTO tabA (mark_done, log_time, tabB_fk, tabC_fk)
SELECT FALSE, '2003-09-02 04:05:06', tabB.pk, tabC.pk FROM tabB, tabC, tabD, tabE, tabF
WHERE (tabC.sf_id='SUMMER' AND tabC.sf_status IN(0,1)
AND tabE.inventory_status=0)
AND tabF.tabD_fk=tabD.pk
AND tabD.tabE_fk=tabE.pk
AND tabE.tabB_fk=tabB.pk
AND tabF.tabC_fk=tabC.pk
AND NOT EXISTS (SELECT *
FROM tabA
WHERE tabB_fk=tabB.pk AND tabC_fk=tabC.pk);
In tabA unique index:
CREATE UNIQUE INDEX i_tabA
ON tabA
USING btree
(tabB_fk , tabC_fk );
Only one row (of many) must be inserted into the tabA.
Your WHERE NOT EXISTS never provides proper protection against a unique violation. It only seems to most of the time. The WHERE NOT EXISTS can run concurrently with another insert, so the row is still inserted multiple times and all but one of the inserts causes a unique violation.
For that reason it's often better to just run the insert and let the violation happen if the row already exists.
I can't help you with the exact problem described unless you show the data (as SQL CREATE TABLE and INSERTs) and the real query.
BTW, please don't use old style A, B joins. Use A INNER JOIN B ON (...). It makes it easier to tell which join conditions are supposed to apply to which parts of the query, and harder to forget a join condition. You seem to have done that; you're attempting to insert a cartesian product. I suspect it's just an editing mistake in the query.
I added LIMIT 1 to the end: ...WHERE tabB_fk=tabB.pk AND tabC_fk=tabC.pk) LIMIT1 ;
and it did the trick.
I created a function with LIMIT 1 and ...EXCEPTION WHEN unique_violation THEN ... and it also worked.
But when LIMIT 1 and "NOT EXISTS" are used, I think, it is not necessary to use unique_violation error handling.

Prevent dupe records in SQL Server

So I have this stored procedure to insert a message into my database. I wanted to prevent users from posting duplicate messages within a short period of time, whether on accident or on purpose (either a laggy connection or a spammer).
This is what the insert statement looks like:
IF NOT EXISTS (SELECT * FROM tblMessages WHERE message = #message and ip = #ip and datediff(minute,timestamp, getdate()) < 10)
BEGIN
INSERT INTO tblMessages (ip, message, votes, latitude, longitude, location, timestamp, flags, deleted, username, parentId)
VALUES (#ip, #message, 0, #latitude, #longitude, #location, GETDATE(), 0, 0, #username, #parentId)
END
You can see I check to see if the same user has posted the same message within 10 minutes, and if not, I post it. I still saw one dupe come through yesterday. When I checked the timestamp of both messages in the database, they were exactly the same, down to the second, so I'm guessing when this 'exists' check ran on each insert, both came back empty, so they both inserted fine (at basically the same exact time).
What's a way I can prevent this from happening correctly?
I reckon you need a trigger
A unique constraint/index isn't clever enough to deal with the 10 minute gap between posts for a given message and ip.
CREATE TRIGGER TRG_tblMessages_I FRO INSERT
AS
SET NOCOUNT ON;
IF EXISTS (SELECT *
FROM tblMessages M
JOIN INSERTED I ON M.message = I.message and M.ip = I.ip
WHERE
datediff(minute, M.timestamp, I.timestamp) < 10)
BEGIN
RAISERRROR ('blah', 16, 1)
ROLLBACK TRAN
END
Edit: you need an extra condition to ignore the same row you have just inserted (eg using surrogate key)
Actually Derek Kromm isn't far off; Essentially you do want a unique constraint, you just want range for one of the columns.
You can express this as a filtered index which enforces the uniqueness on the columns you want but with a filter to match timestamps within a 10 minutes range.
CREATE NONCLUSTERED INDEX IX_UNC_tblMessages
ON tblMessages (message, ip, timestamp)
WHERE datediff(minute, timestamp, getdate()) < 10)
On the difference between a unique constraint and a filtered index which maintains uniqueness (MSDN):
There are no significant differences between creating a UNIQUE
constraint and creating a unique index independent of a constraint.
Data validation occurs in the same manner and the query optimizer does
not differentiate between a unique index created by a constraint or
manually created. However, you should create a UNIQUE or PRIMARY KEY
constraint on the column when data integrity is the objective. By
doing this the objective of the index will be clear.
The only aspect of this I'm not sure about is the use of getdate(). I'm not sure what effect that will have on the index and performance- this you will want to test for yourself.
Add a unique constraint to the table to absolutely prevent it from happening
ALTER TABLE tblMessages ADD CONSTRAINT uq_tblMessages UNIQUE (message,ip,timestamp)
I think, the easiest way is to use a triggers to check the sender and body of message in existing records in the table.
or, as Derek said, you can use the constraint, but with another condition:
ALTER TABLE tblMessages ADD CONSTRAINT uq_tblMessages UNIQUE (message,ip,username, parentId)
but constraint will generate exception (and you will need to handle it).