PostgreSQL two-table constaint not working properly - postgresql

I run the following code:
-- Table describing messages
CREATE TABLE messages
(
id serial PRIMARY KEY NOT NULL,
text TEXT -- Message can have or not have text
);
-- Table describing media attached to messages
CREATE TABLE messages_attachments
(
message_id integer NOT NULL REFERENCES messages,
-- Messages can have any number of attachments, including 0
attachment_id TEXT NOT NULL
);
-- Messages must have either text or at least one attachment
CREATE FUNCTION message_has_text_or_attachments(integer) RETURNS bool STABLE
AS
$$
SELECT
EXISTS(SELECT 1 FROM messages_attachments WHERE message_id = $1)
OR
(SELECT text IS NOT NULL FROM messages WHERE id = $1);
$$ LANGUAGE SQL;
ALTER TABLE messages ADD CONSTRAINT nonempty_message CHECK ( message_has_text_or_attachments(id) );
-- Insert a message with no text and no attachments. Should fail, but it does not
INSERT INTO messages(text) VALUES (NULL);
SELECT *, message_has_text_or_attachments(id) FROM messages;
I expected it to fail on the INSERT line because the row being inserted violates the check constraint (we are inserting a message which's text is NULL and there are no attachments for that message), but it runs successfully and the next query returns (1, NULL, false) (here is an example with slightly modified function definition (apostrophes instead of dollar symbols because of the database version).
One more interesting thing is that if I change the order of the commands and INSERT the row before adding the CONSTRAINT, then PostgreSQL fails to ALTER the table, because "check constraint "nonempty_message" is violated by some row".
Why does PostgreSQL allow inserting the row, which violates the constraint? Am I mistaken somewhere in the function definition? Is there some limitation on how constraints can be applied and which tables can they depend on? Is it a PostgreSQL bug?

From the docs:
PostgreSQL does not support CHECK constraints that reference table data other than the new or updated row being checked. While a CHECK constraint that violates this rule may appear to work in simple tests, it cannot guarantee that the database will not reach a state in which the constraint condition is false (due to subsequent changes of the other row(s) involved).

Related

Problems with creating a postgresql trigger function to create a modified entry for every insert statement

So my first question here on SO,
let me describe the setup:
I have a postgressql database (version 12) with a table guilds (containing an internal guild_id and a few other informations). The guild_id is used as foreign key for many other tables like a teams table. Now if a team is inserted in teams for another guild then the guild with the guild_id = 1, I want a trigger function to create the same team entry, but now with a modified guild_id (should be now 1).
Definition of the relevant things I have atm:
create table if not exists bot.guilds
(
guild_id bigserial not null
constraint guilds_pk
primary key,
guild_dc_id bigint not null,
);
create table if not exists bot.teams
(
team_id bigserial not null
constraint teams_pk
primary key,
guild_id bigserial not null
constraint teams_guilds_guild_id_fk
references bot.guilds
on delete cascade,
team_name varchar(20) not null,
team_nickname varchar(10) not null
);
alter table bot.teams
owner to postgres;
create unique index if not exists teams_guild_id_team_name_uindex
on bot.teams (guild_id, team_name);
create unique index if not exists teams_guild_id_team_nickname_uindex
on bot.teams (guild_id, team_nickname);
create function duplicate_teams() returns trigger
language plpgsql
as
$$
BEGIN
INSERT INTO bot.teams VALUES(1,NEW."team_name",NEW."team_nickname");
RETURN NEW;
END;
$$;
create trigger duplicate_team
after insert
on bot.teams
for each row
execute procedure bot.duplicate_teams();
If I try now to insert a new row in teams (INSERT INTO bot.teams ("guild_id", "team_name", "team_nickname")VALUES (14, 'test2', 'test2');), I get the following error message (orginial german, translated by me to english):
[42804] ERROR: Column »guild_id« has type integer, but the expression has the type character varying
HINT: You have to rewrite the expression or cast the value.
WITH: PL/pgSQL-function duplicate_teams() row 3 in SQL-expressions
After execution the orgininal insert statement isn't in the table neither the copy.
I tried to cast the values for the guild id to serial, integer, bigserial.. but everytime the same error. I'm confused by the error message part with "has the type character varying".
So my questions are:
Is my understanding correct, that the error is caused by the trigger? and due to the error in the trigger the original insert statement doesnt work too?
Why is the type varing even with a cast?
Where is the error in the code?
I tried to search for the problem, but found nothing helpfull. Any hints are welcome. Thank you for your help!
EDIT:
The answer from #Lukas Thaler works, but now I get a new error:
[23505] ERROR: doubled key value violates unique-constraint »teams_guild_id_team_name_uindex«
Detail: Key»(guild_id, team_name)=(1, test3)« exists already.
WHERE: SQL-Statement»INSERT INTO bot.teams(guild_id, team_name, team_nickname) VALUES(1,NEW."team_name",NEW."team_nickname")«
PL/pgSQL-Function duplicate_teams() row 3 in SQL-Statement
SQL-Statment »INSERT INTO bot.teams(guild_id, team_name, team_nickname) VALUES(1,NEW."team_name",NEW."team_nickname")«
PL/pgSQL-Function duplicate_teams() row 3 in SQL-Statement
But the table only contains only "3,11,TeamUtils,TU"...
bot.teams has four columns: team_id, guild_id (both numerical data types), team_name and team_nickname (both varchars). In your INSERT statement in the function definition, you only provide three values and no association to particular columns. The default is to insert them in order, which assigns 1 to team_id and (crucially) NEW."team_name" to guild_id, hence the insert fails with a type mismatch error.
Specifying
INSERT INTO bot.teams(guild_id, team_name, team_nickname) VALUES(1,NEW."team_name",NEW."team_nickname");
in your function should resolve your problem
To answer your other questions:
The INSERT statement is being executed inside a transaction, and a failure in the trigger will cause the entire transaction to be aborted and rolled back, hence you don't see the original row inserted to the table, either
The type is not deviating from the cast, it was the wrong value being inserted that caused the data type mismatch

Is it possible to use a trigger to modify a value before it's inserted or updated?

We have several apps putting empty strings into a non-nullable database field which is causing issues with other applications that are expecting a value. What I'd like to do for now is turn the empty string into a null before an insert or update so that Teradata throws Column 'whatever' is NOT NULL which would result in an exception being thrown in the application.
Edit
I removed my old coneptual SQL which was incorrect and replaced it with new SQL that actually works, but only partially.
Replace trigger mydb.inserttest
before insert ON mydb.test
referencing new row as newrow
for each row
(
set newrow.name = case when newrow.name = '' then null else newrow.name end;
);
This appears to replace the empty string with a null before the insert. However, it also seems to not throw an exception when a UPI is violated. For example, I have the following table:
create table mydb.test
(name varchar(20) not null)
unique primary index (name);
I can execute this statement successfully the first time:
insert into mydb.mytable ('joe');
It tells me INSERT completed. 1 rows processed. However, if I run it again, it simply tells me INSERT completed. 0 rows processed. What you'd normally expect is a Duplicate unique prime key error, but the trigger seems to somehow suppress the exception which causes the calling .NET application to die silently when the UPI constraint is violated.
Why not a check constraint?
create table mydb.test
(name varchar(20) not null, check (name <> ''))
unique primary index (name);

PostgreSQL: How to revalidate CHECKs

My database has the following structure:
CREATE TYPE instrument_type AS ENUM (
'Stock',
...
'Currency',
...
);
CREATE FUNCTION get_instrument_type(instrument_id bigint) RETURNS instrument_type
LANGUAGE plpgsql STABLE RETURNS NULL ON NULL INPUT
AS $$
BEGIN
RETURN (SELECT instr_type FROM instruments WHERE id = instrument_id);
END
$$;
CREATE TABLE instruments (
id bigserial PRIMARY KEY,
instr_type instrument_type NOT NULL,
...
);
CREATE TABLE countries_currencies (
...
curr bigint NOT NULL
REFERENCES instruments (id)
ON UPDATE CASCADE ON DELETE CASCADE
CHECK (get_instrument_type(curr) = 'Currency'),
...
);
As you can see, I use one common table for instruments. There are a lot of foreign keys referencing to that table. But some tables like countries_currencies require that referenced item is 'Currency'. Since I can't use subqueries in CHECK constraints, I have to use function.
One day it could happen that one bad man will change instrument_type from 'Currency' to something else. If there is a row in table countries_currencies, referencing to modified instrument, CHECK will become invalid for this row. But CHECK will be applied to new rows, not for already existing.
Is there any standard way to revalidate CHECKs? I want to run such procedure as a part of general data integrity test.
P.S. I know, I could write trigger on table instruments and forbid change if something could become broken. But it requires assurance that I check all referencing tables and their constraints, so it is error prone anyway.
You could simply update all rows in place to trigger the CHECK:
UPDATE countries_currencies SET curr = curr;

postgres autoincrement not updated on explicit id inserts

I have the following table in postgres:
CREATE TABLE "test" (
"id" serial NOT NULL PRIMARY KEY,
"value" text
)
I am doing following insertions:
insert into test (id, value) values (1, 'alpha')
insert into test (id, value) values (2, 'beta')
insert into test (value) values ('gamma')
In the first 2 inserts I am explicitly mentioning the id. However the table's auto increment pointer is not updated in this case. Hence in the 3rd insert I get the error:
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (id)=(1) already exists.
I never faced this problem in Mysql in both MyISAM and INNODB engines. Explicit or not, mysql always update autoincrement pointer based on the max row id.
What is the workaround for this problem in postgres? I need it because I want a tighter control for some ids in my table.
UPDATE:
I need it because for some values I need to have a fixed id. For other new entries I dont mind creating new ones.
I think it may be possible by manually incrementing the nextval pointer to max(id) + 1 whenever I am explicitly inserting the ids. But I am not sure how to do that.
That's how it's supposed to work - next_val('test_id_seq') is only called when the system needs a value for this column and you have not provided one. If you provide value no such call is performed and consequently the sequence is not "updated".
You could work around this by manually setting the value of the sequence after your last insert with explicitly provided values:
SELECT setval('test_id_seq', (SELECT MAX(id) from "test"));
The name of the sequence is autogenerated and is always tablename_columnname_seq.
In the recent version of Django, this topic is discussed in the documentation:
Django uses PostgreSQL’s SERIAL data type to store auto-incrementing
primary keys. A SERIAL column is populated with values from a sequence
that keeps track of the next available value. Manually assigning a
value to an auto-incrementing field doesn’t update the field’s
sequence, which might later cause a conflict.
Ref: https://docs.djangoproject.com/en/dev/ref/databases/#manually-specified-autoincrement-pk
There is also management command manage.py sqlsequencereset app_label ... that is able to generate SQL statements for resetting sequences for the given app name(s)
Ref: https://docs.djangoproject.com/en/dev/ref/django-admin/#django-admin-sqlsequencereset
For example these SQL statements were generated by manage.py sqlsequencereset my_app_in_my_project:
BEGIN;
SELECT setval(pg_get_serial_sequence('"my_project_aaa"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_aaa";
SELECT setval(pg_get_serial_sequence('"my_project_bbb"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_bbb";
SELECT setval(pg_get_serial_sequence('"my_project_ccc"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_ccc";
COMMIT;
It can be done automatically using a trigger. This way you are sure that the largest value is always used as the next default value.
CREATE OR REPLACE FUNCTION set_serial_id_seq()
RETURNS trigger AS
$BODY$
BEGIN
EXECUTE (FORMAT('SELECT setval(''%s_%s_seq'', (SELECT MAX(%s) from %s));',
TG_TABLE_NAME,
TG_ARGV[0],
TG_ARGV[0],
TG_TABLE_NAME));
RETURN OLD;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER set_mytable_id_seq
AFTER INSERT OR UPDATE OR DELETE
ON mytable
FOR EACH STATEMENT
EXECUTE PROCEDURE set_serial_id_seq('mytable_id');
The function can be reused for multiple tables. Change "mytable" to the table of interest.
For more info regarding triggers:
https://www.postgresql.org/docs/9.1/plpgsql-trigger.html
https://www.postgresql.org/docs/9.1/sql-createtrigger.html

Manual inserts on a postgres table with a primary key sequence

I'm converting a MySQL table to PostgreSQL for the first time in my life and running into the traditional newbie problem of having no auto_increment.
Now I've found out that the postgres solution is to use a sequence and then request the nextval() of this sequence as the default value every time you insert. I've also read that the SERIAL type creates a sequence and a primary key automatically, and that nextval() increments the counter even when called inside transactions to avoid locking the sequence.
What I can't find addressed is the issue of what happens when you manually insert values into a field with a UNIQUE or PRIMARY constraint and a nextval() of a sequence as default. As far as I can see, this causes the INSERT to fail when the sequence reaches that value.
Is there a simple (or common) way to fix this ?
A clear explanation would be very much appreciated.
Update: If you feel I shouldn't do this, will never be able to fix this or am making some flawed assumptions, please feel free to point them out in your answers. Above all, please tell me what to do instead to offer programmers a stable and robust database that can't be corrupted with a simple insert (preferably without hiding everything behind stored procedures)
If you're migrating your data then I would drop the sequence constraint on the column, perform all of your inserts, use setval() to set the sequence to the maximum value of your data and then reinstate your column sequence nextval() default.
You can create a trigger which will check if currval('id_sequence_name')>=NEW.id.
If your transaction did not use default value or nextval('id_sequence_name'), then a currval function will throw an error, as it works only when sequence was updated in current session. If you use nextval and then try to insert bigger primary key then it will throw another error. A transaction will be then aborted.
This would prevent inserting any bad primary keys which would break serial.
Example code:
create table test (id serial primary key, value text);
create or replace function test_id_check() returns trigger language plpgsql as
$$ begin
if ( currval('test_id_seq')<NEW.id ) then
raise exception 'currval(test_id_seq)<id';
end if;
return NEW;
end; $$;
create trigger test_id_seq_check before insert or update of id on test
for each row execute procedure test_id_check();
Then inserting with default primary key will work fine:
insert into test(value) values ('a'),('b'),('c'),('d');
But inserting too big primary key will error out and abort:
insert into test(id, value) values (10,'z');
To expand on Tometzky's great answer, here is a more general version:
CREATE OR REPLACE FUNCTION check_serial() RETURNS trigger AS $$
BEGIN
IF currval(TG_TABLE_SCHEMA || '.' || TG_TABLE_NAME || '_' || TG_ARGV[0] || '_seq') <
(row_to_json(NEW)->>TG_ARGV[0])::bigint
THEN RAISE SQLSTATE '55000'; -- same as currval() of uninitialized sequence
END IF;
RETURN NULL;
EXCEPTION
WHEN SQLSTATE '55000'
THEN RAISE 'manual entry of serial field %.%.% disallowed',
TG_TABLE_SCHEMA, TG_TABLE_NAME, TG_ARGV[0]
USING HINT = 'use DEFAULT instead of specifying value manually',
SCHEMA = TG_TABLE_SCHEMA, TABLE = TG_TABLE_NAME, COLUMN = TG_ARGV[0];
END;
$$ LANGUAGE plpgsql;
Which you can apply to any column, say test.id, thusly:
CREATE CONSTRAINT TRIGGER test_id_check
AFTER INSERT OR UPDATE OF id ON test
FOR EACH ROW EXECUTE PROCEDURE check_serial(id);
I don't exactly understand you question, but if your goal is just to do the insert, and have a valid field (e.g. an id), then insert the values without the id field, that's what "default" stands for. It will work.
E.g. havin a id serial NOT NULL and a CONSTRAINT table_pkey PRIMARY KEY(id) in the table definition will auto-set the id and auto-increment a sequence table_id_seq.
What about using a CHECK?
CREATE SEQUENCE pk_test
INCREMENT 1
MINVALUE 1
MAXVALUE 9223372036854775807
START 1
CACHE 1;
CREATE TABLE test (
id INT PRIMARY KEY CHECK (id=currval('pk_test')) DEFAULT nextval('pk_test'),
num int not null
);
ALTER SEQUENCE pk_test OWNED BY test.id;
-- Testing:
INSERT INTO test (num) VALUES (3) RETURNING id, num;
1,3 -- OK
2,3 -- OK
INSERT INTO test (id, num) values (30,3) RETURNING id, num;
/*
ERROR: new row for relation "test" violates check constraint "test_id_check"
DETAIL: Failing row contains (30, 3).
********** Error **********
ERROR: new row for relation "test" violates check constraint "test_id_check"
SQL state: 23514
Detail: Failing row contains (30, 3).
*/
DROP TABLE test;