I have an application that I'm using to insert some data the db. This application has a field to put an SQL like:
INSERT INTO public.test_table ("message") VALUES (%s::text) # %s will be used as a parameter in each iteration
What I want to check, is how this application behave in case of a deadlock. So, my question is how to deadlock this INSERT query. What should I run to make this happen?
I'm using this table:
CREATE TABLE public."test_table" (
"number" integer NOT NULL GENERATED ALWAYS AS IDENTITY,
"date" time with time zone NOT NULL DEFAULT NOW(),
"message" text,
PRIMARY KEY ("number"));
While I was using MariaDB I managed to create a lock timeout using:
START TRANSACTION;
UPDATE test_table SET message = 'foo';
INSERT INTO test_table (message) VALUES ('test');
DO SLEEP(60);
COMMIT;
But in PostgreSQL this doesn't sent even create a lock timeout.
EDIT:
Let's say I add this one in the application, is it possible to get a deadlock using this:
BEGIN;
INSERT INTO public.test_table ("message") VALUES (%s::text);
I don't think you can force a deadlock with INSERTs given the table definition you have as the primary key value is generated automatically. But if you use a manually assigned PK value (or any other unique constraint) you can get deadlock when inserting the same unique values in different transactions
create table test_table
(
id integer primary key,
code varchar(10) not null unique
);
it is possible following the usual approach for deadlocks: interleaving locking of multiple resources in different order.
The following will result in a deadlock in step #4
Step | Transaction 1 | Transaction 2
----------|-----------------------------|----------------------------------
#1 | insert into test_table |
| (id, code) |
| values |
| (1, 'one'), |
| (2, 'two'); |
----------|-----------------------------|----------------------------------
#2 | | insert into test_table
| | (id, code)
| | values
| | (3, 'three');
----------|-----------------------------|----------------------------------
| -- this waits |
#3 | insert into test_table |
| (id, code) |
| values |
| (3, 'three'); |
----------|-----------------------------|----------------------------------
#4 | | -- this results in a deadlock
| | insert into test_table
| | (id, code)
| | values
| | (2, 'two');
There are an infinite number of ways you could change it to create a deadlock, but most of those ways would be to essentially throw it away and start over with something else entirely. If you want to make as few changes as possible, then I suppose it look something like putting a unique index on "message", then doing:
BEGIN;
INSERT INTO public.test_table ("message") VALUES ('a');
INSERT INTO public.test_table ("message") VALUES ('b');
but you would have to run these in two different sessions at the same time, with the order of 'a' and 'b' reversed in one of them.
Related
I have a table named players which has the following data
+------+------------+
| id | username |
|------+------------|
| 1 | mike93 |
| 2 | james_op |
| 3 | will_sniff |
+------+------------+
desired result:
+------+------------+------------+
| id | username | uniqueId |
|------+------------+------------|
| 1 | mike93 | PvS3T5 |
| 2 | james_op | PqWN7C |
| 3 | will_sniff | PHtPrW |
+------+------------+------------+
I need to create a new column called uniqueId. This value is different than the default serial numeric value. uniqueId is a unique, NOT NULL, 6 characters long text with the prefix "P".
In my migration, here's the code I have so far:
ALTER TABLE players ADD COLUMN uniqueId varchar(6) UNIQUE;
(loop comes here)
ALTER TABLE players ALTER COLUMN uniqueId SET NOT NULL;
and here's the SQL code I use to generate these unique IDs
SELECT CONCAT('P', string_agg (substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', ceil (random() * 62)::integer, 1), ''))
FROM generate_series(1, 5);
So, in other words, I need to create the new column without the NOT NULL constraint, loop over every already existing row, fill the NULL value with a valid ID and eventually add the NOT NULL constraint.
In theory it should be enough to run:
update players
set unique_id = (SELECT CONCAT('P', string_agg ...))
;
However, Postgres will not re-evaluate the expression in the SELECT for every row, so this generates a unique constraint violation. One workaround is to create a function (which you might want to do anyway) that generates these fake IDs
create function generate_fake_id()
returns text
as
$$
SELECT CONCAT('P', string_agg (substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', ceil (random() * 62)::integer, 1), ''))
FROM generate_series(1, 5)
$$
language sql
volatile;
Then you can update your table using:
update players
set unique_id = generate_fake_id()
;
Online example
In the context of datawarehousing, ETL process must have a strategy for error handling. About that, Oracle has a great dml error logging feature that lets you insert/merge/update a million records without failing or rolling back when constraint violation occurs with one or more rows, which can be logged in a dedicated error table. After that you can investigate what is wrong with each row and correct the errors before repeating the insert/merge/update.
Is there any way to implement this feature in Postgresql ?
Since there is nothing built in, nor any useful extension exists, I searched for a solution based on a pgsql procedure and eventually found it. It works well in my use case, where some csv files must be loaded once a month into a staging db using foreign tables.
In the following test some records are inserted into the destination tables while other records that break an integrity constraint are inserted in an error table along with the error info.
test=# create table t1(c1 int primary key);
create table t2( f1 int ,f2 int, f3 numeric);
insert into t1 values(2),(11),(5),(12);
insert into t2 values(100,2,234),(57,11,25),(5,5,1231),(2,2,173),(2,12,240),(11,22,101),(3,12,99);
create table t3 as select * from t2 where 1+1=11;
alter table t3 add constraint t3_pk primary key(f1),add foreign key (f2) references t1(c1),add constraint f3_ck check(f3>100);
create table t3$err(f1 int,f2 int,f3 numeric, error_code varchar, error_message varchar, constraint_name varchar);
test=# do
$$
declare
rec Record;
v_err_code text;
v_err_message text;
v_constraint text;
begin
for rec in
select f1,
f2,
f3
from t2 --in my use case this is the foreign table reading a csv file
loop
begin
insert
into t3
values (rec.f1,
rec.f2,
rec.f3);
exception
when others then
get stacked diagnostics
v_err_code= returned_sqlstate,
v_err_message= MESSAGE_TEXT,
v_constraint= CONSTRAINT_NAME;
if left(v_err_code, 2) = '23' then --exception Class 23 — Integrity Constraint Violation
insert
into t3$err
values (rec.f1,
rec.f2,
rec.f3,
v_err_code,
v_err_message,
v_constraint);
raise notice 'record % inserted in error table',rec;
end if;
end;
end loop;
exception
when others then --outer exceptions different from constraint violations
get stacked diagnostics
v_err_code= returned_sqlstate;
raise notice 'sqlstate: %', v_err_code;
end;
$$;
NOTICE: record (57,11,25) inserted in error table
NOTICE: record (2,12,240) inserted in error table
NOTICE: record (11,22,101) inserted in error table
NOTICE: record (3,12,99) inserted in error table
test=# select * from t3;
f1 | f2 | f3
-----+----+------
100 | 2 | 234
5 | 5 | 1231
2 | 2 | 173
(3 rows)
test=# select * from t3$err;
f1 | f2 | f3 | error_code | error_message | constraint_name
----+----+-----+------------+-----------------------------------------------------------------------------+-----------------
57 | 11 | 25 | 23514 | new row for relation "t3" violates check constraint "f3_ck" | f3_ck
2 | 12 | 240 | 23505 | duplicate key value violates unique constraint "t3_pk" | t3_pk
11 | 22 | 101 | 23503 | insert or update on table "t3" violates foreign key constraint "t3_f2_fkey" | t3_f2_fkey
3 | 12 | 99 | 23514 | new row for relation "t3" violates check constraint "f3_ck" | f3_ck
(4 rows)
All the magics is done within the nested BEGIN..END, where each row passing the constraints is inserted in the target table or else inserted in the error table.
The above solution has many limitations, such as:
the oracle feature mentioned in the question is fully integrated with SQL (except for the plsql preliminaries in order to create the error table), while here a pgsql procedure is needed,
iterating over all the records is not exactly the most efficient way for data loading in comparison with a bulk loading executed through a pgsql procedure,
moreover, the loop is accompanied by a overhead due to the context switch between the procedural environment and the sql environment,
the error handling is not generic but must be addressing specific errors
when a record has more than one error, only the last one is inserted in the error table (there could be a solution for this point).
I have 2 permanent tables in my PostgreSQL 12 database with a one-to-many relationship (thing, and thing_identifier). The second -- thing_identifier -- has a column referencing thing, such that thing_identifier can hold multiple, external identifiers for a given thing:
CREATE TABLE IF NOT EXISTS thing
(
thing_id SERIAL PRIMARY KEY,
thing_name TEXT, --this is not necessarily unique
thing_attribute TEXT --also not unique
);
CREATE TABLE IF NOT EXISTS thing_identifier
(
id SERIAL PRIMARY KEY,
thing_id integer references thing (thing_id),
identifier text
);
I need to insert some new data into thing and thing_identifier, both of which come from a table I created by using COPY to pull the contents of a large CSV file into the database, something like:
CREATE TABLE IF NOT EXISTS things_to_add
(
id SERIAL PRIMARY KEY,
guid TEXT, --a unique identifier used by the supplier
thing_name TEXT, --not unique
thing_attribute TEXT --also not unique
);
Sample data:
INSERT INTO things_to_add (guid, thing_name) VALUES
('[111-22-ABC]','Thing-a-ma-jig','pretty thing'),
('[999-88-XYZ]','Herk-a-ma-fob','blue thing');
The goal is to have each row in things_to_add result in one new row, each, in thing and thing_identifier, as in the following:
thing:
| thing_id | thing_name | thing attribute |
|----------|---------------------|-------------------|
| 1 | thing-a-ma-jig | pretty thing
| 2 | herk-a-ma-fob | blue thing
thing_identifier:
| id | thing_id | identifier |
|----|----------|------------------|
| 8 | 1 | '[111-22-ABC]' |
| 9 | 2 | '[999-88-XYZ]' |
I could use a CTE INSERTstatement (with RETURNING thing_id) to get the thing_id that results from the INSERT on thing, but I can't figure out how to get both that thing_id from the INSERT on thing and the original guid from things_to_add, which needs to go into thing_identifier.identifier.
Just to be clear, the only guaranteed unique column in thing is thing_id, and the only guaranteed unique column in things_to_add is id (which we don't want to store) and guid (which is what we want in thing_identifier.identifier), so there isn't any way to join thing and things_to_add after the INSERT on thing.
You can retrieve the thing_to_add.guid from a JOIN :
WITH list AS
(
INSERT INTO thing (thing_name)
SELECT thing_name
FROM things_to_add
RETURNING thing_id, thing_name
)
INSERT INTO thing_identifier (thing_id, identifier)
SELECT l.thing_id, t.guid
FROM list AS l
INNER JOIN thing_to_add AS t
ON l.thing_name = t.thing_name
Then, if thing.thing_name is not unique, the problem is more tricky. Updating both tables thing and thing_identifier from the same trigger on thing_to_add may solve the issue :
CREATE OR REPLACE FUNCTION after_insert_thing_to_add ()
RETURNS TRIGGER LANGUAGE sql AS
$$
WITH list AS
(
INSERT INTO thing (thing_name)
SELECT NEW.thing_name
RETURNING thing_id
)
INSERT INTO thing_identifier (thing_id, identifier)
SELECT l.thing_id, NEW.guid
FROM list AS l ;
$$
DROP TRIGGER IF EXISTS after_insert ON thing_to_add ;
CREATE TRIGGER after_insert
AFTER INSERT
ON thing_to_add
FOR EACH ROW
EXECUTE PROCEDURE after_insert_thing_to_add ();
I want to create a function that will update a column of type varchar to a preferred string that is referenced in the column of another table to help me clean this column more iteratively.
CREATE TABLE big_table (
mn_uid NUMERIC PRIMARY KEY,
user_name VARCHAR
);
INSERT INTO big_table VALUES
(1, 'DAVE'),
(2, 'Dave'),
(3, 'david'),
(4, 'Jak'),
(5, 'jack'),
(6, 'Jack'),
(7, 'Grant');
CREATE TABLE nameKey_table (
nk_uid NUMERIC PRIMARY KEY,
correct VARCHAR,
wrong VARCHAR
);
INSERT INTO nameKey_table VALUES
(1, 'David', 'Dave_DAVE_dave_DAVID_david'),
(2, 'Jack', 'JACK_jack_Jak_jak');
I want to perform the following procedure:
UPDATE big_table
SET user_name = (SELECT correct
FROM nameKey_table
WHERE wrong
LIKE '%DAVE%')
WHERE user_name = 'DAVE';
but looped over each user_name in big_table so that I have a function that can do something like this:
UPDATE big_table SET user_name = corrected_name_fn();
Here is my attempt to do something like this but I can't seem to get it to work:
CREATE FUNCTION corrected_name_fn() RETURNS VARCHAR AS $$
DECLARE entry RECORD;
DECLARE correct_name VARCHAR;
BEGIN
FOR entry IN SELECT DISTINCT user_name FROM big_table LOOP
EXECUTE 'SELECT correct
FROM nameKey_table
WHERE wrong
LIKE ''%$1%'''
INTO correct_name
USING entry;
RETURN correct_name;
END LOOP;
END;
$$ LANGUAGE plpgsql;
I want the final output in big_table to be:
| mn_uid | user_name |
| 1 | 'David' |
| 2 | 'David' |
| 3 | 'David' |
| 4 | 'Jack' |
| 5 | 'Jack' |
| 6 | 'Jack' |
| 7 | 'Grant' |
I realize rows 6 and 7 provide two unique cases that I want to build into the function with IF ELSE statements.
If user_name is in nameKey_table.correct, go to next
If user_name is not in nameKey_table.correct or does not match a string in nameKey_table.wrong, leave as is.
Thanks for any help on this!!
It sounds like you want a trigger on the table. Here is my suggestion:
CREATE OR REPLACE FUNCTION tf_fix_name() RETURNS TRIGGER AS
$$
DECLARE
corrected_name TEXT;
BEGIN
SELECT correct INTO corrected_name FROM nameKey_table WHERE expression ~* NEW.user_name;
IF FOUND THEN
NEW.user_name := corrected_name;
END IF;
RETURN NEW;
END;
$$
LANGUAGE plpgsql;
CREATE TEMP TABLE big_table (
mn_uid INT PRIMARY KEY,
user_name TEXT NOT NULL
);
CREATE TRIGGER trigger_fix_name
BEFORE INSERT
ON big_table
FOR EACH ROW
EXECUTE PROCEDURE tf_fix_name();
CREATE TEMP TABLE nameKey_table (
nk_uid INT PRIMARY KEY,
correct TEXT NOT NULL,
expression TEXT NOT NULL
);
INSERT INTO nameKey_table VALUES
(1, 'David', '(dave|david)'),
(2, 'Jack', '(jack|jak)');
INSERT INTO big_table VALUES
(1, 'DAVE'),
(2, 'Dave'),
(3, 'david'),
(4, 'Jak'),
(5, 'jack'),
(6, 'Jack'),
(7, 'Grant');
SELECT * FROM big_table;
+--------+-----------+
| mn_uid | user_name |
+--------+-----------+
| 1 | David |
| 2 | David |
| 3 | David |
| 4 | Jack |
| 5 | Jack |
| 6 | Jack |
| 7 | Grant |
+--------+-----------+
(7 rows)
Note: I think you can do what you want a lot easier with a case insensitive regular expression. And I also changed your primary keys to INTs. Not sure why they are numerics, but it doesn't really change the solutions. My solution was developed and tested on PostgreSQL 9.6.
You don't need a function; you can just update one table from the contents of another table:
UPDATE big_table dst
SET user_name = src.correct
FROM nameKey_table src
WHERE src.wrong LIKE '%' || dst.user_name || '%'
AND dst.user_name <> src.correct -- avoid idempotent updates
;
And if you need performance, dont rely on the LIKE operator, it cannot use indexes for leading %. Instead, use a lookup-table with one entry per row:
CREATE TABLE bad_spell (
correct VARCHAR,
wrong VARCHAR PRIMARY KEY -- This will cause an unique index to be created.
);
INSERT INTO bad_spell VALUES
('David', 'Dave')
,('David', 'DAVE')
,('David', 'dave')
,('David', 'DAVID')
,('David', 'david')
,('Jack', 'JACK')
,('Jack', 'jack')
,('Jack', 'Jak')
,('Jack', 'jak')
;
-- This indexes could be temporary
CREATE INDEX ON big_table(user_name);
-- EXPLAIN
UPDATE big_table dst
SET user_name = src.correct
FROM bad_spell src
WHERE dst.user_name = src.wrong
AND dst.user_name <> src.correct -- avoid idempotent updates
;
SELECT* FROM big_table
;
I'm switching from MySQL to PostgreSQL and I was wondering how can I have an INT column with AUTO INCREMENT. I saw in the PostgreSQL docs a datatype called SERIAL, but I get syntax errors when using it.
Yes, SERIAL is the equivalent function.
CREATE TABLE foo (
id SERIAL,
bar varchar
);
INSERT INTO foo (bar) VALUES ('blah');
INSERT INTO foo (bar) VALUES ('blah');
SELECT * FROM foo;
+----------+
| 1 | blah |
+----------+
| 2 | blah |
+----------+
SERIAL is just a create table time macro around sequences. You can not alter SERIAL onto an existing column.
You can use any other integer data type, such as smallint.
Example :
CREATE SEQUENCE user_id_seq;
CREATE TABLE user (
user_id smallint NOT NULL DEFAULT nextval('user_id_seq')
);
ALTER SEQUENCE user_id_seq OWNED BY user.user_id;
Better to use your own data type, rather than user serial data type.
If you want to add sequence to id in the table which already exist you can use:
CREATE SEQUENCE user_id_seq;
ALTER TABLE user ALTER user_id SET DEFAULT NEXTVAL('user_id_seq');
Starting with Postgres 10, identity columns as defined by the SQL standard are also supported:
create table foo
(
id integer generated always as identity
);
creates an identity column that can't be overridden unless explicitly asked for. The following insert will fail with a column defined as generated always:
insert into foo (id)
values (1);
This can however be overruled:
insert into foo (id) overriding system value
values (1);
When using the option generated by default this is essentially the same behaviour as the existing serial implementation:
create table foo
(
id integer generated by default as identity
);
When a value is supplied manually, the underlying sequence needs to be adjusted manually as well - the same as with a serial column.
An identity column is not a primary key by default (just like a serial column). If it should be one, a primary key constraint needs to be defined manually.
Whilst it looks like sequences are the equivalent to MySQL auto_increment, there are some subtle but important differences:
1. Failed Queries Increment The Sequence/Serial
The serial column gets incremented on failed queries. This leads to fragmentation from failed queries, not just row deletions. For example, run the following queries on your PostgreSQL database:
CREATE TABLE table1 (
uid serial NOT NULL PRIMARY KEY,
col_b integer NOT NULL,
CHECK (col_b>=0)
);
INSERT INTO table1 (col_b) VALUES(1);
INSERT INTO table1 (col_b) VALUES(-1);
INSERT INTO table1 (col_b) VALUES(2);
SELECT * FROM table1;
You should get the following output:
uid | col_b
-----+-------
1 | 1
3 | 2
(2 rows)
Notice how uid goes from 1 to 3 instead of 1 to 2.
This still occurs if you were to manually create your own sequence with:
CREATE SEQUENCE table1_seq;
CREATE TABLE table1 (
col_a smallint NOT NULL DEFAULT nextval('table1_seq'),
col_b integer NOT NULL,
CHECK (col_b>=0)
);
ALTER SEQUENCE table1_seq OWNED BY table1.col_a;
If you wish to test how MySQL is different, run the following on a MySQL database:
CREATE TABLE table1 (
uid int unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
col_b int unsigned NOT NULL
);
INSERT INTO table1 (col_b) VALUES(1);
INSERT INTO table1 (col_b) VALUES(-1);
INSERT INTO table1 (col_b) VALUES(2);
You should get the following with no fragementation:
+-----+-------+
| uid | col_b |
+-----+-------+
| 1 | 1 |
| 2 | 2 |
+-----+-------+
2 rows in set (0.00 sec)
2. Manually Setting the Serial Column Value Can Cause Future Queries to Fail.
This was pointed out by #trev in a previous answer.
To simulate this manually set the uid to 4 which will "clash" later.
INSERT INTO table1 (uid, col_b) VALUES(5, 5);
Table data:
uid | col_b
-----+-------
1 | 1
3 | 2
5 | 5
(3 rows)
Run another insert:
INSERT INTO table1 (col_b) VALUES(6);
Table data:
uid | col_b
-----+-------
1 | 1
3 | 2
5 | 5
4 | 6
Now if you run another insert:
INSERT INTO table1 (col_b) VALUES(7);
It will fail with the following error message:
ERROR: duplicate key value violates unique constraint "table1_pkey"
DETAIL: Key (uid)=(5) already exists.
In contrast, MySQL will handle this gracefully as shown below:
INSERT INTO table1 (uid, col_b) VALUES(4, 4);
Now insert another row without setting uid
INSERT INTO table1 (col_b) VALUES(3);
The query doesn't fail, uid just jumps to 5:
+-----+-------+
| uid | col_b |
+-----+-------+
| 1 | 1 |
| 2 | 2 |
| 4 | 4 |
| 5 | 3 |
+-----+-------+
Testing was performed on MySQL 5.6.33, for Linux (x86_64) and PostgreSQL 9.4.9
Sorry, to rehash an old question, but this was the first Stack Overflow question/answer that popped up on Google.
This post (which came up first on Google) talks about using the more updated syntax for PostgreSQL 10:
https://blog.2ndquadrant.com/postgresql-10-identity-columns/
which happens to be:
CREATE TABLE test_new (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
);
Hope that helps :)
You have to be careful not to insert directly into your SERIAL or sequence field, otherwise your write will fail when the sequence reaches the inserted value:
-- Table: "test"
-- DROP TABLE test;
CREATE TABLE test
(
"ID" SERIAL,
"Rank" integer NOT NULL,
"GermanHeadword" "text" [] NOT NULL,
"PartOfSpeech" "text" NOT NULL,
"ExampleSentence" "text" NOT NULL,
"EnglishGloss" "text"[] NOT NULL,
CONSTRAINT "PKey" PRIMARY KEY ("ID", "Rank")
)
WITH (
OIDS=FALSE
);
-- ALTER TABLE test OWNER TO postgres;
INSERT INTO test("Rank", "GermanHeadword", "PartOfSpeech", "ExampleSentence", "EnglishGloss")
VALUES (1, '{"der", "die", "das", "den", "dem", "des"}', 'art', 'Der Mann küsst die Frau und das Kind schaut zu', '{"the", "of the" }');
INSERT INTO test("ID", "Rank", "GermanHeadword", "PartOfSpeech", "ExampleSentence", "EnglishGloss")
VALUES (2, 1, '{"der", "die", "das"}', 'pron', 'Das ist mein Fahrrad', '{"that", "those"}');
INSERT INTO test("Rank", "GermanHeadword", "PartOfSpeech", "ExampleSentence", "EnglishGloss")
VALUES (1, '{"der", "die", "das"}', 'pron', 'Die Frau, die nebenen wohnt, heißt Renate', '{"that", "who"}');
SELECT * from test;
In the context of the asked question and in reply to the comment by #sereja1c, creating SERIAL implicitly creates sequences, so for the above example-
CREATE TABLE foo (id SERIAL,bar varchar);
CREATE TABLE would implicitly create sequence foo_id_seq for serial column foo.id. Hence, SERIAL [4 Bytes] is good for its ease of use unless you need a specific datatype for your id.
Since PostgreSQL 10
CREATE TABLE test_new (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
payload text
);
This way will work for sure, I hope it helps:
CREATE TABLE fruits(
id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL
);
INSERT INTO fruits(id,name) VALUES(DEFAULT,'apple');
or
INSERT INTO fruits VALUES(DEFAULT,'apple');
You can check this the details in the next link:
http://www.postgresqltutorial.com/postgresql-serial/
Create Sequence.
CREATE SEQUENCE user_role_id_seq
INCREMENT 1
MINVALUE 1
MAXVALUE 9223372036854775807
START 3
CACHE 1;
ALTER TABLE user_role_id_seq
OWNER TO postgres;
and alter table
ALTER TABLE user_roles ALTER COLUMN user_role_id SET DEFAULT nextval('user_role_id_seq'::regclass);