Prevent empty strings in CHARACTER VARYING field - postgresql

I am using PostgreSQL and would like to prevent certain required CHARACTER VARYING (VARCHAR) fields from allowing empty string inputs.
These fields would also need to contain unique values, so I am already using a unique constraint; however, this does not prevent an original (unique) empty value.
Basic example, where username needs to be unique and not empty
| id | username | password |
+----+----------+----------+
| 1 | User1 | pw1 | #Allowed
| 2 | User2 | pw1 | #Allowed
| 3 | User2 | pw2 | #Already prevented by constraint
| 4 | '' | pw2 | #Currently allowed, but needs to be prevented

Use a check constraint:
CREATE TABLE foobar(
x TEXT NOT NULL UNIQUE,
CHECK (x <> '')
);
INSERT INTO foobar(x) VALUES('');

You can use the standard SQL 'CONSTRAINT...CHECK' clause when defining table fields:
CREATE TABLE test
(
nonempty VARCHAR NOT NULL UNIQUE CONSTRAINT non_empty CHECK(length(nonempty)>0)
)

As a special kind of constraint, you can put the datatype+constraint into a DOMAIN:
-- set search_path='tmp';
DROP DOMAIN birthdate CASCADE;
CREATE DOMAIN birthdate AS date DEFAULT NULL
CHECK (value >= '1900-01-01' AND value <= now())
;
DROP DOMAIN username CASCADE;
CREATE DOMAIN username AS VARCHAR NOT NULL
CHECK (length(value) > 0)
;
DROP TABLE employee CASCADE;
CREATE TABLE employee
( empno INTEGER NOT NULL PRIMARY KEY
, dob birthdate
, zname username
, UNIQUE (zname)
);
INSERT INTO employee(empno,dob,zname)
VALUES (1,'1980-02-02', 'John Doe' ), (2,'1980-02-02', 'Jon Doeh' );
INSERT INTO employee(empno,dob,zname)
VALUES (3,'1980-02-02', '' ), (4,'1980-01-01', 'Joan Doh' );
This will allow you to reuse the domain again and again, without having to copy the constraint every time.
-- UPDATE 2021-03-25 (Thanks to #AlexanderPavlov)
There appears to be a serious flaw in Postgres's implementation: it is possible to insert NULLs from the results of an empty scalar subquery.
The (nonsensical) COALESCE() below "fixes" this behaviour.
This allows us to put the database into a forbidden state.
\echo literal NULL
INSERT INTO employee(empno,dob,zname) VALUES (5,'2021-02-02', NULL );
\echo empty (scalar) set
INSERT INTO employee(empno,dob,zname) VALUES (6,'2021-02-02', (select zname from employee where 1=0) );
\echo empty COALESCE((scalar, NULL) ) set
INSERT INTO employee(empno,dob,zname) VALUES (7,'2021-02-02', (select COALESCE(zname,NULL) from employee where 1=0) );
\echo empty set#2
INSERT INTO employee(empno,dob,zname) (select 8,'2021-03-03', zname from employee where 1=0 );
\echo duplicate the complete table
INSERT INTO employee(empno,dob,zname) (select 100+empno,dob+'1mon':: interval, upper(zname) from employee );
select * from employee;
Extra Results:
literal NULL
ERROR: domain username does not allow null values
empty (scalar) set
INSERT 0 1
empty COALESCE((scalar, NULL) ) set
ERROR: domain username does not allow null values
empty set#2
INSERT 0 0
duplicate the complete table
ERROR: domain username does not allow null values
empno | dob | zname
-------+------------+----------
1 | 1980-02-02 | John Doe
2 | 1980-02-02 | Jon Doeh
6 | 2021-02-02 |
(3 rows)

Related

PostgreSQL - Loop Over Rows to Fill NULL Values

I have a table named players which has the following data
+------+------------+
| id | username |
|------+------------|
| 1 | mike93 |
| 2 | james_op |
| 3 | will_sniff |
+------+------------+
desired result:
+------+------------+------------+
| id | username | uniqueId |
|------+------------+------------|
| 1 | mike93 | PvS3T5 |
| 2 | james_op | PqWN7C |
| 3 | will_sniff | PHtPrW |
+------+------------+------------+
I need to create a new column called uniqueId. This value is different than the default serial numeric value. uniqueId is a unique, NOT NULL, 6 characters long text with the prefix "P".
In my migration, here's the code I have so far:
ALTER TABLE players ADD COLUMN uniqueId varchar(6) UNIQUE;
(loop comes here)
ALTER TABLE players ALTER COLUMN uniqueId SET NOT NULL;
and here's the SQL code I use to generate these unique IDs
SELECT CONCAT('P', string_agg (substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', ceil (random() * 62)::integer, 1), ''))
FROM generate_series(1, 5);
So, in other words, I need to create the new column without the NOT NULL constraint, loop over every already existing row, fill the NULL value with a valid ID and eventually add the NOT NULL constraint.
In theory it should be enough to run:
update players
set unique_id = (SELECT CONCAT('P', string_agg ...))
;
However, Postgres will not re-evaluate the expression in the SELECT for every row, so this generates a unique constraint violation. One workaround is to create a function (which you might want to do anyway) that generates these fake IDs
create function generate_fake_id()
returns text
as
$$
SELECT CONCAT('P', string_agg (substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', ceil (random() * 62)::integer, 1), ''))
FROM generate_series(1, 5)
$$
language sql
volatile;
Then you can update your table using:
update players
set unique_id = generate_fake_id()
;
Online example

Postgres lowercase column and delete duplicates

I have the following table:
Customers
---------
name text
object_id integer
created_time timestamp with time zone
Indexes:
"my_index" UNIQUE CONSTRAINT, btree (name, object_id, created_time)
The unique index works fine but then I ended up with duplicate data like:
Name | object_id | created_time
------------------------------------
john | 1 | 2018-02-28 15:42:14.30573+00
JOHN | 1 | 2018-02-28 15:42:14.30573+00
So I tried to lowercase all my data in the name column with:
UPDATE customers SET name=lower(name) WHERE name != LOWER(name);
But this procedure generated an error because now I would be violating the index:
ERROR: duplicate key value violates unique constraint "my_index"
DETAIL: Key (name, object_id, created_time)=(john, 1, 2018-02-28 15:42:14.30573+00) already exists.
What kind of procedure could I use to delete rows that after casting to lowercase generate an index violation ?
If you have 'JOHN' and 'John' in you table but not 'john' it gets messy. here's one solution.
insert into customers
select distinct lower("name") ,object_id,created_time from customers
where name <> lower(name)
and not (lower("name") ,object_id,created_time)
in (select * from customers);
delete from customers where name <> lower(name);
after that consider:
alter table customers alter column name type citext;

How use nextval in sequence using node-postgres

my table postgres database has an auto increment key.
In convencional postgres sql I can:
INSERT INTO CITY(ID,NAME) VALUES(nextval('city_sequence_name'),'my first city');
How can I do this using node-postgres ? Until now, I have tried:
myConnection.query('INSERT INTO city(id,name) VALUES ($1,$2)', ['nextval("city_sequence_name")','my first city'],callback);
But the error rules:
erro { error: sintaxe de entrada é inválida para integer: "nextval("city_sequence_name")"
So, I was able to identify the solution to this case:
connection.query("INSERT INTO city(id,name) VALUES(nextval('sequence_name'),$1)", ['first city'],callback);
This is a bad practice. Instead, just set the column to DEFAULT nextval() or use the serial type.
# CREATE TABLE city ( id serial PRIMARY KEY, name text );
CREATE TABLE
# \d city
Table "public.city"
Column | Type | Modifiers
--------+---------+---------------------------------------------------
id | integer | not null default nextval('city_id_seq'::regclass)
name | text |
Indexes:
"city_pkey" PRIMARY KEY, btree (id)
# INSERT INTO city (name) VALUES ('Houston'), ('Austin');
INSERT 0 2
test=# TABLE city;
id | name
----+---------
1 | Houston
2 | Austin
(2 rows)

How to insert values from another table in PostgreSQL?

I have a table which references other tables:
CREATE TABLE scratch
(
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
rep_id INT NOT NULL REFERENCES reps,
term_id INT REFERENCES terms
);
CREATE TABLE reps (
id SERIAL PRIMARY KEY,
rep TEXT NOT NULL UNIQUE
);
CREATE TABLE terms (
id SERIAL PRIMARY KEY,
terms TEXT NOT NULL UNIQUE
);
I wish to add a new record to scratch given the name, the rep and the terms values, i.e. I have neither corresponding rep_id nor term_id.
Right now the only idea that I have is:
insert into scratch (name, rep_id, term_id)
values ('aaa', (select id from reps where rep='Dracula' limit 1), (select id from terms where terms='prepaid' limit 1));
My problem is this. I am trying to use the parameterized query API (from node using the node-postgres package), where an insert query looks like this:
insert into scratch (name, rep_id, term_id) values ($1, $2, $3);
and then an array of values for $1, $2 and $3 is passed as a separate argument. At the end, when I am comfortable with the parameterized queries the idea is to promote them to prepared statements to utilize the most efficient and safest way to query the database.
However, I am puzzled how can I do this with my example, where different tables have to be subqueried.
P.S. I am using PostgreSQL 9.2 and have no problem with a PostgreSQL specific solution.
EDIT 1
C:\Users\markk>psql -U postgres
psql (9.2.4)
WARNING: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
Type "help" for help.
postgres=# \c dummy
WARNING: Console code page (437) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
You are now connected to database "dummy" as user "postgres".
dummy=# DROP TABLE scratch;
DROP TABLE
dummy=# CREATE TABLE scratch
dummy-# (
dummy(# id SERIAL NOT NULL PRIMARY KEY,
dummy(# name text NOT NULL UNIQUE,
dummy(# rep_id integer NOT NULL,
dummy(# term_id integer
dummy(# );
NOTICE: CREATE TABLE will create implicit sequence "scratch_id_seq" for serial column "scratch.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "scratch_pkey" for table "scratch"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "scratch_name_key" for table "scratch"
CREATE TABLE
dummy=# DEALLOCATE insert_scratch;
ERROR: prepared statement "insert_scratch" does not exist
dummy=# PREPARE insert_scratch (text, text, text) AS
dummy-# INSERT INTO scratch (name, rep_id, term_id)
dummy-# SELECT $1, r.id, t.id
dummy-# FROM reps r, terms t
dummy-# WHERE r.rep = $2 AND t.terms = $3
dummy-# RETURNING id, name, $2 rep, $3 terms;
PREPARE
dummy=# DEALLOCATE insert_scratch2;
ERROR: prepared statement "insert_scratch2" does not exist
dummy=# PREPARE insert_scratch2 (text, text, text) AS
dummy-# INSERT INTO scratch (name, rep_id, term_id)
dummy-# VALUES ($1, (SELECT id FROM reps WHERE rep=$2 LIMIT 1), (SELECT id FROM terms WHERE terms=$3 LIMIT 1))
dummy-# RETURNING id, name, $2 rep, $3 terms;
PREPARE
dummy=# EXECUTE insert_scratch ('abc', 'Snowhite', '');
id | name | rep | terms
----+------+-----+-------
(0 rows)
INSERT 0 0
dummy=# EXECUTE insert_scratch2 ('abc', 'Snowhite', '');
id | name | rep | terms
----+------+----------+-------
1 | abc | Snowhite |
(1 row)
INSERT 0 1
dummy=# EXECUTE insert_scratch ('abcd', 'Snowhite', '30 days');
id | name | rep | terms
----+------+----------+---------
2 | abcd | Snowhite | 30 days
(1 row)
INSERT 0 1
dummy=# EXECUTE insert_scratch2 ('abcd2', 'Snowhite', '30 days');
id | name | rep | terms
----+-------+----------+---------
3 | abcd2 | Snowhite | 30 days
(1 row)
INSERT 0 1
dummy=#
EDIT 2
We can utilize the fact that rep_id is required, even though terms_id is optional and use the following version of INSERT-SELECT:
PREPARE insert_scratch (text, text, text) AS
INSERT INTO scratch (name, rep_id, term_id)
SELECT $1, r.id, t.id
FROM reps r
LEFT JOIN terms t ON t.terms = $3
WHERE r.rep = $2
RETURNING id, name, $2 rep, $3 terms;
This version, however, has two problems:
No distinction is made between a missing terms value (i.e. '') and an invalid terms value (i.e. a non empty value missing from the terms table entirely). Both are treated as missing terms. (But the INSERT with two subqueries suffers from the same problem)
The version depends on the fact that the rep is required. But what if rep_id was optional too?
EDIT 3
Found the solution for the item 2 - eliminating dependency on rep being required. Plus using the WHERE statement has the problem that the sql does not fail if the rep is invalid - it just inserts 0 rows, whereas I want to fail explicitly in this case. My solution is simply using a dummy one row CTE:
PREPARE insert_scratch (text, text, text) AS
WITH stub(x) AS (VALUES (0))
INSERT INTO scratch (name, rep_id, term_id)
SELECT $1, r.id, t.id
FROM stub
LEFT JOIN terms t ON t.terms = $3
LEFT JOIN reps r ON r.rep = $2
RETURNING id, name, rep_id, term_id;
If rep is missing or invalid, this sql will try to insert NULL into the rep_id field and since the field is NOT NULL an error would be raised - precisely what I need. And if further I decide to make rep optional - no problem, the same SQL works for that too.
INSERT into scratch (name, rep_id, term_id)
SELECT 'aaa'
, r.id
, t.id
FROM reps r , terms t -- essentially a cross join
WHERE r.rep = 'Dracula'
AND t.terms = 'prepaid'
;
Notes:
You don't need the ugly LIMITs, since r.rep and t.terms are unique (candidate keys)
you could replace the FROM a, b by a FROM a CROSS JOIN b
the scratch table will probably need an UNIQUE constraint on (rep_id, term_it) (the nullability of term_id is questionable)
UPDATE: the same as prepared query as found in the Documentation
PREPARE hoppa (text, text,text) AS
INSERT into scratch (name, rep_id, term_id)
SELECT $1 , r.id , t.id
FROM reps r , terms t -- essentially a cross join
WHERE r.rep = $2
AND t.terms = $3
;
EXECUTE hoppa ('bbb', 'Dracula' , 'prepaid' );
SELECT * FROM scratch;
UPDATE2: test data
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE reps ( id SERIAL PRIMARY KEY, rep TEXT NOT NULL UNIQUE);
CREATE TABLE terms ( id SERIAL PRIMARY KEY, terms TEXT NOT NULL UNIQUE);
CREATE TABLE scratch ( id SERIAL PRIMARY KEY, name TEXT NOT NULL, rep_id INT NOT NULL REFERENCES reps, term_id INT REFERENCES terms);
INSERT INTO reps(rep) VALUES( 'Dracula' );
INSERT INTO terms(terms) VALUES( 'prepaid' );
Results:
NOTICE: drop cascades to 3 other objects
DETAIL: drop cascades to table tmp.reps
drop cascades to table tmp.terms
drop cascades to table tmp.scratch
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
CREATE TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
INSERT 0 1
PREPARE
INSERT 0 1
id | name | rep_id | term_id
----+------+--------+---------
1 | aaa | 1 | 1
2 | bbb | 1 | 1
(2 rows)

What's the PostgreSQL datatype equivalent to MySQL AUTO INCREMENT?

I'm switching from MySQL to PostgreSQL and I was wondering how can I have an INT column with AUTO INCREMENT. I saw in the PostgreSQL docs a datatype called SERIAL, but I get syntax errors when using it.
Yes, SERIAL is the equivalent function.
CREATE TABLE foo (
id SERIAL,
bar varchar
);
INSERT INTO foo (bar) VALUES ('blah');
INSERT INTO foo (bar) VALUES ('blah');
SELECT * FROM foo;
+----------+
| 1 | blah |
+----------+
| 2 | blah |
+----------+
SERIAL is just a create table time macro around sequences. You can not alter SERIAL onto an existing column.
You can use any other integer data type, such as smallint.
Example :
CREATE SEQUENCE user_id_seq;
CREATE TABLE user (
user_id smallint NOT NULL DEFAULT nextval('user_id_seq')
);
ALTER SEQUENCE user_id_seq OWNED BY user.user_id;
Better to use your own data type, rather than user serial data type.
If you want to add sequence to id in the table which already exist you can use:
CREATE SEQUENCE user_id_seq;
ALTER TABLE user ALTER user_id SET DEFAULT NEXTVAL('user_id_seq');
Starting with Postgres 10, identity columns as defined by the SQL standard are also supported:
create table foo
(
id integer generated always as identity
);
creates an identity column that can't be overridden unless explicitly asked for. The following insert will fail with a column defined as generated always:
insert into foo (id)
values (1);
This can however be overruled:
insert into foo (id) overriding system value
values (1);
When using the option generated by default this is essentially the same behaviour as the existing serial implementation:
create table foo
(
id integer generated by default as identity
);
When a value is supplied manually, the underlying sequence needs to be adjusted manually as well - the same as with a serial column.
An identity column is not a primary key by default (just like a serial column). If it should be one, a primary key constraint needs to be defined manually.
Whilst it looks like sequences are the equivalent to MySQL auto_increment, there are some subtle but important differences:
1. Failed Queries Increment The Sequence/Serial
The serial column gets incremented on failed queries. This leads to fragmentation from failed queries, not just row deletions. For example, run the following queries on your PostgreSQL database:
CREATE TABLE table1 (
uid serial NOT NULL PRIMARY KEY,
col_b integer NOT NULL,
CHECK (col_b>=0)
);
INSERT INTO table1 (col_b) VALUES(1);
INSERT INTO table1 (col_b) VALUES(-1);
INSERT INTO table1 (col_b) VALUES(2);
SELECT * FROM table1;
You should get the following output:
uid | col_b
-----+-------
1 | 1
3 | 2
(2 rows)
Notice how uid goes from 1 to 3 instead of 1 to 2.
This still occurs if you were to manually create your own sequence with:
CREATE SEQUENCE table1_seq;
CREATE TABLE table1 (
col_a smallint NOT NULL DEFAULT nextval('table1_seq'),
col_b integer NOT NULL,
CHECK (col_b>=0)
);
ALTER SEQUENCE table1_seq OWNED BY table1.col_a;
If you wish to test how MySQL is different, run the following on a MySQL database:
CREATE TABLE table1 (
uid int unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
col_b int unsigned NOT NULL
);
INSERT INTO table1 (col_b) VALUES(1);
INSERT INTO table1 (col_b) VALUES(-1);
INSERT INTO table1 (col_b) VALUES(2);
You should get the following with no fragementation:
+-----+-------+
| uid | col_b |
+-----+-------+
| 1 | 1 |
| 2 | 2 |
+-----+-------+
2 rows in set (0.00 sec)
2. Manually Setting the Serial Column Value Can Cause Future Queries to Fail.
This was pointed out by #trev in a previous answer.
To simulate this manually set the uid to 4 which will "clash" later.
INSERT INTO table1 (uid, col_b) VALUES(5, 5);
Table data:
uid | col_b
-----+-------
1 | 1
3 | 2
5 | 5
(3 rows)
Run another insert:
INSERT INTO table1 (col_b) VALUES(6);
Table data:
uid | col_b
-----+-------
1 | 1
3 | 2
5 | 5
4 | 6
Now if you run another insert:
INSERT INTO table1 (col_b) VALUES(7);
It will fail with the following error message:
ERROR: duplicate key value violates unique constraint "table1_pkey"
DETAIL: Key (uid)=(5) already exists.
In contrast, MySQL will handle this gracefully as shown below:
INSERT INTO table1 (uid, col_b) VALUES(4, 4);
Now insert another row without setting uid
INSERT INTO table1 (col_b) VALUES(3);
The query doesn't fail, uid just jumps to 5:
+-----+-------+
| uid | col_b |
+-----+-------+
| 1 | 1 |
| 2 | 2 |
| 4 | 4 |
| 5 | 3 |
+-----+-------+
Testing was performed on MySQL 5.6.33, for Linux (x86_64) and PostgreSQL 9.4.9
Sorry, to rehash an old question, but this was the first Stack Overflow question/answer that popped up on Google.
This post (which came up first on Google) talks about using the more updated syntax for PostgreSQL 10:
https://blog.2ndquadrant.com/postgresql-10-identity-columns/
which happens to be:
CREATE TABLE test_new (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
);
Hope that helps :)
You have to be careful not to insert directly into your SERIAL or sequence field, otherwise your write will fail when the sequence reaches the inserted value:
-- Table: "test"
-- DROP TABLE test;
CREATE TABLE test
(
"ID" SERIAL,
"Rank" integer NOT NULL,
"GermanHeadword" "text" [] NOT NULL,
"PartOfSpeech" "text" NOT NULL,
"ExampleSentence" "text" NOT NULL,
"EnglishGloss" "text"[] NOT NULL,
CONSTRAINT "PKey" PRIMARY KEY ("ID", "Rank")
)
WITH (
OIDS=FALSE
);
-- ALTER TABLE test OWNER TO postgres;
INSERT INTO test("Rank", "GermanHeadword", "PartOfSpeech", "ExampleSentence", "EnglishGloss")
VALUES (1, '{"der", "die", "das", "den", "dem", "des"}', 'art', 'Der Mann küsst die Frau und das Kind schaut zu', '{"the", "of the" }');
INSERT INTO test("ID", "Rank", "GermanHeadword", "PartOfSpeech", "ExampleSentence", "EnglishGloss")
VALUES (2, 1, '{"der", "die", "das"}', 'pron', 'Das ist mein Fahrrad', '{"that", "those"}');
INSERT INTO test("Rank", "GermanHeadword", "PartOfSpeech", "ExampleSentence", "EnglishGloss")
VALUES (1, '{"der", "die", "das"}', 'pron', 'Die Frau, die nebenen wohnt, heißt Renate', '{"that", "who"}');
SELECT * from test;
In the context of the asked question and in reply to the comment by #sereja1c, creating SERIAL implicitly creates sequences, so for the above example-
CREATE TABLE foo (id SERIAL,bar varchar);
CREATE TABLE would implicitly create sequence foo_id_seq for serial column foo.id. Hence, SERIAL [4 Bytes] is good for its ease of use unless you need a specific datatype for your id.
Since PostgreSQL 10
CREATE TABLE test_new (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
payload text
);
This way will work for sure, I hope it helps:
CREATE TABLE fruits(
id SERIAL PRIMARY KEY,
name VARCHAR NOT NULL
);
INSERT INTO fruits(id,name) VALUES(DEFAULT,'apple');
or
INSERT INTO fruits VALUES(DEFAULT,'apple');
You can check this the details in the next link:
http://www.postgresqltutorial.com/postgresql-serial/
Create Sequence.
CREATE SEQUENCE user_role_id_seq
INCREMENT 1
MINVALUE 1
MAXVALUE 9223372036854775807
START 3
CACHE 1;
ALTER TABLE user_role_id_seq
OWNER TO postgres;
and alter table
ALTER TABLE user_roles ALTER COLUMN user_role_id SET DEFAULT nextval('user_role_id_seq'::regclass);