Data validation/constraint in Postgres DB - postgresql

I have a (likely) simple question about data validation in a Postgres DB.
I have the following table:
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------------+-----------------------+-----------+----------+---------+----------+--------------+-------------
id_number | integer | | not null | | plain | |
last_name | character varying(50) | | not null | | extended | |
first_name | character varying(50) | | not null | | extended | |
school | character varying(50) | | not null | | extended | |
district | character varying(50) | | not null | | extended | |
Code to create the table
CREATE TABLE students (
id_number INTEGER PRIMARY KEY NOT NULL,
last_name VARCHAR(50) NOT NULL,
first_name VARCHAR(50) NOT NULL,
school VARCHAR(50) NOT NULL,
district VARCHAR(50) NOT NULL);
I want to create a list of valid input strings (text) for a column and reject any other input.
For example: for the "districts" column, I want the only input allowed to be "district a," district b," or "district c."
I've read over the constraints documentation but don't see anything about text constraints or using "or."
Is this possible? If so, how would I do it?
Thanks

Right at the top of the linked documentation it discusses CHECK constraints, that's what you want here:
CREATE TABLE students (
...
district VARCHAR(50) NOT NULL CHECK (district in ('district a', 'district b', 'district c')
);
Alternatively, you could add a separate table with the districts and then use a FOREIGN KEY constraint to restrict the districts to only those in the districts table.
For this you'd have something like:
create table districts (
id integer not null primary key,
name varchar not null
)
and then:
CREATE TABLE students (
id_number INTEGER PRIMARY KEY NOT NULL,
last_name VARCHAR(50) NOT NULL,
first_name VARCHAR(50) NOT NULL,
school VARCHAR(50) NOT NULL,
district_id integer not null references districts(id)
)
and you'd JOIN to the districts table to get the district names.
Using a separate table would make it easier to get a list a possible districts, add new ones, remove old ones, and change the district's names. This would also be a more normalized approach, might be a little more work at the beginning but it is a big win later on.

Related

How to add a row in the postgres table when it is showing duplicate id error even though I haven't passed an id? [duplicate]

This question already has answers here:
How to reset Postgres' primary key sequence when it falls out of sync?
(33 answers)
Why do SQL id sequences go out of sync (specifically using Postgres)?
(2 answers)
Closed 3 days ago.
So, I generated a table called person from mockaroo of about 1000 rows.
Column | Type | Collation | Nullable | Default
------------------+-----------------------+-----------+----------+------------------------------------
id | bigint | | not null | nextval('person_id_seq'::regclass)
first_name | character varying(100) | | not null |
last_name | character varying(100) | | not null |
gender | character varying(7) | | not null |
email | character varying(100) | | |
date_of_birth | date | | not null |
country_of_birth | character varying(100) | | not null |
Indexes:
"person_pkey" PRIMARY KEY, btree (id)
"person_email_key" UNIQUE CONSTRAINT, btree (email)
Above are the table details.
I am trying to insert a row into the table. Since, I gave id as BIGSERIAL datatype, its supposed to auto increment the id for me and everytime I generate a row.
But, now as I am trying to insert a new row it's showing me duplicate id error.
test=# INSERT INTO person (first_name, last_name, gender, email, date_of_birth, country_of_birth) VALUES ('Sean', 'Paul','Male', 'paul#gmail.com','2001-03-02','India');
ERROR: duplicate key value violates unique constraint "person_pkey"
DETAIL: Key (id)=(2) already exists.
The problem can be one of the following:
somebody ran ALTER SEQUENCE or called the setval function to reset the sequence counter
somebody INSERTed a row with an explicit value of 2 for id, so that the default value was overridden rather than using a sequence value
You can reduce the danger of the latter happening by using identity columns with GENERATED ALWAYS AS IDENTITY.

postgresql add constraint Check on 3 columns

I have a table personnes
jouer=# create table personnes (
g_name VARCHAR ( 50 ),
f_name VARCHAR ( 50 ) UNIQUE,
company BOOLEAN NOT NULL DEFAULT false)
;
resulting in:
Colonne | Type | Collationnement | NULL-able | Par défaut
---------+-----------------------+-----------------+-----------+------------
g_name | character varying(50) | | |
f_name | character varying(50) | | |
company | boolean | | not null | false
I want to add a constraint so that:
if company is true, g_name and f_name must be null and
if company is false, g_name and f_name are both required to be not
null.
I have tried 2 things, but neither give the right result.
jouer=# ALTER TABLE personnes ADD CONSTRAINT personnes_company_check CHECK (
company is false
and g_name is not null
and f_name is not null)
;
ALTER TABLE
and
jouer=# ALTER TABLE personnes ADD CONSTRAINT personnes_company_check CHECK (
company is true
and g_name is null
and f_name is null)
;
There's various ways to write this. A literal translation of your requirement would be a conditional expression
ALTER TABLE personnes ADD CONSTRAINT personnes_company_check CHECK (
CASE WHEN company
THEN g_name IS NULL AND f_name IS NULL
ELSE g_name IS NOT NULL AND f_name IS NOT NULL
END
);
but I'd prefer
ALTER TABLE personnes ADD CONSTRAINT personnes_company_check CHECK (
company = (g_name IS NULL) AND
company = (f_name IS NULL)
);
which you could also split into two separate constraints.

List Partitioning in Postgres 12

CREATE TABLE countrymeasurements
(
countrycode int NOT NULL,
countryname character varying(30) NOT NULL,
languagename character varying (30) NOT NULL,
daysofoperation character varying(30) NOT NULL,
salesparts bigint,
replaceparts bigint
)
PARTITION BY LIST(countrycode)
(
partition india values(1),
partition japan values(2),
partition china values(3),
partition malaysia values(4)
);
I am getting ERROR: syntax error at or near "(". What i am missing here. I am using postgres12
I don't know where you found that syntax, obviously not in the manual. As you can see there partitions are created using create table .. as partition of in Postgres:
Define the table:
CREATE TABLE countrymeasurements
(
countrycode int NOT NULL,
countryname character varying(30) NOT NULL,
languagename character varying (30) NOT NULL,
daysofoperation character varying(30) NOT NULL,
salesparts bigint,
replaceparts bigint
)
PARTITION BY LIST(countrycode);
Define the partitions:
create table india
partition of countrymeasurements
for values in (1);
create table japan
partition of countrymeasurements
for values in (2);
create table china
partition of countrymeasurements
for values in (3);
create table malaysia
partition of countrymeasurements
for values in (4);
Welcome to stackoverflow! Please note, that asking questions here without showing prior research may turn away people that otherwise might love to help.
In this case I checked and found no official example for list partitioning. But, if you just shorten your statement it will create a table using the values in countrycode column to partition:
CREATE TABLE countrymeasurements
(
countrycode int NOT NULL,
countryname character varying(30) NOT NULL,
languagename character varying (30) NOT NULL,
daysofoperation character varying(30) NOT NULL,
salesparts bigint,
replaceparts bigint
)
PARTITION BY LIST(countrycode)
;
The psql describe table command shows the partitioning is as requested:
psql=# \d countrymeasurements
Table "public.countrymeasurements"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------+-----------+----------+---------
countrycode | integer | | not null |
countryname | character varying(30) | | not null |
languagename | character varying(30) | | not null |
daysofoperation | character varying(30) | | not null |
salesparts | bigint | | |
replaceparts | bigint | | |
Partition key: LIST (countrycode)
Then you can define the partitions like in the answer from #a_horse_with_no_name. But, some notes on using such a strategy may be in order.
Notes:
When you just allow 4 explicit partitions via list (as you tried) what happens when value 5 comes along?
The documentation at postgresql 12 on ddl partition ing suggests to consider hash partitioning instead of list and choose the number of partitions instead of relying on your column values which might expose a very unbalanced abundance.

How do I update 1.3 billion rows in this table more efficiently?

I have 1.3 billion rows in a PostgreSQL table sku_comparison that looks like this:
id1 (INTEGER) | id2 (INTEGER) | (10 SMALLINT columns) | length1 (SMALLINT)... |
... length2 (SMALLINT) | length_difference (SMALLINT)
The id1 and id2 columns are referenced in a table called sku, which contains about 300,000 rows, and have an associated varchar(25) value in each row from a column, code.
There is a btree index built on id1 and id2, and a compound index of id1 and id2 in sku_comparison. There is a btree index on the id column of sku, as well.
My goal is to update the length1 and length2 columns with the lengths of the corresponding code column from the sku table. However, I ran the following code for over 20 hours, and it did not complete the update:
UPDATE sku_comparison SET length1=length(sku.code) FROM sku
WHERE sku_comparison.id1=sku.id;
All of the data is stored on a single hard disk on a local computer, and the processor is fairly modern. Constructing this table, which required much more complicated string comparisons in Python, only took about 30 hours or so, so I am not sure why something like this would take as long.
edit: here are formatted table definitions:
Table "public.sku"
Column | Type | Modifiers
------------+-----------------------+--------------------------------------------------
id | integer | not null default nextval('sku_id_seq'::regclass)
sku | character varying(25) |
pattern | character varying(25) |
pattern_an | character varying(25) |
firsttwo | character(2) | default ' '::bpchar
reference | character varying(25) |
Indexes:
"sku_pkey" PRIMARY KEY, btree (id)
"sku_sku_idx" UNIQUE, btree (sku)
"sku_firstwo_idx" btree (firsttwo)
Referenced by:
TABLE "sku_comparison" CONSTRAINT "sku_comparison_id1_fkey" FOREIGN KEY (id1) REFERENCES sku(id)
TABLE "sku_comparison" CONSTRAINT "sku_comparison_id2_fkey" FOREIGN KEY (id2) REFERENCES sku(id)
Table "public.sku_comparison"
Column | Type | Modifiers
---------------------------+----------+-------------------------
id1 | integer | not null
id2 | integer | not null
consec_charmatch | smallint |
consec_groupmatch | smallint |
consec_fieldtypematch | smallint |
consec_groupmatch_an | smallint |
consec_fieldtypematch_an | smallint |
general_charmatch | smallint |
general_groupmatch | smallint |
general_fieldtypematch | smallint |
general_groupmatch_an | smallint |
general_fieldtypematch_an | smallint |
length1 | smallint | default 0
length2 | smallint | default 0
length_difference | smallint | default '-999'::integer
Indexes:
"sku_comparison_pkey" PRIMARY KEY, btree (id1, id2)
"ssd_id1_idx" btree (id1)
"ssd_id2_idx" btree (id2)
Foreign-key constraints:
"sku_comparison_id1_fkey" FOREIGN KEY (id1) REFERENCES sku(id)
"sku_comparison_id2_fkey" FOREIGN KEY (id2) REFERENCES sku(id)
Would you consider using an anonymous code block?
using pseudo code...
FOREACH 'SELECT ski.id,
sku.code,
length(sku.code)
FROM sku
INTO v_skuid, v_skucode, v_skulength'
DO
UPDATE sku_comparison
SET sku_comparison.length1 = v_skulength
WHERE sku_comparison.id1=v_skuid;
END DO
END FOREACH
This would break the whole thing into smaller transactions and you will not be evaluating the length of sku.code every time.

Column <tablename>_id referenced in foreign key not found

I'm going through 7 Databases in 7 Weeks.
In PostgreSQL, I created a venues table that has a SERIAL venue_id column.
output of \d venues
Table "public.venues"
Column | Type | Modifiers
----------------+------------------------+-----------------------------------------------------------
venue_id | integer | not null default nextval('venues_venue_id_seq'::regclass)
name | character varying(255) |
street_address | text |
type | character(7) | default 'public'::bpchar
postal_code | character varying(9) |
country_code | character(2) |
Indexes:
"venues_pkey" PRIMARY KEY, btree (venue_id)
Check constraints:
"venues_type_check" CHECK (type = ANY (ARRAY['public'::bpchar, 'private'::bpchar]))
Foreign-key constraints:
"venues_country_code_fkey" FOREIGN KEY (country_code, postal_code) REFERENCES cities(country_code, postal_code) MATCH FULL
The next step is to create an event table that references venue_id with a foreign key.
I'm trying this:
CREATE TABLE events (
event_id SERIAL PRIMARY KEY,
title text,
starts timestamp,
ends timestamp,
FOREIGN KEY (venue_id) REFERENCES venues (venue_id));
And I get this error:
ERROR: column "venue_id" referenced in forgein key not found
What's wrong?
you need to initialize the foreign key column too. see here http://www.postgresql.org/docs/current/static/ddl-constraints.html#DDL-CONSTRAINTS-FK
source & credit from #mu is too short
I'm going through the second edition of this book, so things might have changed slightly.
To create the table, you explicitly have to declare the venues_id as a column in your table, just like the rest of your columns:
CREATE TABLE events (
event_id SERIAL PRIMARY KEY,
title text,
starts timestamp,
ends timestamp,
venue_id integer, -- this is the line you're missing!
FOREIGN KEY (venue_id)
REFERENCES venues (venue_id) MATCH FULL
);
Once you have executed that, the table is created:
7dbs=# \dt
List of relations
Schema | Name | Type | Owner
--------+-----------+-------+----------
public | cities | table | postgres
public | countries | table | postgres
public | events | table | postgres
public | venues | table | postgres