Insert data into strongly normalized DB and maintain the integrity (Postgres) - postgresql

I'm trying to develop a simple database for the phonebook. This is what I wrote:
CREATE TABLE phone
(
phone_id SERIAL PRIMARY KEY,
phone CHAR(15),
sub_id INT, -- subscriber id --
cat_id INT -- category id --
);
CREATE TABLE category
(
cat_id SERIAL PRIMARY KEY, -- category id --
cat_name CHAR(15) -- category name --
);
CREATE TABLE subscriber
(
sub_id SERIAL PRIMARY KEY,
name CHAR(20),
fname CHAR(20), -- first name --
lname CHAR(20), -- last name --
);
CREATE TABLE address
(
addr_id SERIAL PRIMARY KEY,
country CHAR(20),
city CHAR(20),
street CHAR(20),
house_num INT,
apartment_num INT
);
-- many-to-many relation --
CREATE TABLE sub_link
(
sub_id INT REFERENCES subscriber(sub_id),
addr_id INT
);
I created a link table for many-to-many relation because few people can live at the same address and one person can live in different locations at different times.
But I cannot figure out how to add data in strongly normalized DB like this and maintain the integrity of the data.
The first improvement was that I added inique key on address table bacause this table should not contain duplicated data:
CREATE TABLE address
(
addr_id SERIAL PRIMARY KEY,
country CHAR(20),
city CHAR(20),
street CHAR(20),
house_num INT,
apartment_num INT,
UNIQUE (country, city, street, house_num, apartment_num)
);
Now the problem is how to add a new record about some person into DB. I think I should use the next order of actions:
Insert a record into subscriber table, because sub_link and phone tables must use id of a new subscriber.
Insert a record into address table because addr_id must exist before adding record into sub_link.
Link last records from subscriber and address in sub_link table. But at this step I have a new problem: how can I get sub_id and addr_id from steps 1) and 2) in PostgreSQL effectively?
Then I need to insert a record into the phone table. As at 3) step I dont know how to get sub_id from previous queries effectively.
I read about WITH block in the Postgres but I cannot figure out how to use it in my case.
UPDATE
I've done like ASL suggested:
-- First record --
WITH t0 AS (
WITH t1 AS (
INSERT INTO subscriber
VALUES(DEFAULT, 'Twilight Sparkle', NULL, NULL)
RETURNING sub_id
),
t2 AS (
INSERT INTO address
VALUES(DEFAULT, 'Equestria', 'Ponyville', NULL, NULL, NULL)
RETURNING addr_id
)
INSERT INTO sub_link
VALUES((SELECT sub_id FROM t1), (SELECT addr_id FROM t2))
)
INSERT INTO phone
VALUES (DEFAULT, '000000', (SELECT sub_id FROM t1), 1);
But I have an error: WITH clause containing a data-modifying statement must be at the top level
LINE 2: WITH t1 AS (INSERT INTO subscriber VALUES(DEFAULT,

You can do it all in one query using a WITH block with a RETURNING clause. See PostgreSQL docs on INSERT. For example:
WITH t1 AS (INSERT INTO subscriber VALUES ... RETURNING sub_id),
t2 AS (INSERT INTO address VALUES ... RETURNING addr_id)
INSERT INTO sub_link VALUES ((SELECT sub_id FROM t1), (SELECT addr_id FROM t2))
Note that this simple form will only work when inserting a single row into each table.
This is somewhat off the topic of your question, but I suggest you also consider making sub_id and cat_id columns in the phone table foreign keys (use REFERENCES).

You got the idea. Insert data from topmost tables so that you have their IDs before inserting references to them.
In PostgreSQL you can use INSERT/UPDATE ... RETURNING id construct. If you are not using some ORM which do it automatically, this may be useful.
The only thing here is that in step 2 you probably want to check if the address already exists before inserting:
SELECT addr_id FROM address WHERE country = ? AND city = ? ...

Related

PostrgreSQL ForeignKeyViolation

I am attempting to insert some data into my database via a lambda function. I am getting the following error ForeignKeyViolation: insert or update on table "address" violates foreign key constraint "address_id_fkey"
I understand that this is because my address table has a foreign key linking it to the clients table, and the keys are not matching.
Is there a way to format my tables so that I can input my client data and address data together? Or will I need to input the client data first, then retrieve the id and use it to input the address data.
Currently I am running the following two functions.
postgres_insert_query = "INSERT INTO clients (name, phone, contact) VALUES ('{0}','{1}','{2}')".format(data['name'], data['phone'], data['contact'])
postgres_insert_query = "INSERT INTO address (line1, city, state, zip) VALUES ('{0}','{1}','{2}', {3})".format(address['line1'], address['city'], address['state'], address['zip'])
Even if no address data is present I would still like to create a row for it (with the correct foreign key).
use DEFERRABLE foreign key constraint. Then wrap you function into a transaction.
CREATE temp TABLE pktable (
id INT4 PRIMARY KEY,
other INT4
);
CREATE temp TABLE fktable (
id INT4 PRIMARY KEY,
fk INT4 REFERENCES pktable DEFERRABLE INITIALLY DEFERRED
);
BEGIN;
INSERT INTO fktable VALUES (100, 200);
INSERT INTO pktable VALUES (200, 500);
COMMIT;
Postgres allows DML operations within a CTE. Doing so will allow you to insert into both tables in a single statement while allowing auto-generation of both ids. The following is a Postgres implementation. See demo.
with thedata(name, phone, contact, line1, city, state, zip) as
( values ('client 1', 'ev4 4213', 'andy','614 a', 'some city;','that state','11111'))
, theinsert (cli_id) as
( insert into clients(name, phone, contact)
select name, phone, contact
from thedata
returning cli_id
)
insert into addresses(cli_id, line1, city, state, zip)
select cli_id, line1, city, state, zip
from theinsert
cross join thedata;
Unfortunately I do not know your obscurification (Orm) language but perhaps something like:
pg_query = "with thedata( {0} name, {1} phone, {2} contact, {3} line1, {4} city, {5} state, {6} zip) as
, theinsert (cli_id) as
( insert into clients(name, phone, contact)
select name, phone, contact
from thedata
returning cli_id
)
insert into addresses(cli_id, line1, city, state, zip)
select cli_id, line1, city, state, zip
from theinsert
cross join thedata".format(data['name'], data['phone'], data['contact']
, address['line1'], address['city'], address['state'], address['zip']);

Is there pattern to have union table for different items?

I'd like to have column constraint based combination of 2 columns. I don't find the way to use foreign key here, because it should be conditional FK, then. Hope this basic SQL shows the problem:
CREATE TABLE performer_type (
id serial primary key,
type varchar
);
INSERT INTO performer_type ( id, type ) VALUES (1, 'singer'), ( 2, 'band');
CREATE TABLE singer (
id serial primary key,
name varchar
);
INSERT INTO singer ( id, name ) VALUES (1, 'Robert');
CREATE TABLE band (
id serial primary key,
name varchar
);
INSERT INTO band ( id, name ) VALUES (1, 'Animates'), ( 2, 'Zed Leppelin');
CREATE TABLE gig (
id serial primary key,
performer_type_id int default null, /* FK, no problem */
performer_id int default null /* want FK based on previous FK, no good solution so far */
);
INSERT INTO gig ( performer_type_id, performer_id ) VALUES ( 1,1 ), (2,1), (2,2), (1,2), (2,3);
Now, the last INSERT works, but for last 2 value pairs I'd like it fail, because there is no singer ID 2 nor band ID 3. How to set such constraint?
I already asked similar question in Mysql context and only solution was to use trigger. Problem with trigger was: you can't have dynamic list of types and table set. I'd like to add types (and related tables) on the fly.
I also found very promising pattern, but this is upside down for me, I did not figured out, how to turn it to work in my case.
What I am looking here seems to me so useful pattern, I think there must be some common way for it. Is it?
Edit.
Seems, I choose bad items in my examples, so I try make it clear: different performer tables (singer and band) have NO relation between them. gig-table just has to list tasks for different performers, without setting any relations between them.
Another example would items in stock: I may have item_type-table, which defines hundreds of item-types with related tables (for example, orange and house), and there should be table stock which enlists all appearances of items.
PostgreSQL I use is 9.6
Based on #Laurenz Albe answer I form a solution for example above. Main difference: there is parent table performer, which PK is FK/PK for specific performer-tables and is referenced also from gig table.
CREATE TABLE performer_type (
id serial primary key,
type varchar
);
INSERT INTO performer_type ( id, type ) VALUES (1, 'singer' ), ( 2, 'band' );
CREATE TABLE performer (
id serial primary key,
performer_type_id int REFERENCES performer_type(id)
);
CREATE TABLE singer (
id int primary key REFERENCES performer(id),
name varchar
);
INSERT INTO performer ( performer_type_id ) VALUES (1); -- get PK 1 for next statement
INSERT INTO singer ( id, name ) VALUES (1, 'Robert');
CREATE TABLE band (
id int primary key REFERENCES performer(id),
name varchar
);
INSERT INTO performer ( performer_type_id ) VALUES (2); -- get PK 2 for next statement
INSERT INTO singer ( id, name ) VALUES (2, 'Animates');
INSERT INTO performer ( performer_type_id ) VALUES (2); -- get PK 3 for next statement
INSERT INTO singer ( id, name ) VALUES (3, 'Zed Leppelin');
CREATE TABLE gig (
id serial primary key,
performer_id int REFERENCES performer(id)
);
INSERT INTO gig ( performer_id ) VALUES (1), (2), (3), (4);
And the last INSERT fails, as expected:
ERROR: insert or update on table "gig" violates foreign key constraint "gig_performer_id_fkey"
DETAIL: Key (performer_id)=(4) is not present in table "performer".
But
For me there is annoying problem: I have no good way to make distinction which ID is for singer and which for band etc. (in original example I had performer_type_id in gig-table for that), because any performer_id may belong any performer. So I'd like any performer type has it's own ID range, so I create dummy table for every sequence
CREATE TABLE band_id (
id int primary key,
dummy boolean default null
);
CREATE SEQUENCE band_id_seq START 1;
ALTER TABLE band_id ALTER COLUMN id SET DEFAULT nextval('band_id_seq');
CREATE TABLE singer_id (
id int primary key,
dummy boolean default null
);
CREATE SEQUENCE singer_id_seq START 2000000;
ALTER TABLE singer_id ALTER COLUMN id SET DEFAULT nextval('singer_id_seq');
Now, to insert new row into specific perfomer table I have to get next ID for it:
INSERT INTO band_id (dummy) VALUES (NULL);
Trying to figure out, is it possible to solve this process on DB level, or has something to done in App-level. It would be nice, if inserting into band table could:
before trigger inserting into band_id to genereate specific ID
before trigger inserting this new ID into performer-table
include this new ID into INSERT into band
Frist 2 points are easy, but the last point is not clear for now.

Finding distinct values of non Primary Key column in CQL Cassandra

I use the following code for creating table:
CREATE KEYSPACE mykeyspace
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
USE mykeyspace;
CREATE TABLE users (
user_id int PRIMARY KEY,
fname text,
lname text
);
INSERT INTO users (user_id, fname, lname)
VALUES (1745, 'john', 'smith');
INSERT INTO users (user_id, fname, lname)
VALUES (1744, 'john', 'doe');
INSERT INTO users (user_id, fname, lname)
VALUES (1746, 'john', 'smith');
I would like to find the distinct value of lname column (that is not a PRIMARY KEY). I would like to get the following result:
lname
-------
smith
By using SELECT DISTINCT lname FROM users;
However since lname is not a PRIMARY KEY I get the following error:
InvalidRequest: code=2200 [Invalid query] message="SELECT DISTINCT queries must
only request partition key columns and/or static columns (not lname)"
cqlsh:mykeyspace> SELECT DISTINCT lname FROM users;
How can I get the distinct values from lname?
User - Undefined_variable - makes two good points:
In Cassandra, you need to build your data model to match your query patterns. This sometimes means duplicating your data into additional tables, to attain the desired level of query flexibility.
DISTINCT only works on partition keys.
So, one way to get this to work, would be to build a specific table to support that query:
CREATE TABLE users_by_lname (
lname text,
fname text,
user_id int,
PRIMARY KEY (lname, fname, user_id)
);
Now after I run your INSERTs to this new query table, this works:
aploetz#cqlsh:stackoverflow> SELECT DISTINCT lname FROm users_by_lname ;
lname
-------
smith
doe
(2 rows)
Notes: In this table, all rows with the same partition key (lname) will be sorted by fname, as fname is a clustering key. I added user_id as an additional clustering key, just to ensure uniqueness.
There is no such functionality in cassandra. DISTINCT is possible on partition key only.
You should Design Your data model based on your requirements.
You have to process the data in application logic (spark may be useful)

How can I generate big data sample for Postgresql using generate_series and random?

I want to generate big data sample (almost 1 million records) for studying tuplesort.c's polyphase merge in postgresql, and I hope the schema as follows:
CREATE TABLE Departments (code VARCHAR(4), UNIQUE (code));
CREATE TABLE Towns (
id SERIAL UNIQUE NOT NULL,
code VARCHAR(10) NOT NULL, -- not unique
article TEXT,
name TEXT NOT NULL, -- not unique
department VARCHAR(4) NOT NULL REFERENCES Departments (code),
UNIQUE (code, department)
);
how to use generate_series and random for do it? thanks a lot!
To insert one million rows into Towns
insert into towns (
code, article, name, department
)
select
left(md5(i::text), 10),
md5(random()::text),
md5(random()::text),
left(md5(random()::text), 4)
from generate_series(1, 1000000) s(i)
Since id is a serial it is not necessary to include it.

How to implicitly insert SERIAL ID via view over more than one table

I have two tables, connected in E/R by a is-relation. One representing the "mother table"
CREATE TABLE PERSONS(
id SERIAL NOT NULL,
name character varying NOT NULL,
address character varying NOT NULL,
day_of_creation timestamp NOT NULL DEFAULT current_timestamp,
PRIMARY KEY (id)
)
the other representing the "child table"
CREATE TABLE EMPLOYEES (
id integer NOT NULL,
store character varying NOT NULL,
paychecksize integer NOT NULL,
FOREIGN KEY (id)
REFERENCES PERSONS(id),
PRIMARY KEY (id)
)
Now those two tables are joined in a view
CREATE VIEW EMPLOYEES_VIEW AS
SELECT
P.id,name,address,store,paychecksize,day_of_creation
FROM
PERSONS AS P
JOIN
EMPLOYEES AS E ON P.id = E.id
I want to write either a rule or a trigger to enable a db user to make an insert on that view, sparing him the nasty details of the splitted columns into different tables.
But I also want to make it convenient, as the id is a SERIAL and the day_of_creation has a default value there is no actual need that a user has to provide those, therefore a statement like
INSERT INTO EMPLOYEES_VIEW (name, address, store, paychecksize)
VALUES ("bob", "top secret", "drugstore", 42)
should be enough to result in
PERSONS
id|name|address |day_of_creation
-------------------------------
1 |bob |top secret| 2013-08-13 15:32:42
EMPLOYEES
id| store |paychecksize
---------------------
1 |drugstore|42
A basic rule would be easy as
CREATE RULE EMPLOYEE_VIEW_INSERT AS ON INSERT TO EMPLOYEE_VIEW
DO INSTED (
INSERT INTO PERSONS
VALUES (NEW.id,NEW.name,NEW.address,NEW.day_of_creation),
INSERT INTO EMPLOYEES
VALUES (NEW.id,NEW.store,NEW.paychecksize)
)
should be sufficient. But this will not be convenient as a user will have to provide the id and timestamp, even though it actually is not necessary.
How can I rewrite/extend that code base to match my criteria of convenience?
Something like:
CREATE RULE EMPLOYEE_VIEW_INSERT AS ON INSERT TO EMPLOYEES_VIEW
DO INSTEAD
(
INSERT INTO PERSONS (id, name, address, day_of_creation)
VALUES (default,NEW.name,NEW.address,default);
INSERT INTO EMPLOYEES (id, store, paychecksize)
VALUES (currval('persons_id_seq'),NEW.store,NEW.paychecksize)
);
That way the default values for persons.id and persons.day_of_creation will be the default values. Another option would have been to simply remove those columns from the insert:
INSERT INTO PERSONS (name, address)
VALUES (NEW.name,NEW.address);
Once the rule is defined, the following insert should work:
insert into employees_view (name, address, store, paychecksize)
values ('Arthur Dent', 'Some Street', 'Some Store', 42);
Btw: with a current Postgres version an instead of trigger is the preferred way to make a view updateable.