PostgreSQL database: Get rid of redundant transitive relation (maybe 3NF is failed)

PostgreSQL database: Get rid of redundant transitive relation (maybe 3NF is failed) - postgresql

I'm creating a hybrid between "X-Com Enemy Unknown" and "The Sims". I maintain game state in a database--PostgreSQL--but my question is structural, not engine-specific.
As in X-Com, there are some bases in different locations, so I create a table named Base with ID autoincrement identity as primary key.
Every base has some facilities in its territory, so I create a table named Facility with a foreign key Facility.Base_ID, referring to Base.ID.
Every base has some landing crafts in its hangars, so I create a table named Craft with a foreign key Craft.Base_ID, referring to Base.ID.
Every base has some troopers in its barracks, so I create a table named Trooper with a foreign key Trooper.Base_ID, referring to Base.ID.
Just to this point, everything seems to be ok, doesn't it? However...
I want to have some sort of staff instruction. Like in the X-Com game, every trooper can be assigned to some craft for offense action, or can be unassigned. In addition, every trooper can be assigned to some facility (or can be unassigned) for defense action. So, I have to add nullable foreign keys Trooper.Craft_ID and Trooper.Facility_ID, referring to Craft.ID and Facility.ID respectively.
That database has a redundancy. If some trooper is assigned to a craft or to a facility (or both), it has two (or even three) relations to the base--one direct relation through its Base_ID and some indirect relations as Facility(Trooper.Facility_ID).Base_ID and Craft(Trooper.Craft_ID).Base_ID. Even if I get rid of Trooper.Base_ID (e.g. I can make both assignment mandatory and create a mock craft and a mock facility in every base), I can't get rid of both trooper-facility-base and trooper-craft-base relations.
In addition to this redundancy, there is a worse problem--in case of a mistake, some trooper can be assigned to a craft from one base and to a facility from another base, that's a really nasty situation. I can prohibit it in the application business logic tier, but it's still allowed by the database.
There can be some constraints to apply, but is there any structural modification to the schema that can get rid of the redundancy and potential inconsistency as a result of a good structure, not as a result of constraints?
CREATE TABLE base (
id int PRIMARY KEY
);
CREATE TABLE facility (
id int PRIMARY KEY,
base_id int REFERENCES base
);
CREATE TABLE craft (
id int PRIMARY KEY,
base_id int REFERENCES base
);
CREATE TABLE trooper (
id int PRIMARY KEY,
assigned_facility_id int REFERENCES facility,
assigned_craft_id int REFERENCES craft,
base_id int REFERENCES base
);
Now I want to get some sort of constraints on a trooper t so that
facilities.get(t.assigned_facility_id).base_id IS NULL OR EQUAL TO t.base_id
crafts.get(t.assigned_craft_id).base_id IS NULL OR EQUAL TO t.base_id
This hypothetical constraint has to be applied to table trooper, because it applies in boundaries of each trooper row separately. Constraints on one table have to check equality between fields of two other tables.
I would like to create a database schema where there is exactly one way, having a trooper.id, to find its referenced base.id. How do I normalise my schema?

Related

One-way multi column UNIQUE constraint

I have a system in which I am trying to describe event-based interactions between two targets. In our system, an event (interaction) has a "source" and a "target" (basically who [did what] to whom):
-- tried to remove some of the "noise" from this for the sake of the post:
CREATE TABLE interaction_relationship (
id integer CONSTRAINT interaction_pk PRIMARY KEY,
source_id integer CONSTRAINT source_fk NOT NULL REFERENCES entity(id),
target_id integer CONSTRAINT target_fk NOT NULL REFERENCES entity(id),
-- CONSTRAINT(s)
CREATE CONSTRAINT interaction_relationship_deduplication UNIQUE(source_id, target_id)
);
Constraint interaction_relationship_deduplication is the source of my question:
In our system, a single source can interact with a single target multiple times, but that relationship can only exist once, i.e. if I am a mechanic working on a car, I may see that car multiple times in my shop, but I have only one relationship with that single car:
id
source_id
target_id
a
123abc
456def
b
123abc
789ghi
Ideally, this table also represents a unidirectional relationship. source_id is always the "owner" of the interaction, i.e. if the car 456def ran over the mechanic 123abc there would be another entry in the interaction_relationship table:
id
source_id
target_id
1
123abc
456def
2
123abc
789ghi
3
456def
123abc
So, my question: does UNIQUE on multiple columns take value order into consideration? Or would the above cause a failure?

does UNIQUE on multiple columns take value order into consideration?
Yes. The tuple (123abc, 456def) is different from the tuple (456def, 123abc), they may both exist in the table at the same time.
That said, you might want to remove the surrogate id from the relationship table, there's hardly any use for it. A relation table (as opposed to an entity table, and even there) would do totally fine with a multi-column primary key, which is naturally the combination of source and target.

understanding an inheritance in Postgres; why key "fails" in insert/update command

(One image, tousands of words)
I'd made few tables that are inherited between themselves. (persons)
And then assign child table (address), and relate it only to "base" table (person).
When try to insert in child table, and record is related to inherited table, insert statement fail because there is no key in master table.
And as I insert records in descendant tables, records are salo available in base table (so, IMHO, should be visible/accessible in inherited tables).
Please take a look on attached image. Obviously do someting wrong or didn't get some point....
Thank You in advanced!

Sorry, that's how Postgres table inheritance works. 5.10.1 Caveats explains.
A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children. This is true on both the referencing and referenced sides of a foreign key constraint. Thus, in the terms of the above example:
Specifying that another table's column REFERENCES cities(name) would allow the other table to contain city names, but not capital names. There is no good workaround for this case.
In their example, capitals inherits from cities as organization_employees inherits from person. If person_address REFERENCES person(idt_person) it will not see entries in organization_employees.
Inheritance is not as useful as it seems, and it's not a way to avoid joins. This can be better done with a join table with some extra columns. It's unclear why an organization would inherit from a person.
person
id bigserial primary key
name text not null
verified boolean not null default false
vat_nr text
foto bytea
# An organization is not a person
organization
id bigserial not null
name text not null
# Joins a person with an organization
# Stores information about that relationship
organization_employee
person_id bigint not null references person(id)
organization_id bigint not null references organization(id)
usr text
pwd text
# Get each employee, their name, and their org's name.
select
person.name
organization.name
from
organization_employee
join person on person_id = person.id
join organization on organization_id = organization.id
Use bigserial (bigint) for primary keys, 2 billion comes faster than you think
Don't enshrine arbitrary business rules in the schema, like how long a name can be. You're not saving any space by limiting it, and every time the business rule changes you have to alter your schema. Use the text type. Enforce arbitrary limits in the application or as constraints.
idt_table_name primary keys makes for long, inconsistent column names hard to guess. Why is the primary key of person_address not idt_person_address? Why is the primary key of organization_employee idt_person? You can't tell, at a glance, which is the primary key and which is a foreign key. You still need to prepend the column name to disambiguate; for example, if you join person with person_address you need person.idt_person and person_address.idt_person. Confusing and redundant. id (or idt if you prefer) makes it obvious what the primary key is and clearly differentiates it from table_id (or idt_table) foreign keys. SQL already has the means to resolve ambiguities: person.id.

Database design with composite types in postgresql

How to refer another table built from composite type in my table.
I am trying to setup a database to understand postgresql and its Object oriented features.
The statement is as follows : There are multiple companies which can have board members.
Each company can own another company or a person can own that company too.
This is the type of database design I am looking for.
create type companyType(
name: VARCHAR,
boardMembers : personType[],
owns: companyType[]
)
create type personType(
name: VARCHAR,
owns: companyType[]
)
Create table company as companyType
Create table person as personType
I understand that I cannot self reference the companyType so I will probably move this another table.
My question is, when I am trying to insert into say company type, how do i insert list of person table objects as foreign key ?
Would making a column 'id' in each table and giving it type SERIAL work to use it as a foreign key?

That is not a relational database design, and you won't get happy with it.
Map each object to a table. The table columns are the attributes of the object. Add an artificial primary key (id bigint GENERATED ALWAYS AS IDENTITY). Don't use composite types or arrays.
Relationships are expressed like this:
If the relationship is one-to-many, add a foreign key to the "many' side.
If the relationship is many-to-many, add a "junction table" that has foreign keys to both tables. The primary key is the union of these foreign keys.
Normalize the resulting data model to remove redundancies.
Sprinkle with unique and check constraints as appropriate.
That way your queries will become simple, and you can use your database's features to make your life easier.

Shared Primary key versus Foreign Key

I have a laboratory analysis database and I'm working on the bast data layout. I've seen some suggestions based on similar requirements for using a "Shared Primary Key", but I don't see the advantages over just foreign keys. I'm using PostgreSQL:tables listed below
Sample
___________
sample_id (PK)
sample_type (where in the process the sample came from)
sample_timestamp (when was the sample taken)
Analysis
___________
analysis_id (PK)
sample_id (FK references sample)
method (what analytical method was performed)
analysis_timestamp (when did the analysis take place)
analysis_notes
gc
____________
analysis_id (shared Primary key)
gc_concentration_meoh (methanol concentration)
gc_concentration_benzene (benzene concentration)
spectrophotometer
_____________
analysis_id
spectro_nm (wavelength used)
spectro_abs (absorbance measured)
I could use this design, or I could move the fields from the analysis table into both the gc and spectrophotometer tables, and just use foreign keys between sample, gc, and spectrophotometer tables. The only advantage I see of this design is in cases where I would just want information on how many or what types of analyses were performed, without having to join in the actual results. However, the additional rules to ensure referential integrity between the shared primary keys, and managing extra joins and triggers (on delete cascade, etc) appears to make it more of a headache than the minor advantages. I'm not a DBA, but a scientist, so please let me know what I'm missing.
UPDATE:
A shared primary key (as I understand it) is like a one-to-one foreign key with the additional constraint that each value in the parent tables(analysis) must appear in one of the child tables once, and no more than once.

I've seen some suggestions based on similar requirements for using a
"Shared Primary Key", but I don't see the advantages over just foreign
keys.
If I've understood your comments above, the advantage is that only the first implements the requirement that each row in the parent match a row in one child, and only in one child. Here's one way to do that.
create table analytical_methods (
method_id integer primary key,
method_name varchar(25) not null unique
);
insert into analytical_methods values
(1, 'gc'),(2, 'spec'), (3, 'Atomic Absorption'), (4, 'pH probe');
create table analysis (
analysis_id integer primary key,
sample_id integer not null, --references samples, not shown
method_id integer not null references analytical_methods (method_id),
analysis_timestamp timestamp not null,
analysis_notes varchar(255),
-- This unique constraint lets the pair of columns be the target of
-- foreign key constraints from other tables.
unique (analysis_id, method_id)
);
-- The combination of a) the default value and the check() constraint on
-- method_id, and b) the foreign key constraint on the paired columns
-- analysis_id and method_id guarantee that rows in this table match a
-- gc row in the analysis table.
--
-- If all the child tables have similar constraints, a row in analysis
-- can match a row in one and only one child table.
create table gc (
analysis_id integer primary key,
method_id integer not null
default 1
check (method_id = 1),
foreign key (analysis_id, method_id)
references analysis (analysis_id, method_id),
gc_concentration_meoh integer not null,
gc_concentration_benzene integer not null
);

It looks like in my case this supertype/subtype model in not the best choice. Instead, I should move the fields from the analysis table into all the child tables, and make a series of simple foreign key relationships. The advantage of the supertype/subtype model is when using the primary key of the supertype as a foreign key in another table. Since I am not doing this, the extra layer of complexity will not add anything.

When to use inherited tables in PostgreSQL?

In which situations you should use inherited tables? I tried to use them very briefly and inheritance didn't seem like in OOP world.
I thought it worked like this:
Table users has all fields required for all user levels. Tables like moderators, admins, bloggers, etc but fields are not checked from parent. For example users has email field and inherited bloggers has it now too but it's not unique for both users and bloggers at the same time. ie. same as I add email field to both tables.
The only usage I could think of is fields that are usually used, like row_is_deleted, created_at, modified_at. Is this the only usage for inherited tables?

There are some major reasons for using table inheritance in postgres.
Let's say, we have some tables needed for statistics, which are created and filled each month:
statistics
- statistics_2010_04 (inherits statistics)
- statistics_2010_05 (inherits statistics)
In this sample, we have 2.000.000 rows in each table. Each table has a CHECK constraint to make sure only data for the matching month gets stored in it.
So what makes the inheritance a cool feature - why is it cool to split the data?
PERFORMANCE: When selecting data, we SELECT * FROM statistics WHERE date BETWEEN x and Y, and Postgres only uses the tables, where it makes sense. Eg. SELECT * FROM statistics WHERE date BETWEEN '2010-04-01' AND '2010-04-15' only scans the table statistics_2010_04, all other tables won't get touched - fast!
Index size: We have no big fat table with a big fat index on column date. We have small tables per month, with small indexes - faster reads.
Maintenance: We can run vacuum full, reindex, cluster on each month table without locking all other data
For the correct use of table inheritance as a performance booster, look at the postgresql manual.
You need to set CHECK constraints on each table to tell the database, on which key your data gets split (partitioned).
I make heavy use of table inheritance, especially when it comes to storing log data grouped by month. Hint: If you store data, which will never change (log data), create or indexes with CREATE INDEX ON () WITH(fillfactor=100); This means no space for updates will be reserved in the index - index is smaller on disk.
UPDATE:
fillfactor default is 100, from http://www.postgresql.org/docs/9.1/static/sql-createtable.html:
The fillfactor for a table is a percentage between 10 and 100. 100 (complete packing) is the default

"Table inheritance" means something different than "class inheritance" and they serve different purposes.
Postgres is all about data definitions. Sometimes really complex data definitions. OOP (in the common Java-colored sense of things) is about subordinating behaviors to data definitions in a single atomic structure. The purpose and meaning of the word "inheritance" is significantly different here.
In OOP land I might define (being very loose with syntax and semantics here):
import life
class Animal(life.Autonomous):
metabolism = biofunc(alive=True)
def die(self):
self.metabolism = False
class Mammal(Animal):
hair_color = color(foo=bar)
def gray(self, mate):
self.hair_color = age_effect('hair', self.age)
class Human(Mammal):
alcoholic = vice_boolean(baz=balls)
The tables for this might look like:
CREATE TABLE animal
(name varchar(20) PRIMARY KEY,
metabolism boolean NOT NULL);
CREATE TABLE mammal
(hair_color varchar(20) REFERENCES hair_color(code) NOT NULL,
PRIMARY KEY (name))
INHERITS (animal);
CREATE TABLE human
(alcoholic boolean NOT NULL,
FOREIGN KEY (hair_color) REFERENCES hair_color(code),
PRIMARY KEY (name))
INHERITS (mammal);
But where are the behaviors? They don't fit anywhere. This is not the purpose of "objects" as they are discussed in the database world, because databases are concerned with data, not procedural code. You could write functions in the database to do calculations for you (often a very good idea, but not really something that fits this case) but functions are not the same thing as methods -- methods as understood in the form of OOP you are talking about are deliberately less flexible.
There is one more thing to point out about inheritance as a schematic device: As of Postgres 9.2 there is no way to reference a foreign key constraint across all of the partitions/table family members at once. You can write checks to do this or get around it another way, but its not a built-in feature (it comes down to issues with complex indexing, really, and nobody has written the bits necessary to make that automatic). Instead of using table inheritance for this purpose, often a better match in the database for object inheritance is to make schematic extensions to tables. Something like this:
CREATE TABLE animal
(name varchar(20) PRIMARY KEY,
ilk varchar(20) REFERENCES animal_ilk NOT NULL,
metabolism boolean NOT NULL);
CREATE TABLE mammal
(animal varchar(20) REFERENCES animal PRIMARY KEY,
ilk varchar(20) REFERENCES mammal_ilk NOT NULL,
hair_color varchar(20) REFERENCES hair_color(code) NOT NULL);
CREATE TABLE human
(mammal varchar(20) REFERENCES mammal PRIMARY KEY,
alcoholic boolean NOT NULL);
Now we have a canonical reference for the instance of the animal that we can reliably use as a foreign key reference, and we have an "ilk" column that references a table of xxx_ilk definitions which points to the "next" table of extended data (or indicates there is none if the ilk is the generic type itself). Writing table functions, views, etc. against this sort of schema is so easy that most ORM frameworks do exactly this sort of thing in the background when you resort to OOP-style class inheritance to create families of object types.

Inheritance can be used in an OOP paradigm as long as you do not need to create foreign keys on the parent table. By example, if you have an abstract class vehicle stored in a vehicle table and a table car that inherits from it, all cars will be visible in the vehicle table but a foreign key from a driver table on the vehicle table won't match theses records.
Inheritance can be also used as a partitionning tool. This is especially usefull when you have tables meant to be growing forever (log tables etc).

Main use of inheritance is for partitioning, but sometimes it's useful in other situations. In my database there are many tables differing only in a foreign key. My "abstract class" table "image" contains an "ID" (primary key for it must be in every table) and PostGIS 2.0 raster. Inherited tables such as "site_map" or "artifact_drawing" have a foreign key column ("site_name" text column for "site_map", "artifact_id" integer column for the "artifact_drawing" table etc.) and primary and foreign key constraints; the rest is inherited from the the "image" table. I suspect I might have to add a "description" column to all the image tables in the future, so this might save me quite a lot of work without making real issues (well, the database might run little slower).
EDIT: another good use: with two-table handling of unregistered users, other RDBMSs have problems with handling the two tables, but in PostgreSQL it is easy - just add ONLY when you are not interrested in data in the inherited "unregistered user" table.

The only experience I have with inherited tables is in partitioning. It works fine, but it's not the most sophisticated and easy to use part of PostgreSQL.
Last week we were looking the same OOP issue, but we had too many problems with Hibernate - we didn't like our setup, so we didn't use inheritance in PostgreSQL.

I use inheritance when I have more than 1 on 1 relationships between tables.
Example: suppose you want to store object map locations with attributes x, y, rotation, scale.
Now suppose you have several different kinds of objects to display on the map and each object has its own map location parameters, and map parameters are never reused.
In these cases table inheritance would be quite useful to avoid having to maintain unnormalised tables or having to create location id’s and cross referencing it to other tables.

I tried some operations on it, I will not point out if is there any actual use case for database inheritance, but I will give you some detail for making your decision. Here is an example of PostgresQL: https://www.postgresql.org/docs/15/tutorial-inheritance.html
You can try below SQL script.
CREATE TABLE IF NOT EXISTS cities (
name text,
population real,
elevation int -- (in ft)
);
CREATE TABLE IF NOT EXISTS capitals (
state char(2) UNIQUE NOT NULL
) INHERITS (cities);
ALTER TABLE cities
ADD test_id varchar(255); -- Both table would contains test col
DROP TABLE cities; -- Cannot drop because capitals depends on it
ALTER TABLE cities
ADD CONSTRAINT fk_test FOREIGN KEY (test_id) REFERENCES sometable (id);
As you can see my comments, let me summarize:
When you add/delete/update fields -> the inheritance table would also be affected.
Cannot drop the parent table.
Foreign keys would not be inherited.
From my perspective, in growing applications, we cannot easily predict the changes in the future, for me I would avoid applying this to early database developing.
When features are stable as well and we want to create some database model which much likely the same as the existing one, we can consider that use case.

Use it as little as possible. And that usually means never, it boiling down to a way of creating structures that violate the relational model, for instance by breaking the information principle and by creating bags instead of relations.
Instead, use table partitioning combined with proper relational modelling, including further normal forms.