When to use inherited tables in PostgreSQL? - postgresql

In which situations you should use inherited tables? I tried to use them very briefly and inheritance didn't seem like in OOP world.
I thought it worked like this:
Table users has all fields required for all user levels. Tables like moderators, admins, bloggers, etc but fields are not checked from parent. For example users has email field and inherited bloggers has it now too but it's not unique for both users and bloggers at the same time. ie. same as I add email field to both tables.
The only usage I could think of is fields that are usually used, like row_is_deleted, created_at, modified_at. Is this the only usage for inherited tables?

There are some major reasons for using table inheritance in postgres.
Let's say, we have some tables needed for statistics, which are created and filled each month:
statistics
- statistics_2010_04 (inherits statistics)
- statistics_2010_05 (inherits statistics)
In this sample, we have 2.000.000 rows in each table. Each table has a CHECK constraint to make sure only data for the matching month gets stored in it.
So what makes the inheritance a cool feature - why is it cool to split the data?
PERFORMANCE: When selecting data, we SELECT * FROM statistics WHERE date BETWEEN x and Y, and Postgres only uses the tables, where it makes sense. Eg. SELECT * FROM statistics WHERE date BETWEEN '2010-04-01' AND '2010-04-15' only scans the table statistics_2010_04, all other tables won't get touched - fast!
Index size: We have no big fat table with a big fat index on column date. We have small tables per month, with small indexes - faster reads.
Maintenance: We can run vacuum full, reindex, cluster on each month table without locking all other data
For the correct use of table inheritance as a performance booster, look at the postgresql manual.
You need to set CHECK constraints on each table to tell the database, on which key your data gets split (partitioned).
I make heavy use of table inheritance, especially when it comes to storing log data grouped by month. Hint: If you store data, which will never change (log data), create or indexes with CREATE INDEX ON () WITH(fillfactor=100); This means no space for updates will be reserved in the index - index is smaller on disk.
UPDATE:
fillfactor default is 100, from http://www.postgresql.org/docs/9.1/static/sql-createtable.html:
The fillfactor for a table is a percentage between 10 and 100. 100 (complete packing) is the default

"Table inheritance" means something different than "class inheritance" and they serve different purposes.
Postgres is all about data definitions. Sometimes really complex data definitions. OOP (in the common Java-colored sense of things) is about subordinating behaviors to data definitions in a single atomic structure. The purpose and meaning of the word "inheritance" is significantly different here.
In OOP land I might define (being very loose with syntax and semantics here):
import life
class Animal(life.Autonomous):
metabolism = biofunc(alive=True)
def die(self):
self.metabolism = False
class Mammal(Animal):
hair_color = color(foo=bar)
def gray(self, mate):
self.hair_color = age_effect('hair', self.age)
class Human(Mammal):
alcoholic = vice_boolean(baz=balls)
The tables for this might look like:
CREATE TABLE animal
(name varchar(20) PRIMARY KEY,
metabolism boolean NOT NULL);
CREATE TABLE mammal
(hair_color varchar(20) REFERENCES hair_color(code) NOT NULL,
PRIMARY KEY (name))
INHERITS (animal);
CREATE TABLE human
(alcoholic boolean NOT NULL,
FOREIGN KEY (hair_color) REFERENCES hair_color(code),
PRIMARY KEY (name))
INHERITS (mammal);
But where are the behaviors? They don't fit anywhere. This is not the purpose of "objects" as they are discussed in the database world, because databases are concerned with data, not procedural code. You could write functions in the database to do calculations for you (often a very good idea, but not really something that fits this case) but functions are not the same thing as methods -- methods as understood in the form of OOP you are talking about are deliberately less flexible.
There is one more thing to point out about inheritance as a schematic device: As of Postgres 9.2 there is no way to reference a foreign key constraint across all of the partitions/table family members at once. You can write checks to do this or get around it another way, but its not a built-in feature (it comes down to issues with complex indexing, really, and nobody has written the bits necessary to make that automatic). Instead of using table inheritance for this purpose, often a better match in the database for object inheritance is to make schematic extensions to tables. Something like this:
CREATE TABLE animal
(name varchar(20) PRIMARY KEY,
ilk varchar(20) REFERENCES animal_ilk NOT NULL,
metabolism boolean NOT NULL);
CREATE TABLE mammal
(animal varchar(20) REFERENCES animal PRIMARY KEY,
ilk varchar(20) REFERENCES mammal_ilk NOT NULL,
hair_color varchar(20) REFERENCES hair_color(code) NOT NULL);
CREATE TABLE human
(mammal varchar(20) REFERENCES mammal PRIMARY KEY,
alcoholic boolean NOT NULL);
Now we have a canonical reference for the instance of the animal that we can reliably use as a foreign key reference, and we have an "ilk" column that references a table of xxx_ilk definitions which points to the "next" table of extended data (or indicates there is none if the ilk is the generic type itself). Writing table functions, views, etc. against this sort of schema is so easy that most ORM frameworks do exactly this sort of thing in the background when you resort to OOP-style class inheritance to create families of object types.

Inheritance can be used in an OOP paradigm as long as you do not need to create foreign keys on the parent table. By example, if you have an abstract class vehicle stored in a vehicle table and a table car that inherits from it, all cars will be visible in the vehicle table but a foreign key from a driver table on the vehicle table won't match theses records.
Inheritance can be also used as a partitionning tool. This is especially usefull when you have tables meant to be growing forever (log tables etc).

Main use of inheritance is for partitioning, but sometimes it's useful in other situations. In my database there are many tables differing only in a foreign key. My "abstract class" table "image" contains an "ID" (primary key for it must be in every table) and PostGIS 2.0 raster. Inherited tables such as "site_map" or "artifact_drawing" have a foreign key column ("site_name" text column for "site_map", "artifact_id" integer column for the "artifact_drawing" table etc.) and primary and foreign key constraints; the rest is inherited from the the "image" table. I suspect I might have to add a "description" column to all the image tables in the future, so this might save me quite a lot of work without making real issues (well, the database might run little slower).
EDIT: another good use: with two-table handling of unregistered users, other RDBMSs have problems with handling the two tables, but in PostgreSQL it is easy - just add ONLY when you are not interrested in data in the inherited "unregistered user" table.

The only experience I have with inherited tables is in partitioning. It works fine, but it's not the most sophisticated and easy to use part of PostgreSQL.
Last week we were looking the same OOP issue, but we had too many problems with Hibernate - we didn't like our setup, so we didn't use inheritance in PostgreSQL.

I use inheritance when I have more than 1 on 1 relationships between tables.
Example: suppose you want to store object map locations with attributes x, y, rotation, scale.
Now suppose you have several different kinds of objects to display on the map and each object has its own map location parameters, and map parameters are never reused.
In these cases table inheritance would be quite useful to avoid having to maintain unnormalised tables or having to create location id’s and cross referencing it to other tables.

I tried some operations on it, I will not point out if is there any actual use case for database inheritance, but I will give you some detail for making your decision. Here is an example of PostgresQL: https://www.postgresql.org/docs/15/tutorial-inheritance.html
You can try below SQL script.
CREATE TABLE IF NOT EXISTS cities (
name text,
population real,
elevation int -- (in ft)
);
CREATE TABLE IF NOT EXISTS capitals (
state char(2) UNIQUE NOT NULL
) INHERITS (cities);
ALTER TABLE cities
ADD test_id varchar(255); -- Both table would contains test col
DROP TABLE cities; -- Cannot drop because capitals depends on it
ALTER TABLE cities
ADD CONSTRAINT fk_test FOREIGN KEY (test_id) REFERENCES sometable (id);
As you can see my comments, let me summarize:
When you add/delete/update fields -> the inheritance table would also be affected.
Cannot drop the parent table.
Foreign keys would not be inherited.
From my perspective, in growing applications, we cannot easily predict the changes in the future, for me I would avoid applying this to early database developing.
When features are stable as well and we want to create some database model which much likely the same as the existing one, we can consider that use case.

Use it as little as possible. And that usually means never, it boiling down to a way of creating structures that violate the relational model, for instance by breaking the information principle and by creating bags instead of relations.
Instead, use table partitioning combined with proper relational modelling, including further normal forms.

Related

Recursively update rows based on foreign keys

I have a PostgreSQL 12 database with a largely hierarchical design. The tables are related something along these lines: study -> group -> subject -> sample -> assay -> measurements. One study can have several groups, which in turn can have several subjects and so on.
CREATE TABLE studies(study INTEGER PRIMARY KEY, is_secret BOOLEAN);
CREATE TABLE groups("group" INTEGER PRIMARY KEY, study INTEGER REFERENCES studies(study) ON DELETE CASCADE, is_secret BOOLEAN;
CREATE TABLE subjects(subject INTEGER PRIMARY KEY, group INTEGER REFERENCES groups("group") ON DELETE CASCADE, is_secret BOOLEAN;
...
Now I would like to set is_secret for a given study in the studies table and let is cascade to all dependent entries. Exactly the same as with ON DELETE CASCADE, but setting this column value instead. Is there a way to do that? Thanks!
EDIT: I forgot an important piece of evidence. This has to be done without explicitly stating the relationships between tables. It should "follow the primary keys". Also, it should be possible to trigger also when is_secret is already TRUE in studies (to update when there is new data).
EDIT2: It should be possible to set this property not only for studies, but also for individual groups, subjects and so on (and let is cascade). It's the general mimic-on-*-cascade behavior that I'm after. Maybe one could do a delete, somehow get the affected rows, rollback and then set is_secret for those rows?
The best solution is to stay normalized and not store is_secret on each dependent object.
Rather, store it only in studies, and whenever you want to know if a dependent table is secret, you join it with its study. Such joins will be executed as efficient nested loop joins and won't slow you down considerably.
If you need that last little bit of query performance so badly that you need to denormalize in the way you suggested, define a trigger on studies that modifies all dependent objects. In that case, you will trade query performance for data modification performance.

In PostgreSQL 12, Does creating partitioning via inheritance improve query performance if queries are contained with a child table?

Using PostgreSQL 12, I'd like to take advantage of partitioning to 1: Aid in query performance, 2: Allow removing historic data more easily to keep mitigate database growth.
Unfortunately, declarative partitioning requires the key to be part of the PKs. A temporal field as primary key doesn't work well for my model -- so I'm exploring using inheritance instead (as per the docs).
My question is whether using this approach will similarly isolate the amount of rows that my SELECT statement will be exposed to if an item in my WHERE statement limits the results to a single child table.
eg.
Books => BooksJan2020, BooksFeb2020, BooksMar2020.
SELECT * FROM Books WHERE created < '01 20 2020' and author LIKE 'John%';
In declarative partitioning, I would expect the 'LIKE' statement to only be exposed to rows within the January table. Can I expect the same with inheritance? When studying how to create inherited tables, I don't see a mechanism that would tell the planner which child table to pull from.
SteveJ
You can do that by creating the appropriate check constraints on the inheritance children and leaving constraint_exclusion at its default value on.
But I want to dissuade you from using anything but declarative partitioning in v12. Partitioning by inheritance hurts. Besides, you cannot get a true primary key on anything that does not contain the partitioning key that way: even though you have a primary key on all partitions, nothing can prevent you from inserting the same key in different partitions.
My advice is to go with a primary key on (id, created). True, that does not guarantee global uniqueness of id, but it goes a long way towards that goal. With values generated from a single sequence, the risk of duplicates is marginal.
The remaining down side of a composite primary key is that you have to include both columns into any table that has a foreign key constraint to the partitioned table, but I'd say that is the price you pay for the advantages of partitioning. Besides, with inheritance partitioning you couldn't have foreign keys pointing to the partitioned table at all.

Database design with composite types in postgresql

How to refer another table built from composite type in my table.
I am trying to setup a database to understand postgresql and its Object oriented features.
The statement is as follows : There are multiple companies which can have board members.
Each company can own another company or a person can own that company too.
This is the type of database design I am looking for.
create type companyType(
name: VARCHAR,
boardMembers : personType[],
owns: companyType[]
)
create type personType(
name: VARCHAR,
owns: companyType[]
)
Create table company as companyType
Create table person as personType
I understand that I cannot self reference the companyType so I will probably move this another table.
My question is, when I am trying to insert into say company type, how do i insert list of person table objects as foreign key ?
Would making a column 'id' in each table and giving it type SERIAL work to use it as a foreign key?
That is not a relational database design, and you won't get happy with it.
Map each object to a table. The table columns are the attributes of the object. Add an artificial primary key (id bigint GENERATED ALWAYS AS IDENTITY). Don't use composite types or arrays.
Relationships are expressed like this:
If the relationship is one-to-many, add a foreign key to the "many' side.
If the relationship is many-to-many, add a "junction table" that has foreign keys to both tables. The primary key is the union of these foreign keys.
Normalize the resulting data model to remove redundancies.
Sprinkle with unique and check constraints as appropriate.
That way your queries will become simple, and you can use your database's features to make your life easier.

Many-to-Many in Postgres?

I went with PostgreSQL because it is an ORDMBS rather than a standard relational DBMS. I have a class/object (below) that I would like to implement into the database.
class User{
int id;
String name;
ArrayList<User> friends;
}
Now, a user has many friends, so, logically, the table should be declared like so:
CREATE TABLE user_table(
id INT,
name TEXT,
friends TYPEOF(user_table)[]
)
However, to my knowledge, it is not possible to use a row of a table as a type (-10 points for postgreSQL), so, instead, my array of friends is stored as integers:
CREATE TABLE user_table(
id INT,
name TEXT,
friends INT[]
)
This is an issue because elements of an array cannot reference - only the array itself can. Added to this, there seems to be no way to import the whole user (that is to say, the user and all the user's friends) without doing multiple queries.
Am I using postgreSQL wrong? It seems to me that the only efficient way to use it is by using a relational approach.
I want a cleaner object-oriented approach similar to that of Java.
I'm afraid you are indeed using PostgreSQL wrong, and possibly misunderstanding the purpose of Object-relational databases as opposed to classic relational databases. Both classes of database are still inherently relational, but the former provides allowances for inheritance and user-defined types that the latter does not.
This answer to one of your previous questions provides you with some great pointers to achieve what you're trying to do using the Postgres pattern.
Well, first off PostgreSQL absolutely supports arrays of complex types like you describe (although I don't think it has a TYPEOF operator). How would the declaration you describe work, though? You are trying to use the table type in the declaration of the table. If what you want is a composite type in an array (and I'm not really sure that it is) you would declare this in two steps:
CREATE TYPE ima_type AS ( some_id integer, some_val text);
CREATE TABLE ima_table
( some_other_id serial NOT NULL
, friendz ima_type []
)
;
That runs fine. You can also create arrays of table types, because every table definition is a type definition in Postgres.
However, in a relational database, a more traditional model would use two tables:
CREATE TABLE persons
( person_id serial NOT NULL PRIMARY KEY
, person_name text NOT NULL
)
;
CREATE TABLE friend_lookup
( person_id integer FOREIGN KEY REFERENCES persons
, friend_id integer FOREIGN KEY REFERENCES persons(person_id)
, CONSTRAINT uq_person_friend UNIQUE (person_id, friend_id)
)
;
Ignoring the fact that the persons table has absolutely no way to prevent duplicate persons (what about misspellings, middle initials, spacing, honorifics, etc?; also two different people can have the same name), this will do what you want and allow for a simple query that lists all friends.

postgresql hstore key/value vs traditional SQL performance

I need to develop a key/value backend, something like this:
Table T1 id-PK, Key - string, Value - string
INSERT into T1('String1', 'Value1')
INSERT INTO T1('String1', 'Value2')
Table T2 id-PK2, id2->external key to id
some other data in T2, which references data in T1 (like users which have those K/V etc)
I heard about PostgreSQL hstore with GIN/GIST. What is better (performance-wise)?
Doing this the traditional way with SQL joins and having separate columns(Key/Value) ?
Does PostgreSQL hstore perform better in this case?
The format of the data should be any key=>any value.
I also want to do text matching e.g. partially search for (LIKE % in SQL or using the hstore equivalent).
I plan to have around 1M-2M entries in it and probably scale at some point.
What do you recommend ? Going the SQL traditional way/PostgreSQL hstore or any other distributed key/value store with persistence?
If it helps, my server is a VPS with 1-2GB RAM, so not a pretty good hardware. I was also thinking to have a cache layer on top of this, but I think it rather complicates the problem. I just want good performance for 2M entries. Updates will be done often but searches even more often.
Thanks.
Your question is unclear because your not clear about your objective.
The key here is the index (pun intended) - if your dealing with a large amount of keys you want to be able to retrieve them with a the least lookups and without pulling up unrelated data.
Short answer is you probably don't want to use hstore, but lets look into more detail...
Does each id have many key/value pairs (hundreds+)? Don't use hstore.
Will any of your values contain large blocks of text (4kb+)? Don't use hstore.
Do you want to be able to search by keys in wildcard expressions? Don't use hstore.
Do you want to do complex joins/aggregation/reports? Don't use hstore.
Will you update the value for a single key? Don't use hstore.
Multiple keys with the same name under an id? Can't use hstore.
So what's the use of hstore? Well, one good scenario would be if you wanted to hold key/value pairs for an external application where you know you always want to retrive all key/values and will always save the data back as a block (ie, it's never edited in-place). At the same time you do want some flexibility to be able to search this data - albiet very simply - rather than storing it in say a block of XML or JSON. In this case since the number of key/value pairs are small you save on space because your compressing several tuples into one hstore.
Consider this as your table:
CREATE TABLE kv (
id /* SOME TYPE */ PRIMARY KEY,
key_name TEXT NOT NULL,
key_value TEXT,
UNIQUE(id, key_name)
);
I think the design is poorly normalized. Try something more like this:
CREATE TABLE t1
(
t1_id serial PRIMARY KEY,
<other data which depends on t1_id and nothing else>,
-- possibly an hstore, but maybe better as a separate table
t1_props hstore
);
-- if properties are done as a separate table:
CREATE TABLE t1_properties
(
t1_id int NOT NULL REFERENCES t1,
key_name text NOT NULL,
key_value text,
PRIMARY KEY (t1_id, key_name)
);
If the properties are small and you don't need to use them heavily in joins or with fancy selection criteria, and hstore may suffice. Elliot laid out some sensible things to consider in that regard.
Your reference to users suggests that this is incomplete, but you didn't really give enough information to suggest where those belong. You might get by with an array in t1, or you might be better off with a separate table.