PostgreSQL: How to revalidate CHECKs - postgresql

My database has the following structure:
CREATE TYPE instrument_type AS ENUM (
'Stock',
...
'Currency',
...
);
CREATE FUNCTION get_instrument_type(instrument_id bigint) RETURNS instrument_type
LANGUAGE plpgsql STABLE RETURNS NULL ON NULL INPUT
AS $$
BEGIN
RETURN (SELECT instr_type FROM instruments WHERE id = instrument_id);
END
$$;
CREATE TABLE instruments (
id bigserial PRIMARY KEY,
instr_type instrument_type NOT NULL,
...
);
CREATE TABLE countries_currencies (
...
curr bigint NOT NULL
REFERENCES instruments (id)
ON UPDATE CASCADE ON DELETE CASCADE
CHECK (get_instrument_type(curr) = 'Currency'),
...
);
As you can see, I use one common table for instruments. There are a lot of foreign keys referencing to that table. But some tables like countries_currencies require that referenced item is 'Currency'. Since I can't use subqueries in CHECK constraints, I have to use function.
One day it could happen that one bad man will change instrument_type from 'Currency' to something else. If there is a row in table countries_currencies, referencing to modified instrument, CHECK will become invalid for this row. But CHECK will be applied to new rows, not for already existing.
Is there any standard way to revalidate CHECKs? I want to run such procedure as a part of general data integrity test.
P.S. I know, I could write trigger on table instruments and forbid change if something could become broken. But it requires assurance that I check all referencing tables and their constraints, so it is error prone anyway.

You could simply update all rows in place to trigger the CHECK:
UPDATE countries_currencies SET curr = curr;

Related

How can a relational database with foreign key constraints ingest data that may be in the wrong order?

The database is ingesting data from a stream, and all the rows needed to satisfy a foreign key constraint may be late or never arrive.
This can likely be accomplished by using another datastore, one without foreign key constraints, and then when all the needed data is available, read into the database which has fk constraints. However, this adds complexity and I'd like to avoid it.
We're working on a solution that creates "placeholder" rows to point the foreign key to. When the real data comes in, the placeholder is replaced with real values. Again, this adds complexity, but it's the best solution we've found so far.
How do people typically solve this problem?
Edit: Some sample data which might help explain the problem:
Let's say we have these tables:
CREATE TABLE order (
id INTEGER NOT NULL,
order_number,
PRIMARY KEY (id),
UNIQUE (order_number)
);
CREATE TABLE line_item (
id INTEGER NOT NULL,
order_number INTEGER REFERENCES order(order_number),
PRIMARY KEY (id)
);
If I insert an order first, not a problem! But let's say I try:
INSERT INTO line_item (order_number) values (123) before order 123 was inserted. This will fail the fk constraint of course. But this might be the order I get the data, since it's reading from a stream that is collecting this data from multiple sources.
Also, to address #philpxy's question, I didn't really find much on this. One thing that was mentioned was deferred constraints. This is a mechanism that waits to do the fk constraints at the end of a transaction. I don't think it's possible to do that in my case however, since these insert statements will be run at random times whenever the data is received.
You have a business workflow problem, because line items of individual orders are coming in before the orders themselves have come in. One workaround, perhaps not ideal, would be to create a before insert trigger which checks, for every incoming insert to the line_item table, whether that order already exists in the order table. If not, then it will first insert the order record before trying the insert on line_item.
CREATE OR REPLACE FUNCTION "public"."fn_insert_order" () RETURNS trigger AS $$
BEGIN
INSERT INTO "order" (order_number)
SELECT NEW.order_number
WHERE NOT EXISTS (SELECT 1 FROM "order" WHERE order_number = NEW.order_number);
RETURN NEW;
END
$$
LANGUAGE 'plpgsql'
# trigger
CREATE TRIGGER "trigger_insert_order"
BEFORE INSERT ON line_item FOR EACH ROW
EXECUTE PROCEDURE fn_insert_order()
Note: I am assuming that the id column of the order table in fact is auto increment, in which case Postgres would automatically assign a value to it when inserting as above. Most likely, this is what you want, as having two id columns which both need to be manually assigned does not make much sense.
You could accomplish that with a BEFORE INSERT trigger on line_item.
In that trigger you query order if a matching item exists, and if not, you insert a dummy row.
That will allow the INSERT to succeed, at the cost of some performance.
To insert rows into order, use
INSERT INTO order ...
ON CONFLICT ON (order_number) DO UPDATE SET
id = EXCLUDED.id;
Updating a primary key is problematic and may lead to conflicts. One way you could get around that is if you use negative ids for artificially generated orders (assuming that the real ids are positive). If you have any references to that primary key, you'd have to define the constraint with ON UPDATE CASCADE.

Is it possible to refer a column in a view as foreign key (PostgreSQL 9.4)?

I know in older versions it was impossible, is it the same with version 9.4?
I'm trying to do something like this:
CREATE VIEW products AS
SELECT d1.id AS id, d1.price AS pr FROM dup.freshProducts AS d1
UNION
SELECT d2.id AS id, d2.price AS pr FROM dup.cannedProducts AS d2;
CREATE TABLE orderLines
(
line_id integer PRIMARY KEY,
product_no integer REFERENCES productView.id
);
I'm trying to implement an inheritance relationship where freshProducts and cannedProducts both inherit from products. I implemented it using two different tables and I created a view products that has only the common properties between freshProducts and cannedProducts. In addition, each row in orderLines has a relationship with a product, either a freshProduct or a cannedProduct. See image for clarification.
If referencing to a view is yet not possible, which solution do you think is best? I've thought of eihter a materialized view or implementing the restriction using triggers. Could you recommend any good example of such triggers to use as a basis?
Thank-you very much!
Referencing a (materialized) view wouldn't work and a trigger might look like this:
CREATE OR REPLACE FUNCTION reject_not_existing_id()
RETURNS "trigger" AS
$BODY$
BEGIN
IF NEW.product_no NOT IN (SELECT id FROM dup.freshProducts UNION SELECT id FROM dup.cannedProducts) THEN
RAISE EXCEPTION 'The product id % does not exist', NEW.product_no;
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
CREATE TRIGGER tr_before_insert_or_update
BEFORE INSERT OR UPDATE OF product_no
ON orderLines
FOR EACH ROW
EXECUTE PROCEDURE reject_not_existing_id();
(See also http://www.tek-tips.com/viewthread.cfm?qid=1116256)
A materialized view might look like a good approach but fails for two reasons: Like a view you simply can't reference it, because it is no table (go ahead and try). Assuming you could, there would still be the problem of preventing two equal ids in freshProducts and cannedProducts. Yes you can define an UNIQUE INDEX on a materialized view, but how to make sure the same id isn't used both in fresh an canned in the first place?
That's something you still have to solve if using the trigger in orderLines.
That brings me to suggest to rethink your model. 'Fresh' and 'canned' might as well be values of an attribute of a single table products, hence making all the trouble superfluous. If fresh and canned product significantly differ in (the number of) their attributes (can't think of any other reason to create two different tables) then reference the product id in two other tables. Like
CREATE TABLE products
(
id ... PRIMARY KEY
, fresh_or_canned ...
, price ...
, another_common_attribute_1 ...
, ...
, another_common_attribute_n ...
);
CREATE TABLE canned_specific_data
(
canned_id ... REFERENCES products (id)
, type_of_can ...
, ...
, another_attribute_that_does_not_apply_to_fresh ...
);
CREATE TABLE fresh_specific_data
(
fresh_id ... REFERENCES products (id)
, date_of_harvest ...
, ...
, another_attribute_that_does_not_apply_to_canned ...
);
The simple answer to preventing ID duplication is to simply use the same sequence as the default value for IDs in both freshProducts and cannedProducts.
Now, there comes the question, why do you need a foreign key at all? Typically this is to prevent deletion of data that another table depends upon, however, you can write a trigger to prevent that. Alsowise, you have updating that value to something that doesn't exist in the keyed-to table, but you can write a trigger for that too.
So basically you can write triggers to implement all the desired functionality of a foreign key without actually needing a foreign key, with the added benefit that they WILL work with such a view.

postgresql - designing a tree hierarchy with mixed node types (inheritance does not help!)

I have a question about implementing inheritance in postgresql(9.1).
The purpose is to build a geo-hierarchy model, where countries, states and continents can be mixed up to create "regions". And then these
regions too can be mixed up with the countries, etc. to create a truly awesome region-hierarchy
So in my logical model, everything is a type of "place". A region-tree can be constructed by specifying edgewise using the two "places". The design is as below, and easy to manage in the Java layer.
create table place_t (
place_id serial primary key,
place_type varchar(10)
);
create table country_t (
short_name varchar(30) unique,
name varchar(255) null
) inherits(place_t);
create table region_t(
short_name varchar(30),
hierarchy_id integer, -- references hierarchy_t(hierarchy_id)
unique(short_name) -- (short_name,hierarchy_id)
) inherits(place_t);
create table region_hier_t(
parent integer references place_t(place_id), -- would prefer FK region_t(place_id)
child integer references place_t(place_id),
primary key(parent,child)
);
insert into region_t values(DEFAULT, 'region', 'NA', 'north american ops');
insert into region_t values(DEFAULT, 'region', 'EMEA', 'europe and middle east');
insert into country_t values(DEFAULT, 'country', 'US', 'USD', 'united states');
insert into country_t values(DEFAULT, 'country', 'CN', 'CND', 'canada');
So far so good. But the following fails:
insert into region_hier_t
select p.place_id, c.place_id
from region_t as p, country_t as c
where p.short_name = 'NA' and c.short_name = 'US';
The reason is that the first 4 inserts did not create any row in "place_t". RTFM! Postgres docs actually mention this.
The question is - is there a workaround? Via insert triggers on region_t and country_t to implement my own "inheritance" is the only thing I could think of.
A second question is - is there a better design for such a mixed-node tree structure?
For certain reasons I do not want to rely too much on postgres-contrib features. Perhaps that's very silly and please feel free to chime in, but gently (and only after answering the other question)!
Thanks
References on parent and child column in region_hier_t table are wrong, because you cannot insert a key from country_t if your reference calls another table (child integer references place_t(place_id)); You can either drop them or add new ones.
So let's take the second option and add an unique constraint matching given keys for referenced tables region_t and country_t:
ALTER TABLE region_t
ADD CONSTRAINT pk_region_t PRIMARY KEY(place_id );
ALTER TABLE country_t
ADD CONSTRAINT pk_country_t PRIMARY KEY(place_id );
The correct CREATE statement for region_hier_t is:
create table region_hier_t(
parent integer references region_t(place_id),
child integer references country_t(place_id),
primary key(parent,child)
);
And finally you can run your INSERT.
So, as you see there is many improvements for you to do. Maybe you should reconsider your design. Take a look at this answer: How to store postal addresses and political divisions in a normalized way? It's much simpler than your solution and easier to maintain.
But if you wanna stay by your solution don't forget to set primary keys on child tables(as shown above). Only check constraints and not-null constraints are inherited by its children and you haven't done it already.
I see that other of your insert don't work correctly:
insert into region_t values(DEFAULT, 'region', 'NA', 'north american ops');
ERROR: invalid input syntax for integer: "north american ops"
LINE 1: ...ert into region_t values(DEFAULT, 'region', 'NA', 'north ame...
So there is problem with column assignment as well.
So it turns out that inheritance in PostgreSQL is somewhat different from that used in typical OOP languages. In particular, the "superclass" table is not populated automatically. If I had to use my own triggers to do that, I didn't have a use case left for the inheritance structure.
So I abandoned Postgresql inheritance and created my own "place_t" table. And "country_t", "state_t", "county_t" and "region_t" children tables, linked to parent "place_t" through "place_id".
On these children tables, I created an before insert/update row level trigger to ensure that "place_id" refers to a valid row in "place_t" and the reference is not changed later. IOW, "place_id" in children tables should behave like write-once-read-many.
Now, I can insert the world geo. Also, define a new "region". I created a "region_composition_t" to record the edges of a regional hierarchy, the parent being a reference to "region_t" and child being a reference to "place_t".
So far so good. The challenge now is how to suppress any update/delete cascading effects.
The workaround is to get rid of your foreign keys to place_t and do instead:
CREATE FUNCTION place_t_exists(id int)
RETURNS bool LANGUAGE SQL AS
$$
SELECT count(*) = 1 FROM place_t;
$$;
CREATE FUNCTION fkey_place_t() RETURNS TRIGGER
LANGUAGE PLPGSQL AS $$
BEGIN;
IF place_t_exists(TG_ARGV[1]) THEN RETURN NEW
ELSE RAISE EXCEPTION 'place_t does not exist';
END IF;
END;
$$;
You also need something on the child tables to restrain when the hierarchy node exists:
CREATE FUNCTION hierarchy_exists(id int) RETURNS BOOL LANGUAGE SQL AS
$$
SELECT COUNT(*) > 0 FROM region_heir_t WHERE parent = $1 or child = $1;
$$;
CREATE OR REPLACE FUNCTION fkey_hierarchy_trigger() RETURNS trigger LANGUAGE PLPGSQL AS
$$
BEGIN
IF hierarchy_exists(old.place_id) THEN RAISE EXCEPTION 'Hierarchy node still exists';
ELSE RETURN OLD;
END;
$$;
Then you can create your triggers:
CREATE CONSTRAINT TRIGGER fkey_place_parent AFTER INSERT OR UPDATE TO region_hier_t
FOR EACH ROW EXECUTE PROCEDURE fkey_place_t(new.parent);
CREATE CONSTRAINT TRIGGER fkey_place_child AFTER INSERT OR UPDATE TO region_hier_t
FOR EACH ROW EXECUTE PROCEDURE fkey_place_t(new.child);
And then for each of the place_t child tables:
CREATE CONSTRAINT TRIGGER fkey_hier_t TO [child_table]
FOR EACH ROW EXECUTE PROCEDURE fkey_hierarchy_trigger();
This solution may not be worth it, but it is worth knowing how to do it if you need to.

postgres autoincrement not updated on explicit id inserts

I have the following table in postgres:
CREATE TABLE "test" (
"id" serial NOT NULL PRIMARY KEY,
"value" text
)
I am doing following insertions:
insert into test (id, value) values (1, 'alpha')
insert into test (id, value) values (2, 'beta')
insert into test (value) values ('gamma')
In the first 2 inserts I am explicitly mentioning the id. However the table's auto increment pointer is not updated in this case. Hence in the 3rd insert I get the error:
ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (id)=(1) already exists.
I never faced this problem in Mysql in both MyISAM and INNODB engines. Explicit or not, mysql always update autoincrement pointer based on the max row id.
What is the workaround for this problem in postgres? I need it because I want a tighter control for some ids in my table.
UPDATE:
I need it because for some values I need to have a fixed id. For other new entries I dont mind creating new ones.
I think it may be possible by manually incrementing the nextval pointer to max(id) + 1 whenever I am explicitly inserting the ids. But I am not sure how to do that.
That's how it's supposed to work - next_val('test_id_seq') is only called when the system needs a value for this column and you have not provided one. If you provide value no such call is performed and consequently the sequence is not "updated".
You could work around this by manually setting the value of the sequence after your last insert with explicitly provided values:
SELECT setval('test_id_seq', (SELECT MAX(id) from "test"));
The name of the sequence is autogenerated and is always tablename_columnname_seq.
In the recent version of Django, this topic is discussed in the documentation:
Django uses PostgreSQL’s SERIAL data type to store auto-incrementing
primary keys. A SERIAL column is populated with values from a sequence
that keeps track of the next available value. Manually assigning a
value to an auto-incrementing field doesn’t update the field’s
sequence, which might later cause a conflict.
Ref: https://docs.djangoproject.com/en/dev/ref/databases/#manually-specified-autoincrement-pk
There is also management command manage.py sqlsequencereset app_label ... that is able to generate SQL statements for resetting sequences for the given app name(s)
Ref: https://docs.djangoproject.com/en/dev/ref/django-admin/#django-admin-sqlsequencereset
For example these SQL statements were generated by manage.py sqlsequencereset my_app_in_my_project:
BEGIN;
SELECT setval(pg_get_serial_sequence('"my_project_aaa"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_aaa";
SELECT setval(pg_get_serial_sequence('"my_project_bbb"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_bbb";
SELECT setval(pg_get_serial_sequence('"my_project_ccc"','id'), coalesce(max("id"), 1), max("id") IS NOT null) FROM "my_project_ccc";
COMMIT;
It can be done automatically using a trigger. This way you are sure that the largest value is always used as the next default value.
CREATE OR REPLACE FUNCTION set_serial_id_seq()
RETURNS trigger AS
$BODY$
BEGIN
EXECUTE (FORMAT('SELECT setval(''%s_%s_seq'', (SELECT MAX(%s) from %s));',
TG_TABLE_NAME,
TG_ARGV[0],
TG_ARGV[0],
TG_TABLE_NAME));
RETURN OLD;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER set_mytable_id_seq
AFTER INSERT OR UPDATE OR DELETE
ON mytable
FOR EACH STATEMENT
EXECUTE PROCEDURE set_serial_id_seq('mytable_id');
The function can be reused for multiple tables. Change "mytable" to the table of interest.
For more info regarding triggers:
https://www.postgresql.org/docs/9.1/plpgsql-trigger.html
https://www.postgresql.org/docs/9.1/sql-createtrigger.html

PostgreSQL ON INSERT CASCADE

I've got two tables - one is Product and one is ProductSearchResult.
Whenever someone tries to Insert a SearchResult with a product that is not listed in the Product table the foreign key constrain is violattet, hence i get an error.
I would like to know how i could get my database to automatically create that missing Product in the Product Table (Just the ProductID, all other attributes can be left blank)
Is there such thing as CASCADE ON INSERT? If there is, i was not able not get it working.
Rules are getting executed after the Insert, so because we get an Error beforehand there are useless if you USE an "DO ALSO". If you use "DO INSTEAD" and add the INSERT Command at the End you end up with endless recursion.
I reckon a Trigger is the way to go - but all my attempts to write one failed.
Any recommendations?
The Table Structure:
CREATE TABLE Product (
ID char(10) PRIMARY KEY,
Title varchar(150),
Manufacturer varchar(80),
Category smallint,
FOREIGN KEY(Category) REFERENCES Category(ID) ON DELETE CASCADE);
CREATE TABLE ProductSearchResult (
SearchTermID smallint NOT NULL,
ProductID char(10) NOT NULL,
DateFirstListed date NOT NULL DEFAULT current_date,
DateLastListed date NOT NULL DEFAULT current_date,
PRIMARY KEY (SearchTermID,ProductID),
FOREIGN KEY (SearchTermID) REFERENCES SearchTerm(ID) ON DELETE CASCADE,
FOREIGN KEY (ProductID) REFERENCES Product ON DELETE CASCADE);
Yes, triggers are the way to go. But before you can start to use triggers in plpgsql, you
have to enable the language. As user postgres, run the command createlang with the proper parameters.
Once you've done that, you have to
Write function in plpgsql
create a trigger to invoke that function
See example 39-3 for a basic example.
Note that a function body in Postgres is a string, with a special quoting mechanism: 2 dollar signs with an optional word in between them, as the quotes. (The word allows you to quote other similar quotes.)
Also note that you can reuse a trigger procedure for multiple tables, as long as they have the columns your procedure uses.
So the function has to
check if the value of NEW.ProductID exists in the ProductSearchResult table, with a select statement (you ought to be able to use SELECT count(*) ... INTO someint, or SELECT EXISTS(...) INTO somebool)
if not, insert a new row in that table
If you still get stuck, come back here.
In any case (rules OR triggers) the insert needs to create a new key (and new values for the attributes) in the products table. In most cases, this implies that a (serial,sequence) surrogate primary key should be used in the products table, and that the "real world" product_id ("product number") should default to NULL, and be degraded to a candidate key.
BTW: a rule can be used, rules just are tricky to implement correctly for N:1 relations (they need the same kind of EXISTS-logic as in Bart's answer above).
Maybe cascading on INSERT is not such a good idea after all. What do you want to happen if someone inserts a ProductSearchResult record for a not-existing product? [IMO a FK is always a domain; you cannot just extend a domain just by referring to a not-existant value for it; that would make the FK constraint meaningless]