RETURNING clause with BEFORE INSERT trigger - postgresql

I'm using table inheritance to split a table into smaller ones. I'm using BEFORE INSERT trigger to route new data into correct inherited tables. This trigger returns NULL so actual INSERT won't run on parent table.
The side effect of this is lack of any result of actual INSERT:
INSERT INTO TABLE a VALUES (...) RETURNING a_id
triggers BEFORE INSERT which directs new data to another table a_CURRENT_DATE - dynamically created when necessary by the trigger function. The trigger returns NULL so actual INSERT into table a is suppressed.
Original query has no result hence no a_id (a_id is a SERIAL column) is available.
What is the most elegant way to obtain a_id value?

Good question. That is typical problem with partitioning. I'm afraid there is no good, or elegant solution, and all you can do is to introduce some workarounds:
inserting, and then deleting - yes, far from perfect,
if you need id generated by serial type, you can use currval()... That would mean another query.
Here there is yet another other way - you can create view, and use instead of trigger for that view. It is hard to tell if that is elegant, but for me that is quite close to that.

Related

How can a relational database with foreign key constraints ingest data that may be in the wrong order?

The database is ingesting data from a stream, and all the rows needed to satisfy a foreign key constraint may be late or never arrive.
This can likely be accomplished by using another datastore, one without foreign key constraints, and then when all the needed data is available, read into the database which has fk constraints. However, this adds complexity and I'd like to avoid it.
We're working on a solution that creates "placeholder" rows to point the foreign key to. When the real data comes in, the placeholder is replaced with real values. Again, this adds complexity, but it's the best solution we've found so far.
How do people typically solve this problem?
Edit: Some sample data which might help explain the problem:
Let's say we have these tables:
CREATE TABLE order (
id INTEGER NOT NULL,
order_number,
PRIMARY KEY (id),
UNIQUE (order_number)
);
CREATE TABLE line_item (
id INTEGER NOT NULL,
order_number INTEGER REFERENCES order(order_number),
PRIMARY KEY (id)
);
If I insert an order first, not a problem! But let's say I try:
INSERT INTO line_item (order_number) values (123) before order 123 was inserted. This will fail the fk constraint of course. But this might be the order I get the data, since it's reading from a stream that is collecting this data from multiple sources.
Also, to address #philpxy's question, I didn't really find much on this. One thing that was mentioned was deferred constraints. This is a mechanism that waits to do the fk constraints at the end of a transaction. I don't think it's possible to do that in my case however, since these insert statements will be run at random times whenever the data is received.
You have a business workflow problem, because line items of individual orders are coming in before the orders themselves have come in. One workaround, perhaps not ideal, would be to create a before insert trigger which checks, for every incoming insert to the line_item table, whether that order already exists in the order table. If not, then it will first insert the order record before trying the insert on line_item.
CREATE OR REPLACE FUNCTION "public"."fn_insert_order" () RETURNS trigger AS $$
BEGIN
INSERT INTO "order" (order_number)
SELECT NEW.order_number
WHERE NOT EXISTS (SELECT 1 FROM "order" WHERE order_number = NEW.order_number);
RETURN NEW;
END
$$
LANGUAGE 'plpgsql'
# trigger
CREATE TRIGGER "trigger_insert_order"
BEFORE INSERT ON line_item FOR EACH ROW
EXECUTE PROCEDURE fn_insert_order()
Note: I am assuming that the id column of the order table in fact is auto increment, in which case Postgres would automatically assign a value to it when inserting as above. Most likely, this is what you want, as having two id columns which both need to be manually assigned does not make much sense.
You could accomplish that with a BEFORE INSERT trigger on line_item.
In that trigger you query order if a matching item exists, and if not, you insert a dummy row.
That will allow the INSERT to succeed, at the cost of some performance.
To insert rows into order, use
INSERT INTO order ...
ON CONFLICT ON (order_number) DO UPDATE SET
id = EXCLUDED.id;
Updating a primary key is problematic and may lead to conflicts. One way you could get around that is if you use negative ids for artificially generated orders (assuming that the real ids are positive). If you have any references to that primary key, you'd have to define the constraint with ON UPDATE CASCADE.

Is it possible to refer a column in a view as foreign key (PostgreSQL 9.4)?

I know in older versions it was impossible, is it the same with version 9.4?
I'm trying to do something like this:
CREATE VIEW products AS
SELECT d1.id AS id, d1.price AS pr FROM dup.freshProducts AS d1
UNION
SELECT d2.id AS id, d2.price AS pr FROM dup.cannedProducts AS d2;
CREATE TABLE orderLines
(
line_id integer PRIMARY KEY,
product_no integer REFERENCES productView.id
);
I'm trying to implement an inheritance relationship where freshProducts and cannedProducts both inherit from products. I implemented it using two different tables and I created a view products that has only the common properties between freshProducts and cannedProducts. In addition, each row in orderLines has a relationship with a product, either a freshProduct or a cannedProduct. See image for clarification.
If referencing to a view is yet not possible, which solution do you think is best? I've thought of eihter a materialized view or implementing the restriction using triggers. Could you recommend any good example of such triggers to use as a basis?
Thank-you very much!
Referencing a (materialized) view wouldn't work and a trigger might look like this:
CREATE OR REPLACE FUNCTION reject_not_existing_id()
RETURNS "trigger" AS
$BODY$
BEGIN
IF NEW.product_no NOT IN (SELECT id FROM dup.freshProducts UNION SELECT id FROM dup.cannedProducts) THEN
RAISE EXCEPTION 'The product id % does not exist', NEW.product_no;
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
CREATE TRIGGER tr_before_insert_or_update
BEFORE INSERT OR UPDATE OF product_no
ON orderLines
FOR EACH ROW
EXECUTE PROCEDURE reject_not_existing_id();
(See also http://www.tek-tips.com/viewthread.cfm?qid=1116256)
A materialized view might look like a good approach but fails for two reasons: Like a view you simply can't reference it, because it is no table (go ahead and try). Assuming you could, there would still be the problem of preventing two equal ids in freshProducts and cannedProducts. Yes you can define an UNIQUE INDEX on a materialized view, but how to make sure the same id isn't used both in fresh an canned in the first place?
That's something you still have to solve if using the trigger in orderLines.
That brings me to suggest to rethink your model. 'Fresh' and 'canned' might as well be values of an attribute of a single table products, hence making all the trouble superfluous. If fresh and canned product significantly differ in (the number of) their attributes (can't think of any other reason to create two different tables) then reference the product id in two other tables. Like
CREATE TABLE products
(
id ... PRIMARY KEY
, fresh_or_canned ...
, price ...
, another_common_attribute_1 ...
, ...
, another_common_attribute_n ...
);
CREATE TABLE canned_specific_data
(
canned_id ... REFERENCES products (id)
, type_of_can ...
, ...
, another_attribute_that_does_not_apply_to_fresh ...
);
CREATE TABLE fresh_specific_data
(
fresh_id ... REFERENCES products (id)
, date_of_harvest ...
, ...
, another_attribute_that_does_not_apply_to_canned ...
);
The simple answer to preventing ID duplication is to simply use the same sequence as the default value for IDs in both freshProducts and cannedProducts.
Now, there comes the question, why do you need a foreign key at all? Typically this is to prevent deletion of data that another table depends upon, however, you can write a trigger to prevent that. Alsowise, you have updating that value to something that doesn't exist in the keyed-to table, but you can write a trigger for that too.
So basically you can write triggers to implement all the desired functionality of a foreign key without actually needing a foreign key, with the added benefit that they WILL work with such a view.

postgresql trigger (strategy) to update table based on entries of another table

I am fairly new to PostgreSQL (spoilt by django ORM!), and I would like to create a trigger which updates a table based on entries of another table.
So, I have the following table on my schema:
collection_myblogs(id, col1,col2,title,col4,col5)
..where field id is autogenerated. Now, I have a new table created like so:
CREATE TABLE FullText(id SERIAL NOT NULL, content text NOT NULL);
ALTER TABLE ONLY FullText ADD CONSTRAINT fulltext_pkey PRIMARY KEY (id);
and I insert values from collection_myblogs like so:
INSERT INTO FullText(content) SELECT title FROM collection_myblogs;
All fine so far...I would now like a trigger on FullText such that FullText updates itself with new entries everytime collection_myblogs has a new entry. So, I attempted creating a trigger as following:
CREATE TRIGGER collection_ft_update BEFORE INSERT OR UPDATE ON collection_myblogs FOR EACH ROW EXECUTE PROCEDURE ft_update();
Now, I am not entirely sure what should go on ft_update() function, and at the moment, I have:
CREATE FUNCTION ft_update() RETURNS trigger AS '
BEGIN
INSERT INTO FullText(content) SELECT new.title;
return new;
END
' LANGUAGE plpgsql;
..which works fine for INSERTS but not UPDATES. i.e if I update the title of the orginal column collections_myblog(title) it appears as a new entry on FullText I am unsure how to deal with ids here.
I would like the ids i.e primary keys to be the same on each table. So, the idea for me is to have FullText(id, content) == collection_myblogs(id, title) - if this makes sense. So, the id and the content should be replicated from collection_myblogs table. How would one go about achieving this?
My understanding is that I can use a trigger before any insert or an update on my collection_myblogs and somehow maintain FullText(id, content) == collection_myblogs(id, title)
I would appreciate any guidance on this.
There are actually a large number of ways to handle this problem. Some examples:
Use table inheritance to create an "interface" to your data (no trigger needed, the abstract table ends up functioning like a view). This is complicated territory though.
Use the trigger approach like you do and then handle UPDATE and DELETE separately. The big issue here is that if you have two areas of text that are identical, your update trigger needs to be able to separate them.
There are many others but those should get you started.
Actually, it turned out to be quite simple. Just had to follow this

What is the difference between these two T-SQL statements?

In a SSIS package at work there are some SQL tasks that create staging tables for holding import data. All the statements take this form:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.tbNewTable') AND type in (N'U'))
BEGIN
TRUNCATE TABLE dbo.tbNewTable
END
ELSE
BEGIN
CREATE TABLE dbo.tbNewTable (
ColumnA VARCHAR(10) NULL,
ColumnB VARCHAR(10) NULL,
ColumnC INT NULL
) ON PRIMARY
END
In Itzik Ben-Gan's T-SQL Fundamentals I see a different form of statement for creating a table:
IF OBJECT_ID('dbo.tbNewTable', 'U') IS NOT NULL
BEGIN
DROP TABLE dbo.tbNewTable
END
CREATE TABLE dbo.tbNewTable (
ColumnA VARCHAR(10) NULL,
ColumnB VARCHAR(10) NULL,
ColumnC INT NULL
) ON PRIMARY
Each of these appears to do the same thing. After execution, there will be a empty table called tbNewTable in the dbo schema.
Are there any practical or theoretical differences between the two? What implications might they have?
The first one assumes that if the table exists, it has the same columns as those it would create. The second one does not make that assumption. So if a table with that name happened to exist and had a different set of columns, the two would have very different results.
The first will not actually DROP the table -- it merely TRUNCATES all the data in said table. Hence why the CREATE is guarded.
Thus the form with the DROP will allow the subsequent CREATE to change the schema (when the new table is created) even if tbNewTable previously existed.
Because the DROP/CREATE alters the database schema it may not also be allowed in all cases. For instance, a view created with a SCHEMABINDING will prevent the table from being dropped. (This also hold true for more general FK relationships, should any exist.)
...when SCHEMABINDING is specified, the base table or tables cannot be modified in a way that would affect the view definition.
The TRUNCATE should be marginally faster in one of those constant "don't care" ways: there should be no performance consideration given to one over the other.
There are also permission differences. TRUNCATE only requires the ALTER permission.
The minimum permission required is ALTER on table_name. TRUNCATE TABLE permissions default to the table owner...
Happy coding.
These are very different..
The first does an equality check on the sys.objects system table and looks to see if there is a matching table name. If so, it truncates the table. Basically removing all rows but maintaining the table structure itself - i.e. the actual table is never dropped.
In the second, the check to make sure that the table exists is implicitly done using the OBJECT_ID() method. If so, the table is dropped completely - rows and structure.
If you have a primary and foreign key constraint on the table, you'll certainly have issues dropping it completely... and if you have other tables that are linked to the table you are trying to 'truncate' you'll have issues there too, unless you have cascade deletion turned on.
I tend to dislike either construction in an SSIS package. I create the tables in a deployment script and I want the package to fail if one of the tables I use is missing later on because then something drastically wrong has happened and I want to investigate what before I try putting data anywhere.

How do I INSERT and SELECT data with partitioned tables?

I set up a set of partitioned tables per the docs at http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html
CREATE TABLE t (year, a);
CREATE TABLE t_1980 ( CHECK (year = 1980) ) INHERITS (t);
CREATE TABLE t_1981 ( CHECK (year = 1981) ) INHERITS (t);
CREATE RULE t_ins_1980 AS ON INSERT TO t WHERE (year = 1980)
DO INSTEAD INSERT INTO t_1980 VALUES (NEW.year, NEW.a);
CREATE RULE t_ins_1981 AS ON INSERT TO t WHERE (year = 1981)
DO INSTEAD INSERT INTO t_1981 VALUES (NEW.year, NEW.a);
From my understanding, if I INSERT INTO t (year, a) VALUES (1980, 5), it will go to t_1980, and if I INSERT INTO t (year, a) VALUES (1981, 3), it will go to t_1981. But, my understanding seems to be incorrect. First, I can't understand the following from the docs
"There is currently no simple way to specify that rows must not be inserted into the master table. A CHECK (false) constraint on the master table would be inherited by all child tables, so that cannot be used for this purpose. One possibility is to set up an ON INSERT trigger on the master table that always raises an error. (Alternatively, such a trigger could be used to redirect the data into the proper child table, instead of using a set of rules as suggested above.)"
Does the above mean that in spite of setting up the CHECK constraints and the RULEs, I also have to create TRIGGERs on the master table so that the INSERTs go to the correct tables? If that were the case, what would be the point of the db supporting partitioning? I could just set up the separate tables myself? I inserted a bunch of values into the master table, and those rows are still in the master table, not in the inherited tables.
Second question. When retrieving the rows, do I select from the master table, or do I have to select from the individual tables as needed? How would the following work?
SELECT year, a FROM t WHERE year IN (1980, 1981);
Update: Seems like I have found the answer to my own question
"Be aware that the COPY command ignores rules. If you are using COPY to insert data, you must copy the data into the correct child table rather than into the parent. COPY does fire triggers, so you can use it normally if you create partitioned tables using the trigger approach."
I was indeed using COPY FROM to load data, so RULEs were being ignored. Will try with TRIGGERs.
Definitely try triggers.
If you think you want to implement a rule, don't (the only exception that comes to mind is updatable views). See this great article by depesz for more explanation there.
In reality, Postgres only supports partitioning on the reading side of things. You're going to have setup the method of insertition into partitions yourself - in most cases TRIGGERing. Depending on the needs and applicaitons, it can sometimes be faster to teach your application to insert directly into the partitions.
When selecting from partioned tables, you can indeed just SELECT ... WHERE... on the master table so long as your CHECK constraints are properly setup (they are in your example) and the constraint_exclusion parameter is set corectly.
For 8.4:
SET constraint_exclusion = partition;
For < 8.4:
SET constraint_exclusion = on;
All this being said, I actually really like the way Postgres does it and use it myself often.
Does the above mean that in spite of
setting up the CHECK constraints and
the RULEs, I also have to create
TRIGGERs on the master table so that
the INSERTs go to the correct tables?
Yes. Read point 5 (section 5.9.2)
If that were the case, what would be
the point of the db supporting
partitioning? I could just set up the
separate tables myself?
Basically: the INSERTS in the child tables must be done explicitly (either creating TRIGGERS, or by specifying the correct child table in the query). But the partitioning
is transparent for SELECTS, and (given the storage and indexing advantages of this schema) that's the point.
(Besides, because the partitioned tables are inherited,
the schema is inherited from the parent, hence consistency
is enforced).
Triggers are definitelly better than rules.
Today I've played with partitioning of materialized view table and run into problem with triggers solution.
Why ?
I'm using RETURNING and current solution returns NULL :)
But here's solution which works for me - correct me if I'm wrong.
1. I have 3 tables which are inserted with some data, there's an view (let we call it viewfoo) which contains
data which need to be materialized.
2. Insert into last table have trigger which inserts into materialized view table
via INSERT INTO matviewtable SELECT * FROM viewfoo WHERE recno=NEW.recno;
That works fine and I'm using RETURNING recno; (recno is SERIAL type - sequence).
Materialized view (table) need to be partitioned because it's huge, and
according to my tests it's at least x10 faster for SELECT in this case.
Problems with partitioning:
* Current trigger solution RETURN NULL - so I cannot use RETURNING recno.
(Current trigger solution = trigger explained at depesz page).
Solution:
I've changed trigger of my 3rd table TO NOT insert into materialized view table (that table is parent of partitioned tables), but created new trigger which inserts
partitioned table directly FROM 3rd table and that trigger RETURN NEW.
Materialized view table is automagically updated and RETURNING recno works fine.
I'll be glad if this helped to anybody.