Execute Function Once per Unique Value - postgresql

I have two tables that contain data related to everyday business:
CREATE TABLE main_table (
main_id serial,
cola text,
colb text,
colc text,
CONSTRAINT main_table_pkey PRIMARY KEY (main_id)
);
CREATE TABLE second_table (
second_id serial,
main_id integer,
cold text,
CONSTRAINT second_table_pkey PRIMARY KEY (second_id),
CONSTRAINT second_table_fkey FOREIGN KEY (main_id)
REFERENCES main_table (main_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);
We have a need to know when some data was updated in these tables so that exports can be generated and pushed to third parties. I've created a third table to hold the update information:
CREATE TYPE field AS ENUM ('cola', 'colb', 'colc', 'cold');
CREATE TABLE table_updates (
main_id int,
field field
updated_on date NOT NULL DEFAULT NOW(),
CONSTRAINT table_updates_fkey FOREIGN KEY (main_id)
REFERENCES main_table (main_id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
);
main_table has a trigger to update table_updates before UPDATE queries, which satisfies the need to track three of the four column updates.
I can easily add the same type of trigger to second_table, however because main_id is not unique the function can be executed several times for a single main_id value, which is not desirable.
How can I create a function that, when updating several rows in second_table, executes only once per main_id?

How can I create a function that, when updating several rows in second_table, executes only once per main_id?
If your inserts are batched insert by main_id ie, INSERT INTO tbl (main_id...) VALUES (main_id ...),(main_id ...),(main_id ...) you can use the rule system to trigger once for the INSERT or UPDATE
For the things that can be implemented by both, which is best depends on the usage of the database. A trigger is fired once for each affected row. A rule modifies the query or generates an additional query. So if many rows are affected in one statement, a rule issuing one extra command is likely to be faster than a trigger that is called for every single row and must re-determine what to do many times. However, the trigger approach is conceptually far simpler than the rule approach, and is easier for novices to get right.
Shy of that, you may also want to look into the normal LISTEN, and NOTIFY. Which give you the ability to use Async actions. If that's your thing and you decide to keep the trigger method consider Trigger Change Notification module, via tcn.
My suggestion is to do this in the app (outside of the DB) if at all possible. Remember in PostgreSQL temp tables are local to the session. So you can have each loader-session do something like this,
BEGIN
CREATE TEMP TABLE UNLOGGED etl_inventory;
COPY foo FROM stdin;
-- Are they different, if so `NOTIFY`
-- UPSERT
COMMIT;
And then one have one daemon that does exportation add to exportation queue when it receives the NOTIFY event.

While Evan's answer is correct, I think this question could benefit from an example.
This is the rule definition I used with the example tables in the question:
CREATE OR REPLACE RULE update_update_table
AS ON UPDATE TO second_table
DO ALSO (
INSERT INTO table_updates (
main_id, field
)
SELECT DISTINCT OLD.main_id, 'cold'::field
WHERE NOT EXISTS (
SELECT TRUE
FROM table_updates
WHERE main_id = OLD.main_id
AND field = 'cold'
);
UPDATE table_updates
SET updated_on = NOW()
WHERE main_id = OLD.main_id
AND field = 'cold'
)

Related

How can a relational database with foreign key constraints ingest data that may be in the wrong order?

The database is ingesting data from a stream, and all the rows needed to satisfy a foreign key constraint may be late or never arrive.
This can likely be accomplished by using another datastore, one without foreign key constraints, and then when all the needed data is available, read into the database which has fk constraints. However, this adds complexity and I'd like to avoid it.
We're working on a solution that creates "placeholder" rows to point the foreign key to. When the real data comes in, the placeholder is replaced with real values. Again, this adds complexity, but it's the best solution we've found so far.
How do people typically solve this problem?
Edit: Some sample data which might help explain the problem:
Let's say we have these tables:
CREATE TABLE order (
id INTEGER NOT NULL,
order_number,
PRIMARY KEY (id),
UNIQUE (order_number)
);
CREATE TABLE line_item (
id INTEGER NOT NULL,
order_number INTEGER REFERENCES order(order_number),
PRIMARY KEY (id)
);
If I insert an order first, not a problem! But let's say I try:
INSERT INTO line_item (order_number) values (123) before order 123 was inserted. This will fail the fk constraint of course. But this might be the order I get the data, since it's reading from a stream that is collecting this data from multiple sources.
Also, to address #philpxy's question, I didn't really find much on this. One thing that was mentioned was deferred constraints. This is a mechanism that waits to do the fk constraints at the end of a transaction. I don't think it's possible to do that in my case however, since these insert statements will be run at random times whenever the data is received.
You have a business workflow problem, because line items of individual orders are coming in before the orders themselves have come in. One workaround, perhaps not ideal, would be to create a before insert trigger which checks, for every incoming insert to the line_item table, whether that order already exists in the order table. If not, then it will first insert the order record before trying the insert on line_item.
CREATE OR REPLACE FUNCTION "public"."fn_insert_order" () RETURNS trigger AS $$
BEGIN
INSERT INTO "order" (order_number)
SELECT NEW.order_number
WHERE NOT EXISTS (SELECT 1 FROM "order" WHERE order_number = NEW.order_number);
RETURN NEW;
END
$$
LANGUAGE 'plpgsql'
# trigger
CREATE TRIGGER "trigger_insert_order"
BEFORE INSERT ON line_item FOR EACH ROW
EXECUTE PROCEDURE fn_insert_order()
Note: I am assuming that the id column of the order table in fact is auto increment, in which case Postgres would automatically assign a value to it when inserting as above. Most likely, this is what you want, as having two id columns which both need to be manually assigned does not make much sense.
You could accomplish that with a BEFORE INSERT trigger on line_item.
In that trigger you query order if a matching item exists, and if not, you insert a dummy row.
That will allow the INSERT to succeed, at the cost of some performance.
To insert rows into order, use
INSERT INTO order ...
ON CONFLICT ON (order_number) DO UPDATE SET
id = EXCLUDED.id;
Updating a primary key is problematic and may lead to conflicts. One way you could get around that is if you use negative ids for artificially generated orders (assuming that the real ids are positive). If you have any references to that primary key, you'd have to define the constraint with ON UPDATE CASCADE.

PostgreSQL self referential table - how to store parent ID in script?

I've the following table:
DROP SEQUENCE IF EXISTS CATEGORY_SEQ CASCADE;
CREATE SEQUENCE CATEGORY_SEQ START 1;
DROP TABLE IF EXISTS CATEGORY CASCADE;
CREATE TABLE CATEGORY (
ID BIGINT NOT NULL DEFAULT nextval('CATEGORY_SEQ'),
NAME CHARACTER VARYING(255) NOT NULL,
PARENT_ID BIGINT
);
ALTER TABLE CATEGORY
ADD CONSTRAINT CATEGORY_PK PRIMARY KEY (ID);
ALTER TABLE CATEGORY
ADD CONSTRAINT CATEGORY_SELF_FK FOREIGN KEY (PARENT_ID) REFERENCES CATEGORY (ID);
Now I need to insert the data. So I start with parent:
INSERT INTO CATEGORY (NAME) VALUES ('PARENT_1');
And now I need the ID of the just inserted parent to add children to it:
INSERT INTO CATEGORY (NAME, PARENT_ID) VALUES ('CHILDREN_1_1', <what_goes_here>);
INSERT INTO CATEGORY (NAME, PARENT_ID) VALUES ('CHILDREN_1_2', <what_goes_here>);
How can I get and store the ID of the parent to later use it in the subsequent inserts?
You can use a data modifying CTE with the returning clause:
with parent_cat (parent_id) as (
INSERT INTO CATEGORY (NAME) VALUES ('PARENT_1')
returning id
)
INSERT INTO CATEGORY (NAME, PARENT_ID)
VALUES
('CHILDREN_1_1', (select parent_id from parent_cat) ),
('CHILDREN_1_2', (select parent_id from parent_cat) );
The answer is to use RETURNING along with WITH
WITH inserted AS (
INSERT INTO CATEGORY (NAME) VALUES ('PARENT_1')
RETURNING id
) INSERT INTO CATEGORY (NAME, PARENT_ID) VALUES
('CHILD_1_1', (SELECT inserted.id FROM inserted)),
('CHILD_2_1', (SELECT inserted.id FROM inserted));
( tl;dr : goto option 3: INSERT with RETURNING )
Recall that in postgresql there is no "id" concept for tables, just sequences (which are typically but not necessarily used as default values for surrogate primary keys, with the SERIAL pseudo-type).
If you are interested in getting the id of a newly inserted row, there are several ways:
Option 1: CURRVAL(<sequence name>);.
For example:
INSERT INTO persons (lastname,firstname) VALUES ('Smith', 'John');
SELECT currval('persons_id_seq');
The name of the sequence must be known, it's really arbitrary; in this example we assume that the table persons has an id column created with the SERIAL pseudo-type. To avoid relying on this and to feel more clean, you can use instead pg_get_serial_sequence:
INSERT INTO persons (lastname,firstname) VALUES ('Smith', 'John');
SELECT currval(pg_get_serial_sequence('persons','id'));
Caveat: currval() only works after an INSERT (which has executed nextval() ), in the same session.
Option 2: LASTVAL();
This is similar to the previous, only that you don't need to specify the sequence number: it looks for the most recent modified sequence (always inside your session, same caveat as above).
Both CURRVAL and LASTVAL are totally concurrent safe. The behaviour of sequence in PG is designed so that different session will not interfere, so there is no risk of race conditions (if another session inserts another row between my INSERT and my SELECT, I still get my correct value).
However they do have a subtle potential problem. If the database has some TRIGGER (or RULE) that, on insertion into persons table, makes some extra insertions in other tables... then LASTVAL will probably give us the wrong value. The problem can even happen with CURRVAL, if the extra insertions are done intto the same persons table (this is much less usual, but the risk still exists).
Option 3: INSERT with RETURNING
INSERT INTO persons (lastname,firstname) VALUES ('Smith', 'John') RETURNING id;
This is the most clean, efficient and safe way to get the id. It doesn't have any of the risks of the previous.
Drawbacks? Almost none: you might need to modify the way you call your INSERT statement (in the worst case, perhaps your API or DB layer does not expect an INSERT to return a value); it's not standard SQL (who cares); it's available since Postgresql 8.2 (Dec 2006...)
Conclusion: If you can, go for option 3. Elsewhere, prefer 1.
Note: all these methods are useless if you intend to get the last globally inserted id (not necessarily in your session). For this, you must resort to select max(id) from table (of course, this will not read uncommitted inserts from other transactions).

Sync Primary Key in Parent and Child Table

I need to keep 3 columns in 2 different tables synchronized.
MS SQL Server 2008
create table a(
id int IDENTITY(1,1),
name1 nvarchar(255),
name2 nvarchar(255),
date1 date default null,
date2 date default null,
...
CONSTRAINT pk primary key (id),
CONSTRAINT uniqueNames UNIQUE (name1,name2))
I'm looking to create another table that will duplicate the first 3 columns in my second table, and keep them up to date (insert, delete, update). I was looking at triggers, but the consensus seems to be that if I insert or update multiple rows at once, this will not execute correctly. Any idea's on how to accomplish this?
Ideally you would normalise the data by moving the duplicated fields in to another table and referencing them via a foreign key.
However triggers could also be used to solve this problem. take a look here : How to write trigger for multiple row update?

How to apply sequence order in trigger?

In Informix, I need to update 2 (two) tables when the trigger is executed. Let's say Table_A and Table_B. In Table_A, there is a int8 (long data type) column as primary key. When a new record is inserted, this primary key column will retrieve the value from a sequence. This is the code:
sequence_A.nextVal
In Table_B, there is a foreign key column that references the primary key in Table_A. In order to make the primary key and the foreign key tally, I use sequence_A.currVal to insert the value into this foreign key column.
I did try the code below but Informix give me syntax error.
create trigger The_Trigger
insert on The_Table referencing new as n
for each row
(
insert into Table_A(...) value(sequence_A.nextVal, ...)
insert into Table_B(...) value(sequence_A.currVal, ...)
)
If I separate the insert statement into 2 (two) difference triggers, it works. Thus I was thinking to create 2 (two) triggers on The_Table. Let's say Trigger_A and Trigger_B, may I know how can I ensure that Trigger_A will get execute first then only Thrigger_B. Can I specify the order execution on triggers? Can this be done? And how?
In your first attempt, you omitted the comma between the two INSERT statements, and the keyword is VALUES, not VALUE:
CREATE TRIGGER The_Trigger
INSERT ON The_Table REFERENCING NEW AS n
FOR EACH ROW
(
INSERT INTO Table_A(...) VALUES(sequence_A.nextVal, ...),
INSERT INTO Table_B(...) VALUES(sequence_A.currVal, ...)
)
With those two changes, I believe you would have sequential execution.
Given a sufficiently recent version of Informix, you can have multiple triggers for a single event on a single table. The sequence of execution is the sequence in which the triggers are defined.

PostgreSQL ON INSERT CASCADE

I've got two tables - one is Product and one is ProductSearchResult.
Whenever someone tries to Insert a SearchResult with a product that is not listed in the Product table the foreign key constrain is violattet, hence i get an error.
I would like to know how i could get my database to automatically create that missing Product in the Product Table (Just the ProductID, all other attributes can be left blank)
Is there such thing as CASCADE ON INSERT? If there is, i was not able not get it working.
Rules are getting executed after the Insert, so because we get an Error beforehand there are useless if you USE an "DO ALSO". If you use "DO INSTEAD" and add the INSERT Command at the End you end up with endless recursion.
I reckon a Trigger is the way to go - but all my attempts to write one failed.
Any recommendations?
The Table Structure:
CREATE TABLE Product (
ID char(10) PRIMARY KEY,
Title varchar(150),
Manufacturer varchar(80),
Category smallint,
FOREIGN KEY(Category) REFERENCES Category(ID) ON DELETE CASCADE);
CREATE TABLE ProductSearchResult (
SearchTermID smallint NOT NULL,
ProductID char(10) NOT NULL,
DateFirstListed date NOT NULL DEFAULT current_date,
DateLastListed date NOT NULL DEFAULT current_date,
PRIMARY KEY (SearchTermID,ProductID),
FOREIGN KEY (SearchTermID) REFERENCES SearchTerm(ID) ON DELETE CASCADE,
FOREIGN KEY (ProductID) REFERENCES Product ON DELETE CASCADE);
Yes, triggers are the way to go. But before you can start to use triggers in plpgsql, you
have to enable the language. As user postgres, run the command createlang with the proper parameters.
Once you've done that, you have to
Write function in plpgsql
create a trigger to invoke that function
See example 39-3 for a basic example.
Note that a function body in Postgres is a string, with a special quoting mechanism: 2 dollar signs with an optional word in between them, as the quotes. (The word allows you to quote other similar quotes.)
Also note that you can reuse a trigger procedure for multiple tables, as long as they have the columns your procedure uses.
So the function has to
check if the value of NEW.ProductID exists in the ProductSearchResult table, with a select statement (you ought to be able to use SELECT count(*) ... INTO someint, or SELECT EXISTS(...) INTO somebool)
if not, insert a new row in that table
If you still get stuck, come back here.
In any case (rules OR triggers) the insert needs to create a new key (and new values for the attributes) in the products table. In most cases, this implies that a (serial,sequence) surrogate primary key should be used in the products table, and that the "real world" product_id ("product number") should default to NULL, and be degraded to a candidate key.
BTW: a rule can be used, rules just are tricky to implement correctly for N:1 relations (they need the same kind of EXISTS-logic as in Bart's answer above).
Maybe cascading on INSERT is not such a good idea after all. What do you want to happen if someone inserts a ProductSearchResult record for a not-existing product? [IMO a FK is always a domain; you cannot just extend a domain just by referring to a not-existant value for it; that would make the FK constraint meaningless]