How to update an aggregate table in a trigger procedure while taking care of proper concurrency? - postgresql

For illustration, say I'm updating a table ProductOffers and their prices. Mutations to this table are of the form: add new ProductOffer, change price of existing ProductOffer.
Based on the above changes, I'd like to update a Product-table which holds pricing info per product aggregated over all offers.
It seems logical to implement this using a row-based update/insert trigger, where the trigger runs a procedure creating/updating a Product row.
I'd like to properly implement concurrent updates (and thus triggers). I.e.: updating productOffers of the same Product concurrently, would potentially lead to wrong aggregate values (because multiple triggered procedures would concurrently attempt to insert/update the same Product-row)
It seems I cannot use row-based locking on the product-table (i.e.: select .. for update) because it's not guaranteed that a particular product-row already exists. Instead the first time around a Product row must be created (instead of updated) once a ProductOffer triggers the procedure. Afaik, row-locking can't work with new rows to be inserted, which totally makes sense.
So where does that leave me? Would I need to roll my own optimistic locking scheme? This would need to include:
check row not exists => create new row fail if already exists. (which is possible if 2 triggers concurrently try to create a row). Try again afterwards, with an update.
check row exists and has version=x => update row but fail if row.version !=x. Try again afterwards
Would the above work, or any better / more out-of-the-box solutions?
EDIT:
For future ref: found official example which exactly illustrates what I want to accomplish: Example 39-6. A PL/pgSQL Trigger Procedure For Maintaining A Summary Table

Things are much simpler than you think they are, thanks to the I an ACID.
The trigger you envision will run in the same transaction as the data modification that triggered it, and each modification to the aggregate table will first lock the row that it wants to update with an EXCLUSIVE lock.
So if two concurrent transactions cause an UPDATE on the same row in the aggregate table, the first transaction will get the lock and proceed, while the second transaction will have to wait until the first transaction commits (or rolls back) before it can get the lock on the row and modify it.
So data modifications that update the same row in the aggregate table will effectively be serialized, which may hurt performance, but guarantees exact results.

Related

Does Postgres guarantee to lock rows in the order of supplied update-statements?

I like to do batch updates to Postgres. Sometimes, the batch may contain update-statements to the same record. (*)
To this end I need to be sure that Postgres locks rows based on the order in which the update-statements are supplied.
Is this guaranteed?
To be clear, I'm sending a sequence of single row update-statements, so not a single multi-row update-statement. E.g.:
update A set x='abc', dt='<timeN>' where id='123';
update A set x='def', dt='<timeN+1>' where id='123';
update A set x='ghi', dt='<timeN+2>' where id='123';
*) This might seem redundant: just only save the last one. However, I have defined an after-trigger on the table so history is created in a different table. Therefore I need the multiple updates.
The rows will definitely be locked in the order of the UPDATE statements.
Moreover, locks only affect concurrent transactions, so if all the UPDATEs take place in one database session, you don't have to be afraid to get blocked by a lock.

How to trigger creation/update of another row of record if one row is created/updated in postgresql

I am receiving a record csv for outside, then when I create or update the entry into the postgresql, I need to create an mirror entry that only have sign differences. This is could be done at program level, I am curious to know would it possible using triggers.
For the examples I can find, they all end with code,
FOR EACH ROW EXECUTE PROCEDURE foo()
And usually deal with checks, add addtional info using NEW.additionalfield, or insert into another table. If I use trigger this way to insert another row in the same table, it seems the trigger will triggered again and the creation become recursive.
Any way to work this out?
When dealing with triggers, the rules of thumb are:
If it changes the current row, based on some business rules or other (e.g. adding extra info or processing calculated fields), it belongs in a BEFORE trigger.
If it has side effects on one or more rows in separate tables, it belongs in an AFTER trigger.
If it runs integrity checks on any table that no other built-in constraints (checks, unique keys, foreign keys, exclude, etc.) can take care of, it belongs in a CONSTRAINT [after] trigger.
If it has side effects on one or more other rows within the same table, you should probably revisit your schema, your code flow, or both.
Regarding that last point, there actually are workarounds in Postgres, such as trying to get a lock or checking xmin vs the transaction's xid, to avoid getting bogged down in recursive scenarios. A recent version additionally introduced pg_trigger_depth(). But I'd still advise against it.
Note that a constraint trigger can be created as deferrable initially deferred. This will delay the constraint trigger until the very end of the transaction, rather than immediately after the statement.
Your question and nickname hint that you're wondering how to automatically balance a set of lines in a double-entry book-keeping application. Assuming so, do NOT create the balancing entry automatically. Instead, begin a transaction, enter each line separately, and have a (for each row, deferrable initially deferred) constraint trigger pick things up from there and reject the entire batch if anything is unbalanced. Proceeding that way will spare you a mountain of headaches when you want to balance more than two or three lines with each other.
Another reading might be that you want to create an audit trail. If so, create additional audit tables and use after triggers to populate them. There are multiple ways to create and manage these audit tables. Look into slowly changing dimensions. (Fwiw, type 6 with a start_end column of type tsrange or tstzrange works well for the audit tables if you're interested in a table's full history including its history of relationships with other audit tables.) Use the "live" tables for your application to keep things fast, and use the audit-tables when you need historical reporting.

How to wait during SELECT that pending INSERT commit?

I'm using PostgreSQL 9.2 in a Windows environment.
I'm in a 2PC (2 phase commit) environment using MSDTC.
I have a client application, that starts a transaction at the SERIALIZABLE isolation level, inserts a new row of data in a table for a specific foreign key value (there is an index on the column), and vote for completion of the transaction (The transaction is PREPARED). The transaction will be COMMITED by the Transaction Coordinator.
Immediatly after that, outside of a transaction, the same client requests all the rows for this same specific foreign key value.
Because there may be a delay before the previous transaction is really commited, the SELECT clause may return a previous snapshot of the data. In fact, it does happen sometimes, and this is problematic. Of course the application may be redesigned but until then, I'm looking for a lock solution. Advisory Lock ?
I already solved the problem while performing UPDATE on specific rows, then using SELECT...FOR SHARE, and it works well. The SELECT waits until the transaction commits and return old and new rows.
Now I'm trying to solve it for INSERT.
SELECT...FOR SHARE does not block and return immediatley.
There is no concurrency issue here as only one client deals with a specific set of rows. I already know about MVCC.
Any help appreciated.
To wait for a not-yet-committed INSERT you'd need to take a predicate lock. There's limited predicate locking in PostgreSQL for the serializable support, but it's not exposed directly to the user.
Simple SERIALIZABLE isolation won't help you here, because SERIALIZABLE only requires that there be an order in which the transactions could've occurred to produce a consistent result. In your case this ordering is SELECT followed by INSERT.
The only option I can think of is to take an ACCESS EXCLUSIVE lock on the table before INSERTing. This will only get released at COMMIT PREPARED or ROLLBACK PREPARED time, and in the mean time any other queries will wait for the lock. You can enforce this via a BEFORE trigger to avoid the need to change the app. You'll probably get the odd deadlock and rollback if you do it that way, though, because INSERT will take a lower lock then you'll attempt lock promotion in the trigger. If possible it's better to run the LOCK TABLE ... IN ACCESS EXCLUSIVE MODE command before the INSERT.
As you've alluded to, this is mostly an application mis-design problem. Expecting to see not-yet-committed rows doesn't really make any sense.

How can I optimize this: Many Postgres triggers calling the same function within a transaction

I have a cached field in one of my database tables that get's updated based on many different tables and the state of fields in those tables.
Each of those tables calls the same function via a trigger: updateCachedField(basetable_id INTEGER). The updateCachedField function queries all these other tables and calculates the new cached value for the base table. The updateCachedField function is complicated and very cost heavy.
During a single transaction it's possible for many of the tables that affect the cached field to be changed. So during a single transaction, the updateCachedField function may get called 50 times... but with only 5 different basetable_id's.
Is there a way to optimize this so that the updateCachedField function only gets called 5 times instead of the 50+?
I think your best option is to use a deferrable constraint trigger and query based on the xmin value or other information. You could in fact also copy "log" data for snapshotting somewhere else.
Without clear information its not really clear what you are looking at but one option is to essentially have three sets of tables:
Live tables. Updates (deltas or other aggregate-useful info) get written to log tables on update or insert.
Log tables. These are append only.
Aggregate snapshot tables which periodically aggregate new sets of log tables.
You could then have a live view which would show current aggregate data, aggregating only from the last snapshot.

basic doubts in T-SQL triggers

What is the difference between FOR and AFTER in trigger definition. Any benefits of using one vs another?
If I issue an update statement which updates 5 rows, does the trigger (with FOR UPDATE) fires 5 times? If it is so, is there any way to make trigger fire only once for the entire UPDATE statment (even though it updates multiple rows)
Is there any chance/situation of having more than one row in "inserted" or "deleted" table at any time in a trigger life cycle. If it so, can I have a very quick sample on that?
thanks
Trigger fire once for each batch and should always be designed with that in mind. Yes if you do a multi-row update insert or delte, allthe rows will be in the inserted or deleted tables. For instance the command
Delete table1 where state = 'CA'
would have all the rows in the table that have a state of CA in them even if it was 10,000,000 of them. That is why trigger testing is critical and why the trigger must be designed to handle multi-row actions. A trigger that works well for one row may bring the deatabase toa screeching halt for hours if poorly designed to handle mulitple rows or could cause data integrity issues if not designed correctly to handle mulitple rows. Triggers should not rely on either cursors or loops for the most part but on set-based operations. If you are setting the contents of inserted or delted to a variable, you are almost certainly expecting one row and yor trigger will not work properly when someone does a set-based operation on it.
SQL Server has two basic kinds of DML triggers, after triggers which happen after the record has been placed in the table. These are typically used to update some other table as well. Before triggers take the place of the insert/update/delete, they are used for special processing onthe table inserted usually. It is important to know that a before trigger will not perform the action that was sent to the table and if you still want to delete/update or insert as part of the trigger you must write that into the trigger.