How to trigger creation/update of another row of record if one row is created/updated in postgresql - postgresql

I am receiving a record csv for outside, then when I create or update the entry into the postgresql, I need to create an mirror entry that only have sign differences. This is could be done at program level, I am curious to know would it possible using triggers.
For the examples I can find, they all end with code,
FOR EACH ROW EXECUTE PROCEDURE foo()
And usually deal with checks, add addtional info using NEW.additionalfield, or insert into another table. If I use trigger this way to insert another row in the same table, it seems the trigger will triggered again and the creation become recursive.
Any way to work this out?

When dealing with triggers, the rules of thumb are:
If it changes the current row, based on some business rules or other (e.g. adding extra info or processing calculated fields), it belongs in a BEFORE trigger.
If it has side effects on one or more rows in separate tables, it belongs in an AFTER trigger.
If it runs integrity checks on any table that no other built-in constraints (checks, unique keys, foreign keys, exclude, etc.) can take care of, it belongs in a CONSTRAINT [after] trigger.
If it has side effects on one or more other rows within the same table, you should probably revisit your schema, your code flow, or both.
Regarding that last point, there actually are workarounds in Postgres, such as trying to get a lock or checking xmin vs the transaction's xid, to avoid getting bogged down in recursive scenarios. A recent version additionally introduced pg_trigger_depth(). But I'd still advise against it.
Note that a constraint trigger can be created as deferrable initially deferred. This will delay the constraint trigger until the very end of the transaction, rather than immediately after the statement.
Your question and nickname hint that you're wondering how to automatically balance a set of lines in a double-entry book-keeping application. Assuming so, do NOT create the balancing entry automatically. Instead, begin a transaction, enter each line separately, and have a (for each row, deferrable initially deferred) constraint trigger pick things up from there and reject the entire batch if anything is unbalanced. Proceeding that way will spare you a mountain of headaches when you want to balance more than two or three lines with each other.
Another reading might be that you want to create an audit trail. If so, create additional audit tables and use after triggers to populate them. There are multiple ways to create and manage these audit tables. Look into slowly changing dimensions. (Fwiw, type 6 with a start_end column of type tsrange or tstzrange works well for the audit tables if you're interested in a table's full history including its history of relationships with other audit tables.) Use the "live" tables for your application to keep things fast, and use the audit-tables when you need historical reporting.

Related

Can I configure a table such that inserted rows always have a greater primary key

I would like to configure a table in Postgres to behave like an append only log. This table will have an automatically generated primary ID.
Workers will work on the items in the table in order and should only need to store the last row ID that they have completed.
How can i prevent rows being written to the table (perhaps by some transactions taking longer than others) where the row ID is less than the greatest value in the table?
There is no way to prevent concurrent inserts in the table (short of locking the table, which is a bad idea, because it breaks autovacuum).
So there is no way to to guarantee that rows are inserted in a certain order. The order in which rows are inserted isn't really a meaningful concept in PostgreSQL.
If you really want that, you have to use a different mechanism to serialize inserts, for example using PostgreSQL advisory locks or synchronization mechanisms on the client side.
Except the numbers assigned are session specific, so a session that starts earlier but lasts longer can write to the table with an id that is less then one that started later but finished sooner. So either you create your own number sequence generation that involves locking or you use an INSERT timestamp.

How to update an aggregate table in a trigger procedure while taking care of proper concurrency?

For illustration, say I'm updating a table ProductOffers and their prices. Mutations to this table are of the form: add new ProductOffer, change price of existing ProductOffer.
Based on the above changes, I'd like to update a Product-table which holds pricing info per product aggregated over all offers.
It seems logical to implement this using a row-based update/insert trigger, where the trigger runs a procedure creating/updating a Product row.
I'd like to properly implement concurrent updates (and thus triggers). I.e.: updating productOffers of the same Product concurrently, would potentially lead to wrong aggregate values (because multiple triggered procedures would concurrently attempt to insert/update the same Product-row)
It seems I cannot use row-based locking on the product-table (i.e.: select .. for update) because it's not guaranteed that a particular product-row already exists. Instead the first time around a Product row must be created (instead of updated) once a ProductOffer triggers the procedure. Afaik, row-locking can't work with new rows to be inserted, which totally makes sense.
So where does that leave me? Would I need to roll my own optimistic locking scheme? This would need to include:
check row not exists => create new row fail if already exists. (which is possible if 2 triggers concurrently try to create a row). Try again afterwards, with an update.
check row exists and has version=x => update row but fail if row.version !=x. Try again afterwards
Would the above work, or any better / more out-of-the-box solutions?
EDIT:
For future ref: found official example which exactly illustrates what I want to accomplish: Example 39-6. A PL/pgSQL Trigger Procedure For Maintaining A Summary Table
Things are much simpler than you think they are, thanks to the I an ACID.
The trigger you envision will run in the same transaction as the data modification that triggered it, and each modification to the aggregate table will first lock the row that it wants to update with an EXCLUSIVE lock.
So if two concurrent transactions cause an UPDATE on the same row in the aggregate table, the first transaction will get the lock and proceed, while the second transaction will have to wait until the first transaction commits (or rolls back) before it can get the lock on the row and modify it.
So data modifications that update the same row in the aggregate table will effectively be serialized, which may hurt performance, but guarantees exact results.

Does DROP COLUMN block on a postrgeSQL database

I have the following column in a postgreSQL database
column | character varying(10) | not null default 'default'::character varying
I want to drop it, but the database is huge and if it blocks updates for an extended period of time I will be publicly flogged, and likely drawn and quartered. I found a blog from braintree, here, which suggests this is a safe operation but its a little vague.
The ALTER TABLE command needs to acquire an ACCESS EXCLUSIVE lock on the table, which will block everything trying to access that table, including SELECTs, and, as the name implies, needs to wait for existing operations to finish so it can be exclusive.
So, if your table is extremely busy, it may not get an opportunity to actually acquire the exclusive lock, and will simply block for what is functionally forever.
It also depends whether this column has a lot of indexes and dependencies. If there are dependencies (i.e. foreign keys or views), you'll need to add CASCADE to the DROP COLUMN, and this will increase the work that needs to be done, and the amount of time it will need to hold the exclusive lock.
So, it's not risk free. However, you should know fairly quickly after trying it whether it's likely to block for a long time. If you can try it and safely take a minute or two of potentially blocking that table, it's worth a shot -- try the drop and see. If it doesn't complete within a relatively short period of time, abort the command and you'll likely need to schedule some downtime of at least the app(s) that are hammering the table. (You can take a look at the server activity and the lock activity to try to surmise what's hammering that table.)
does drop column block a PostgreSQL database
The answer to that is no, because it does not block the database.
However any DDL statement requires an exclusive lock on the table being changed. Which means no other transaction can access the table. So the table is "blocked", not the database.
However the time to drop a column is really very short, because the column isn't physically removed from the table but only marked as no longer there.
And don't forget to commit the DDL statement (if you have turned autocommit off), otherwise the table will be blocked until you commit your change.

How to manage foreign key errors from insert for the purpose of data validation (t-sql)

I am building a database in SQL Server 2000 and need to perform data validation by testing for foreign key violations. This post is related to an earlier post I made (Trigger exits on first failed insert and cant set xact_abort OFF in SQL Server 2000) which focussed on how to port from a working SQL Server 2005 implementation to a server 2000 implementation. Following the advice received on this post indicating wholesale recoding was required, i am now re-considering the design itself - hence this post. To recap on my application, my
I receive a daily data feed containing ~5k records into a Staging table. When this insert is done a single record is then added to a table called TRIGGER_DATA.
I have created a trigger ‘on insert’ on this table which then attempts to insert the data therein into a FACT_data table one record at a time.
The FACT_data table is foreign keyed to many DIM tables which define the acceptable inputs the field can take.
If any record violates a foreign key constraint the insert should fail and the record should instead be inserted into a Load_error table (which has no foreign key and all fields are Nullable).
Given the volume of records in each insert i thought it would be a bad idea to create the trigger on the Stage_data table since this would result in ~5k trigger firing in one go each day. However since i cannot set xact_abort off in a trigger under SQL Server 2000 and therefore on the first failure it aborts in the trigger i am wondering if it might be actually be a half decent solution.
Questions:
The basic question i am now asking myself is what is the typical approach for doing this - it seems to me that this kind of data validation through checking for FK violations must be common and therefore a consensus best practise may have emerged (although i really cant find any for server 2000 platform!)
Am i correct that the trigger on the stage_data table would be bad practise given the volume of records in each insert or is it acceptable?
Is my approach of looping through each record from within the trigger and testing the insert ok?
What are your thoughts on this alternative that i have just thought of. Stop using triggers altogether and, after the Stage table is loaded, update a 'stack' table with a record saying that data had been received and was ready to be validated and loaded to the FACT table (perhaps along with a priority level indicating order in which order tasks must be processed). This stack or 'job' table would then be a register of all requested inserts along with their status (created/in-progress/completed). I would then have a stored procedure continually poll this table and process the top priority record. This would mean that all stored proc calls would happen outwith the trigger.
Many thanks
You don't need a trigger at all. Unless there is some reason that you need split-second timing of this daily data load, just schedule a job (stored proc) that runs as often as necessary to look for data in the staging table.
When it finds any, process the records one at a time and load the ones that are OK and do whatever you do with the ones that have broken FKs (delete, move to a work queue, etc.).
If you use a schedule frequency that is often enough that there is some risk of the next job starting while the last one is still running, then you should create a sentinel table that your stored proc can write in to say that the job is running. This could work one of two ways. Either you just have one record that says "running" or "not running" or, you could have one record per job (like a transaction log) that has a status code indicating whether the job is complete or not.

basic doubts in T-SQL triggers

What is the difference between FOR and AFTER in trigger definition. Any benefits of using one vs another?
If I issue an update statement which updates 5 rows, does the trigger (with FOR UPDATE) fires 5 times? If it is so, is there any way to make trigger fire only once for the entire UPDATE statment (even though it updates multiple rows)
Is there any chance/situation of having more than one row in "inserted" or "deleted" table at any time in a trigger life cycle. If it so, can I have a very quick sample on that?
thanks
Trigger fire once for each batch and should always be designed with that in mind. Yes if you do a multi-row update insert or delte, allthe rows will be in the inserted or deleted tables. For instance the command
Delete table1 where state = 'CA'
would have all the rows in the table that have a state of CA in them even if it was 10,000,000 of them. That is why trigger testing is critical and why the trigger must be designed to handle multi-row actions. A trigger that works well for one row may bring the deatabase toa screeching halt for hours if poorly designed to handle mulitple rows or could cause data integrity issues if not designed correctly to handle mulitple rows. Triggers should not rely on either cursors or loops for the most part but on set-based operations. If you are setting the contents of inserted or delted to a variable, you are almost certainly expecting one row and yor trigger will not work properly when someone does a set-based operation on it.
SQL Server has two basic kinds of DML triggers, after triggers which happen after the record has been placed in the table. These are typically used to update some other table as well. Before triggers take the place of the insert/update/delete, they are used for special processing onthe table inserted usually. It is important to know that a before trigger will not perform the action that was sent to the table and if you still want to delete/update or insert as part of the trigger you must write that into the trigger.