How to investigate time required to obtain lock - and why - within a procedure - postgresql

I am stumped on an issue I am having. The true context is rather complicated, but I can boil it down to these functional points (everything else is not related to the problematic table):
I have a trigger function that contains several SELECTs and then an UPDATE
The update takes an unreasonable amount of time to execute ("unreasonable" = > 1.4s)
The same exact queries when run outside the trigger (for the same rows, parameters, etc.) do not have any issues (i.e. they execute in under 1-2ms)
I am pretty sure that indexes, etc., are working as necessary; i.e. there shouldn't be any issues.
There are no circular triggers
There is on trigger on the destination table, but even with that removed the behavior is the same.
I have done many tests to no avail, but these are pretty meaningful:
when the update is replaced with a SELECT, the response time is fast, as expected
when the update is replaced with a SELECT... FOR UPDATE, the response time is slow, the same as the update
^ this (as well as other things) has led me to possibly believe that the delay is spent waiting to achieve a lock
No other transactions are really happening on that table. I am truly bewildered.
Server context: This is being run in AWS/RDS on db.m5.xlarge.
What I am looking for is whether there is a way to get some information about locks that are happening mid-transaction or possibly even a history of acquired locks? Or anything else that can give me insight into what is causing the delay that seems so closely related to acquiring a lock on that table.
Unfortunately, just to make everything even more frustrating, I cannot replicate the issue when I attempt to use EXPLAIN in the function body. The only way to do this (that I know of) is to use the EXECUTE... syntax with a query string. That doesn't have a delay - its also useless for the trigger.

Related

Most Performant way to implement time-dependent status

Central to a project I'm working on is a highlighting-mechanic that can be applied to certain items on the website. The idea is, that this highlighted-status is only active for a certain amount of time.
I'm trying to find the most performant way to achieve this (in querying, setting status, checking status and revoking it)
A first approach would be to set simply set a value 'highlighted:true' to the item. This seems to be the most performant way to query for highlighted items. The Drawback I see here, is that there also needs to be stored a date for the highlighting-action, but furthermore there needs to run an interval to check on the highlighted items and potentially revoke their highlighted status. Also the exact moment when the item stops beeing highlighted can't be determined exactly, since its depending on the interval of the check-function.
A second approach would be to mainly store the date of the highlighting-action and run the query against it. It seems that the query of highlighted objects is way less performant, since every item ever is beeing checked, and on top its not just a boolean, but a proper function that throws those differnt date-values around to check if it is still valid. On the upside there is no external cleanup-function neccessary and every highlighting period ends perfectly on time.
Would love to have your input on this. Is there maybe a clever pattern on this?

tsql query kill scenario

In our webapp, we have lots of queries running. Most of them reading data but some update queries with high priority might come. Since, we'd like to cancel read queries but when using KILL, I'd like the read query to return certain dataset result or execution result upon receiving cancel.
My intention is to mimic the behavior of signal in C programs for which a signal handler is invoked upon receiving a kill signal.
Is there any method to define an asynchrnous KILL signal handler for SPs?
This is not a fully tested answer. But it is a bit more than just a comment.
One is to have dirty read (with nolock).
This part is tested I do this all time.
Build a large scalable app you need to resort to this and manage it.
A dirty read will not block an update.
You can get that - a dirty read.
A lot of people think a dirty read may get corrupt data.
If you are updating smith to johnson the select is not going to get smison.
The select is going to get smith and it will be immediately stale.
But how is that worse then taking a read lock?
The read get smith and blocks the update.
Once the read locks are cleared it is updated.
I would contend that blocking an update is also stale data.
If you are using reader I think you could pass the same cancellation token to each select and then just cancel the one token.
But if may not process the CancellationToken until it read the row so it may not cancel a query a long running query that has not yet returned any rows.
DbDataReader.ReadAsync Method (CancellationToken)
Or if you are not using reader look at
SqlCommand.Cancel
As far as getting cancel to return alternate data. I doubt SQL is going to do that.

Could I save Postgres transaction and continue work with db within it later

I know about prepared transaction in Postgres, but seems you can just commit or rollback it later. You cannot even view the transaction's db state before you've committed it. Is any way to save transaction for later use?
What I want to achieve actually is a preview (and correcting) of some changes in db (changes are imports from csv file, so user need to see preview before apply it). I want to make changes, add some changes later, see full state of db and apply it (certainly, commit transaction)
I cannot find a very good reference in docs, but I have a very strong feeling that the answer is: No, you cannot do that.
It would mean that when you "save" the transaction, the database would basically have to maintain all of its locks in place for an indefinite amount of time. Even if it was possible, it would mean horrible failure modes and trouble on all fronts.
For the pattern that you are describing, I would use two separate transactions. Import to a staging table and show that to user (or import to the main table but mark rows as "unapproved"). If user approves, in another transactions move or update these rows.
You can always end up in a situation where user can simply leave or crash without clicking "OK" or "Cancel". If what you're describing was possible, you would end up with a hung transaction holding all these resources. In my proposed solution you end up with wasteful rows in "staging" table that you may still show to user later or remove.
You may want to read up on persistence saga. This is actually a very simple example of a well known and researched problem.
To make the long story short, this pattern breaks down a long-running process like yours into smaller operations that are applied and persisted in some way in separate transactions. If any of them happens to fail (or does not occur as expected), you have compensating actions that usually undo what the steps executed so far have done (e.g. by throwing away stale/irrelevant data).
Here's a decent introduction:
https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part/#:~:text=The%20SAGA%20Pattern,completion%20of%20the%20previous%20one.
http://vasters.com/clemensv/2012/09/01/Sagas.aspx
This concept was formally introduced in the 80s, but is well alive and relevant today.

How triggers works internally in SQL Server

Please correct me if I am wrong.
What I know about triggers is that they are triggered by events (Insert, Update, Delete). So we can run a stored procedure etc.. in the trigger.
This will give the application a good responsiveness because the query that the user interacts with, is quite small and this "other" longer time taking stuffs are taken care by the server internally as a separate task.
But I do not know about how the the triggers are handled inside the server. What I exactly want to know is what would happen in scenarios as given below.
Take Insert after trigger. And take trigger is executing a longer stored procedure. Then in the middle of the trigger there can be another insert. What I want to know is what will happen to that second trigger. If possible can I make that second trigger ignore itself.
marc_s has given the correct answer. I will copy it for the sake of completeness.
TRIGGERS ARE SYNCHRONOUS
If you want to have a asynchronous functionality go for a SQL broker implimenation.
Triggers are triggered by events - and then they are executed - right now. Since you cannot control when and how often they are triggered, you should keep the processing in those triggers to an absolute minimum - I always try to make - at most - an entry into another table (an "Audit" table) or possibly put a "marker" row into a "command" table. But the actual processing of that info - running stored procedures etc. - should be left to an outside job - don't do extensive processing in a trigger! This will reliably KILL all your performance\responsiveness.

Getting past Salesforce trigger governors

I'm trying to write an "after update" trigger that does a batch update on all child records of the record that has just been updated. This needs to be able to handle 15k+ child records at a time. Unfortunately, the limit appears to be 100, which is so far below my needs it's not even close to acceptable. I haven't tried splitting the records into batches of 100 each, since this will still put me at a cap of 10k updates per trigger execution. (Maybe I could just daisy-chain triggers together? ugh.)
Does anyone know what series of hoops I can jump through to overcome this limitation?
Edit: I tried calling following #future function in my trigger, but it never updates the child records:
global class ParentChildBulkUpdater
{
#future
public static void UpdateChildDistributors(String parentId) {
Account[] children = [SELECT Id FROM Account WHERE ParentId = :parentId];
for(Account child : children)
child.Site = 'Bulk Updater Fired';
update children;
}
}
The best (and easiest) route to take with this problem is to use Batch Apex, you can create a batch class and fire it from the trigger. Like #future it runs in a separate thread, but it can process up to 50,000,000 records!
You'll need to pass some information to your batch class before using database.executeBatch so that it has the list of parent IDs to work with, or you could just get all of the accounts of course ;)
I've only just noticed how old this question is but hopefully this answer will help others.
It's worst than that, you're not even going to be able to get those 15k records in the first place, because there is a 1,000 row query limit within a trigger (This scales to the number of rows the trigger is being called for, but that probably doesnt help)
I guess your only way to do it is with the #future tag - read up on that in the docs. It gives you much higher limits. Although, you can only call so many of those in a day - so you may need to somehow keep track of which parent objects have their children updating, and then process that offline.
A final option may be to use the API via some external tool. But you'll still have to make sure everything in your code is batched up.
I thought these limits were draconian at first, but actually you can do a hell of a lot within them if you batch things correctly, we regularly update 1,000's of rows from triggers. And from an architectural point of view, much more than that and you're really talking batch processing anyway which isnt normally activated by a trigger. One things for sure - they make you jump through hoops to do it.
I think Codek is right, going the API / external tool route is a good way to go. The governor limits still apply, but are much less strict with API calls. Salesforce recently revamped their DataLoader tool, so that might be something to look into.
Another thing you could try is using a Workflow rule with an Outbound Message to call a web service on your end. Just send over the parent object and let a process on your end handle the child record updates via the API. One thing to be aware of with outbound messages, it is best to queue up the process on your end somehow, and immediately respond to Salesforce. Otherwise Salesforce will resend the message.
#future doesn't work (does not update records at all)? Weird. Did you try using your function in automated test? It should work and and the annotation should be ignored (during the test it will be executed instantly, test methods have higher limits). I suggest you investigate this a bit more, it seems like best solution to what you want to accomplish.
Also - maybe try to call it from your class, not the trigger?
Daisy-chaining triggers together will not work, I've tried it in the past.
Your last option might be batch Apex (from Winter'10 release so all organisations should have it by now). It's meant for mass data update/validation jobs, things you typically run overnight in normal databases (it can be scheduled). See http://www.salesforce.com/community/winter10/custom-cloud/program-cloud-logic/batch-code.jsp and release notes PDF.
I believe in version 18 of the API the 1000 limit has been removed. (so the documentations says but in some cases I still hit a limit)
So you may be able to use batch apex. With a single APEX update statement
Something like:
List children = new List{};
for(childObect__c c : [SELECT ....]) {
c.foo__c = 'bar';
children.add(c);
}
update(children);;
Besure you bulkify your tigger also see http://sfdc.arrowpointe.com/2008/09/13/bulkifying-a-trigger-an-example/
Maybe a change to your data model is the better option here. Think of creating a formula on the children object where you access the data from the parent. This would be far more efficient probably.