Why does Postgres sequence item go up even if object creation fails? - postgresql

I have a Postgres item where one of my models is Client simply indexed by its primary key. I was having an issue creating clients because somewhere along the lines someone created a client while explicitly setting its primary key which I have read doesn't affect Postgres' sequence table for the Clients, which is responsible for auto-incrementing the primary key 1 whenever a Client object is created.
I ran some SQL queries to play around with it, and found that the current sequence value was in fact 1 lower, 262, than the highest id for Clients in the database 263 so it was saying that a Client with the ID 263 already existed. I tried creating a Client in our front end application, got the error again, and decided to re run the queries. I saw that there was no new client created in the database, as expected, but I also noticed that the sequence value did go up to 263, so when I tried creating a client again it worked!
Is this normal behavior for a PostgreSQL sequence table to increment up even if creation of its related model fails ? If so it seems like that could cause some serious issues.

Yes, this the expected behaviour. See docs:
nextval
Advance the sequence object to its next value and return that value. This is done atomically: even if multiple sessions execute nextval concurrently, each will safely receive a distinct sequence value.
If a sequence object has been created with default parameters, successive nextval calls will return successive values beginning with 1. Other behaviors can be obtained by using special parameters in the CREATE SEQUENCE command; see its command reference page for more information.
Important: To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used, even if the transaction that did the nextval later aborts. This means that aborted transactions might leave unused "holes" in the sequence of assigned values.
Note that nextval is normally set as a default value for a autoincrement/serial column.
Also try to imagine how hard and inefficient it would be if nextval were to rollback. Essentially you would have to lock every client on nextval until whole transaction (the one that acquired the lock) is processed. In that case forget about concurrent inserts.
If so it seems like that could cause some serious issues.
Like what? The issue in your case was that someone manually specified a value for an autoincrement column. You should never do that unless you are a samurai. :)

Related

when should I use an After trigger instead of a Before trigger?

Afaik, although I'm pretty new to Postgres, Before-triggers are less expensive then After-triggers.
After all, if you want to change the current record (using NEW), you can change the record before it is written. In contrast, with After-triggers you need two writes: 1 verbatim write and 1 as a result of the after-trigger.
At the same time, all functionality that is available in after-triggers seems to be available in before-triggers. If I'm not mistaken.
So why would you ever use After-triggers to begin with?
If you're changing the record upon which the trigger is acting use a BEFORE trigger. If you're doing some complex logic that may prevent the record from being changed, use a BEFORE trigger.
Almost anything else, use an AFTER trigger. An example might be where you're inserting child records which rely upon the primary key of a record being inserted. For example, if you're adding an entry to a history table for a newly inserted row. The parent row won't exist in the BEFORE trigger, so would fail foreign key checks.

What will happen if two processes modify data in two transactions at the same time and there is a unique constraint on the table?

I am thinking about a race condition in a production system I am working on. Database is PostgreSQL. Application is written in Java, but this is not relevant.
There is a table called "versions", which contains columns "entity_ID" and "version" (and some other fields). This table contains versions of a certain entity.
There is an application where user can modify those entities.
Every modification of an entity creates a new version to the tabel "versions" (using a trigger). This trigger finds the last version in the same table "versions" and inserts a new row with the same entity_ID, but version = (last version + 1).
There is a nightly job that is run in PostgreSQL every 4:00 that also changes those entities and therefore updates data in the table "versions". This procedure was designed to finish its work by the morning (before users of the application start to use it), but unfortunately runs into the day. As this procedure is run in a function, then it is one big transaction. Therefore the changes done by it are not visible to the application.
The nightly job uses the following workflow:
Set "failed_counter" = 0
Iterate over entities that need to be modified
Do modifications to the entity inside a BEGIN .. EXCEPTION .. END block
If there is an EXCEPTION, increase the "failed_counter". Log the exception and the failed entity to a log table.
If "failed_counter" > 10, cancel work.
End work
This has caused the following race condition to happen a few times (lets assume that X is the last version of entity A):
Nightly job starts
Nightly job modifies entity A, creating version X+1
Application is used to also modify entity A, creating also version X+1 (because the nightly job transaction has not COMMITed, so the version X+1 is not visible to the application)
Nightly job ends, causing COMMIT
There are now two versions with version number X+1, which causes application to break.
I thought that I could just solve the problem by using an UNIQUE CONSTRAINT over fields (entity_ID, version). I thought that it would cause the application to receive an error (due to violating the UNIQUE CONSTRAINT) at race condition step 3. But I am not sure how does the unique constraint work in this situation. In race condition step 3, when the application adds a version, does the database check the UNIQUE CONSTRAINT? I suppose not, since the transaction of the nightly process has not been completed. If I am correct and the UNIQUE CONSTRAINT is checked only at race condition step 4, when COMMIT is made, then this causes the whole nightly procedure to fail, which is not desired result.
So, the question is the following.
When is the UNIQUE CONSTRAINT checked: At race condition step 3 or race condition step 4?
If the answer to the last question is "race condition 4", then how could I change the design of the system to avoid the above-mentioned problems?
By default, unique constraints in PostgreSQL are checked at the end of each statement. It's easy to test the behavior using psql.
Some big, red flags . . .
As this procedure is run in a function, then it is one big transaction.
It's not one, big transaction because you're running a function. It's one, big transaction because you haven't run the function several times over smaller subsets of the data. Whether you can run the function over subsets is application-dependent.
Iterate over entities that need to be modified
Rough rule of thumb for SQL databases: iteration is always a mistake.
SQL is a set-oriented language. Dealing with sets is usually faster than iteration, and often by several orders of magnitude.
If "failed_counter" > 10, cancel work.
This looks suspicious. Why are nine failures ok? Why are any failures ok?
I thought that I could just solve the problem by using an UNIQUE CONSTRAINT over fields (entity_ID, version).
That you don't already have a unique constraint on those two columns is a big, waving red flag. Fix this first.
The fact that an application should apparently be waiting for a batch job to finish, but isn't waiting, might or might not be a system design issue. (It smells like a system design issue.)
There is a nightly job that is run in PostgreSQL every 4:00 ...
Did you think of starting at 3:00?
Test this, but not on your production server.
Drop the trigger.
Add a column of type timestamp with time zone.
Set that column's default value. Most applications will use current_timestamp, but you might want clock_timestamp() instead. Docs
Add a unique constraint on {entity_id, new timestamp column}.
Eliminating the trigger might speed things up enough for you.

Salesforce.com: UNABLE_TO_LOCK_ROW, unable to obtain exclusive access to this record

In our production org, we have a system of uploading sales data into Salesforce using command line data loader. This data is loaded into a temporary object Temp. We have created a formula field (which combines three fields) to form a unique key. The purpose of the object is to reduce user efforts for creating the key manually.
There is an after insert trigger on Temp which calls an asynchronous method which upserts the data to another object SalesData using the key. The insert/update trigger on SalesData checks the various fields and creates/updates the records in another object SalesRecords. After the insertion/updation is complete, all the records in temp object Temp are deleted. The SalesRecords object does not have any trigger on it and is a child of another object Sales. The Sales object has some rollup fields which are summing up fields from SalesRecords object.
Lately, we are getting the below error for some of the records which are updated.
UNABLE_TO_LOCK_ROW, unable to obtain exclusive access to this record
Please provide some pointers to resolve the issue
this could either be caused by conflicting DML operations in the various trigger execution or some recursive trigger execution. i would assume that the async executions cause multiple subsequent updates on the same records, probably on the SalesRecords object. I would recommend to try to simplify the process to avoid too many related trigger executions.
I'm a little surprised you were able to get this to work in the first place. After triggers should be used with caution and only when before triggers can't be. One reason for this is that you don't need to perform additional DML to make changes to records, since in before triggers you simply change the values and the insert/update commit happens automatically. But recursive trigger firings is the main problem with after triggers.
One quick way to avoid trigger re-entry is to use a public static Boolean in a class that states whether you're already in this trigger from the same thread of execution.
Something like:
public static Boolean isExecuting = false;
Once set to true, any trigger code that is a re-fire can be avoided with:
if(Class.isExecuting == false)
{
Class.isExecuting = true;
// Perform trigger logic
// ...
}
Additionally, since the order of trigger execution cannot be determined up front, you might be seeing an issue with deletions or other data changes that depend on other parts of your flow to finish first.
Also, without knowing the details of your custom unique 3-part key, I'd wonder if there's a problem there too such as whether it's truly unique or not. Case insensitivity is a common mistake and it's the reason there are 15 AND 18 character Ids in Salesforce. For example, when people export to Excel (a case-insensitive environment) and do VLOOKUPs, they would occasionally find the wrong record. The 3-digit calculated suffix was added to disambiguate for case-insensitive environments.
Googling for this same error lead me to this post:
http://boards.developerforce.com/t5/General-Development/Unable-to-obtain-exclusive-access-to-this-record/td-p/345319
Which points out some common causes for this to happen:
Sharing Rules are being calculated.
A picklist value has been replaced and replacement is in progress.
A custom index creation/removal is in progress.
Most unlikely one - someone else is already editing the same record that you are trying to access at the same time.
Posting here in case somebody else needs it.
I got this error multiple times today. Turned out one of our vendors was updating their installed package during that time in the same org. All kinds of things were going wrong also - some object validation exceptions were being thrown on DMLs, without any error message content.
Resolution
The error is shown when a field update such as a roll-up summary field is being attempted on a parent object that already had a field update to cause the roll-up summary field to calculate. This could also occur if a trigger or another apex job running on the master object and it also attempting to do an update.
You can either reduce the batch size and try again or create separate smaller files to be imported if this issue occurs.

How to safely increment a counter in Entity Framework

Let's say I have a table that tracks the number of times a file was downloaded, and I expose that table to my code via EF. When the file is downloaded I want to update the count by one. At first, I wrote something like this:
var fileRecord = (from r in context.Files where r.FileId == 3 select r).Single();
fileRecord.Count++;
context.SaveChanges();
But then when I examined the actual SQL that is generated by these statements I noticed that the incrementing isn't happening on the DB side but instead in my memory. So my program reads the value of the counter in the database (say 2003), performs the calculation (new value is 2004) and then explicitly updates the row with the new Count value of 2004. Clearly this isn't safe from a concurrency perspective.
I was hoping the query would end up looking instead like:
UPDATE Files SET Count = Count + 1 WHERE FileId=3
Can anyone suggest how I might accomplish this? I'd prefer not to lock the row before the read and then unlock after the update because I'm afraid of blocking reads by other users (unless there is someway to lock a row only for writes but not block reads).
I also looked at doing a Entity SQL command but it appears Entity SQL doesn't support updates.
Thanks
You're certainly welcome to call a stored procedure with EF. Write a sproc with the SQL you show then create a function import in your EF model mapped to said sproc.
You will need to do some locking in order to get this to work. But you can minimise the amount of locking.
When you read the count and you want to update it, you must lock it, this can be done by placing the read and the update inside a transaction scope. This will protect you from race conditions.
When you read the value and you just want to read it, you can do this with a transaction isolation level of ReadUncommited, this read will then not be locked by the read/write lock above.

How to check for pending operations in a PostgreSQL transaction

I have a session (SQLAlchemy) on PostgreSQL, with an active uncommitted transaction. I have just passed the session to some call tree that may or may not have issued SQL INSERT/UPDATE/DELETE statements, through sqlalchemy.orm or directly through the underlying connection.
Is there a way to check whether there are any pending data-modifying statements in this transaction? I.e. whether commit would be a no-op or not, and whether rollback would discard something or not?
I've seen people point out v$transaction in Oracle for the same thing (see this SO question). I'm looking for something similar to use on PostgreSQL.
Start by checking into system view pg_locks.
http://www.postgresql.org/docs/8.4/interactive/view-pg-locks.html
Consider the following sequence of statements:
select txid_current();
begin;
select txid_current();
If the transaction id returned by the two selects is equal, then there is an open transaction. If not then there wasn't, (but now is).
If the numbers are different, then as a side effect you will just have opened a transaction, which you will probably want to close.
UPDATE: In fact, as #r2evans points out (thanks for the insight!), you don't need the "begin" -- txid_current() will return the same number just if you are in a transaction.
Since Postgres 10:
select txid_current_if_assigned();
will return null if there is no current transaction.
If a Start Transaction has been issued, it will still return null if there have been no updates.
No, not from the database level, really. Perhaps you can add some tracing at the sqlalchemy level to track it?
Also, how do you define a no-op? What if you updated a value to the same value it had before, is that a no-op or not? From the databases perspective, if it had one, it would not be a no-op. But from the application perspective, it probably would.