Why would the Postgres autoincrement counter move forward during Heroku DB failover? - postgresql

Heroku did automatic failover to a follower DB.
I noticed a gap in some autoincremented primary key columns after that failover. E.g. before the failover, the latest record had ID 117019 and the next record got ID 117052. No records were deleted.
It's not an issue as such, I was just curious what's going on here, and if I may be correct in attributing it to the failover, or if I should look for other explanations.

These gaps are probably the result of failed transactions that were rolled back.
Sequences are not transactional, that is, the sequence won't return the same value again after a rollback.
This is intentional, see the documentation:
To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used and will not be returned again. This is true even if the surrounding transaction later aborts, or if the calling query ends up not using the value. [...] Such cases will leave unused “holes” in the sequence of assigned values. Thus, PostgreSQL sequence objects cannot be used to obtain “gapless” sequences.

The PostgreSQL serial or identity columns are usual sequence. So, your the sequence can take several values, but some of them are not used.

Related

Reliable way to poll data from a Postgres table

I want to use a table in Postgres database as a storage for input documents (there will be billions of them).
Documents are being continuously added (using "UPSERT" logic to avoid duplicates), and rarely are removed from the table.
There will be multiple worker apps that should continuously read data from this table, from the first inserted row to the latest, and then poll new rows as they being inserted, reading each row exactly once.
Also, when worker's processing algorithm changes, all the data should be reread from the first row. Each app should be able to maintain its own row processing progress, independent of other apps.
I'm looking for a way to track last processed row, to be able to pause and continue polling at any moment.
I can think of these options:
Using an autoincrement field
And then store the autoincrement field value of the last processed row somewhere, to use it in a next query like this:
SELECT * FROM document WHERE id > :last_processed_id LIMIT 100;
But after some research I found that in a concurrent environment, it is possible that rows with lower autoincrement values will become visible to clients LATER than rows with higher values, so some rows could be skipped.
Using a timestamp field
The problem with this option is timestamps are not unique and could overlap during high insertion rate, what, once again, leads to skipped rows. Also, adjusting system time (manually or by NTP) may lead to unpredicted results.
Add a process completion flag to each row
This is the only actually reliable way to do this I could think of, but there are drawbacks to it, including the need to update each row after it was processed and extra storage needed to store the completion flag field for each app, and running a new app may require DB schema change. This is the last resort for me, I'd like to avoid it if there are more elegant ways to do this.
I know, the task definition screams I should use Kafka for this, but the problem with it is it doesn't allow to delete single messages from a topic, and I need this functionality. Keeping an external list of Kafka records that should be skipped during processing feels very clumsy and inefficient to me. Also, a real-time deduplication with Kafka would also require some external storage.
I'd like to know if there are other, more efficient approaches to this problem using the Postgres DB.
I ended up saving the transaction id for each row and then selecting records that have txid value lower than the transaction with smallest id to the moment like this:
SELECT * FROM document
WHERE ((txid = :last_processed_txid AND id > :last_processed_id) OR txid > :last_processed_txid)
AND txid < pg_snapshot_xmin(pg_current_snapshot())
ORDER BY txid, id
LIMIT 100
This way, even if Transaction #2, that was started after Transaction #1, completes faster than the first one, the rows it written won't be read by a consumer until the Transaction #1 finishes.
Postgres docs state that
xid8 values increase strictly monotonically and cannot be reused in the lifetime of a database cluster
so it should fit my case.
This solution is not that space-efficient, because an extra 8-byte txid field must be saved with each row, and an index for the txid field should be created, but the main benefits over other methods here are:
DB schema remains the same in case of adding new consumers
No updates needed to mark row as processed, a consumer only should keep id and txid values of the last processed row
System clock drift or adjustment won't lead to rows being skipped
Having the txid for each row helps to query data in insertion order in cases when multiple producers insert rows with id, generated using preallocated pool (for example, Producer 1 in the moment inserts rows with ids in 1..100, Producer 2 - 101..200 and so on)

Postgres: incrementing a counter field

How does one increment a field in a database such that even if if a thousands connections to the database try to increment it at once -- it will always be 1000 at the end (if started from zero).
I mean, so that no two connections increment the same number number resulting in a lost increment.
How do you synchronize and make sure the data is consistent? is there a must for a stores procedure for this? database "locking"? how is that done?
What you're looking for is a Postgres SEQUENCE.
You call nextval('sequence_name') to get the next number in the sequence.
According to the docs,
sequences are designed with concurrency in mind:
To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used, even if the transaction that did the nextval later aborts. This means that aborted transactions might leave unused "holes" in the sequence of assigned values.
EDIT:
If you're looking for a gapless sequence, in 2006 someone posted a solution to the PostgreSQL mailing list: http://www.postgresql.org/message-id/44E376F6.7010802#seaworthysys.com. It appears there's also a lengthy discussion on locking, etc.
The gapless-sequence question was also asked on SO even though there was never an accepted answer:
postgresql generate sequence with no gap

mongo save documents in monotically increasing sequence

I know mongo docs provide a way to simulate auto_increment.
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
But it is not concurrency-proof as guaranteed by say MySQL.
Consider the sequence of events:
client 1 obtains an index of 1
client 2 obtains an index of 2
client 2 saves doc with id=2
client 1 saves doc with id=1
In this case, it is possible to save a doc with id less than the current max that is already saved. For MySql, this can never happen since auto increment id is assigned by the server.
How do I prevent this? One way is to do optimistic looping at each client, but for many clients, this will result in heavy contention. Any other better way?
The use case for this is to ensure id is "forward-only". This is important for say a chat room where many messages are posted, and messages are paginated, I do not want new messages to be inserted in a previous page.
But it is not concurrency-proof as guaranteed by say MySQL.
That depends on the definition of concurrency-proof, but let's see
In this case, it is possible to save a doc with id less than the current max that is already saved.
That is correct, but it depends on the definition of simultaneity and monotonicity. Let's say your code snapshots the state of some other part of the system, then fetches the monotonic key, then performs an insert that may take a while. In that case, this apparently non-monotonic insert might actually be 'more monotonic' in the sense that index 2 was indeed captured at a later time, possibly reflecting a more recent state. In other words: does the time it took to insert really matter?
For MySql, this can never happen since auto increment id is assigned by the server.
That sounds like folklore. Most relational dbs offer fine-grained control over these features, since strict guarantees severely impact concurrency.
MySQL does neither guarantee that there are no gaps, nor that a transaction with a high AUTO_INCREMENT id isn't visible to other readers before a transaction that acquired a lower AUTO_INCREMENT value was committed, unless you keep a table-level lock, which severely impacts concurrency.
For gaplessness, consider a transaction rollback of the first of two concurrent inserts. Does the second insert now get a new id assigned while it's being committed? No - from the InnoDB documentation:
You may see gaps in the sequence of values assigned to the AUTO_INCREMENT column if you roll back transactions that have generated numbers using the counter. (see end of 14.6.5.5.1, "Traditional InnoDB Auto-Increment Locking")
and
In all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are “lost”
also, you're completely ignoring the problem of replication where sequences lead to even more trouble:
Thus, table-level locks held until the end of a statement make INSERT statements using auto-increment safe for use with statement-based replication. However, those locks limit concurrency and scalability when multiple transactions are executing insert statements at the same time. (see 14.6.5.5.2 "Configurable InnoDB Auto-Increment Locking")
The sheer length of the documentation of the InnoDB behavior is a reminder of the true complexity of making apparently simple guarantees in a concurrent system. Yes, monotonicity of inserts is possible with table-level locks, but hardly desirable. If you take a distributed view of the system, things get worse, because we can't even be sure of the counter value in partition mode...

DB2 Read committed without locking?

We have a transaction that is modifying a record. The transaction must call a web service, rolling back the transaction if the service fails (so it can't commit it before hand). Because the record is modified, the client app has a lock on it. However, the web service must retrieve that record to get information from it as part of it's processing. Bam, deadlock.
We use websphere, which, for reasons that boggle my mind, defaults to repeatable read isolation level. We knocked it down to read_committed, thinking that this would retrieve the row without seeking a lock. In our dev environment, it seemed to work, but in staging we're getting deadlocks.
I'm not asking why it behaved differently, we probably made a mistake somewhere. Nor am I asking about the specifics of the web service example above, because obviously this same thing could happen elsewhere.
But based on reading the docs, it seems like read_committed DOES acquire a shared lock during read, and as a result will wait for an exclusive lock held by another transaction (in this case the client app). But I don't want to go to read_uncommitted isolation level because I don't want dirty reads. Is there a less extreme solution? I need some middle ground where I can perform reads without any lock-waiting, and retrieve only committed data.
Is there such a goldilocks solution? Not too deadlock-y, not too dirty-read-y? If not in siolation level, maybe some modifier I can tack onto my SQL? Anything?
I assume you are talking jdbc isolation levels, and not db2. The difference between read_committed (cursor stability in db2) and repeatable_read (read stability) is how long the share locks are kept. repeatable_read keeps every lock that satisfied the predicates, read_committed on the other hand only keeps the lock until another row that matches the predicate is found.
Have you compared the plans? If the plans are different you may end up with different behaviour.
Are there any escalations occurring?
Have you tried CURRENTLY_COMMITTED (assuming you are on 9.7+)?
Pre currently_committed there where the following settings, DB2_SKIPINSERTED, DB2_EVALUNCOMMITTED and DB2_SKIPDELETED
The lowest isolation level that reads committed rows is read committed.
Usually, you process rows in a DB2 database like this:
Read database row with no read locks (ordinary SELECT with read committed).
Process data so you have a row with changed values.
Read database row again, with a read lock. (SELECT for UPDATE)
Check to see the database row in 1. matches the database row in 3.
If rows match, update database row.
If rows don't match, release update lock and go back to 2.

Is there any way to insert data in sqllite table sequentially?

I want to insert some data in SQLite table with one column for keeping string values and other column for keeping sequence number.
SQLite documentation says that autoincrement does not guarantees the sequential insertion.
And i do not want to keep track of previously inserted sequence number.
Is there any way for storing data sequentially, without keeping track of previously inserted row?
The short answer is that you're right that the autoincrement documentation makes it clear that INTEGER PRIMARY KEY AUTOINCREMENT will be constantly increasing, though as you point out you using, not necessarily sequentially so. So you obviously have to either modify your code so it's not contingent on sequential values (which is probably the right course of action), or you have to maintain your own sequential identifier yourself. I'm sure that's not the answer you're looking for, but I think it's the practical reality of the situation.
Short answer: Stop worrying about gaps in AUTOINCREMENT id sequences. They are inevitable when dealing with transactional databases.
Long answer:
SQLite cannot guarantee that AUTOINCREMENT will always increase by one, and reason for this is transactions.
Say, you have 2 database connections that started 2 parallel transactions almost at the same time. First one acquired some AUTOINCREMENT id and it becomes previously used value +1. One tick later, second transaction acquired next id, which is now +2. Now imagine that first transaction rolls back for some reason (encounters some error, code decided to abort it, program crashed, etc.). After that, second transaction will commit id +2, creating a gap in id numbering.
Now, what if number of such parallel transactions is higher than 2? You cannot predict, and you also cannot tell currently running transactions to reuse ids that were not used for any reason.
If you insert data sequentially into your SQLite database, they will be stored sequentially.
From the Documentation: the automatically generated ROWIDs are guaranteed to be monotonically increasing.
So, for example, if you wanted to have a table for Person, then you could use the following command to create table with autoincrement.
CREATE table PERSON (personID integer PRIMARY KEY AUTOINCREMENT, personName string)
Link: http://www.sqlite.org/autoinc.html