PostgreSQL Gap lock by SELECT ... FOR UPDATE - postgresql

There is a query with a Gap Lock used at MySQL/InnoDB:
SELECT id, time, count
FROM table_a
WHERE time
BETWEEN DATE_SUB(NOW(), INTERVAL 24 HOUR)
AND NOW()
FOR UPDATE
It locks the time range and returns a recent record if present (during last 24 hrs).
If not - the session still owns a lock over the duration of last 24 hours to safely insert a new record.
Is is possible to make the same gap lock for entire 24 hours duration (even if there are no records) in PostgreSQL?

The way to do that in PostgreSQL is to use the SERIALIZABLE isolation level for all transactions.
Then you don't need the FOR UPDATE at all. PostgreSQL won't prevent rows from being inserted in the gap, but if two transactions both read and write values in the same gap simultaneously, one of them will get a serialization error and have to redo the transaction (on the second try, it will find the gap not empty).
The concept at work here is serializability: it is acceptable if someone else inserts into the gap without reading (that transaction is logically after the one with your SELECT). But if two transactions both find the gap empty and then insert something, that would create an anomaly that is prevented by SERIALIZABLE.

Related

High speed single row inserts with PostgreSQL & TimescaleDB

I have a case with a TSDB Hypertable looking approximately like this:
CREATE TABLE data (
pool_id INTEGER NOT NULL,
ts TIMESTAMP NOT NULL,
noise_err DECIMAL,
noise_val DECIMAL,
signal_err DECIMAL,
signal_val DECIMAL,
high_val DECIMAL,
low_val DECIMAL,
CONSTRAINT data_pid_fk FOREIGN KEY (pool_id) REFERENCES pools (id) ON DELETE CASCADE
);
CREATE UNIQUE INDEX data_pts_idx ON data (pool_id, ts);
SELECT create_hypertable('data', 'ts', 'pool_id', 100);
There are ~1000 pools, data contains >1 year of minute records for every pool, and quite a few analytical queries working with the last 3 to 5 days of data. New data is coming with arbitrary delay: 10ms to 30s depending on the pool.
Now the problem: I need to run analytical queries as fast as possible after the new record has been received, hence I can't insert in batches, and I need to speed up single row insertions.
I've run timescaledb-tune, then turned off synchronous commits (synchronous_commit = off), played with unlogged table mode, and tried to disable the auto vacuum, which didn't help much.
The best insert time I get is ~37ms and degrading when concurrent inserts start to 110ms.
What else except removing indexes/constraints can I do to speed up single row inserts?
First, why use timescaledb for this table in the first place? What are you getting from it that is worth this slowdown?
Second, you have 5200 partitions per year worth of data. That is approaching an unmanageable number of partitions.
I question the requirement for analytical queries that need to see the latest split second of data.
Anyway, the way to speed up single row inserts is:
Set synchronous_commit to off.
But be aware that that means data loss of around half a second of committed transactions in the event of a crash! If that is unacceptable, play with commit_siblings and commit_delay; that will also reduce the number of WAL flushes.
Use prepared statements. With single row inserts, planning time will be significant.
Don't use unlogged tables unless you don't mind if you lose the data after a crash.
Don't disable autovacuum.
Increase max_wal_size to get no more checkpoints than is healthy.

Postgres - Bulk transferring of data from one table to another

I need to transfer a large amount of data (several million rows) from one table to another. So far I’ve tried doing this….
INSERT INTO TABLE_A (field1, field2)
SELECT field1, field2 FROM TABLE_A_20180807_BCK;
This worked (eventually) for a table with about 10 million rows in it (took 24 hours). The problem is that I have several other tables that need the same process applied and they’re all a lot larger (the biggest is 20 million rows). I have attempted a similar load with a table holding 12 million rows and it failed to complete in 48 hours so I had to cancel it.
Other issues that probably affect performance are 1) TABLE_A has a field based on an auto-generated sequence, 2) TABLE_A has an AFTER INSERT trigger on it that parses each new record and adds a second record to TABLE_B
A number of other threads have suggested doing a pg_dump of TABLE_A_20180807_BCK and then load the data back into TABLE_A. I’m not sure a pg_dump would actually work for me because I’m only interested in couple of fields from TABLE_A, not the whole lot.
Instead I was wondering about the following….
Export to a CSV file…..
COPY TABLE_A_20180807_BCK (field1,field2) to 'd:\tmp\dump\table_a.dump' DELIMITER ',' CSV;
Import back into the desired table….
COPY TABLE_A(field1,field2) FROM 'd:\tmp\dump\table_a.dump' DELIMITER ',' CSV
Is the export/import method likely to be any quicker – I’d like some guidance on this before I start on another job that may take days to run, and may not even work any better! The obvious answer of "just try it and see" isn't really an option, I can't afford more downtime!
(this is follow-on question from this, if any background details are required)
Update....
I don't think there is any significant problems with the trigger. Under normal circumstances records are INPUTed into TABLE_A at a rate of about 1000/sec (including trigger time). I think the issue is likely to be size of the transaction, under normal circumstances records are inserted into in blocks of 100 records per INSERT, the statement shown above attempts to add 10 million records in a single transaction, my guess is that this is the problem, but I've no way of knowing if it really is, or if there's a suitable work around (or if the export/import method I've proposed would be any quicker)
Maybe I should have emphasized this earlier, every insert into TABLE_A fires a trigger that adds record to TABLE_B. It's the data that's in TABLE_B that's the final objective, so disabling the trigger isn't an option! This whole problem came about because I accidentally disabled the trigger for a few days, and the preferred solution to the question 'how to run a trigger on existing rows' seemed to be 'remove the rows and add them back again' - see the original post (link above) for details.
My current attempt involves using the COPY command with a WHERE clause to split the contents of TABLE_A_20180807_BCK into a dozen small files and then re-load them one at a time. This may not give me any overall time saving, but although I can't afford 24 hours of continuous downtime, I can afford 6 hours of downtime for 4 nights.
Preparation (if you have access and can restart your server) set checkpoint_segments to 32 or perhaps more. This will reduce the frequency and number of checkpoints during this operation. You can undo it when you're finished. This step is not totally necessary but should speed up writes considerably.
edit postgresql.conf and set checkpoint_segments to 32 or maybe more
Step 1: drop/delete all indexes and triggers on table A.
EDIT: Step 1a
alter table_a set unlogged;
(repeat step 1 for each table you're inserting into)
Step 2. (unnecessary if you do one table at a time)
begin transaction;
Step 3.
INSERT INTO TABLE_A (field1, field2)
SELECT field1, field2 FROM TABLE_A_20180807_BCK;
(repeat step 3 for all tables being inserted into)
Step 4. (unnecessary if you do one table at a time)
commit;
Step 5 re-enable indexes and triggers on all tables.
Step 5a.
Alter table_a set logged;

Postgres 9.4 detects Deadlock when read-modify-write on single table

We have an application with a simple table
given_entity{
UUID id;
TimeStamp due_time;
TimeStamp process_time;
}
This is a spring boot (1.2.5.RELEASE) application that uses spring-data-jpa.1.2.5.RELEASE with hibernate-4.3.10.FINAL as jpa provier.
We have 5 instances of this application with each of them having a scheduler running every 2 second and querying the database for rows that have a due_time of last 2 mins until now that are not yet processed;
SELECT * FROM given_entity
WHERE process_time is null and due_time between now() and NOW() - INTERVAL '2 minutes'
FOR UPDATE
Requirement is each row of above table gets successfully processed by exactly one of application instances.
Then the application instance processes these rows and update its process_time field in one transaction.
This may or may not take more than 2 seconds, which is scheduler interval.
Also we don't have any index but PK index on this table.
Second point worth noting is that these instances might insert rows this table which is called separately by clients.
Problem: in the logs I see this message from postgresql (rarely but it happens)
ERROR: deadlock detected
Detail: Process 10625 waits for ShareLock on transaction 25382449; blocked by process 10012.
Process 10012 waits for ShareLock on transaction 25382448; blocked by process 12238.
Process 12238 waits for AccessExclusiveLock on tuple (1371,45) of relation 19118 of database 19113; blocked by process 10625.
Hint: See server log for query details.
Where: while locking tuple (1371,45) in relation "given_entity"
Question:
How does this happen?
I checked postgresql locks and searched internet. I didn't find anything that says deadlock is possible on only one simple table.
I also couldn't reproduce this error using test.
Process A tries to lock row 1 followed by row 2. Meanwhile, process B tries to lock row 2 then row 1. That's all it takes to trigger a deadlock.
The problem is that the row locks are acquired in an indeterminate order, because the SELECT returns its rows in an indeterminate order. And avoiding this is just a matter of ensuring that all processes agree on an order when locking rows, i.e.:
SELECT * FROM given_entity
WHERE process_time is null and due_time between now() and NOW() - INTERVAL '2 minutes'
ORDER BY id
FOR UPDATE
In Postgres 9.5+, you can simply ignore any row which is locked by another process using FOR UPDATE SKIP LOCKED.
This can easily happen.
There are probably several rows that satisfy the condition
due_time BETWEEN now() AND now() - INTERVAL '2 minutes'
so it can easily happen that the SELECT ... FOR UPDATE finds and locks one row and then is blocked locking the next row. Remember – for a deadlock it is not necessary that more than one table is involved, it is enough that more than one lockable resource is involved. In your case, those are two different rows in the given_entity table.
It may even be that the deadlock happens between two of your SELECT ... FOR UPDATE statements.
Since you say that there is none but the primary key index on the table, the query has to perform a sequential scan. In PostgreSQL, there is no fixed order for rows returned from a sequential scan. Rather, if two sequential scans run concurrently, the second one will “piggy-back” on the first and will start scanning the table at the current location of the first sequential scan.
You can check if that is the case by setting the parameter synchronize_seqscans to off and see if the deadlocks vanish. Another option would be to take a SHARE ROW EXCLUSIVE lock on the table before you run the statement.
Switch on hibernate batch updates in your application.properties
hibernate.batch.size=100
hibernate.order_updates=true
hibernate.order_inserts=true
hibernate.jdbc.fetch_size = 400

How to know the actual count in a huge table with high insertion rate

Let's say I have a table with exactly 10M rows. I need to know the exact count of the rows. A COUNT request takes 5 seconds. Let's say exactly 100 rows are added to the table every second.
If I ask the DB the count at the moment it has exactly 10,000,000 rows, and this request takes exactly 5 seconds to complete, will the result be 10000000, 10000500 or something between these two values?
If you are not running the statement inside an explicit transaction, then the count it gives will be correct as-of the moment the statement started executing, so 10000000 not 10000500.
If you are running it inside a transaction, the exact behavior depends on the isolation level you use, and what happened previously in that transaction.
Assuming your table has an auto-increment id column, you can execute this in a couple of milliseconds:
select max(id) - min(id)
from my_big_table
This assumes you don't have any gaps, which is the typical case.
Actually, you can know exactly the delta of how many rows are "missing" by running this once:
select max(id) - count(*) from my_big_table
And remembering the value. Unless you delete rows, this won't change (if you do delete rows, run it again).
Now that you know the delta, this will be accurate:
select max(id) - <delta>
from my_big_table
Which will be accurate and extremely fast, so you don't need to worry about the implications of slow queries.

In-order sequence generation

Is there a way to generate some kind of in-order identifier for a table records?
Suppose that we have two threads doing queries:
Thread 1:
begin;
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
commit;
Thread 2:
begin;
insert into table1(id, value) values (nextval('table1_seq'), 'world');
commit;
It's entirely possible (depending on timing) that an external observer would see the (2, 'world') record appear before the (1, 'hello').
That's fine, but I want a way to get all the records in the 'table1' that appeared since the last time the external observer checked it.
So, is there any way to get the records in the order they were inserted? Maybe OIDs can help?
No. Since there is no natural order of rows in a database table, all you have to work with is the values in your table.
Well, there are the Postgres specific system columns cmin and ctid you could abuse to some degree.
The tuple ID (ctid) contains the file block number and position in the block for the row. So this represents the current physical ordering on disk. Later additions will have a bigger ctid, normally. Your SELECT statement could look like this
SELECT *, ctid -- save ctid from last row in last_ctid
FROM tbl
WHERE ctid > last_ctid
ORDER BY ctid
ctid has the data type tid. Example: '(0,9)'::tid
However it is not stable as long-term identifier, since VACUUM or any concurrent UPDATE or some other operations can change the physical location of a tuple at any time. For the duration of a transaction it is stable, though. And if you are just inserting and nothing else, it should work locally for your purpose.
I would add a timestamp column with default now() in addition to the serial column ...
I would also let a column default populate your id column (a serial or IDENTITY column). That retrieves the number from the sequence at a later stage than explicitly fetching and then inserting it, thereby minimizing (but not eliminating) the window for a race condition - the chance that a lower id would be inserted at a later time. Detailed instructions:
Auto increment table column
What you want is to force transactions to commit (making their inserts visible) in the same order that they did the inserts. As far as other clients are concerned the inserts haven't happened until they're committed, since they might roll back and vanish.
This is true even if you don't wrap the inserts in an explicit begin / commit. Transaction commit, even if done implicitly, still doesn't necessarily run in the same order that the row its self was inserted. It's subject to operating system CPU scheduler ordering decisions, etc.
Even if PostgreSQL supported dirty reads this would still be true. Just because you start three inserts in a given order doesn't mean they'll finish in that order.
There is no easy or reliable way to do what you seem to want that will preserve concurrency. You'll need to do your inserts in order on a single worker - or use table locking as Tometzky suggests, which has basically the same effect since only one of your insert threads can be doing anything at any given time.
You can use advisory locking, but the effect is the same.
Using a timestamp won't help, since you don't know if for any two timestamps there's a row with a timestamp between the two that hasn't yet been committed.
You can't rely on an identity column where you read rows only up to the first "gap" because gaps are normal in system-generated columns due to rollbacks.
I think you should step back and look at why you have this requirement and, given this requirement, why you're using individual concurrent inserts.
Maybe you'll be better off doing small-block batched inserts from a single session?
If you mean that every query if it sees world row it has to also see hello row then you'd need to do:
begin;
lock table table1 in share update exclusive mode;
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
commit;
This share update exclusive mode is the weakest lock mode which is self-exclusive — only one session can hold it at a time.
Be aware that this will not make this sequence gap-less — this is a different issue.
We found another solution with recent PostgreSQL servers, similar to #erwin's answer but with txid.
When inserting rows, instead of using a sequence, insert txid_current() as row id. This ID is monotonically increasing on each new transaction.
Then, when selecting rows from the table, add to the WHERE clause id < txid_snapshot_xmin(txid_current_snapshot()).
txid_snapshot_xmin(txid_current_snapshot()) corresponds to the transaction index of the oldest still-open transaction. Thus, if row 20 is committed before row 19, it will be filtered out because transaction 19 will still be open. When the transaction 19 is committed, both rows 19 and 20 will become visible.
When no transaction is opened, the snapshot xmin will be the transaction id of the currently running SELECT statement.
The returned transaction IDs are 64-bits, the higher 32 bits are an epoch and the lower 32 bits are the actual ID.
Here is the documentation of these functions: https://www.postgresql.org/docs/9.6/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT
Credits to tux3 for the idea.