Postgres eventually consistency Id - postgresql

Taken a table like this:
create table mytable(id SERIAL, largeImage bytea);
Imagine 2 process (A, B) simultaneously do inserts:
Process A arrives first but contains a very large file.
After that Process B arrives but very small file.
I suppose that inserts will work in paralell and consistency order (id assignament) is done because nextval is assigned on arrives moment. (Maybe I am wrong).
Because the Process B file is small than process A, the Process B is faster thant Process A to save on disk. (my guess)
My questions:
Process A will be assigned with ID=1 a, d Process B will get Id=2. Thats correct?
It is posible (rare but possible) that "select * from mytable" will return only the ID=2 because save disk operation of Process A (Id=1) has not finished already?
Thanks in advance.

Process A will be assigned with ID=1 and Process B will get Id=2. Thats correct?
Probably yes, but I have not tested it.
It is posible (rare but possible) that select * from mytable will return only the ID=2 because save disk operation of Process A (Id=1) has not finished already?
This is possible for a variety of reasons and not only for a short period of time. Imagine the process that got the ID=1 crashes before it finishes the transaction. You will then never see a row with ID=1 in your table.
If you catch yourself assuming that SERIAL means "continuous", take a step back and redesign your data model. SERIAL rather means "automatic value that is guaranteed to be unique". It is a good idea for an ID, it is generally not appropriate for numbering.

Related

Try to understand MVCC

I'm trying to understand MVCC and can't get it. For example.
Transaction1 (T1) try to read some data row. In that same time T2 update the same row.
The flow of transaction is
T1 begin -> T2 begin -> T2 commit -> T1 commit
So the first transaction get it snapshot of database and returns to user result on which he is gonna build other calculation. But as I understand, customer get the old data value? As I understand MVCC, after T1 transaction begins, that transaction doesn't know, that some other transaction change data. So if now user doing some calculation after that (without DB involved), he is doing it on wrong data? Or I'm not right and first transaction have some mechanisms to know that row was changed?
Let's now change the transaction flow.
T2beg -> T1beg -> T2com -> T1com
In 2PL user get the newer version of data because of locks (T1 must wait before exclusive lock released). But in case of MVCC it still will be the old data, as I understand the postgresql MVCC model. So I can get stale data in exchange for speed. Am I right? Or I miss something?
Thank you
Yes, it can happen that you read some old data from the database (that a concurrent transaction has modified), perform a calculation based on this and store “outdated” data in the database.
This is not a problem, because rather than the actual order in which transactions happen, the logical order is more relevant:
If T1 reads some data, then T2 modifies the data, then T1 modifies the database based on what it read, you can say that T1 logically took place before T2, and there is no inconsistency.
You only get an anomaly if T1 and T2 modify the same data: T1 reads data, T2 modifies the data, then T1 modifies the same data based on what it read. This anomaly is called a “lost update”.
Lost updates can only occur with the weakest (which is the default) isolation level, READ COMMITTED.
If you want better isolation from concurrent activity, you need to use at least REPEATABLE READ isolation. Then T1 would receive a serialization error when it tries to update the data, and you would have to repeat the transaction. On the second attempt, it would read the new data, and everything will be consistent.
The flow of transaction is next T1 begin -> T2 begin -> T2 commit -> T1 commit. So the first transaction get it snapshot of database and returns to user result on which he is gonna build other calculation. But as I understand, customer get the old data value?
Unless T1 and T2 try to update the same row, this could usually be reordered to be the same as: T1 begin; T1 commit; T2 begin; T2 commit;
In other words, any undesirable business outcome that could be achieved by T2 changing the data while T1 was making a decision, could have also occurred by T2 changing the data immediately after T1 made the same decision.
Each transaction can only see data that is younger or equal to that transaction's ID.
When transaction 1 reads data, it marks the read time stamp of that data to transaction 1.
If transaction 2 tries to read the same data, it checks the read timestamp of the data, if the read timestamp for the data is less than transaction 2, then transaction 2 is aborted because 1 < 2 -- 1 got there before us and they must finish before us.
At commit time, we also check if the read timestamp of the data is less than the committing transaction. If it is, we abort the transaction and restart with a new transaction ID.
We also check if write timestamps are less than our transaction. Younger transaction wins.
There is an edge case where a younger transaction can abort if an older transaction gets ahead of the younger transaction.
I have actually implemented MVCC in Java. (see transaction, runner and mvcc code)

Postgres - Bulk transferring of data from one table to another

I need to transfer a large amount of data (several million rows) from one table to another. So far I’ve tried doing this….
INSERT INTO TABLE_A (field1, field2)
SELECT field1, field2 FROM TABLE_A_20180807_BCK;
This worked (eventually) for a table with about 10 million rows in it (took 24 hours). The problem is that I have several other tables that need the same process applied and they’re all a lot larger (the biggest is 20 million rows). I have attempted a similar load with a table holding 12 million rows and it failed to complete in 48 hours so I had to cancel it.
Other issues that probably affect performance are 1) TABLE_A has a field based on an auto-generated sequence, 2) TABLE_A has an AFTER INSERT trigger on it that parses each new record and adds a second record to TABLE_B
A number of other threads have suggested doing a pg_dump of TABLE_A_20180807_BCK and then load the data back into TABLE_A. I’m not sure a pg_dump would actually work for me because I’m only interested in couple of fields from TABLE_A, not the whole lot.
Instead I was wondering about the following….
Export to a CSV file…..
COPY TABLE_A_20180807_BCK (field1,field2) to 'd:\tmp\dump\table_a.dump' DELIMITER ',' CSV;
Import back into the desired table….
COPY TABLE_A(field1,field2) FROM 'd:\tmp\dump\table_a.dump' DELIMITER ',' CSV
Is the export/import method likely to be any quicker – I’d like some guidance on this before I start on another job that may take days to run, and may not even work any better! The obvious answer of "just try it and see" isn't really an option, I can't afford more downtime!
(this is follow-on question from this, if any background details are required)
Update....
I don't think there is any significant problems with the trigger. Under normal circumstances records are INPUTed into TABLE_A at a rate of about 1000/sec (including trigger time). I think the issue is likely to be size of the transaction, under normal circumstances records are inserted into in blocks of 100 records per INSERT, the statement shown above attempts to add 10 million records in a single transaction, my guess is that this is the problem, but I've no way of knowing if it really is, or if there's a suitable work around (or if the export/import method I've proposed would be any quicker)
Maybe I should have emphasized this earlier, every insert into TABLE_A fires a trigger that adds record to TABLE_B. It's the data that's in TABLE_B that's the final objective, so disabling the trigger isn't an option! This whole problem came about because I accidentally disabled the trigger for a few days, and the preferred solution to the question 'how to run a trigger on existing rows' seemed to be 'remove the rows and add them back again' - see the original post (link above) for details.
My current attempt involves using the COPY command with a WHERE clause to split the contents of TABLE_A_20180807_BCK into a dozen small files and then re-load them one at a time. This may not give me any overall time saving, but although I can't afford 24 hours of continuous downtime, I can afford 6 hours of downtime for 4 nights.
Preparation (if you have access and can restart your server) set checkpoint_segments to 32 or perhaps more. This will reduce the frequency and number of checkpoints during this operation. You can undo it when you're finished. This step is not totally necessary but should speed up writes considerably.
edit postgresql.conf and set checkpoint_segments to 32 or maybe more
Step 1: drop/delete all indexes and triggers on table A.
EDIT: Step 1a
alter table_a set unlogged;
(repeat step 1 for each table you're inserting into)
Step 2. (unnecessary if you do one table at a time)
begin transaction;
Step 3.
INSERT INTO TABLE_A (field1, field2)
SELECT field1, field2 FROM TABLE_A_20180807_BCK;
(repeat step 3 for all tables being inserted into)
Step 4. (unnecessary if you do one table at a time)
commit;
Step 5 re-enable indexes and triggers on all tables.
Step 5a.
Alter table_a set logged;

How to list all locked rows of a table?

My application uses pessimistic locking. When a user opens the form for update a record, the application executes this query (table names are exemplary):
begin;
select *
from master m
natural join detail d
where m.master_id = 123456
for update nowait;
The query locks one master row and several (to several dozen) detail rows. Transaction is open until a user confirms or cancels updates.
I need to know what rows (at least master rows) are locked. I have excavated the documentation and postgres wiki without success.
Is it possible to list all locked rows?
PostgreSQL 9.5 added a new option to FOR UPDATE that provides a straightforward way to do this.
SELECT master_id
FROM master
WHERE master_id NOT IN (
SELECT master_id
FROM master
FOR UPDATE SKIP LOCKED);
This acquires locks on all the not-currently-locked rows, so think through whether that's a problem for you, especially if your table is large. If nothing else, you'll want to avoid doing this in an open transaction. If your table is huge you can apply additional WHERE conditions and step through it in chunks to avoid locking everything at once.
Is it possible? Probably yes, but it is the Greatest Mystery of Postgres. I think you would need to write your own extension for it (*).
However, there is an easy way to work around the problem. You can use very nice Postgres feature, advisory locks. Two arguments of the function pg_try_advisory_lock(key1 int, key2 int) you can interpret as: table oid (key1) and row id (key2). Then
select pg_try_advisory_lock(('master'::regclass)::integer, 123456)
locks row 123456 of table master, if it was not locked earlier. The function returns boolean.
After update the lock has to be freed:
select pg_advisory_unlock(('master'::regclass)::integer, 123456)
And the nicest thing, list of locked rows:
select classid::regclass, objid
from pg_locks
where locktype = 'advisory'
Advisory locks may be complementary to regular locks or you can use them independently. The second option is very temptive, as it can significantly simplify the code. But it should be applied with caution because you have to make sure that all updates (deletes) on the table in all applications are performed with this locking.
(*) Mr. Tatsuo Ishii did it (I did not know about it, have just found).

PostgreSql 9.3 -> several consumers + locking

There are 2 tables:
CREATE TABLE "job"
(
"id" SERIAL,
"processed" BOOLEAN NOT NULL,
PRIMARY KEY("id")
);
CREATE TABLE "job_result"
(
"id" SERIAL,
"job_id" INT NOT NULL,
PRIMARY KEY("id")
);
There are several consumers, that do the following (sequentially):
1) start transaction
2) search for job not processed yet
3) process it
4) save result ( set processed field to true and insert into job_result )
5) commit
Questions:
1) Is the following sql code correct, so no job could be processed more than one time?
2) If it is correct, can it be rewritten in more clean way ? ( I am confused about "UPDATE job SET id = id" )
UPDATE job
SET id = id
WHERE id =
(
SELECT MIN(id)
FROM job
WHERE processed = false AND pg_try_advisory_lock(id) = true
)
AND processed = false
RETURNING *
Thanks.
with job_update as (
update job
set processed = true
where id = (
select id
from (
select min(id)
from job
where processed = false
) s
for update
)
returning id
)
insert into job_result (job_id)
select id
from job_update
Question 1
To answer your first question, the processing can be done twice if the database crashes between step 3 and step 5. When the server/service recovers, it will be processed again.
If the processing step only computes results which are sent to the database in the same connection as the queuing queries, then no one will be able to see that it was processed twice, as the results of the first time were never visible.
However if the processing step talks to the outside world, such as sending an email or charging a credit card, that action will be taken twice and both will be visible. The only way to avoid that is to use two-phase commits for all dealings with the outside world. Also, if the worker keeps two connections to the database and is not disciplined about their use, then that can also lead to visible double-processing.
Question 2
For your second question, there are several ways it can be made cleaner.
Most importantly, you will want to change the advisory lock from session-duration to transaction-duration. If you leave it at session-duration, long-lived workers will be become slower and slower and will use more and more memory as time goes on. This is safe to do, because in the query as written you are checking the processed flag in both the sub-select and in the update itself.
You could make the table structure itself cleaner. You could have one table with both the processed flag and the results field, instead of two tables. Or if you want two tables, you could remove the processed flag from the job table and signify completion simply be deleting the completed record from the table, rather than updating the processed flag.
Assuming you don't want to make such changes, you could still clean up the SQL without changing the table structure or semantics. You do need to lock the tuple to avoid a race condition with the release of the advisory lock. But rather than using the degenerate id=id construct (which some future maintainer is likely to remove, because it is not intuitively obvious why it is even there), you might as well just set the tuple to its final state by setting processed=true, and then removing that second update step from your step 4. This is safe to do because you do not issue an intermediate commit, so no one can see the tuple in this intermediate state of having processed=true but not yet really being processed.
UPDATE job
SET processed = true
WHERE id =
(
SELECT MIN(id)
FROM job
WHERE processed = false AND pg_try_advisory_xact_lock(id) = true
)
AND processed = false
RETURNING id
However, this query still has the unwanted feature that often someone looking for the next job to process will find no rows. That is because it suffered a race condition which was then filtered out by the outer processed=false condition. This is OK as long as your workers are prepared to retry, but it leads to needless contention in the database. This can be improved by making the inner select lock the tuple when it first encounters it by switching from a min(id) to a LIMIT 1 query:
UPDATE job
SET processed=true
WHERE id =
(
SELECT id
FROM job
WHERE processed = false AND pg_try_advisory_xact_lock(id) = true
order by id limit 1 for update
)
RETURNING id
If PostgreSQL allowed ORDER BY and LIMIT on UPDATES, then you could avoid the subselect altogether, but that is currently implemented (maybe it will be in 9.5).
For good performance (or even to avoid memory errors), you will need an index like:
create index on job (id) where processed = false;

In-order sequence generation

Is there a way to generate some kind of in-order identifier for a table records?
Suppose that we have two threads doing queries:
Thread 1:
begin;
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
commit;
Thread 2:
begin;
insert into table1(id, value) values (nextval('table1_seq'), 'world');
commit;
It's entirely possible (depending on timing) that an external observer would see the (2, 'world') record appear before the (1, 'hello').
That's fine, but I want a way to get all the records in the 'table1' that appeared since the last time the external observer checked it.
So, is there any way to get the records in the order they were inserted? Maybe OIDs can help?
No. Since there is no natural order of rows in a database table, all you have to work with is the values in your table.
Well, there are the Postgres specific system columns cmin and ctid you could abuse to some degree.
The tuple ID (ctid) contains the file block number and position in the block for the row. So this represents the current physical ordering on disk. Later additions will have a bigger ctid, normally. Your SELECT statement could look like this
SELECT *, ctid -- save ctid from last row in last_ctid
FROM tbl
WHERE ctid > last_ctid
ORDER BY ctid
ctid has the data type tid. Example: '(0,9)'::tid
However it is not stable as long-term identifier, since VACUUM or any concurrent UPDATE or some other operations can change the physical location of a tuple at any time. For the duration of a transaction it is stable, though. And if you are just inserting and nothing else, it should work locally for your purpose.
I would add a timestamp column with default now() in addition to the serial column ...
I would also let a column default populate your id column (a serial or IDENTITY column). That retrieves the number from the sequence at a later stage than explicitly fetching and then inserting it, thereby minimizing (but not eliminating) the window for a race condition - the chance that a lower id would be inserted at a later time. Detailed instructions:
Auto increment table column
What you want is to force transactions to commit (making their inserts visible) in the same order that they did the inserts. As far as other clients are concerned the inserts haven't happened until they're committed, since they might roll back and vanish.
This is true even if you don't wrap the inserts in an explicit begin / commit. Transaction commit, even if done implicitly, still doesn't necessarily run in the same order that the row its self was inserted. It's subject to operating system CPU scheduler ordering decisions, etc.
Even if PostgreSQL supported dirty reads this would still be true. Just because you start three inserts in a given order doesn't mean they'll finish in that order.
There is no easy or reliable way to do what you seem to want that will preserve concurrency. You'll need to do your inserts in order on a single worker - or use table locking as Tometzky suggests, which has basically the same effect since only one of your insert threads can be doing anything at any given time.
You can use advisory locking, but the effect is the same.
Using a timestamp won't help, since you don't know if for any two timestamps there's a row with a timestamp between the two that hasn't yet been committed.
You can't rely on an identity column where you read rows only up to the first "gap" because gaps are normal in system-generated columns due to rollbacks.
I think you should step back and look at why you have this requirement and, given this requirement, why you're using individual concurrent inserts.
Maybe you'll be better off doing small-block batched inserts from a single session?
If you mean that every query if it sees world row it has to also see hello row then you'd need to do:
begin;
lock table table1 in share update exclusive mode;
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
commit;
This share update exclusive mode is the weakest lock mode which is self-exclusive — only one session can hold it at a time.
Be aware that this will not make this sequence gap-less — this is a different issue.
We found another solution with recent PostgreSQL servers, similar to #erwin's answer but with txid.
When inserting rows, instead of using a sequence, insert txid_current() as row id. This ID is monotonically increasing on each new transaction.
Then, when selecting rows from the table, add to the WHERE clause id < txid_snapshot_xmin(txid_current_snapshot()).
txid_snapshot_xmin(txid_current_snapshot()) corresponds to the transaction index of the oldest still-open transaction. Thus, if row 20 is committed before row 19, it will be filtered out because transaction 19 will still be open. When the transaction 19 is committed, both rows 19 and 20 will become visible.
When no transaction is opened, the snapshot xmin will be the transaction id of the currently running SELECT statement.
The returned transaction IDs are 64-bits, the higher 32 bits are an epoch and the lower 32 bits are the actual ID.
Here is the documentation of these functions: https://www.postgresql.org/docs/9.6/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT
Credits to tux3 for the idea.