Postgres looped table (start sequence from 1 and rewrite old rows) - postgresql

I want table to contain logs and be looped, so when it came to last key in sequence it will start over from one and will rewrite old logs.
I can do it if i mark sequence as cycle and do requests like that:
INSERT INTO logs (initiator_id, target_id, action, message) VALUES ($1, $2, $3, $4)
ON CONFLICT (id) DO UPDATE SET target_id=$2, initiator_id=$1, action=$3, message=$4;
What i don't like, is that i need to repeat same iserts if there is conflict (line 2)
What i'am curios about, is it a good way to do, or there is other more practical ways for tables that you need clear lets say every 3 mounts?
Or maybe there is a way to generate unique id without using sequences?

For now i decided to make 2 colums primary key (date, initiator_id) as log is always 1 action at a time.

Related

Postgres nextval thread safety

We are experiencing a strange problem with triggers and a sequence generator in Postgres.
We have a trigger on table that inserts audit log informations (some business Data before update, some business Data after update, ...).
The problem we have is that the order of the sequence generated doesn't match the timestamp and the data at the time of the capture.
We are sure that functionnaly the row with the id (7 228 916) happened after the row with the id (7 229 471), as the timestamps confirm it.
But the sequences are not in the correct order.
At the end of the of the trigger, here is the insert statement:
INSERT INTO AUDIT_TABLE (ID, ROW_ID, CREATED_AT, ...)
VALUES (nextval('AUDIT_TABLE_SEQ'), NEW.PK_, current_timestamp, ...);
Does someone please have an explanation for this problem?
Thank you for your help
We are expecting to have the sequences to be ordered with timestamps of the capture of events.

INSERT INTO .. SELECT causing possible race condition?

INSERT INTO A
SELECT * FROM B WHERE timestamp > (SELECT max(timestamp) FROM A);
or, written differently:
WITH selection AS
(SELECT * FROM B WHERE timestamp > (SELECT max(timestamp) FROM A))
INSERT INTO A SELECT * FROM selection;
If these queries run multiple times simultaneously, is it possible that I will end up with duplicated rows in A?
How does Postgres process these queries? Is it one or multiple?
If it is multiple queries (find max(timestamp)[1], select[2] then insert[3]) I can imagine this will cause duplicated rows.
If that is correct, would wrapping it in BEGIN/END (a transaction) help?
Yes, that might result in duplicate values.
A single statement sees a consistent view of the data in all tables as of the point in time when the statement started.
Wrapping that single statement into a transaction won't change that (a single statement is always executed as an atomic statement regardless of the number of sub-query involved).
The statement will never see uncommitted data from other transactions (which is the root cause why you can wind up with duplicate values).
The only safe way to avoid duplicate values, is to create a unique constraint (or index) on that column. In that case the INSERT would result in an error if such a value already exists.
If you want to avoid the error, use insert ... on conflict
This depends on the isolation level set in your database.
This is from the postgres documentation
By default, this is set to Repeatable read, which means that each query will get the output based on when the transaction first attempted to read the data. If 2 queries read before any one writes, then you will get duplicate data in these tables.
If you want to avoid having duplicate entries, you have a few options.
Try using the isolation level Serializable
Apply a unique index on a field of A in table B. Timestamp is not a great contender as you might legitimately have 2 rows with the same timestamp. Probably id of the table A is a good option.
Take a lock at the application level before performing such a query.

Are race conditions possible with PostgreSQL auto-increment

Are there any conditions under which records created in a table using a typical auto-increment field would be available for read out of sequence?
For instance, could a record with value 10 ever appear in the result of a select query when the record with value 9 is not yet visible to a select query?
The purpose for my question is… I want to know if it is reliable to use the maximum value retrieved from one query as the lower bound to identify previously unretrieved values in a later query, or could that potentially miss a row?
If that kind of race condition is possible under some circumstances, then are any of the isolation levels that can be used for the select queries that are immune to that problem?
Yes, and good on you for thinking about it.
You can trivially demonstrate this with three concurrent psql sessions, given some table
CREATE TABLE x (
seq serial primary key,
n integer not null
);
then
SESSION 1 SESSION 2 SESSION 3
BEGIN;
BEGIN;
INSERT INTO x(n) VALUES(1)
INSERT INTO x(n) VALUES (2);
COMMIT;
SELECT * FROM x;
COMMIT;
SELECT * FROM x;
It is not safe to assume that for any generated value n, all generated values n-1 have been used by already-committed or already-aborted xacts. They might be in progress and commit after you see n.
I don't think isolation levels really help you here. There's no mutual dependency for SERIALIZABLE to detect.
This is partly why logical decoding was added, so you can get a consistent stream in commit order.

In-order sequence generation

Is there a way to generate some kind of in-order identifier for a table records?
Suppose that we have two threads doing queries:
Thread 1:
begin;
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
commit;
Thread 2:
begin;
insert into table1(id, value) values (nextval('table1_seq'), 'world');
commit;
It's entirely possible (depending on timing) that an external observer would see the (2, 'world') record appear before the (1, 'hello').
That's fine, but I want a way to get all the records in the 'table1' that appeared since the last time the external observer checked it.
So, is there any way to get the records in the order they were inserted? Maybe OIDs can help?
No. Since there is no natural order of rows in a database table, all you have to work with is the values in your table.
Well, there are the Postgres specific system columns cmin and ctid you could abuse to some degree.
The tuple ID (ctid) contains the file block number and position in the block for the row. So this represents the current physical ordering on disk. Later additions will have a bigger ctid, normally. Your SELECT statement could look like this
SELECT *, ctid -- save ctid from last row in last_ctid
FROM tbl
WHERE ctid > last_ctid
ORDER BY ctid
ctid has the data type tid. Example: '(0,9)'::tid
However it is not stable as long-term identifier, since VACUUM or any concurrent UPDATE or some other operations can change the physical location of a tuple at any time. For the duration of a transaction it is stable, though. And if you are just inserting and nothing else, it should work locally for your purpose.
I would add a timestamp column with default now() in addition to the serial column ...
I would also let a column default populate your id column (a serial or IDENTITY column). That retrieves the number from the sequence at a later stage than explicitly fetching and then inserting it, thereby minimizing (but not eliminating) the window for a race condition - the chance that a lower id would be inserted at a later time. Detailed instructions:
Auto increment table column
What you want is to force transactions to commit (making their inserts visible) in the same order that they did the inserts. As far as other clients are concerned the inserts haven't happened until they're committed, since they might roll back and vanish.
This is true even if you don't wrap the inserts in an explicit begin / commit. Transaction commit, even if done implicitly, still doesn't necessarily run in the same order that the row its self was inserted. It's subject to operating system CPU scheduler ordering decisions, etc.
Even if PostgreSQL supported dirty reads this would still be true. Just because you start three inserts in a given order doesn't mean they'll finish in that order.
There is no easy or reliable way to do what you seem to want that will preserve concurrency. You'll need to do your inserts in order on a single worker - or use table locking as Tometzky suggests, which has basically the same effect since only one of your insert threads can be doing anything at any given time.
You can use advisory locking, but the effect is the same.
Using a timestamp won't help, since you don't know if for any two timestamps there's a row with a timestamp between the two that hasn't yet been committed.
You can't rely on an identity column where you read rows only up to the first "gap" because gaps are normal in system-generated columns due to rollbacks.
I think you should step back and look at why you have this requirement and, given this requirement, why you're using individual concurrent inserts.
Maybe you'll be better off doing small-block batched inserts from a single session?
If you mean that every query if it sees world row it has to also see hello row then you'd need to do:
begin;
lock table table1 in share update exclusive mode;
insert into table1(id, value) values (nextval('table1_seq'), 'hello');
commit;
This share update exclusive mode is the weakest lock mode which is self-exclusive — only one session can hold it at a time.
Be aware that this will not make this sequence gap-less — this is a different issue.
We found another solution with recent PostgreSQL servers, similar to #erwin's answer but with txid.
When inserting rows, instead of using a sequence, insert txid_current() as row id. This ID is monotonically increasing on each new transaction.
Then, when selecting rows from the table, add to the WHERE clause id < txid_snapshot_xmin(txid_current_snapshot()).
txid_snapshot_xmin(txid_current_snapshot()) corresponds to the transaction index of the oldest still-open transaction. Thus, if row 20 is committed before row 19, it will be filtered out because transaction 19 will still be open. When the transaction 19 is committed, both rows 19 and 20 will become visible.
When no transaction is opened, the snapshot xmin will be the transaction id of the currently running SELECT statement.
The returned transaction IDs are 64-bits, the higher 32 bits are an epoch and the lower 32 bits are the actual ID.
Here is the documentation of these functions: https://www.postgresql.org/docs/9.6/static/functions-info.html#FUNCTIONS-TXID-SNAPSHOT
Credits to tux3 for the idea.

SQLite - a smart way to remove and add new objects

I have a table in my database and I want for each row in my table to have an unique id and to have the rows named sequently.
For example: I have 10 rows, each has an id - starting from 0, ending at 9. When I remove a row from a table, lets say - row number 5, there occurs a "hole". And afterwards I add more data, but the "hole" is still there.
It is important for me to know exact number of rows and to have at every row data in order to access my table arbitrarily.
There is a way in sqlite to do it? Or do I have to manually manage removing and adding of data?
Thank you in advance,
Ilya.
It may be worth considering whether you really want to do this. Primary keys usually should not change through the lifetime of the row, and you can always find the total number of rows by running:
SELECT COUNT(*) FROM table_name;
That said, the following trigger should "roll down" every ID number whenever a delete creates a hole:
CREATE TRIGGER sequentialize_ids AFTER DELETE ON table_name FOR EACH ROW
BEGIN
UPDATE table_name SET id=id-1 WHERE id > OLD.id;
END;
I tested this on a sample database and it appears to work as advertised. If you have the following table:
id name
1 First
2 Second
3 Third
4 Fourth
And delete where id=2, afterwards the table will be:
id name
1 First
2 Third
3 Fourth
This trigger can take a long time and has very poor scaling properties (it takes longer for each row you delete and each remaining row in the table). On my computer, deleting 15 rows at the beginning of a 1000 row table took 0.26 seconds, but this will certainly be longer on an iPhone.
I strongly suggest that you re-think your design. In my opinion your asking yourself for troubles in the future (e.g. if you create another table and want to have some relations between the tables).
If you want to know the number of rows just use:
SELECT count(*) FROM table_name;
If you want to access rows in the order of id, just define this field using PRIMARY KEY constraint:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
...
);
and get rows using ORDER BY clause with ASC or DESC:
SELECT * FROM table_name ORDER BY id ASC;
Sqlite creates an index for the primary key field, so this query is fast.
I think that you would be interested in reading about LIMIT and OFFSET clauses.
The best source of information is the SQLite documentation.
If you don't want to take Stephen Jennings's very clever but performance-killing approach, just query a little differently. Instead of:
SELECT * FROM mytable WHERE id = ?
Do:
SELECT * FROM mytable ORDER BY id LIMIT 1 OFFSET ?
Note that OFFSET is zero-based, so you may need to subtract 1 from the variable you're indexing in with.
If you want to reclaim deleted row ids the VACUUM command or pragma may be what you seek,
http://www.sqlite.org/faq.html#q12
http://www.sqlite.org/lang_vacuum.html
http://www.sqlite.org/pragma.html#pragma_auto_vacuum