Postgres Sequence out of sync - postgresql

I'm running a multi-master setup with bucardo and postgres.
I'm finding that some of my table sequences are getting out of sync with each other. Particularly the auto-incremented id.
example:
db1 - table1
INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets')
The id of the new row is 1
db2 - table1
INSERT INTO distributors (did, dname) VALUES (DEFAULT, 'XYZ Widgets')
The id of the new row is 1
The id of the new row on db2 should be 2, because bucardo has replicated the data from db1, but db2's auto increment is based on:
nextval('oauth_sessions_id_seq'::regclass)
And if we check the "oauth_sessions_id_seq" we see the last value as 0.
phew... Make sense?
Anyway, can I do any of the following?
Replicate the session tables with bucardo, so each DB's session is shared?
Manipulate the default auto-increment function above to take into account the max existing items in the table?
If you have any better ideas, please feel free to throw them in. Questions just ask, thanks for any help.

You are going to have to change your id generation method, because there is no Bucardo solution according to this comment in the FAQ.
Can Bucardo replicate DDL?
No, Bucardo relies on triggers, and Postgres does not yet provide DDL
triggers or triggers on its system tables.
Since Bucardo uses triggers, it cannot "see" the sequence changes, only the data in tables, which it replicates. Sequences are interesting objects that do not support triggers, but you can manually update them. I suppose you could add something like the code below before the INSERT, but there still might be issues.
SELECT setval('oauth_sessions_id_seq', (SELECT MAX(did) FROM distributors));
See this question for more information.
I am not fully up on all the issues involved, but you could perform the maximum calculation manually and do the insert operation in a re-try loop. I doubt it will work if you are actually doing inserts on both DBs and allowing Bucardo to replicate, but if you can guarantee that only one DB updates at a time, then you could try something like an UPSERT retry loop. See this post for more info. The "guts" of the loop might look like this:
INSERT INTO distributors (did, dname)
VALUES ((SELECT max(did)+1 FROM distributors), 'XYZ Widgets');

Irrespective of the DB (PostgreSQL, Oracle, etc.), dynamic sequence was created for each of the table which has the primary key associated with it.
Most of the sequences go out of sync whenever a huge import of data is happened or some person has manually modified the sequence of the table.
Solution: The only way we can set back the sequence is by taking the max value of the PK table and set the sequence next val to it.
The below query will list you out all the sequences created in your DB schema:
SELECT c.relname FROM pg_class c WHERE c.relkind = 'S';
SELECT MAX('primary_key') from table;
SELECT setval('the_primary_key_sequence', (SELECT MAX(the_primary_key) FROM the_table)+1);

Related

How do you manage UPSERTs on PostgreSQL partitioned tables for unique constraints on columns outside the partitioning strategy?

This question is for a database using PostgreSQL 12.3; we are using declarative partitioning and ON CONFLICT against the partitioned table is possible.
We had a single table representing application event data from client activity. Therefore, each row has fields client_id int4 and a dttm timestamp field. There is also an event_id text field and a project_id int4 field which together formed the basis of a composite primary key. (While rare, it was possible for two event records to have the same event_id but different project_id values for the same client_id.)
The table became non-performant, and we saw that queries most often targeted a single client in a specific timeframe. So we shifted the data into a partitioned table: first by LIST (client_id) and then each partition is further partitioned by RANGE(dttm).
We are running into problems shifting our upsert strategy to work with this new table. We used to perform a query of INSERT INTO table SELECT * FROM staging_table ON CONFLICT (event_id, project_id) DO UPDATE ...
But since the columns that determine uniqueness (event_id and project_id) are not part of the partitioning strategy (dttm and client_id), I can't do the same thing with the partitioned table. I thought I could get around this by building UNIQUE indexes on each partition on (project_id, event_id) but the ON CONFLICT is still not firing because there is no such unique index on the parent table (there can't be, since it doesn't contain all partitioning columns). So now a single upsert query appears impossible.
I've found two solutions so far but both require additional changes to the upsert script that seem like they'd be less performant:
I can still do an INSERT INTO table_partition_subpartition ... ON CONFLICT (event_id, project_id) DO UPDATE ... but that requires explicitly determining the name of the partition for each row instead of just INSERT INTO table ... once for the entire dataset.
I could implement the "old way" UPSERT procedure: https://www.postgresql.org/docs/9.4/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE but this again requires looping through all rows.
Is there anything else I could do to retain the cleanliness of a single, one-and-done INSERT INTO table SELECT * FROM staging_table ON CONFLICT () DO UPDATE ... while still keeping the partitioning strategy as-is?
Edit: if it matters, concurrency is not an issue here; there's just one machine executing the UPSERT into the main table from the staging table on a schedule.

PostgreSQL shows wrong sequence details in the table's information schema

Recently I saw a strange scenario with my PostgreSQL DB. The information schema of my database is showing a different sequence name than the one actually allocated for the column of my table.
The issue is:
I have a table tab_1
id name
1 emp1
2 emp2
3 emp3
Previously the id column (integer) of the table was an auto generated field where the sequence number was generated at run time via JPA. (Sequence name: tab_1_seq)
We made a change and updated the table's column id to bigserial and the sequence is maintained in the column level (allocated new sequence: tab_1_temp_seq) not handled by the JPA anymore.
After this change everything was working fine for few months and after that we faced an error - "the sequence "tab_1_temp_seq" is not yet defined in this session"
On analyzing the issue I found out that there is a mismatch between the sequences allocated for the table.
In the table structure, we where shown the sequence as tab_1_temp_seq and in the information_schema the table was allocated with the old sequence - tab_1_seq.
I am not sure what has really triggered this to happen, as we are not managing our database system. If you have faced any issues like this, kindly let me know its root cause.
Queries:
SELECT table_name, column_name, column_default from information_schema.columns where table_name = ‘tab_1’;
result :
table_name column_name column_default
tab_1 id nextval('tab_1_seq::regclass')
Below are the details found in the table structure/properties:
id nextval('tab_1_temp_seq::regclass')
name varChar
Perhaps you are suffering from data corruption, but it is most likely that you are suffering from bad tools to visualize your database objects. Whatever program shows you the “table structure/properties” might be confused.
To find out the truth (which DEFAULT value PostgreSQL uses), run:
SELECT pg_get_expr(adbin, adrelid)
FROM pg_attrdef
WHERE adrelid = 'tab1'::regclass;
This is also what information_schema.columns will show, but I added the naked query for clarity.
This DEFAULT value will be used whenever the INSERT statement either doesn't specify the id column or fills it with the special value DEFAULT.
Perhaps the confusion is also caused by different programs that may set default values in their way. The way I showed above is PostgreSQL's way, but nothing can keep a third-party tool from using its own sequence to filling the id.

Move truncated records to another table in Postgresql 9.5

Problem is following: remove all records from one table, and insert them to another.
I have a table that is partitioned by date criteria. To avoid partitioning each record one by one, I'm collecting the data in one table, and periodically move them to another table. Copied records have to be removed from first table. I'm using DELETE query with RETURNING, but the side effect is that autovacuum is having a lot of work to do to clean up the mess from original table.
I'm trying to achieve the same effect (copy and remove records), but without creating additional work for vacuum mechanism.
As I'm removing all rows (by delete without where conditions), I was thinking about TRUNCATE, but it does not support RETURNING clause. Another idea was to somehow configure the table, to automatically remove tuple from page on delete operation, without waiting for vacuum, but I did not found if it is possible.
Can you suggest something, that I could use to solve my problem?
You need to use something like:
--Open your transaction
BEGIN;
--Prevent concurrent writes, but allow concurrent data access
LOCK TABLE table_a IN SHARE MODE;
--Copy the data from table_a to table_b, you can also use CREATE TABLE AS to do this
INSERT INTO table_b AS SELECT * FROM table_a;
--Zeroying table_a
TRUNCATE TABLE table_a;
--Commits and release the lock
COMMIT;

Alter the column type over several tables

In a PostgreSQL db I'm working on, half of the tables have one particular column, always named the same, that is of type varchar(5). The size became a bit too restricting and I want to change it to varchar(10).
The number of tables in my particular case is actually very manageable to do it by hand. But I was wondering how one could script this with a query for larger dbs. It generally should be possible in just a few steps.
Identify all the tables in the schema, then (?) filter by condition if column present.
Create ALTER TABLE statements for each table found
I have some idea about how to write a query that identifies all tables in the schema. But I wouldn't know how to filter them. And if I didn't filter them, I assume the generated alter table statements would break.
Would be great if someone could share their knowledge on this.
Thanks to Abelisto for providing some guidance. Eventually, this is how I did it.
First, I created a query that in turn creates the ALTER TABLE statements. MyDB and MyColumn need to reflect actual values.
SELECT
'ALTER TABLE '||columns.table_name||' ALTER COLUMN '||MyColumn||' TYPE varchar(20);'
FROM
information_schema.columns
WHERE
columns.table_catalog = 'MyDB' AND
columns.table_schema = 'public' AND
columns.column_name = 'MyColumn';
Then it was just a matter of executing the output as a new query. All done.

How do I INSERT and SELECT data with partitioned tables?

I set up a set of partitioned tables per the docs at http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html
CREATE TABLE t (year, a);
CREATE TABLE t_1980 ( CHECK (year = 1980) ) INHERITS (t);
CREATE TABLE t_1981 ( CHECK (year = 1981) ) INHERITS (t);
CREATE RULE t_ins_1980 AS ON INSERT TO t WHERE (year = 1980)
DO INSTEAD INSERT INTO t_1980 VALUES (NEW.year, NEW.a);
CREATE RULE t_ins_1981 AS ON INSERT TO t WHERE (year = 1981)
DO INSTEAD INSERT INTO t_1981 VALUES (NEW.year, NEW.a);
From my understanding, if I INSERT INTO t (year, a) VALUES (1980, 5), it will go to t_1980, and if I INSERT INTO t (year, a) VALUES (1981, 3), it will go to t_1981. But, my understanding seems to be incorrect. First, I can't understand the following from the docs
"There is currently no simple way to specify that rows must not be inserted into the master table. A CHECK (false) constraint on the master table would be inherited by all child tables, so that cannot be used for this purpose. One possibility is to set up an ON INSERT trigger on the master table that always raises an error. (Alternatively, such a trigger could be used to redirect the data into the proper child table, instead of using a set of rules as suggested above.)"
Does the above mean that in spite of setting up the CHECK constraints and the RULEs, I also have to create TRIGGERs on the master table so that the INSERTs go to the correct tables? If that were the case, what would be the point of the db supporting partitioning? I could just set up the separate tables myself? I inserted a bunch of values into the master table, and those rows are still in the master table, not in the inherited tables.
Second question. When retrieving the rows, do I select from the master table, or do I have to select from the individual tables as needed? How would the following work?
SELECT year, a FROM t WHERE year IN (1980, 1981);
Update: Seems like I have found the answer to my own question
"Be aware that the COPY command ignores rules. If you are using COPY to insert data, you must copy the data into the correct child table rather than into the parent. COPY does fire triggers, so you can use it normally if you create partitioned tables using the trigger approach."
I was indeed using COPY FROM to load data, so RULEs were being ignored. Will try with TRIGGERs.
Definitely try triggers.
If you think you want to implement a rule, don't (the only exception that comes to mind is updatable views). See this great article by depesz for more explanation there.
In reality, Postgres only supports partitioning on the reading side of things. You're going to have setup the method of insertition into partitions yourself - in most cases TRIGGERing. Depending on the needs and applicaitons, it can sometimes be faster to teach your application to insert directly into the partitions.
When selecting from partioned tables, you can indeed just SELECT ... WHERE... on the master table so long as your CHECK constraints are properly setup (they are in your example) and the constraint_exclusion parameter is set corectly.
For 8.4:
SET constraint_exclusion = partition;
For < 8.4:
SET constraint_exclusion = on;
All this being said, I actually really like the way Postgres does it and use it myself often.
Does the above mean that in spite of
setting up the CHECK constraints and
the RULEs, I also have to create
TRIGGERs on the master table so that
the INSERTs go to the correct tables?
Yes. Read point 5 (section 5.9.2)
If that were the case, what would be
the point of the db supporting
partitioning? I could just set up the
separate tables myself?
Basically: the INSERTS in the child tables must be done explicitly (either creating TRIGGERS, or by specifying the correct child table in the query). But the partitioning
is transparent for SELECTS, and (given the storage and indexing advantages of this schema) that's the point.
(Besides, because the partitioned tables are inherited,
the schema is inherited from the parent, hence consistency
is enforced).
Triggers are definitelly better than rules.
Today I've played with partitioning of materialized view table and run into problem with triggers solution.
Why ?
I'm using RETURNING and current solution returns NULL :)
But here's solution which works for me - correct me if I'm wrong.
1. I have 3 tables which are inserted with some data, there's an view (let we call it viewfoo) which contains
data which need to be materialized.
2. Insert into last table have trigger which inserts into materialized view table
via INSERT INTO matviewtable SELECT * FROM viewfoo WHERE recno=NEW.recno;
That works fine and I'm using RETURNING recno; (recno is SERIAL type - sequence).
Materialized view (table) need to be partitioned because it's huge, and
according to my tests it's at least x10 faster for SELECT in this case.
Problems with partitioning:
* Current trigger solution RETURN NULL - so I cannot use RETURNING recno.
(Current trigger solution = trigger explained at depesz page).
Solution:
I've changed trigger of my 3rd table TO NOT insert into materialized view table (that table is parent of partitioned tables), but created new trigger which inserts
partitioned table directly FROM 3rd table and that trigger RETURN NEW.
Materialized view table is automagically updated and RETURNING recno works fine.
I'll be glad if this helped to anybody.