Is there a way to start from max in a primary key in PostgreSQL if I import data from another db? - postgresql

I have an old MSAccess DB and I want to convert it to PostgreSQL. I found DBeaver very useful. Some operations may be done by hand. This is the case of Primary Keys. You must manually set the primary keys. I didn't found another way to do it. So in PGAdmin, I'm setting all this stuff. The client is using constantly this db so I can get data only on holiday when the client is not working and import data on PostgreSQL. My goal is to set the database ready to receive data from the production database. So far, in PGAdmin, I'm setting the Primary Key identity on "Always" and starting from the last number of primary key, setting it by hand. I think this is not the right way to do it. And when I'll start to import the production data, I don't want to set by hand all that stuff. How can I set the primary key ready to start the autoincrement from max of ID?

Using an identity column is the good way. Just set the sequences to a value safely above the current maximum. It doesn't matter if you lose a million sequence values.

Related

Why are the primary keys in my Postgres DB out of sync?

I developed an application using Postgresql + TimescaleDB + PostGIS where repeatedly some primary keys are out of sync. Fixing this manually as described in How to reset postgres' primary key sequence when it falls out of sync? works, but I want to understand why this happens and how I can avoid this.
The affected tables are not managed with TimescaleDB and have no PostGIS related columns.
My code contains only one insert statement per affected table and my code never sets primary-keys directly.
I'm using SqlAlchemy for the inserts, but only the SQLAlchemy Expression Language layer and not the ORM.
Does anybody have an idea how to fix this issue?

Does debezium support capture postgres schema change event?

Does debezium support capture postgres schema change like 'alter table xxx add/drop/alter column xxx'?
Seems like an old question but in any way the short answer is yes. checkout the documentation here https://debezium.io/documentation/reference/connectors/postgresql.html .
With some exceptions:
The PostgreSQL connector retrieves schema information as part of the events sent by the logical decoding plug-in. However, the connector does not retrieve information about which columns compose the primary key. The connector obtains this information from the JDBC metadata (side channel). If the primary key definition of a table changes (by adding, removing or renaming primary key columns), there is a tiny period of time when the primary key information from JDBC is not synchronized with the change event that the logical decoding plug-in generates. During this tiny period, a message could be created with an inconsistent key structure. To prevent this inconsistency, update primary key structures as follows:
Put the database or an application into a read-only mode.
Let Debezium process all remaining events.
Stop Debezium.
Update the primary key definition in the relevant table.
Put the database or the application into read/write mode.
Restart Debezium.

Enforcing Foreign Key Constraint Over Table From pg_dump With --exclude-table-data

I'm currently working on dumping one of our customer's database in a way that allows us to create new databases from this customer's basic structure, but without bringing along their private data.
So far, I've had success with pg_dump combined with the --exclude_table and exclude-table-data commands, which allowed me to bring only the data I'll effectively need for this task.
However, there are a few tables that mix lines which references some of the data I left behind with other lines that references data that I had to bring, and this is causing me a few issues during the restore operation. Specifically, when the dump tries to enforce FOREIGN KEY constraints for certain columns on these tables, it fails because there are some lines with keys that have no matching data on the respective foreign table - because I chose to not bring this table's data!
I know I can log into the database after the dump is complete, delete any rows that reference data that no longer exists and create the constraint myself, but I'd like to automate the process as much as possible. Is there a way to tell pg_dump or pg_restore (or any other program) to not bring rows from table A if they reference table B if and table B's data was excluded from the backup? Or to tell Postgres that I'd like to have that specific foreign key to be active before importing the table's data?
For reference, I'm working with PostgreSQL 9.2 on a HREL 7 server.
What if you disable foreign key checking when you restore your database dump? And after that remove lonely rows from the referring table.
By the way, I recommend you to fix you database schema so there is no chance wrong tuples being inserted into your database.

UUID or SEQUENCE for primary key?

I am coming from MySQL, and in MySQL you can use AUTOINCREMENT for a row's unique id as the primary key.
I find that there is no AUTOINCREMENT in Postgresql, only SEQUENCE or UUID.I have read somewhere that we can use UUID as the primary key of a table. This has the added advantage of masking other user's id (as I want to build APIs that take the ID in as a parameter). Which should I use for Postgresql?
A sequence in PostgreSQL does exactly the same as AUTOINCREMENT in MySQL. A sequence is more efficient than a uuid because it is 8 bytes instead of 16 for the uuid. You can use a uuid as a primary key, just like most any other data type.
However, I don't see how this relates to the masking of a user ID. If you want to mask the ID of a certain user from other users, you should carefully manage the table privileges and/or hash the ID using - for instance - md5().
If you want to protect a table with user data from snooping hackers that are trying to guess other IDs, then the uuid type is an excellent choice. Package uuid-ossp has several flavours. The version 4 is then the best choice as it has 122 random bits (the other 6 are used for identification of the version). You can create a primary key like this:
id uuid PRIMARY KEY DEFAULT uuid_generate_v4()
and then you will never have to worry about it anymore.
PostgreSQL 13+
You can now use the built-in function gen_random_uuid() to get a version 4 random UUID.
For many years I developed applications for databases using PKs and FKs as numerical sequential values. This has worked perfectly, but in recent years when creating cloud applications where information will be exchanged between applications and we will have integrations between various applications developed by us, we realized that the use of sequential IDs in our APIs ended up creating an effort.
In some applications we have to find the ID (of the target application) to be sent via the API call, on the other hand our database tables, in all our applications have, in addition to the sequential PK / FK column, a UUID column, which was not used in API calls. In this scenario we decided to rewrite the APIs so that the UUID column was used.
This solved some of the problems because one of our desktop applications would have their data migrated to another cloud application, this cloud application also used PK / FK columns. When migrating this data we had to change the values ​​of the PKs / FKs for new sequences as the sequences could clash between the values ​​of the desktop application and the values ​​of the cloud application. With this in mind we chose to switch cloud application PKs / FKs to UUID, since data coming from the desktop application had a UUID column.
The problem then was to convert the cloud application tables by turning the INT columns (PKs and FKs) into UUID columns without losing the table information. That was a big task, but it was made easier because I ended up building an application that makes this change easer. The application changes every PK / FK integer column to UUID, keeping the data and relationships. Anyone interested follows the link:
https://claytonbonelli.github.io/int_pk2uuid_pk/
You can use UUID as primary key in your table as it will be unique. However do keep in mind that UUID will occupy a bit more space as compared to SEQUENCE. And also they are not very fast. But yes they are for sure unique and hence you are guaranteed to get a consistent data.
You can also refer:
UUID Primary Keys in PostgreSQL
UUID vs. Sequences

Restore PostgreSQL dump with new primary key values

I've got a problem with a PostgreSQL dump / restore. We have a production appliaction running with PostgresSQL 8.4. I need to create some values in the database in the testing environment and then import just this chunk of data into the production environment. The data is generated by the application and I need to use this approach because it needs testing before going into production.
Now that I described the environment, here is my problem:
In the testing database, I leave nothing but the data I need to move to the production database. The data is spread across multiple tables linked with foreign keys with multiple levels (like a tree). I then use pg_dump to export the desired tables into binary format.
When I try to import, the database will correctly import the root table entries with new primary key values, but does not import any of the data from the other tables. I believe that the problem is that foreign keys on child tables no longer recognizes the new primary keys.
Is there a way to achieve such an import which will update all the primary key values of all affected tables in the tree to correct serial (auto increment) values automatically and also update all foreign keys according to these new primary key values?
I have and idea how to do this with assistance of programming language while connected to both databases, but that would be very problematic to achieve for me since I don't have direct access to customers production server.
Thanks in advance!
That one seems to me like a complex migration issue. You can create PL/pgSQL migration scripts with inserts and use returning to get serials and use as foreign keys for other tables up the tree. I do not know the structure of your tree but in some cases reading sequence values in advance into arrays may be required due to complexity or performance reasons.
Other approach can be to examine production sequence values and estimate sequence values that will not be used in the near future. Fabricate test data in the test environment to have serial values that will not collide with production sequence values. Then load that data into the prod database and adjust sequence values of the prod environment so that test sequence values will not be used. It will leave a gap in your ID sequence so you must examine whether anything (like other processes) rely on the sequence values to be continuos.