UUID or SEQUENCE for primary key? - postgresql

I am coming from MySQL, and in MySQL you can use AUTOINCREMENT for a row's unique id as the primary key.
I find that there is no AUTOINCREMENT in Postgresql, only SEQUENCE or UUID.I have read somewhere that we can use UUID as the primary key of a table. This has the added advantage of masking other user's id (as I want to build APIs that take the ID in as a parameter). Which should I use for Postgresql?

A sequence in PostgreSQL does exactly the same as AUTOINCREMENT in MySQL. A sequence is more efficient than a uuid because it is 8 bytes instead of 16 for the uuid. You can use a uuid as a primary key, just like most any other data type.
However, I don't see how this relates to the masking of a user ID. If you want to mask the ID of a certain user from other users, you should carefully manage the table privileges and/or hash the ID using - for instance - md5().
If you want to protect a table with user data from snooping hackers that are trying to guess other IDs, then the uuid type is an excellent choice. Package uuid-ossp has several flavours. The version 4 is then the best choice as it has 122 random bits (the other 6 are used for identification of the version). You can create a primary key like this:
id uuid PRIMARY KEY DEFAULT uuid_generate_v4()
and then you will never have to worry about it anymore.
PostgreSQL 13+
You can now use the built-in function gen_random_uuid() to get a version 4 random UUID.

For many years I developed applications for databases using PKs and FKs as numerical sequential values. This has worked perfectly, but in recent years when creating cloud applications where information will be exchanged between applications and we will have integrations between various applications developed by us, we realized that the use of sequential IDs in our APIs ended up creating an effort.
In some applications we have to find the ID (of the target application) to be sent via the API call, on the other hand our database tables, in all our applications have, in addition to the sequential PK / FK column, a UUID column, which was not used in API calls. In this scenario we decided to rewrite the APIs so that the UUID column was used.
This solved some of the problems because one of our desktop applications would have their data migrated to another cloud application, this cloud application also used PK / FK columns. When migrating this data we had to change the values ​​of the PKs / FKs for new sequences as the sequences could clash between the values ​​of the desktop application and the values ​​of the cloud application. With this in mind we chose to switch cloud application PKs / FKs to UUID, since data coming from the desktop application had a UUID column.
The problem then was to convert the cloud application tables by turning the INT columns (PKs and FKs) into UUID columns without losing the table information. That was a big task, but it was made easier because I ended up building an application that makes this change easer. The application changes every PK / FK integer column to UUID, keeping the data and relationships. Anyone interested follows the link:
https://claytonbonelli.github.io/int_pk2uuid_pk/

You can use UUID as primary key in your table as it will be unique. However do keep in mind that UUID will occupy a bit more space as compared to SEQUENCE. And also they are not very fast. But yes they are for sure unique and hence you are guaranteed to get a consistent data.
You can also refer:
UUID Primary Keys in PostgreSQL
UUID vs. Sequences

Related

Is there a way to start from max in a primary key in PostgreSQL if I import data from another db?

I have an old MSAccess DB and I want to convert it to PostgreSQL. I found DBeaver very useful. Some operations may be done by hand. This is the case of Primary Keys. You must manually set the primary keys. I didn't found another way to do it. So in PGAdmin, I'm setting all this stuff. The client is using constantly this db so I can get data only on holiday when the client is not working and import data on PostgreSQL. My goal is to set the database ready to receive data from the production database. So far, in PGAdmin, I'm setting the Primary Key identity on "Always" and starting from the last number of primary key, setting it by hand. I think this is not the right way to do it. And when I'll start to import the production data, I don't want to set by hand all that stuff. How can I set the primary key ready to start the autoincrement from max of ID?
Using an identity column is the good way. Just set the sequences to a value safely above the current maximum. It doesn't matter if you lose a million sequence values.

entity framework and tables requiring primary keys

I am looking at entity framework and trying learn more about it. So have created a simple project to play with.
I found out that I can't add a table if it does not have a primary key. Reading some posts on here and other places I think that is correct. It is apparently to allow EF to do deletions and updates etc. If I have a project where there will be no deletions or updates, just select queries I'm guessing it doesn't matter what column I make as a primary key? I understand most tables should have a primary this is just a question out of curiosity.
Also can EF handle a primary key on multiple columns, I assume so?
Although you application does not require deletions or updates, son or later you will need a primary key. If you set a good primary key (here you have a good guide for this), the task of programming will be much easier. And yes, EF can handle primary key on multiple columns.

Restore PostgreSQL dump with new primary key values

I've got a problem with a PostgreSQL dump / restore. We have a production appliaction running with PostgresSQL 8.4. I need to create some values in the database in the testing environment and then import just this chunk of data into the production environment. The data is generated by the application and I need to use this approach because it needs testing before going into production.
Now that I described the environment, here is my problem:
In the testing database, I leave nothing but the data I need to move to the production database. The data is spread across multiple tables linked with foreign keys with multiple levels (like a tree). I then use pg_dump to export the desired tables into binary format.
When I try to import, the database will correctly import the root table entries with new primary key values, but does not import any of the data from the other tables. I believe that the problem is that foreign keys on child tables no longer recognizes the new primary keys.
Is there a way to achieve such an import which will update all the primary key values of all affected tables in the tree to correct serial (auto increment) values automatically and also update all foreign keys according to these new primary key values?
I have and idea how to do this with assistance of programming language while connected to both databases, but that would be very problematic to achieve for me since I don't have direct access to customers production server.
Thanks in advance!
That one seems to me like a complex migration issue. You can create PL/pgSQL migration scripts with inserts and use returning to get serials and use as foreign keys for other tables up the tree. I do not know the structure of your tree but in some cases reading sequence values in advance into arrays may be required due to complexity or performance reasons.
Other approach can be to examine production sequence values and estimate sequence values that will not be used in the near future. Fabricate test data in the test environment to have serial values that will not collide with production sequence values. Then load that data into the prod database and adjust sequence values of the prod environment so that test sequence values will not be used. It will leave a gap in your ID sequence so you must examine whether anything (like other processes) rely on the sequence values to be continuos.

Advantages and disadvantages of having composite primary key

Instead of having a composite primary key (this table maintains the relationship between the two tables which represents two entities [two tables]), the design is proposed to have identity column as primary key and the unique data constraint is enforced over two columns which represents the data from the primary key of entities.
For me having identity column for each relationship table is breaking the normalisation rules.
What is the industry standards?
What are the considerations to make before making the design decision on this?
Which approach is right?
There are lots of tables where you may want to have an identity column as a primary key. However, in the case of a M:M relationship table you describe, best practice is NOT to use a new identity column for the primary key.
RThomas's link in his comment provides the excellent reasons why the best practice is to NOT add an identity column. Here's that link.
The cons will outweigh the pros in pretty much every case, but since you asked for pros and cons I put a couple of unlikely pros in as well.
Cons
Adds complexity
Can lead to duplicate relationships unless you enforce uniqueness on the relationship (which a primary key would do by default).
Likely slower: db must maintain two indexes rather than one.
Pros
All the pros are pretty sketchy
If you had a situation where you needed to use the primary key of the relationship table as a join to a separate table (e.g. an audit table?) the join would likely be faster. (As noted though--adding and removing records will likely be slower. Further, if your relationship table is a relationship between tables that themselves use unique IDs, the speed increase from using one identity column in the join vs two will be minimal.)
The application, for simplicity, may assume that every table it works with has a unique ID as its primary key. (That's poor design in the app but you may not have control over it.) You could imagine a scenario where it is better to introduce some extra complexity in the DB than the extra complexity into such an app.
Cons:
Composite primary keys have to be imported in all referencing tables.
That means larger indexes, and more code to write (e.g. the joins,
the updates). If you are systematic about using composite primary
keys, it can become very cumbersome.
You can't update a part of the primary key. E.g. if you use
university_id, student_id as primary key in a table of university
students, and one student changes university, you have to delete
and recreate the record.
Pros:
Composite primary keys allow to enforce a common kind of constraint
in a powerful and seemless way. Suppose you have a table UNIVERSITY,
a table STUDENT, a table COURSE, and a table STUDENT_COURSE (which
student follows which course). If it is a constraint that you always
have to be a student of university A in order to follow a course of
university A, then that constraint will be automatically validated if
university_id is a part of the composite keys of both STUDENT and
COURSE.
You have to create all the columns in each tables wherever it is used as foreign key. This is the biggest disadvantage.

Foreign Key mapping in Core Data

I understand that Core Data is not a relational database but I need to understand how it can be used to support a client/server model where the server uses a Rails, ActiveRecord, Mysql setup.
My app is pulling records from the server using JSON and I am mapping the relationships using Core Data.
The Foreign Key in the SQLLite database is showing the PK field of the related table even though I have set the User Info Key/Value of primaryAttributeKey => id. (I can't remember where I saw this mentioned.)
Is there any way to setup the models so they will use my id as the PK so that it will clean up the export of related data back to the server?
Edward,
The PK is just a field in your object. If you want to maintain them in CD, they are just numbers. As you build your object graph, you have to maintain them in parallel with your relations. Of course, exporting records created on the device back to your server will have difficulty -- FKs and PKs are unique to each table and that uniqueness is determined on the server. Hence, tracking these numbers is not that useful.
May I suggest that your JSON needs to be structured such that it is redundant -- that it has both the data and the various PKs and FKs, if any?
Finally, you appear to be making a CRUD focused API. Generally, those are low performance APIs for remote devices. There are other problems with CRUD APIs, such as inconsistent business logic between servers and clients. I would suggest you to rethink your APIs.
Andrew