Cryptographically random primary key in PostgreSQL - postgresql

I am trying to solve a problem where I need to generate a non-sequential and cryptographically random primary key as each record is inserted into a table.
The reason for this is each record is generated from an email list, but those records should not be able to be linked back to those email addresses (secret ballot situation). If someone managed to access the email list, they could derive from the insertion order who supplied the data for each record.
Is it possible to do this without generating an id of some kind in the application code, and be done purely in PostgreSQL? If not, what is the next best solution?

It seems that the best choice is to use pgcrypto and do the following:
CREATE EXTENSION pgcrypto;
CREATE TABLE whatever (
id uuid PRIMARY KEY DEFAULT gen_random_uuid()
)
The PostgreSQL 9.4 documentation on pgcrypto states that gen_random_uuid() generates a cryptographically random V4 UUID, which suits this situation perfectly.

Its good practice to Go with uuid as you can check out this link as well.
Advantages and Disadvantages of UUID and
whats-your-opinion-on-using-uuids-as-database-row-identifiers-particularly-in
Note: You need to enable the pgcrypto (only PostgreSQL >= 9.4) or uuid-ossp extension to generate random UUIDs..
pgcrypto generator function
uuid-ossp generator functions
Hope this help you !!!

Another option is to use encryption to generate the unique row keys. Simply encrypt the numbers 0, 1, 2, 3, ... in turn using a block cypher. Provided you always use the same cypher key, the outputs are guaranteed unique because the inputs are unique. For 64 bit row keys use 3DES, for 128 bit row keys use AES. For other sizes use Hasty Pudding cypher, which can work with effectively any block size you want.

Related

SERIAL pseudo-type in POSTGRESQL?

Currently I'm using serial datatype for my project. I want to know the following details to change previously Created SEQUENCE datatype to SERAIL datatype.
From Which version onwards POSTGRESQL supports SERAIL DATATYPE, also If I change my id into SERAIL it wont affect my future code.
What is the max size of serial and its impact?
Can ALTER SEQUENCE impact the Sequence numbers?
any drawback on serail datatype in future?
How to create gap-less sequence?
All answers can be found in the manual
serial goes back to Postgres 7.2
It's a bigint, the max size is documented in the manual. Also see this note in the documenation of CREATE SEQUENCE
Sequences are based on bigint arithmetic, so the range cannot exceed the range of an eight-byte integer (-9223372036854775808 to 9223372036854775807).
Obviously. As documented in the manual that command can set minvalue or restart with a new value and manipulate many other properties that affect the number generation.
You should use identity columns instead
Not possible - that's not what sequences are intended for. See e.g. here for a possible implementation of a gapless number generator.

Way to migrate a create table with sequence from postgres to DB2

I need to migrate a DDL from Postgres to DB2, but I need that it works the same as in Postgres. There is a table that generates values from a sequence, but the values can also be explicitly given.
Postgres
create sequence hist_id_seq;
create table benchmarksql.history (
hist_id integer not null default nextval('hist_id_seq') primary key,
h_c_id integer,
h_c_d_id integer,
h_c_w_id integer,
h_d_id integer,
h_w_id integer,
h_date timestamp,
h_amount decimal(6,2),
h_data varchar(24)
);
(Look at the sequence call in the hist_id column to define the value of the primary key)
The business logic inserts into the table by explicitly providing an ID, and in other cases, it leaves the database to choose the number.
If I change this in DB2 to a GENERATED ALWAYS it will throw errors because there are some provided values. On the other side, if I create the table with GENERATED BY DEFAULT, DB2 will throw an error when trying to insert with the same value (SQL0803N), because the "internal sequence" does not take into account the already inserted values, and it does not retry with a next value.
And, I do not want to restart the sequence each time a provided ID was inserted.
This is the problem in BenchmarkSQL when trying to port it to DB2: https://sourceforge.net/projects/benchmarksql/ (File sqlTableCreates)
How can I implement the same database logic in DB2 as it does in Postgres (and apparently in Oracle)?
You're operating under a misconception: that sources external to the db get to dictate its internal keys. Ideally/conceptually, autogenerated ids will never need to be seen outside of the db, as conceptually there should be unique natural keys for export or reporting. Still, there are times when applications will need to manage some ids, often when setting up related entities (eg, JPA seems to want to work this way).
However, if you add an id value that you generated from a different source, the db won't be able to manage it. How could it? It's not efficient - for one thing, attempting to do so would do one of the following
Be unsafe in the face of multiple clients (attempt to add duplicate keys)
Serialize access to the table (for a potentially slow query, too)
(This usually shows up when people attempt something like: SELECT MAX(id) + 1, which would require locking the entire table for thread safety, likely including statements that don't even touch that column. If you try to find any "first-unused" id - trying to fill gaps - this gets more complicated and problematic)
Neither is ideal, so it's best to not have the problem in the first place. This is usually done by having id columns be autogenerated, but (as pointed out earlier) there are situations where we may need to know what the id will be before we insert the row into the table. Fortunately, there's a standard SQL object for this, SEQUENCE. This provides a db-managed, thread-safe, fast way to get ids. It appears that in PostgreSQL you can use sequences in the DEFAULT clause for a column, but DB2 doesn't allow it. If you don't want to specify an id every time (it should be autogenerated some of the time), you'll need another way; this is the perfect time to use a BEFORE INSERT trigger;
CREATE TRIGGER Add_Generated_Id NO CASCADE BEFORE INSERT ON benchmarksql.history
NEW AS Incoming_Entity
FOR EACH ROW
WHEN Incoming_Entity.id IS NULL
SET id = NEXTVAL FOR hist_id_seq
(something like this - not tested. You didn't specify where in the project this would belong)
So, if you then add a row with something like:
INSERT INTO benchmarksql.history (hist_id, h_data) VALUES(null, 'a')
or
INSERT INTO benchmarksql.history (h_data) VALUES('a')
an id will be generated and attached automatically. Note that ALL ids added to the table must come from the given sequence (as #mustaccio pointed out, this appears to be true even in PostgreSQL), or any UNIQUE CONSTRAINT on the column will start throwing duplicate-key errors. So any time your application needs an id before inserting a row in the table, you'll need some form of
SELECT NEXT VALUE FOR hist_id_seq
FROM sysibm.sysdummy1
... and that's it, pretty much. This is completely thread and concurrency safe, will not maintain/require long-term locks, nor require serialized access to the table.

PostgreSQL: How to store UUID value?

I am trying to store UUID value into my table using PostgreSQL 9.3 version.
Example:
create table test
(
uno UUID,
name text,
address text
);
insert into test values(1,'abc','xyz');
Note: How to store integer value into UUID type?
The whole point of UUIDs is that they are automatically generated because the algorithm used virtually guarantees that they are unique in your table, your database, or even across databases. UUIDs are stored as 16-byte datums so you really only want to use them when one or more of the following holds true:
You need to store data indefinitely, forever, always.
When you have a highly distributed system of data generation (e.g. INSERTS in a single "system" which is distributed over multiple machines each having their own local database) where central ID generation is not feasible (e.g. a mobile data collection system with limited connectivity, later uploading to a central server).
Certain scenarios in load-balancing, replication, etc.
If one of these cases applies to you, then you are best off using the UUID as a primary key and have it generated automagically:
CREATE TABLE test (
uno uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
name text,
address text
);
Like this you never have to worry about the UUIDs themselves, it is all done behind the scenes. Your INSERT statement would become:
INSERT INTO test (name, address) VALUES ('abc','xyz') RETURNING uno;
With the RETURNING clause obviously optional if you want to use that to reference related data.
It's not allowed to simply cast an integer into a UUID type. Generally, UUIDs are generated either internally in Postgres (see http://www.postgresql.org/docs/9.3/static/uuid-ossp.html for more detail on ways to do this) or via a client program.
If you just want unique IDs for your table itself, but don't actually need UUIDs (those are geared toward universal uniqueness, across tables and servers, etc.), you can use the serial type, which creates an implicit sequence for you which automatically increments when you INSERT into a table.

postgresql hstore key/value vs traditional SQL performance

I need to develop a key/value backend, something like this:
Table T1 id-PK, Key - string, Value - string
INSERT into T1('String1', 'Value1')
INSERT INTO T1('String1', 'Value2')
Table T2 id-PK2, id2->external key to id
some other data in T2, which references data in T1 (like users which have those K/V etc)
I heard about PostgreSQL hstore with GIN/GIST. What is better (performance-wise)?
Doing this the traditional way with SQL joins and having separate columns(Key/Value) ?
Does PostgreSQL hstore perform better in this case?
The format of the data should be any key=>any value.
I also want to do text matching e.g. partially search for (LIKE % in SQL or using the hstore equivalent).
I plan to have around 1M-2M entries in it and probably scale at some point.
What do you recommend ? Going the SQL traditional way/PostgreSQL hstore or any other distributed key/value store with persistence?
If it helps, my server is a VPS with 1-2GB RAM, so not a pretty good hardware. I was also thinking to have a cache layer on top of this, but I think it rather complicates the problem. I just want good performance for 2M entries. Updates will be done often but searches even more often.
Thanks.
Your question is unclear because your not clear about your objective.
The key here is the index (pun intended) - if your dealing with a large amount of keys you want to be able to retrieve them with a the least lookups and without pulling up unrelated data.
Short answer is you probably don't want to use hstore, but lets look into more detail...
Does each id have many key/value pairs (hundreds+)? Don't use hstore.
Will any of your values contain large blocks of text (4kb+)? Don't use hstore.
Do you want to be able to search by keys in wildcard expressions? Don't use hstore.
Do you want to do complex joins/aggregation/reports? Don't use hstore.
Will you update the value for a single key? Don't use hstore.
Multiple keys with the same name under an id? Can't use hstore.
So what's the use of hstore? Well, one good scenario would be if you wanted to hold key/value pairs for an external application where you know you always want to retrive all key/values and will always save the data back as a block (ie, it's never edited in-place). At the same time you do want some flexibility to be able to search this data - albiet very simply - rather than storing it in say a block of XML or JSON. In this case since the number of key/value pairs are small you save on space because your compressing several tuples into one hstore.
Consider this as your table:
CREATE TABLE kv (
id /* SOME TYPE */ PRIMARY KEY,
key_name TEXT NOT NULL,
key_value TEXT,
UNIQUE(id, key_name)
);
I think the design is poorly normalized. Try something more like this:
CREATE TABLE t1
(
t1_id serial PRIMARY KEY,
<other data which depends on t1_id and nothing else>,
-- possibly an hstore, but maybe better as a separate table
t1_props hstore
);
-- if properties are done as a separate table:
CREATE TABLE t1_properties
(
t1_id int NOT NULL REFERENCES t1,
key_name text NOT NULL,
key_value text,
PRIMARY KEY (t1_id, key_name)
);
If the properties are small and you don't need to use them heavily in joins or with fancy selection criteria, and hstore may suffice. Elliot laid out some sensible things to consider in that regard.
Your reference to users suggests that this is incomplete, but you didn't really give enough information to suggest where those belong. You might get by with an array in t1, or you might be better off with a separate table.

How to get the key fields for a table in plpgsql function?

I need to make a function that would be triggered after every UPDATE and INSERT operation and would check the key fields of the table that the operation is performed on vs some conditions.
The function (and the trigger) needs to be an universal one, it shouldn't have the table name / fields names hardcoded.
I got stuck on the part where I need to access the table name and its schema part - check what fields are part of the PRIMARY KEY.
After getting the primary key info as already posted in the first answer you can check the code in http://github.com/fgp/pg_record_inspect to get record field values dynamicaly in PL/pgSQL.
Have a look at How do I get the primary key(s) of a table from Postgres via plpgsql? The answer in that one should be able to help you.
Note that you can't use dynamic SQL in PL/pgSQL; it's too strongly-typed a language for that. You'll have more luck with PL/Perl, on which you can access a hash of the columns and use regular Perl accessors to check them. (PL/Python would also work, but sadly that's an untrusted language only. PL/Tcl works too.)
In 8.4 you can use EXECUTE 'something' USING NEW, which in some cases is able to do the job.