Building a primary key with Json columns - postgresql-9.4

I am trying to set a unique constraint across rows, in which some of them are JSON data types. Since there's no way to make a JSON column a primary key, I thought maybe I can hash the desired columns and build a primary key on the hash. For example:
CREATE TABLE petshop(
name text,
fav_food jsonb,
md5sum uuid);
I can do the following:
SELECT md5(name||fav_food::text) FROM petshop;
But I want that to be performed by default and/or with a trigger which insert the md5 sum into the column md5sum. And then build a pkey on that column.
But really, I just want to know if the JSON object is unique, and not restrict the keys in the JSON. So if anyone has a better idea, helps!

Related

Postgres JSONB unique constraint

I have a table as following table.
create table person {
firstname varchar,
lastname varchar,
person_info jsonb,
..
}
I already have unique constraints on firstname + lastname. I recently identify there is always something different in person_info jsonb. I want to uniquely identify by person_info jsonb.
Should I add person_info as part of unique constraints firstname + lastname + person_info ? Is there any performance impact with such implementation ? I heard JSONB is not good for index when number of data increases.
I am thinking to use store person_info hashvalue in different field and combine this new hashvalue field as part of unique index.
I would appreciate if I get some help from expert on this.
This seems like a wrong idea.
A primary key should be immutable and uniquely identify a table row.
Names are not good for that, because
different people can have the same name
names can change
This is probably why you are tempted to add additional information to truly identify each individual row.
Unless you have some immutable attribute that uniquely identifies each person (such as the social security nubmer), you should generate an artificial primary key for the table:
ALTER TABLE person
ADD id bigint
GENERATED ALWAYS AS IDENTITY
PRIMARY KEY;
Indexing a jsonb is possible, but you will get problems with long values since index entries are limited in size, and you will get an error if you exceed the limit.
I recommend that any attribute that you might want to index is not stored in a jsonb, but as a regular table column.
JSONB indexing IMHO refers to the ability to index fields inside the binary JSON rather than the whole block. Be aware also that key ordering is not kept! So if you can obtain two different hashes for two json with the exact same data but different ordering. Instead, if you can find which json fields gives you uniqueness, than you can use directly those for indexing.
Try also to look at this page

Generate column value automatically from other columns values and be used as PRIMARY KEY

I have a table with a column named "source" and "id". This table is populated from open data DB.
"id" can't be UNIQUE, since my data came from other db with their own id system. There is a real risk to have same id but really different data.
I want to create another column which combine source and id into a single value.
"openDataA" + 123456789 -> "openDataA123456789"
"openDataB" + 123456789 -> "openDataB123456789"
I have seen example that use || and function to concatenate value. This is good, but I want to make this third column my PRIMARY KEY, to avoid duplicate, and create a really unique id that I can query without much computation and that I can use as a foreign key constraint for other table.
I think Composite Types is what I'm looking for, but instead of setting the value manually each time, I want to grab them automatically by setting only "source" and "id"
I'm fairly new to postgresql, so any help is welcome.
Thank you.
You could just have a composite key in your table:
CREATE TABLE mytable (
source VARCHAR(10),
id VARCHAR(10),
PRIMARY KEY (source, id)
);
If you really want a joined column, you could create a view to display it:
CREATE VIEW myview AS
SELECT *, source || id AS primary_key
FROM mytable;

redshift copy using amazon pipeline fails for missing primary key

I have a set of files on S3 that I am trying to load into redshift.
I am using the amazon data pipeline to do it. the wizard took the cluster, db and file format info but I get errors that a primary key is needed to keep existing fields in th table (KEEP_EXISTING) on the table
My table schema is:
create table public.Bens_Analytics_IP_To_FileName(
Day date not null encode delta32k,
IP varchar(30) not null encode text255,
FileName varchar(300) not null encode text32k,
Count integer not null)
distkey(Day)
sortkey(Day,IP);
so then I added a composite primary key on the table to see if it will work, but I get the same error.
create table public.Bens_Analytics_IP_To_FileName(
Day date not null encode delta32k,
IP varchar(30) not null encode text255,
FileName varchar(300) not null encode text32k,
Count integer not null,
primary key(Day,IP,FileName))
distkey(Day)
sortkey(Day,IP);
So I decided to add an identity column as the last column and made it the primary key but then the COPY operation wants a value in the input files for that identity column which did not make much sense
ideally I want it to work without a primary key or a composite primary key
any ideas?
Thanks
Documentation is not in a great condition. They have added a 'mergeKey' concept that can be any arbitrary key (announcement, docs). You should not have to define a primary key on table with this.
But you would still need to supply a key to perform join between your new data coming in and the existing data in redshift table.
In Edit Pipeline, under Parameters, there is a field named: myPrimaryKeys (optional). Enter you Pk there, instead of adding it to your table definition.

what is the right data type for unique key in postgresql DB?

which data type should I choose for a unique key (id of a user for example) in postgresql database's table?
does bigint is the one?
thanks
Use the serial type for automatically incrementing unique ids.
If you plan to have more than two billion entries, use bigserial. serial is the PostgresSQL equivalent of MySQL's AUTO_INCREMENT.
PostgresSQL Documentation: Numeric Types
bigint (or bigserial if you need auto-incrementing keys) is just fine.
If know for certain that you are not going to load too many rows, you might consider integer (or a regular serial) and potentially save some harddisk space.
According to this answer the current recommended approach to doing auto-increment unique IDs is to use the generated as identity syntax instead of serial.
Here's an example:
-- the old way
create table t1 (id serial primary key);
-- the new way
create table t2 (id integer primary key generated always as identity);

Can I create an index on User-defined Table variables?

Just wanted to check, if we will be able to create indexes on User-defined Table variables. I know that we can create PK on an UDT. Does it imply that PK creates an (clustered) index internally? If an index is possible on a column on UDT, where does the indexed data get stored?
To define an index on a table variable use a primary key or unique constraint. You can nominate one as clustered.
If you need an index on a non-unique field, simply add the unique key to the end of the index column list, to make it unique.
If the table variable has not got a unique field, add a dummy unique field using an identity column.
Something like this:
declare #t table (
dummy identity primary key nonclustered,
val1 nvarchar(50),
val2 nvarchar(50),
unique clustered (val1, dummy)
)
Now you have a table variable with a clustered index on non-unique field val1.
With table variables, you can define primary key and unique constraints, but you are unable to define
any clustering behaviour. The indexes for these are stored alongside the actual data in the table variable - hopefully in memory within tempdb, but if necessary, spilled to disk, if memory pressure is high.
You're unable to define arbitrary indexes on such tables.
You can however define whatever indexes you want on temp tables.