Unique partial composite primary key in Postgres - postgresql

I'm guessing the answer is no, but... is it possible to enforce uniqueness on only part of a composite primary key?
create table foo (
id integer,
yesno boolean,
extra text,
primary key (id, yesno, extra)
)
The idea here is that I want id + yesno to be unique for this particular table, but I want to include extra in the index so I can take advantage of Postgres index-only scans.
Yes, I could create a second, unique index on id + yesno, but that would be wasteful.

You can use the INCLUDE option to add extra columns in the index that are not actually part of the index itself.
create table foo (
id integer not null,
yesno boolean not null,
extra text
);
Create unique index foo_uk
on foo (id, yesno)
include (extra);
You did not indicate what Postgres version you have, so this may not be appropriate, as you need at least version 11.

Related

postgres indexing all columns of composite primary key

in postgres, just checking if we need to index all columns of composite primary key
CREATE TABLE BOOK_TYPE(
ID TEXT NOT NULL,
TYPE TEXT NOT NULL,
LABELS HSTORE NOT NULL,
CONSTRAINT BOOK_TYPE_PKEY PRIMARY KEY (ID,TYPE)
);
should I have to index ID and type separately?
You don't need to create any extra index unless you happen to need it to speed up a query. The primary key will automatically create a unique index on (id, type), and that is all that is needed to guarantee consistency.

Table with SERIAL field gets indexes automatically in PostgreSQL?

I am working on a database and I need to add indexes to it.
I have a table like this.
CREATE TABLE Customers
(
Id SERIAL PRIMARY KEY,
FirstName CHARACTER VARYING(30),
LastName CHARACTER VARYING(30),
Email CHARACTER VARYING(30),
Age INTEGER
);
CREATE UNIQUE INDEX customers_idx ON Customers (Id);
Do I need to add indexes to it if there is a field with the SERIAL attribute?
As per the documentation,
Adding a primary key will automatically create a unique B-tree index on the column or group of columns listed in the primary key, and will force the column(s) to be marked NOT NULL.
So no, in this case you do not need to create the customers_idx yourself, because you defined that column as a primary key. However, the serial type itself (if the column isn't a primary key or unique) does NOT automatically come with an index.

Using tableoids as foreign keys in Postgresql

I was wondering whether there was any way possible to reference tableoid's as foreign keys in an inheritance relationship. For example:
CREATE TABLE employee
(
name TEXT,
PRIMARY KEY(name, TABLEOID)
);
CREATE TABLE hourly_employee
(
hours_worked INT,
PRIMARY KEY(name)
) INHERITS(employee);
CREATE TABLE salaried_employee
(
anniversary_date DATE,
PRIMARY KEY(name)
) INHERITS(employee);
CREATE TABLE employee_training
(
training_id SERIAL,
due_date DATE,
employee_name TEXT,
emp_oid OID,
PRIMARY KEY(training_id),
FOREIGN KEY(employee_name, emp_oid) REFERENCES employee(name, TABLEOID)
);
INSERT INTO hourly_employee (name, hours_worked) VALUES ('Joe Smith', 40);
INSERT INTO salaried_employee(name, anniversary_date) VALUES ('Bob Brown', '2014-02-20');
INSERT INTO employee_training (due_date, employee_name, emp_oid) VALUES ('2016-08-16', 'Bob Brown', 'salaried_employee'::REGCLASS);
In this example, the foreign key is created without a problem, but the last insert will fail with the error Key (employee_name, emp_oid)=(Bob Brown, 16403) is not present in table "employee" even though I can confirm that 16403 is the correct tableoid for salaried_employee.
Is there any way to make this work?
Sadly inheritance has some serious limitations. Several elements (including unique indexes / foreign keys) only apply to one table and not the children. Personally I've found it much less useful than I'd have liked it to be.
I know its annoying to suggest you re-design but in my opinion you'd be better to have a single table employee with optional columns instead of the parent / child relations.
CREATE TABLE employee
(
name TEXT,
employee_type TEXT,
hours_worked INT,
anniversary_date DATE,
PRIMARY KEY(name, TABLEOID)
);
In the long run you often find the code becomes simpler and frankly much more portable between DBMS as well.
You can ensure the correct fields have been entered for the correct type using constraints to manage which fields are mandatory for each type.
Eg:
ALTER TABLE employee ADD CHECK (
(type = 'hourly' and hours worked is not null)
or (type = 'salaried' and anniversary_date is not null))

Composite PRIMARY KEY enforces NOT NULL constraints on involved columns

This is one strange, unwanted behavior I encountered in Postgres:
When I create a Postgres table with composite primary keys, it enforces NOT NULL constraint on each column of the composite combination.
For example,
CREATE TABLE distributors (m_id integer, x_id integer, PRIMARY KEY(m_id, x_id));
enforces NOT NULL constraint on columns m_id and x_id, which I don't want!
MySQL doesn't do this. I think Oracle doesn't do it as well.
I understand that PRIMARY KEY enforces UNIQUE and NOT NULL automatically but that makes sense for single-column primary key. In a multi-column primary key table, the uniqueness is determined by the combination.
Is there any simple way of avoiding this behavior of Postgres? When I execute this:
CREATE TABLE distributors (m_id integer, x_id integer);
I do not get any NOT NULL constraints of course. But I would not have a primary key either.
If you need to allow NULL values, use a UNIQUE constraint (or index) instead of a PRIMARY KEY (and add a surrogate PK column - I suggest a serial or IDENTITY column in Postgres 10 or later).
Auto increment table column
A UNIQUE constraint allows columns to be NULL:
CREATE TABLE distributor (
distributor_id GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, m_id integer
, x_id integer
, UNIQUE(m_id, x_id) -- !
-- , CONSTRAINT distributor_my_name_uni UNIQUE (m_id, x_id) -- verbose form
);
The manual:
For the purpose of a unique constraint, null values are not considered equal, unless NULLS NOT DISTINCT is specified.
In your case, you could enter something like (1, NULL) for (m_id, x_id) any number of times without violating the constraint. Postgres never considers two NULL values equal - as per definition in the SQL standard.
If you need to treat NULL values as equal (i.e. "not distinct") to disallow such "duplicates", I see two three (since Postgres 15) options:
0. NULLS NOT DISTINCT
This option was added with Postgres 15 and allows to treat NULL values as "not distinct", so two of them conflict in a unique constraint or index. This is the most convenient option, going forward. The manual:
That means even in the presence of a unique constraint it is possible
to store duplicate rows that contain a null value in at least one of
the constrained columns. This behavior can be changed by adding the
clause NULLS NOT DISTINCT ...
Detailed instructions:
Create unique constraint with null columns
1. Two partial indexes
In addition to the UNIQUE constraint above:
CREATE UNIQUE INDEX dist_m_uni_idx ON distributor (m_id) WHERE x_id IS NULL;
CREATE UNIQUE INDEX dist_x_uni_idx ON distributor (x_id) WHERE m_id IS NULL;
But this gets out of hands quickly with more than two columns that can be NULL. See:
Create unique constraint with null columns
2. A multi-column UNIQUE index on expressions
Instead of the UNIQUE constraint. We need a free default value that is never present in involved columns, like -1. Add CHECK constraints to disallow it:
CREATE TABLE distributor (
distributor serial PRIMARY KEY
, m_id integer
, x_id integer
, CHECK (m_id &lt> -1)
, CHECK (x_id &lt> -1)
);
CREATE UNIQUE INDEX distributor_uni_idx
ON distributor (COALESCE(m_id, -1), COALESCE(x_id, -1));
When you want a polymorphic relation
Your table uses column names that indicate that they are probably references to other tables:
CREATE TABLE distributors (m_id integer, x_id integer);
So I think you probably are trying to model a polymorphic relation to other tables – where a record in your table distributors can refer to one m record xor one x record.
Polymorphic relations are difficult in SQL. The best resource I have seen about this topic is "Modeling Polymorphic Associations in a Relational Database". There, four alternative options are presented, and the recommendation for most cases is called "Exclusive Belongs To", which in your case would lead to a table like this:
CREATE TABLE distributors (
id serial PRIMARY KEY,
m_id integer REFERENCES m,
x_id integer REFERENCES x,
CHECK (
((m_id IS NOT NULL)::integer + (x_id IS NOT NULL)::integer) = 1
)
);
CREATE UNIQUE INDEX ON distributors (m_id) WHERE m_id IS NOT NULL;
CREATE UNIQUE INDEX ON distributors (x_id) WHERE x_id IS NOT NULL;
Like other solutions, this uses a surrogate primary key column because primary keys are enforced to not contain NULL values in the SQL standard.
This solution adds a 4th option to the three in #Erwin Brandstetter's answer for how to avoid the case where "you could enter something like (1, NULL) for (m_id, x_id) any number of times without violating the constraint." Here, that case is excluded by a combination of two measures:
Partial unique indexes on each column individually: two records (1, NULL) and (1, NULL) would not violate the constraint on the second column as NULLs are considered distinct, but they would violate the constraint on the first column (two records with value 1).
Check constraint: The missing piece is preventing multiple (NULL, NULL) records, still allowed because NULLs are considered distinct, and anyway because our partial indexes do not cover them to save space and write events. This is achieved by the CHECK constraint, which prevents any (NULL, NULL) records by making sure that exactly one column is NULL.
There's one difference though: all alternatives in #Erwin Brandstetter's answer allow at least one record (NULL, NULL) and any number of records with no NULL value in any column (like (1, 2)). When modeling a polymorphic relation, you want to disallow such records. That is achieved by the check constraint in the solution above.

Do I need a primary key for my table, which has a UNIQUE (composite 4-columns), one of which can be NULL?

I have the following table (PostgreSQL 8.3) which stores prices of some products. The prices are synchronised with another database, basically most of the fields below (apart from one) are not updated by our client - but instead dropped and refreshed every once-in-a-while to sync with another stock database:
CREATE TABLE product_pricebands (
template_sku varchar(20) NOT NULL,
colourid integer REFERENCES colour (colourid) ON DELETE CASCADE,
currencyid integer NOT NULL REFERENCES currency (currencyid) ON DELETE CASCADE,
siteid integer NOT NULL REFERENCES site (siteid) ON DELETE CASCADE,
master_price numeric(10,2),
my_custom_field boolean,
UNIQUE (template_sku, siteid, currencyid, colourid)
);
On the synchronisation, I basically DELETE most of the data above except for data WHERE my_custom_field is TRUE (if it's TRUE, it means the client updated this field via their CMS and therefore this record should not be dropped). I then INSERT 100s to 1000s of rows into the table, and UPDATE where the INSERT fails (i.e. where the combination of (template_sku, siteid, currencyid, colourid) already exists).
My question is - what best practice should be applied here to create a primary key? Is a primary key even needed? I wanted to make the primary key = (template_sku, siteid, currencyid, colourid) - but the colourid field can be NULL, and using it in a composite primary key is not possible.
From what I read on other forum posts, I think I have done the above correctly, and just need to clarify:
1) Should I use a "serial" primary key just in case I ever need one? At the moment I don't, and don't think I ever will, because the important data in the table is the price and my custom field, only identified by the (template_sku, siteid, currencyid, colourid) combination.
2) Since (template_sku, siteid, currencyid, colourid) is the combination that I will use to query a product's price, should I add any further indexing to my columns, such as the "template_sku" which is a varchar? Or is the UNIQUE constraint a good index already for my SELECTs?
Should I use a "serial" primary key just in case I ever need one?
You can easily add a serial column later if you need one:
ALTER TABLE product_pricebands ADD COLUMN id serial;
The column will be filled with unique values automatically. You can even make it the primary key in the same statement (if no primary key is defined, yet):
ALTER TABLE product_pricebands ADD COLUMN id serial PRIMARY KEY;
If you reference the table from other tables I would advise to use such a surrogate primary key, because it is rather unwieldy to link by four columns. It is also slower in SELECTs with JOINs.
Either way, you should define a primary key. The UNIQUE index including a nullable column is not a full replacement. It allows duplicates for combinations including a NULL value, because two NULL values are never considered the same. This can lead to trouble.
As
the colourid field can be NULL
you might want to create two unique indexes. The combination (template_sku, siteid, currencyid, colourid) cannot be a PRIMARY KEY, because of the nullable colourid, but you can create a UNIQUE constraint like you already have (implementing an index automatically):
ALTER TABLE product_pricebands ADD CONSTRAINT product_pricebands_uni_idx
UNIQUE (template_sku, siteid, currencyid, colourid)
This index perfectly covers the queries you mention in 2).
Create a partial unique index in addition if you want to avoid "duplicates" with (colourid IS NULL):
CREATE UNIQUE INDEX product_pricebands_uni_null_idx
ON product_pricebands (template_sku, siteid, currencyid)
WHERE colourid IS NULL;
To cover all bases. I wrote more about that technique in a related answer on dba.SE.
The simple alternative to the above is to make colourid NOT NULL and create a primary key instead of the above product_pricebands_uni_idx.
Also, as you
basically DELETE most of the data
for your refill operation, it will be faster to drop indexes, that are not needed during the refill operation, and recreate those afterwards. It is faster by an order of magnitude to build an index from scratch than to add all rows incrementally.
How do you know, which indexes are used (needed)?
Test your queries with EXPLAIN ANALYZE.
Or use the built-in statistics. pgAdmin displays statistics in a separate tab for the selected object.
It may also be faster to select the few rows with my_custom_field = TRUE into a temporary table, TRUNCATE the base table and re-INSERT the survivors. Depends on whether you have foreign keys defined. Would look like this:
CREATE TEMP TABLE pr_tmp AS
SELECT * FROM product_pricebands WHERE my_custom_field;
TRUNCATE product_pricebands;
INSERT INTO product_pricebands SELECT * FROM pr_tmp;
This avoids a lot of vacuuming.