postgres prefixed index usage - postgresql

Setting a primary key, a composite of 3 columns, generates an index, which can be viewed with:
select t.relname as tbl, i.relname as idx, a.attname as col
from pg_class t, pg_class i, pg_index ix, pg_attribute a
where t.oid = ix.indrelid
and i.oid = ix.indexrelid
and a.attrelid = t.oid
and a.attnum = any(ix.indkey)
and t.relkind = 'r'
and t.relname not like 'pg%'
order by t.relname, i.relname;
The table is "customer" as defined in the TPC-C benchmarking guide. The question I have is, when creating the foreign key on the table, as necessitated by the guide, does one need to create a corresponding index.
Given that the 2 columns for the foreign key match the first two columns of the primary key, would the index generated as part of the primary key constraint suffice?
Table & key DDL:
create table customer (c_id numeric,c_d_id numeric,c_w_id numeric ..);
alter table customer add constraint pk_customer
primary key (c_w_id, c_d_id, c_id) ;
alter table customer add constraint fk_cust_district
foreign key (c_w_id, c_id) references district (d_w_id, d_id);
The reason for the question is that in Oracle and SQL Anywhere one need not create an index and the respective optimizers would make use of whichever index assisted with improved referral performance, in this case, the 2 column prefix of the index generated as part of the primary key constraint.

The way your tables are created, the primary key index cannot support the foreign key constraint well because the columns of the foreign key definition are not at the beginning of the primary key constraint definition.
The primary key index is better than nothing, though: at least it can be scanned for c_w_id (with c_id as a filter), but not for both columns at the same time, as would be most efficient.
So PostgreSQL will make use of the index at hand, but it will still not be very efficient.
Unless there is a good reason that the primary key columns are defined in this order, I suggest that you swap the second and third column in the primary key definition. Then the index is a perfect fit for the foreign key constraint.
If that is not feasible, create a second index on (c_w_id, c_id).
(This would be just the same on Oracle, by the way, except that they have an index skip scan of – in my opinion – questionable merits.)

Related

How to use ON CONFLICT with a primary key on a foreign table

I am basically trying to replicate data from a table on one server to another.
I have two identical databases on the servers. I created a foreign table called opentickets_aux1 to represent the opentickets table on the secondary server on the primary server. Both have a primary key of incidentnumber. I can access the data in the foreign table just fine but when I try the following SQL,I get "ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification."
INSERT INTO opentickets_aux1 (SELECT * FROM opentickets)
ON CONFLICT (incidentnumber)
DO
UPDATE SET
status = EXCLUDED.status,
lastmodifieddate = EXCLUDED.lastmodifieddate
I want to update a few columns if the primary key exist. I use this statement for other queries and they work when its a local table. Any ideas?
A foreign table cannot have a primary key constraint, because PostgreSQL wouldn't be able to enforce its integrity. Therefore, you cannot use INSERT ... ON CONFLICT with foreign tables.
Your idea also does not handle rows that are deleted on the foreign server, but maybe that's intentional.
If you want a local copy of a foreign table, the easiest way would be to create a materialized view on the foreign table.
If that is not your desire (perhaps because you don't want to copy deletions), you'd have to use statements like
INSERT INTO localtable
SELECT * FROM foreigntable f
WHERE NOT EXISTS
(SELECT 1 FROM localtable l
WHERE f.id = l.id);
UPDATE localtable l
SET /* all columns from f */
FROM foreigntable f
WHERE f.id = l.id
AND (f.*) <> (l.*);

DB2 find tables that reference my table

[db2-as400] I have a table ENR_DATA that has column EnrollmentID as a primary key. This column is referred by many tables as a "foreign key". Is there a way to list down all those tables who refer to EnrollmentID of ENR_DATA table?
There are a few catalog views that each give just a part of the answer, and you have to join them all together.
SYSCST provides a list of constraints with the constrain type. From here we can select out Foreign Key constraints. TABLE_NAME in this table is the table that contains the foreign key.
SYSKEYCST provides a list of columns for a given Foreign Key, Primary Key, or Unique constraint along with the ordinal position of the column in the key, and the associated table name.
SYSREFCST provides the name of the Primary or Unique Key constraint that is referenced by a given Foreign Key Constraint.
From these three tables we can write the following SQL:
select cst.constraint_schema, cst.constraint_name,
fk.table_schema, fk.table_name, fk.ordinal_position, fk.column_name,
pk.table_schema, pk.table_name, pk.column_name
from qsys2.syscst cst
join qsys2.syskeycst fk
on fk.constraint_schema = cst.constraint_schema
and fk.constraint_name = cst.constraint_name
join qsys2.sysrefcst ref
on ref.constraint_schema = cst.constraint_schema
and ref.constraint_name = cst.constraint_name
join qsys2.syskeycst pk
on pk.constraint_schema = ref.unique_constraint_schema
and pk.constraint_name = ref.unique_constraint_name
where cst.constraint_type = 'FOREIGN KEY'
and fk.ordinal_position = pk.ordinal_position
and pk.table_name = 'ENR_DATA'
and pk.column_name = 'ENROLLMENTID'
order by cst.constraint_schema, cst.constraint_name;
This will get you the table names that reference 'ENR_DATA' via foreign key. Note I have ENROLLMENTID in all upper case. That is how DB2 for i stores all column names unless they are quoted using "".
DB2 on IBM i (AS 400) offers a list of all system tables, the system catalog. It is the place where metadata is stored. One of the views, SYSCST, is the view with all constraints, the view SYSCSTCOL has information about the constraint columns, and SYSCSTDEP stores the dependencies.
So you would query SYSCST, SYSCSTCOL and SYSCSTDEP for finding the details.

Index on foreign keys

I'm just trying to best understand index.
On pg 106 of 70-461 - Querying Microsoft Sql Server 2012,
it says when a primary or unique constraint SQL Sever will automatically create a unique index.
But no index are created for foreign keys.
Therefore to make joins more efficient is it best to just create a non_clustered index on the foreign keys?
Not sure what part is the question.
An index is used to enforce a unique constraint.
A FK by nature does not require an index.
But if the FK has an index the query optimizer will often use it in the join.
In this query docMVEnum1.valueID is a FK with an index.
The query optimizer used that index.
Even with the index it was still the most expensive part of the query.
select docMVEnum1.sID, docEnum1.value
from docMVEnum1
join docEnum1
on docEnum1.valueID = docMVEnum1.valueID
Also by nature a FK is often used in a where clause.
Indexes are not free.
They improve select but slow down insert and update.
No, you don't need to create a index for the foreign keys, it will not promise that it will make joins more efficient.
The indexes for unique and PK are created to improve the INSERT and UPDATE performance.
While you are querying with JOIN it will use zero or one index to seek / scan the table.
Lets say that you have couple of tables like
MyTable
(
ID int (PK),
Description varchar(max),
ColumnFK int (FK to LookupTable)
)
Table LookupTable
(
ID int (PK),
Description varchar(max)
)
SELECT MyTable.ID, MyTable.Description, MyTable.ColumnFK, LookupTable.Description
FROM MyTable
INNER JOIN LookupTable
on LookupTable.ID = MyTable.ColumnFK,
WHERE ID between 5 and 10000
most probably is that the profiler will use index scan to find all the relevant IDs in MyTable so it will pick from Mytable columns ColumnFK1 and Description.
if you were thinking of adding the FK to the unique or pk just evaluate what happens if you are going to have many FK in the same table?
Note that intentionally I added to the predicate MyTable.Description and made it varchar(max) to show that you will reach the data for such a query.

Set column as primary key if the table doesn't have a primary key

I have a column in db which has 5 columns but no primary key.
One of the columns is named myTable_id and is integer.
I want to check if the table has a primary key column. If it doesn't, then make myTable_id a primary key column and make it identity column. Is there a way to do this?
I tried with this:
ALTER TABLE Persons
DROP CONSTRAINT pk_PersonID
ALTER TABLE Persons
ADD PRIMARY KEY (P_Id)
and I get syntax error in Management studio.
This checks if primary key exists, if not it is created
IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
WHERE CONSTRAINT_TYPE = 'PRIMARY KEY' AND TABLE_NAME = 'Persons'
AND TABLE_SCHEMA ='dbo')
BEGIN
ALTER TABLE Persons ADD CONSTRAINT pk_PersonID PRIMARY KEY (P_Id)
END
ELSE
BEGIN
-- Key exists
END
fiddle: http://sqlfiddle.com/#!6/e165d/2
ALTER TABLE Persons
ADD CONSTRAINT pk_PersonID PRIMARY KEY (P_Id)
An IDENTITY constraint can't be added to an existing column, so how you add this needs to be your initial thought. There are two options:
Create a new table including a primary key with identity and drop the existing table
Create a new primary key column with identity and drop the existing 'P_ID' column
There is a third way, which is a better approach for very large tables via the ALTER TABLE...SWITCH statement. See Adding an IDENTITY to an existing column for an example of each. In answer to this question, if the table isn't too large, I recommend running the following:
-- Check that the table/column exist and no primary key is already on the table.
IF COL_LENGTH('PERSONS','P_ID') IS NOT NULL
AND NOT EXISTS (SELECT 1 FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
WHERE CONSTRAINT_TYPE = 'PRIMARY KEY' AND TABLE_NAME = 'PERSONS')
-- Add table schema to the WHERE clause above e.g. AND TABLE_SCHEMA ='dbo'
BEGIN
ALTER TABLE PERSONS
ADD P_ID_new int IDENTITY(1, 1)
GO
ALTER TABLE PERSONS
DROP COLUMN P_ID
GO
EXEC sp_rename 'PERSONS.P_ID_new', 'P_ID', 'Column'
GO
ALTER TABLE PERSONS
ADD CONSTRAINT PK_P_ID PRIMARY KEY CLUSTERED (P_ID)
GO
END
Notes:
By explicitly using the CONSTRAINT keyword the primary key constraint is given a particular name rather than depending on SQL Server to auto-assign a name.
Only include CLUSTERED on the PRIMARY KEY if the balance of searches for a particular P_ID and the amount of writing outweighs the benefits of clustering the table by some other index. See Create SQL IDENTITY as PRIMARY KEY.
You can check if primary key exists or not using OBJECTPROPERTY Transact SQL, use 'TableHasPrimaryKey' for the second arguments.
DECLARE #ISHASPRIMARYKEY INT;
SELECT #ISHASPRIMARYKEY = OBJECTPROPERTY(OBJECT_ID('PERSONS'), 'TABLEHASPRIMARYKEY');
IF #ISHASPRIMARYKEY IS NULL
BEGIN
-- generate identity column
ALTER TABLE PERSONS
DROP COLUMN P_ID;
ALTER TABLE PERSONS
ADD P_ID INT IDENTITY(1,1);
-- add primary key
ALTER TABLE PERSONS
ADD CONSTRAINT PK_PERSONID PRIMARY KEY (P_ID);
END;
I don't think you can do that. For making a column into an identity column I think you have to drop the table entirely.

Do I need a primary key for my table, which has a UNIQUE (composite 4-columns), one of which can be NULL?

I have the following table (PostgreSQL 8.3) which stores prices of some products. The prices are synchronised with another database, basically most of the fields below (apart from one) are not updated by our client - but instead dropped and refreshed every once-in-a-while to sync with another stock database:
CREATE TABLE product_pricebands (
template_sku varchar(20) NOT NULL,
colourid integer REFERENCES colour (colourid) ON DELETE CASCADE,
currencyid integer NOT NULL REFERENCES currency (currencyid) ON DELETE CASCADE,
siteid integer NOT NULL REFERENCES site (siteid) ON DELETE CASCADE,
master_price numeric(10,2),
my_custom_field boolean,
UNIQUE (template_sku, siteid, currencyid, colourid)
);
On the synchronisation, I basically DELETE most of the data above except for data WHERE my_custom_field is TRUE (if it's TRUE, it means the client updated this field via their CMS and therefore this record should not be dropped). I then INSERT 100s to 1000s of rows into the table, and UPDATE where the INSERT fails (i.e. where the combination of (template_sku, siteid, currencyid, colourid) already exists).
My question is - what best practice should be applied here to create a primary key? Is a primary key even needed? I wanted to make the primary key = (template_sku, siteid, currencyid, colourid) - but the colourid field can be NULL, and using it in a composite primary key is not possible.
From what I read on other forum posts, I think I have done the above correctly, and just need to clarify:
1) Should I use a "serial" primary key just in case I ever need one? At the moment I don't, and don't think I ever will, because the important data in the table is the price and my custom field, only identified by the (template_sku, siteid, currencyid, colourid) combination.
2) Since (template_sku, siteid, currencyid, colourid) is the combination that I will use to query a product's price, should I add any further indexing to my columns, such as the "template_sku" which is a varchar? Or is the UNIQUE constraint a good index already for my SELECTs?
Should I use a "serial" primary key just in case I ever need one?
You can easily add a serial column later if you need one:
ALTER TABLE product_pricebands ADD COLUMN id serial;
The column will be filled with unique values automatically. You can even make it the primary key in the same statement (if no primary key is defined, yet):
ALTER TABLE product_pricebands ADD COLUMN id serial PRIMARY KEY;
If you reference the table from other tables I would advise to use such a surrogate primary key, because it is rather unwieldy to link by four columns. It is also slower in SELECTs with JOINs.
Either way, you should define a primary key. The UNIQUE index including a nullable column is not a full replacement. It allows duplicates for combinations including a NULL value, because two NULL values are never considered the same. This can lead to trouble.
As
the colourid field can be NULL
you might want to create two unique indexes. The combination (template_sku, siteid, currencyid, colourid) cannot be a PRIMARY KEY, because of the nullable colourid, but you can create a UNIQUE constraint like you already have (implementing an index automatically):
ALTER TABLE product_pricebands ADD CONSTRAINT product_pricebands_uni_idx
UNIQUE (template_sku, siteid, currencyid, colourid)
This index perfectly covers the queries you mention in 2).
Create a partial unique index in addition if you want to avoid "duplicates" with (colourid IS NULL):
CREATE UNIQUE INDEX product_pricebands_uni_null_idx
ON product_pricebands (template_sku, siteid, currencyid)
WHERE colourid IS NULL;
To cover all bases. I wrote more about that technique in a related answer on dba.SE.
The simple alternative to the above is to make colourid NOT NULL and create a primary key instead of the above product_pricebands_uni_idx.
Also, as you
basically DELETE most of the data
for your refill operation, it will be faster to drop indexes, that are not needed during the refill operation, and recreate those afterwards. It is faster by an order of magnitude to build an index from scratch than to add all rows incrementally.
How do you know, which indexes are used (needed)?
Test your queries with EXPLAIN ANALYZE.
Or use the built-in statistics. pgAdmin displays statistics in a separate tab for the selected object.
It may also be faster to select the few rows with my_custom_field = TRUE into a temporary table, TRUNCATE the base table and re-INSERT the survivors. Depends on whether you have foreign keys defined. Would look like this:
CREATE TEMP TABLE pr_tmp AS
SELECT * FROM product_pricebands WHERE my_custom_field;
TRUNCATE product_pricebands;
INSERT INTO product_pricebands SELECT * FROM pr_tmp;
This avoids a lot of vacuuming.