Composite key with user-supplied string column, foreign keys - postgresql

Let's say I have the following table
TABLE subgroups (
group_id t_group_id NOT NULL REFERENCES groups(group_id),
subgroup_name t_subgroup_name NOT NULL,
more attributes ...
)
subgroup_name is UNIQUE to a group(group_id).
A group can have many subgroups.
The subgroup_names are user-supplied. (I would like to avoid using a subgroup_id column. subgroup_name has meaning in the model and is more than just a label, I am providing a list of predetermined names but allow a user to add his owns for flexibility).
This table has 2 levels of referencing child tables containing subgroup attributes (with many-to-one relations);
I would like to have a PRIMARY KEY on (group_id, upper(trim(subgroup_name)));
From what I know, postgres doesn't allow to use PRIMARY KEY/UNIQUE on a function.
IIRC, the relational model also requires columns to be used as stored.
CREATE UNIQUE INDEX ON subgroups (group_id, upper(trim(subgroup_name))); doesn't solve my problem
as other tables in my model will have FOREIGN KEYs pointing to those two columns.
I see two options.
Option A)
Store a cleaned up subgroup name in subgroup_name
Add an extra column called subgroup_name_raw that would contained the uncleaned string
Option B)
Create both a UNIQUE INDEX and PRIMARY KEY on my key pair. (seems like a huge waste)
Any insights?
Note: I'm using Postgres 9.2

Actually you can do a UNIQUE constraint on the output of a function. You can't do it in the table definition though. What you need to do is create a unique index after. So something like:
CREATE UNIQUE INDEX subgroups_ukey2 ON subgroups(group_id, upper(trim(subgroup_name)));
PostgreSQL has a number of absolutely amazing indexing capabilities, and the ability to create unique (and partial unique) indexes on function output is quite underrated.

Related

Is it bad for columns in composite keys to have mismatching types?

Problem:
I'd like to make a composite primary key from columns id and user_id for a postgres database table. Column user_id is a foreign key with an integer type, whereas id is a string. Will this cause a conflict because the types are different?
Edit: Also, are there combinations of types that would cause problems?
Context:
I obviously should match the type of the User.id field for its foreign key. And, the id for my table will be derived from a uuid to prevent data leaks. So I would prefer not to change the types of either field I want in this table.
Research:
I am using sqlalchemy. Their documentation mentions how to create a composite primary key, but it doesn't discuss dealing with different types for each column.
No, this won't be a problem.
Your question seems to indicate that you think, the values of the indexed columns are somehow concatenated and then stored in the index as a single value. This is not the case. Each column value is stored independently but together. Similar to the way the column values are stored in the actual table.

Setting constraint for two unique fields in PostgreSQL

I'm new to postgres. I wonder, what is a PostgreSQL way to set a constraint for a couple of unique values (so that each pair would be unique). Should I create an INDEX for bar and baz fields?
CREATE UNIQUE INDEX foo ON table_name(bar, baz);
If not, what is a right way to do that? Thanks in advance.
If each field needs to be unique unto itself, then create unique indexes on each field. If they need to be unique in combination only, then create a single unique index across both fields.
Don't forget to set each field NOT NULL if it should be. NULLs are never unique, so something like this can happen:
create table test (a int, b int);
create unique index test_a_b_unq on test (a,b);
insert into test values (NULL,1);
insert into test values (NULL,1);
and get no error. Because the two NULLs are not unique.
You can do what you are already thinking of: create a unique constraint on both fields. This way, a unique index will be created behind the scenes, and you will get the behavior you need. Plus, that information can be picked up by information_schema to do some metadata inferring if necessary on the fact that both need to be unique. I would recommend this option. You can also use triggers for this, but a unique constraint is way better for this specific requirement.

Is this kind of DB relation design favourable and correct? Should it be converted to a no-sql solution?

First of all, I did my research but being rather a newbie, I am not that well acquainted with words so might have failed in founding the correct ones. I beg your pardon in case of a possible duplicate.
Question #1:
I have a table consisting of ID [PK] and LABEL [Varchar 128]. Each record (row) here is unique. What I want is, to define relations between these LABELS.
Requisite:
There will be an n amount of groups, each group containing one or more of these LABELS. In each group, each LABEL can either exist or not exist (meaning a group does not have 2x of same LABEL).
How should I define this relation?
I thought of creating another table with ID [PK] - Group ID [randomly assigned unique key] - LABEL_ID [ID of Labels table pointing to a single Label]
Is this correct and favourable? If a group has 10 LABELS then there will be 10 records with unique ID, same uniquely assigned Group ID and LABEL_ID pointing to LABELS table.
Question #2:
Should I let go of the Relational solution (as described above) and opt for a NoSQL solution? Where Each group is stored on it's own as a single entry into the database with an ID [PK] - Data [Containing either labels or IDs of labels pointing to the Label table]?
If NoSQL is the way to go, how should I store this data?
a) Should I have ID - Data (containing Labels)?
b) ID - Data (containing IDs of Labels)?
Question #3:
If NoSQL solution here is the best way, which NoSQL database should I choose for this use case?
Thank you.
There's no real need for an ID column in this GroupLabels table:
CREATE TABLE GroupLabels (
GroupID int not null,
LabelID int not null,
constraint PK_GroupLabels PRIMARY KEY (GroupID,LabelID),
constraint FK_GroupLabels_Groups FOREIGN KEY (GroupID) references Groups,
constraint FK_GroupLabels_Labels FOREIGN KEY (LabelID) references Labels
)
By doing the above, we've automatically achieved a constraint - that the same label can't be added to the same group more than once.
With the above, I'd say it's a reasonably common SQL solution.
There is too little information here to make recommendations on the question of "to SQL or not to SQL".
However, the relational approach would be as you describe, I think.
CREATE TABLE Group
(
GroupId int PRIMARY KEY
)
CREATE TABLE GroupLabel
(
GroupId int FOREIGN KEY REFERENCES Group,
LabelId int FOREIGN KEY REFERENCES Label,
UNIQUE (GroupId, LabelId)
)
CREATE TABLE Label
(
LabelId int PRIMARY KEY,
Value varchar(100) UNIQUE
)
Here, every label is unique, Many labels may be in each group and each label may be in many groups but each label can only be in each group once.
As #Damien_The_Unbeliever indicates, the Group table can be omitted if you don't need to store any additional attributes about each group by making the GroupId column on the GroupLabels table solely unique.
You might need to change the syntax slightly for whatever RDBMS you're using.

DB associative entities and indexing

This is kind of a general DB design question. If one has an associative entity table, i.e. a cross-reference, containing records that basically just consist of two FK references, should it be indexed in some way? Is it necessary to explicitly index that table, since the PKs in the associated tables are already indexed by definition? If one should index it, should it be a combination index, consisting of the two FK fields together?
Indexes on the referenced pk columns in the other tables do not cover it.
By defining the two fk columns as composite primary key of the "associative entity" table (as you should in most cases - provided that associations are unique), you implicitly create a multi-column index.
That covers all queries involving both or the first columns optimally.
It also covers queries on the second column, but in a less effective way.
If you have important queries involving just the second column, create an additional index on that one, too.
Read all the details about the topic at this related question on dba.SE.
Or this question on SO, also covering this topic.
Suppose your associative table has a schema such as:
CREATE TABLE Association
(
ReferenceA INTEGER NOT NULL REFERENCES TableA CONSTRAINT FK1_Association,
ReferenceB INTEGER NOT NULL REFERENCES TableB CONSTRAINT FK2_Association,
PRIMARY KEY(ReferenceA, ReferenceB) CONSTRAINT PK_Association
);
The chances are that your DBMS will automatically create some indexes.
Some DBMS will create an index for each of the two foreign keys and also a unique index for the primary key. This is slightly wasteful since the PK index could be used for accessing ReferenceA too.
Ideally, there will be just two indexes: the PK (unique) index and the (duplicates allowed) FK index for ReferenceB, assuming that the PK index has ReferenceA as the first column.
If a DBMS does not automatically create indexes to enforce the referential integrity constraints, you'll want to create the RI or FK duplicates-allowed index. If it doesn't automatically create an index to enforce the PK constraint, you'll want to create that unique index too. The upside is that you'll only create the indexes for the ideal case.
Depending on your DBMS, you might find it more effective to create the table without the constraints, then to add the indexes, and then to add the constraints (which will then use the indexes you created). Things like fragmentation schemes can also factor into this; I ignored them above.
The concept remains simple — you want two indexes in total, one to enforce uniqueness on both columns and provide fast access on the leading column, and a non-unique or duplicates-allowed index on the trailing column.

Can I create an index on User-defined Table variables?

Just wanted to check, if we will be able to create indexes on User-defined Table variables. I know that we can create PK on an UDT. Does it imply that PK creates an (clustered) index internally? If an index is possible on a column on UDT, where does the indexed data get stored?
To define an index on a table variable use a primary key or unique constraint. You can nominate one as clustered.
If you need an index on a non-unique field, simply add the unique key to the end of the index column list, to make it unique.
If the table variable has not got a unique field, add a dummy unique field using an identity column.
Something like this:
declare #t table (
dummy identity primary key nonclustered,
val1 nvarchar(50),
val2 nvarchar(50),
unique clustered (val1, dummy)
)
Now you have a table variable with a clustered index on non-unique field val1.
With table variables, you can define primary key and unique constraints, but you are unable to define
any clustering behaviour. The indexes for these are stored alongside the actual data in the table variable - hopefully in memory within tempdb, but if necessary, spilled to disk, if memory pressure is high.
You're unable to define arbitrary indexes on such tables.
You can however define whatever indexes you want on temp tables.