Is this kind of DB relation design favourable and correct? Should it be converted to a no-sql solution? - nosql

First of all, I did my research but being rather a newbie, I am not that well acquainted with words so might have failed in founding the correct ones. I beg your pardon in case of a possible duplicate.
Question #1:
I have a table consisting of ID [PK] and LABEL [Varchar 128]. Each record (row) here is unique. What I want is, to define relations between these LABELS.
Requisite:
There will be an n amount of groups, each group containing one or more of these LABELS. In each group, each LABEL can either exist or not exist (meaning a group does not have 2x of same LABEL).
How should I define this relation?
I thought of creating another table with ID [PK] - Group ID [randomly assigned unique key] - LABEL_ID [ID of Labels table pointing to a single Label]
Is this correct and favourable? If a group has 10 LABELS then there will be 10 records with unique ID, same uniquely assigned Group ID and LABEL_ID pointing to LABELS table.
Question #2:
Should I let go of the Relational solution (as described above) and opt for a NoSQL solution? Where Each group is stored on it's own as a single entry into the database with an ID [PK] - Data [Containing either labels or IDs of labels pointing to the Label table]?
If NoSQL is the way to go, how should I store this data?
a) Should I have ID - Data (containing Labels)?
b) ID - Data (containing IDs of Labels)?
Question #3:
If NoSQL solution here is the best way, which NoSQL database should I choose for this use case?
Thank you.

There's no real need for an ID column in this GroupLabels table:
CREATE TABLE GroupLabels (
GroupID int not null,
LabelID int not null,
constraint PK_GroupLabels PRIMARY KEY (GroupID,LabelID),
constraint FK_GroupLabels_Groups FOREIGN KEY (GroupID) references Groups,
constraint FK_GroupLabels_Labels FOREIGN KEY (LabelID) references Labels
)
By doing the above, we've automatically achieved a constraint - that the same label can't be added to the same group more than once.
With the above, I'd say it's a reasonably common SQL solution.

There is too little information here to make recommendations on the question of "to SQL or not to SQL".
However, the relational approach would be as you describe, I think.
CREATE TABLE Group
(
GroupId int PRIMARY KEY
)
CREATE TABLE GroupLabel
(
GroupId int FOREIGN KEY REFERENCES Group,
LabelId int FOREIGN KEY REFERENCES Label,
UNIQUE (GroupId, LabelId)
)
CREATE TABLE Label
(
LabelId int PRIMARY KEY,
Value varchar(100) UNIQUE
)
Here, every label is unique, Many labels may be in each group and each label may be in many groups but each label can only be in each group once.
As #Damien_The_Unbeliever indicates, the Group table can be omitted if you don't need to store any additional attributes about each group by making the GroupId column on the GroupLabels table solely unique.
You might need to change the syntax slightly for whatever RDBMS you're using.

Related

PostgreSQL database: Get rid of redundant transitive relation (maybe 3NF is failed)

I'm creating a hybrid between "X-Com Enemy Unknown" and "The Sims". I maintain game state in a database--PostgreSQL--but my question is structural, not engine-specific.
As in X-Com, there are some bases in different locations, so I create a table named Base with ID autoincrement identity as primary key.
Every base has some facilities in its territory, so I create a table named Facility with a foreign key Facility.Base_ID, referring to Base.ID.
Every base has some landing crafts in its hangars, so I create a table named Craft with a foreign key Craft.Base_ID, referring to Base.ID.
Every base has some troopers in its barracks, so I create a table named Trooper with a foreign key Trooper.Base_ID, referring to Base.ID.
Just to this point, everything seems to be ok, doesn't it? However...
I want to have some sort of staff instruction. Like in the X-Com game, every trooper can be assigned to some craft for offense action, or can be unassigned. In addition, every trooper can be assigned to some facility (or can be unassigned) for defense action. So, I have to add nullable foreign keys Trooper.Craft_ID and Trooper.Facility_ID, referring to Craft.ID and Facility.ID respectively.
That database has a redundancy. If some trooper is assigned to a craft or to a facility (or both), it has two (or even three) relations to the base--one direct relation through its Base_ID and some indirect relations as Facility(Trooper.Facility_ID).Base_ID and Craft(Trooper.Craft_ID).Base_ID. Even if I get rid of Trooper.Base_ID (e.g. I can make both assignment mandatory and create a mock craft and a mock facility in every base), I can't get rid of both trooper-facility-base and trooper-craft-base relations.
In addition to this redundancy, there is a worse problem--in case of a mistake, some trooper can be assigned to a craft from one base and to a facility from another base, that's a really nasty situation. I can prohibit it in the application business logic tier, but it's still allowed by the database.
There can be some constraints to apply, but is there any structural modification to the schema that can get rid of the redundancy and potential inconsistency as a result of a good structure, not as a result of constraints?
CREATE TABLE base (
id int PRIMARY KEY
);
CREATE TABLE facility (
id int PRIMARY KEY,
base_id int REFERENCES base
);
CREATE TABLE craft (
id int PRIMARY KEY,
base_id int REFERENCES base
);
CREATE TABLE trooper (
id int PRIMARY KEY,
assigned_facility_id int REFERENCES facility,
assigned_craft_id int REFERENCES craft,
base_id int REFERENCES base
);
Now I want to get some sort of constraints on a trooper t so that
facilities.get(t.assigned_facility_id).base_id IS NULL OR EQUAL TO t.base_id
crafts.get(t.assigned_craft_id).base_id IS NULL OR EQUAL TO t.base_id
This hypothetical constraint has to be applied to table trooper, because it applies in boundaries of each trooper row separately. Constraints on one table have to check equality between fields of two other tables.
I would like to create a database schema where there is exactly one way, having a trooper.id, to find its referenced base.id. How do I normalise my schema?

understanding an inheritance in Postgres; why key "fails" in insert/update command

(One image, tousands of words)
I'd made few tables that are inherited between themselves. (persons)
And then assign child table (address), and relate it only to "base" table (person).
When try to insert in child table, and record is related to inherited table, insert statement fail because there is no key in master table.
And as I insert records in descendant tables, records are salo available in base table (so, IMHO, should be visible/accessible in inherited tables).
Please take a look on attached image. Obviously do someting wrong or didn't get some point....
Thank You in advanced!
Sorry, that's how Postgres table inheritance works. 5.10.1 Caveats explains.
A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children. This is true on both the referencing and referenced sides of a foreign key constraint. Thus, in the terms of the above example:
Specifying that another table's column REFERENCES cities(name) would allow the other table to contain city names, but not capital names. There is no good workaround for this case.
In their example, capitals inherits from cities as organization_employees inherits from person. If person_address REFERENCES person(idt_person) it will not see entries in organization_employees.
Inheritance is not as useful as it seems, and it's not a way to avoid joins. This can be better done with a join table with some extra columns. It's unclear why an organization would inherit from a person.
person
id bigserial primary key
name text not null
verified boolean not null default false
vat_nr text
foto bytea
# An organization is not a person
organization
id bigserial not null
name text not null
# Joins a person with an organization
# Stores information about that relationship
organization_employee
person_id bigint not null references person(id)
organization_id bigint not null references organization(id)
usr text
pwd text
# Get each employee, their name, and their org's name.
select
person.name
organization.name
from
organization_employee
join person on person_id = person.id
join organization on organization_id = organization.id
Use bigserial (bigint) for primary keys, 2 billion comes faster than you think
Don't enshrine arbitrary business rules in the schema, like how long a name can be. You're not saving any space by limiting it, and every time the business rule changes you have to alter your schema. Use the text type. Enforce arbitrary limits in the application or as constraints.
idt_table_name primary keys makes for long, inconsistent column names hard to guess. Why is the primary key of person_address not idt_person_address? Why is the primary key of organization_employee idt_person? You can't tell, at a glance, which is the primary key and which is a foreign key. You still need to prepend the column name to disambiguate; for example, if you join person with person_address you need person.idt_person and person_address.idt_person. Confusing and redundant. id (or idt if you prefer) makes it obvious what the primary key is and clearly differentiates it from table_id (or idt_table) foreign keys. SQL already has the means to resolve ambiguities: person.id.

Generate column value automatically from other columns values and be used as PRIMARY KEY

I have a table with a column named "source" and "id". This table is populated from open data DB.
"id" can't be UNIQUE, since my data came from other db with their own id system. There is a real risk to have same id but really different data.
I want to create another column which combine source and id into a single value.
"openDataA" + 123456789 -> "openDataA123456789"
"openDataB" + 123456789 -> "openDataB123456789"
I have seen example that use || and function to concatenate value. This is good, but I want to make this third column my PRIMARY KEY, to avoid duplicate, and create a really unique id that I can query without much computation and that I can use as a foreign key constraint for other table.
I think Composite Types is what I'm looking for, but instead of setting the value manually each time, I want to grab them automatically by setting only "source" and "id"
I'm fairly new to postgresql, so any help is welcome.
Thank you.
You could just have a composite key in your table:
CREATE TABLE mytable (
source VARCHAR(10),
id VARCHAR(10),
PRIMARY KEY (source, id)
);
If you really want a joined column, you could create a view to display it:
CREATE VIEW myview AS
SELECT *, source || id AS primary_key
FROM mytable;

Composite key with user-supplied string column, foreign keys

Let's say I have the following table
TABLE subgroups (
group_id t_group_id NOT NULL REFERENCES groups(group_id),
subgroup_name t_subgroup_name NOT NULL,
more attributes ...
)
subgroup_name is UNIQUE to a group(group_id).
A group can have many subgroups.
The subgroup_names are user-supplied. (I would like to avoid using a subgroup_id column. subgroup_name has meaning in the model and is more than just a label, I am providing a list of predetermined names but allow a user to add his owns for flexibility).
This table has 2 levels of referencing child tables containing subgroup attributes (with many-to-one relations);
I would like to have a PRIMARY KEY on (group_id, upper(trim(subgroup_name)));
From what I know, postgres doesn't allow to use PRIMARY KEY/UNIQUE on a function.
IIRC, the relational model also requires columns to be used as stored.
CREATE UNIQUE INDEX ON subgroups (group_id, upper(trim(subgroup_name))); doesn't solve my problem
as other tables in my model will have FOREIGN KEYs pointing to those two columns.
I see two options.
Option A)
Store a cleaned up subgroup name in subgroup_name
Add an extra column called subgroup_name_raw that would contained the uncleaned string
Option B)
Create both a UNIQUE INDEX and PRIMARY KEY on my key pair. (seems like a huge waste)
Any insights?
Note: I'm using Postgres 9.2
Actually you can do a UNIQUE constraint on the output of a function. You can't do it in the table definition though. What you need to do is create a unique index after. So something like:
CREATE UNIQUE INDEX subgroups_ukey2 ON subgroups(group_id, upper(trim(subgroup_name)));
PostgreSQL has a number of absolutely amazing indexing capabilities, and the ability to create unique (and partial unique) indexes on function output is quite underrated.

Primary key defined by many attributes?

Can I define a primary key according to three attributes? I am using Visual Paradigm and Postgres.
CREATE TABLE answers (
time SERIAL NOT NULL,
"{Users}{userID}user_id" int4 NOT NULL,
"{Users}{userID}question_id" int4 NOT NULL,
reply varchar(255),
PRIMARY KEY (time, "{Users}{userID}user_id", "{Users}{userID}question_id"));
A picture may clarify the question.
Yes you can, just as you showed.(though I question your naming of the 2. and 3. column.)
From the docs:
"Primary keys can also constrain more than one column; the syntax is similar to unique constraints:
CREATE TABLE example (
a integer,
b integer,
c integer,
PRIMARY KEY (a, c)
);
A primary key indicates that a column or group of columns can be used as a unique identifier for rows in the table. (This is a direct consequence of the definition of a primary key. Note that a unique constraint does not, by itself, provide a unique identifier because it does not exclude null values.) This is useful both for documentation purposes and for client applications. For example, a GUI application that allows modifying row values probably needs to know the primary key of a table to be able to identify rows uniquely.
A table can have at most one primary key (while it can have many unique and not-null constraints). Relational database theory dictates that every table must have a primary key. This rule is not enforced by PostgreSQL, but it is usually best to follow it.
"
Yes, you can. There is just such an example in the documentation.. However, I'm not familiar with the bracketed terms you're using. Are you doing some variable evaluation before creating the database schema?
yes you can
if you'd run it - you would see it in no time.
i would really, really, really suggest to rethink naming convention. time column that contains serial integer? column names like "{Users}{userID}user_id"? oh my.