One-way multi column UNIQUE constraint - postgresql

I have a system in which I am trying to describe event-based interactions between two targets. In our system, an event (interaction) has a "source" and a "target" (basically who [did what] to whom):
-- tried to remove some of the "noise" from this for the sake of the post:
CREATE TABLE interaction_relationship (
id integer CONSTRAINT interaction_pk PRIMARY KEY,
source_id integer CONSTRAINT source_fk NOT NULL REFERENCES entity(id),
target_id integer CONSTRAINT target_fk NOT NULL REFERENCES entity(id),
-- CONSTRAINT(s)
CREATE CONSTRAINT interaction_relationship_deduplication UNIQUE(source_id, target_id)
);
Constraint interaction_relationship_deduplication is the source of my question:
In our system, a single source can interact with a single target multiple times, but that relationship can only exist once, i.e. if I am a mechanic working on a car, I may see that car multiple times in my shop, but I have only one relationship with that single car:
id
source_id
target_id
a
123abc
456def
b
123abc
789ghi
Ideally, this table also represents a unidirectional relationship. source_id is always the "owner" of the interaction, i.e. if the car 456def ran over the mechanic 123abc there would be another entry in the interaction_relationship table:
id
source_id
target_id
1
123abc
456def
2
123abc
789ghi
3
456def
123abc
So, my question: does UNIQUE on multiple columns take value order into consideration? Or would the above cause a failure?

does UNIQUE on multiple columns take value order into consideration?
Yes. The tuple (123abc, 456def) is different from the tuple (456def, 123abc), they may both exist in the table at the same time.
That said, you might want to remove the surrogate id from the relationship table, there's hardly any use for it. A relation table (as opposed to an entity table, and even there) would do totally fine with a multi-column primary key, which is naturally the combination of source and target.

Related

PostgreSQL database: Get rid of redundant transitive relation (maybe 3NF is failed)

I'm creating a hybrid between "X-Com Enemy Unknown" and "The Sims". I maintain game state in a database--PostgreSQL--but my question is structural, not engine-specific.
As in X-Com, there are some bases in different locations, so I create a table named Base with ID autoincrement identity as primary key.
Every base has some facilities in its territory, so I create a table named Facility with a foreign key Facility.Base_ID, referring to Base.ID.
Every base has some landing crafts in its hangars, so I create a table named Craft with a foreign key Craft.Base_ID, referring to Base.ID.
Every base has some troopers in its barracks, so I create a table named Trooper with a foreign key Trooper.Base_ID, referring to Base.ID.
Just to this point, everything seems to be ok, doesn't it? However...
I want to have some sort of staff instruction. Like in the X-Com game, every trooper can be assigned to some craft for offense action, or can be unassigned. In addition, every trooper can be assigned to some facility (or can be unassigned) for defense action. So, I have to add nullable foreign keys Trooper.Craft_ID and Trooper.Facility_ID, referring to Craft.ID and Facility.ID respectively.
That database has a redundancy. If some trooper is assigned to a craft or to a facility (or both), it has two (or even three) relations to the base--one direct relation through its Base_ID and some indirect relations as Facility(Trooper.Facility_ID).Base_ID and Craft(Trooper.Craft_ID).Base_ID. Even if I get rid of Trooper.Base_ID (e.g. I can make both assignment mandatory and create a mock craft and a mock facility in every base), I can't get rid of both trooper-facility-base and trooper-craft-base relations.
In addition to this redundancy, there is a worse problem--in case of a mistake, some trooper can be assigned to a craft from one base and to a facility from another base, that's a really nasty situation. I can prohibit it in the application business logic tier, but it's still allowed by the database.
There can be some constraints to apply, but is there any structural modification to the schema that can get rid of the redundancy and potential inconsistency as a result of a good structure, not as a result of constraints?
CREATE TABLE base (
id int PRIMARY KEY
);
CREATE TABLE facility (
id int PRIMARY KEY,
base_id int REFERENCES base
);
CREATE TABLE craft (
id int PRIMARY KEY,
base_id int REFERENCES base
);
CREATE TABLE trooper (
id int PRIMARY KEY,
assigned_facility_id int REFERENCES facility,
assigned_craft_id int REFERENCES craft,
base_id int REFERENCES base
);
Now I want to get some sort of constraints on a trooper t so that
facilities.get(t.assigned_facility_id).base_id IS NULL OR EQUAL TO t.base_id
crafts.get(t.assigned_craft_id).base_id IS NULL OR EQUAL TO t.base_id
This hypothetical constraint has to be applied to table trooper, because it applies in boundaries of each trooper row separately. Constraints on one table have to check equality between fields of two other tables.
I would like to create a database schema where there is exactly one way, having a trooper.id, to find its referenced base.id. How do I normalise my schema?

Would this PostgresQL model work for long-term use and security?

I'm making a real-time chat app and was stuck figuring out how the DB model should look like. I've made this diagram, but would this work? My issue is more to do with foreign keys.
I know this is a very vague question. But have been struggling with this model for a while now. This is the first database I'm setting up so it's probably got a load of errors.
Actually you are fairly close, but over complicated it a bit. At the conceptual/logical model you have just 2 entities. Users and Messages
with a many-to-many relationship. At the physical level the Channels table resolves the M:M into the 2 one_to_many you have described. But the
viewing this way ravels a couple issues. The attribute user is not required in the Messages table and if physically implemented requires a not easily done validation
that the user there exists in the Channels table. Further everything that Message:User relationship provides is a available
via Users:Channels:Messages relationship. A similar argument applies to Channels column in Users - completely resolved by the resolution table. Suggestion: drop user from message table and channels from users.
Now lets look at the columns of Channels. It looks like you using a boiler plate for created_at and updated_at, but are they necessary?
Well at least for updated_at No. What can be updated? If either User or Message is updated you have a brand new entry. Yes it may seem like the same physical row (actually it is not)
but the meaning is completely different. Well how about last massage? What is it trying to indicate that the max value created at for the user does not give you?
I cannot see anything. I guess you could change the created at but what is the point of tracking when I changed that column. Suggestion: drop last message sent and updated at (unless required by Institution standards) from message table.
That leaves the Users table itself. Besides Channels mentioned above there is the Contacts column. Physically as a array it violates 1NF and becomes difficult to manage - (as wall as validating that the contact is in fact a user)
Logically it is creating a M:M on USER:USER. So resolve it the same way as User:Messages, pull it out into another table, say User_Contacts with 2 attributes to the Users table. Suggestion drop contacts for the users table and create a resolution table.
Unfortunately, I do not have a good ERD diagrammer, so I just provide DDL.
create table users (
user_id integer generated always as identity primary key
, name text
, phone_number text
, last_login timestamptz
, created_at timestamptz
, updated_at timestamptz
) ;
create type message_type as enum ('short', 'long'); -- list all values
create table messages(
msg_id integer generated always as identity primary key
, msg_type message_type
, message text
, created_at timestamptz
, updated_at timestamptz
);
create table channels( -- resolves M:M Users:Messages
user_id integer
, msg_id integer
, created_at timestamptz
, constraint channels_pk
primary key (user_id, msg_id)
, constraint channels_2_users_fk
foreign key (user_id)
references users(user_id)
, constraint channels_2_messages_fk
foreign key (msg_id)
references messages(msg_id )
);
create table user_contacts( -- resolves M:M Users:Users
user_id integer
, contact_id integer
, created_at timestamptz
, constraint user_contacts_pk
primary key (user_id, contact_id)
, constraint user_2_users_fk
foreign key (user_id)
references users(user_id)
, constraint contact_2_user_fk
foreign key (user_id)
references users(user_id)
, constraint contact_not_me_check check (user_id <> contact_id)
);
Notes:
Do not use text as PK, use either integer (bigint) or UUID, and generate them during insert.
Caution on ENUM. In Postgres you can add new values, but you cannot remove a value. Depending upon number of values and how often the change consider creating a lookup/reference table for them.
Do not use the data type TIME. It is really not that useful without the date. Simple example I login today at 15:00, you login tomorrow at 13:00. Now, from the database itself, which of us logged in first.

understanding an inheritance in Postgres; why key "fails" in insert/update command

(One image, tousands of words)
I'd made few tables that are inherited between themselves. (persons)
And then assign child table (address), and relate it only to "base" table (person).
When try to insert in child table, and record is related to inherited table, insert statement fail because there is no key in master table.
And as I insert records in descendant tables, records are salo available in base table (so, IMHO, should be visible/accessible in inherited tables).
Please take a look on attached image. Obviously do someting wrong or didn't get some point....
Thank You in advanced!
Sorry, that's how Postgres table inheritance works. 5.10.1 Caveats explains.
A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children. This is true on both the referencing and referenced sides of a foreign key constraint. Thus, in the terms of the above example:
Specifying that another table's column REFERENCES cities(name) would allow the other table to contain city names, but not capital names. There is no good workaround for this case.
In their example, capitals inherits from cities as organization_employees inherits from person. If person_address REFERENCES person(idt_person) it will not see entries in organization_employees.
Inheritance is not as useful as it seems, and it's not a way to avoid joins. This can be better done with a join table with some extra columns. It's unclear why an organization would inherit from a person.
person
id bigserial primary key
name text not null
verified boolean not null default false
vat_nr text
foto bytea
# An organization is not a person
organization
id bigserial not null
name text not null
# Joins a person with an organization
# Stores information about that relationship
organization_employee
person_id bigint not null references person(id)
organization_id bigint not null references organization(id)
usr text
pwd text
# Get each employee, their name, and their org's name.
select
person.name
organization.name
from
organization_employee
join person on person_id = person.id
join organization on organization_id = organization.id
Use bigserial (bigint) for primary keys, 2 billion comes faster than you think
Don't enshrine arbitrary business rules in the schema, like how long a name can be. You're not saving any space by limiting it, and every time the business rule changes you have to alter your schema. Use the text type. Enforce arbitrary limits in the application or as constraints.
idt_table_name primary keys makes for long, inconsistent column names hard to guess. Why is the primary key of person_address not idt_person_address? Why is the primary key of organization_employee idt_person? You can't tell, at a glance, which is the primary key and which is a foreign key. You still need to prepend the column name to disambiguate; for example, if you join person with person_address you need person.idt_person and person_address.idt_person. Confusing and redundant. id (or idt if you prefer) makes it obvious what the primary key is and clearly differentiates it from table_id (or idt_table) foreign keys. SQL already has the means to resolve ambiguities: person.id.

Foreign key from events table 1-1 0r many?

I'm likely overthinking a problem here and may well get downvoted but I'm prepared to take the hit. I'm building my first schema in a data warehouse.
2 tables: events and contacts:
events(id(pk), cid, other, fields, here)
contacts(id (pk), cid(fk), other, fields, here)
Someone visits our website and registers. A line item is generated in events column "id" and a "cid" for contacts is generated. A new record is added to contacts.
I have two questions:
Can I make the primary key of contacts cid? Thus the primary key is also a foreign key?
I'm using MySQL Workbench to create the schema. When I create the contacts table I am able to set the foreign key of cid and the cardinality as either 1-1 or 1-many. From the point of view of contacts table, is the relationship 1-1 or to many? There will only ever be 1 cid record in contacts but if that user does multiple things (like receive an email from us etc) they will appear multiple times in events table. So, logically 1-many. But when creating this in Workbench the relation line appears as though it's a 1-many relation with the many part being at contacts, not the other way around as desired. It should be the other way around?
What is the relationship between events.cid and contacts.cid?
If a user's registration results in a single contact_ record while each user visit to the web site (each Session started) results in an event_ record belonging to that user’s contact_ record, then you have a One-To-Many relationship.
`contact_` = parent table (the One)
`event_` = child table (the Many)
Notice how I boiled down that relationship into a single sentence. That should be your goal when doing analysis work to determine table structure.
Relationships are almost always defined as a link from a primary key on parent table to a foreign key on a child table.
How you define the primary key is up to you. First decide whether you want a natural key or a surrogate key. In my experience a natural key never works out as the values always eventually change.
If using a surrogate key, decide what type. The usual choices are an integer tied to an automatically incrementing sequence generator, or a UUID. If ever federating data with other databases or systems then UUID is the way to go. If using an integer, decide on size, with 32-bit integers handling a total of 2-4 billion records. A 64-bit integer can track 18 quintillion records.
The foreign key in child table is simply a copy of its assigned parent’s primary key value. So the foreign key must have same data type as that parent primary key.
If a particular parent record owns multiple records in the child table, each of those child records will carry a copy of that parent’s primary key. So if the user Susan has five events, her primary key value appears once in the contact_ table and that same value appears five times in the event_ table stored in the foreign key column.
If cid uniquely identifies each contact_ record amongst all the other contact_ records, then you have a primary key. No need to define another.

Is this kind of DB relation design favourable and correct? Should it be converted to a no-sql solution?

First of all, I did my research but being rather a newbie, I am not that well acquainted with words so might have failed in founding the correct ones. I beg your pardon in case of a possible duplicate.
Question #1:
I have a table consisting of ID [PK] and LABEL [Varchar 128]. Each record (row) here is unique. What I want is, to define relations between these LABELS.
Requisite:
There will be an n amount of groups, each group containing one or more of these LABELS. In each group, each LABEL can either exist or not exist (meaning a group does not have 2x of same LABEL).
How should I define this relation?
I thought of creating another table with ID [PK] - Group ID [randomly assigned unique key] - LABEL_ID [ID of Labels table pointing to a single Label]
Is this correct and favourable? If a group has 10 LABELS then there will be 10 records with unique ID, same uniquely assigned Group ID and LABEL_ID pointing to LABELS table.
Question #2:
Should I let go of the Relational solution (as described above) and opt for a NoSQL solution? Where Each group is stored on it's own as a single entry into the database with an ID [PK] - Data [Containing either labels or IDs of labels pointing to the Label table]?
If NoSQL is the way to go, how should I store this data?
a) Should I have ID - Data (containing Labels)?
b) ID - Data (containing IDs of Labels)?
Question #3:
If NoSQL solution here is the best way, which NoSQL database should I choose for this use case?
Thank you.
There's no real need for an ID column in this GroupLabels table:
CREATE TABLE GroupLabels (
GroupID int not null,
LabelID int not null,
constraint PK_GroupLabels PRIMARY KEY (GroupID,LabelID),
constraint FK_GroupLabels_Groups FOREIGN KEY (GroupID) references Groups,
constraint FK_GroupLabels_Labels FOREIGN KEY (LabelID) references Labels
)
By doing the above, we've automatically achieved a constraint - that the same label can't be added to the same group more than once.
With the above, I'd say it's a reasonably common SQL solution.
There is too little information here to make recommendations on the question of "to SQL or not to SQL".
However, the relational approach would be as you describe, I think.
CREATE TABLE Group
(
GroupId int PRIMARY KEY
)
CREATE TABLE GroupLabel
(
GroupId int FOREIGN KEY REFERENCES Group,
LabelId int FOREIGN KEY REFERENCES Label,
UNIQUE (GroupId, LabelId)
)
CREATE TABLE Label
(
LabelId int PRIMARY KEY,
Value varchar(100) UNIQUE
)
Here, every label is unique, Many labels may be in each group and each label may be in many groups but each label can only be in each group once.
As #Damien_The_Unbeliever indicates, the Group table can be omitted if you don't need to store any additional attributes about each group by making the GroupId column on the GroupLabels table solely unique.
You might need to change the syntax slightly for whatever RDBMS you're using.