I building my first PostgreSQL database. It covers where people lived and worked over several decades (1890 to 1930). I have people, address, and restaurant name tables. The people moved around, both their residences and places of work.
How do I establish the link to say from a person to the address for certain years? In other words there might be from 1 to ~20 years (some people stayed put), but I'll want to query for each year (actually it's going to become a map).
I understand that if it were only once, it would be a foreign key.
I'm also going to be linking restaurant names to various addresses. In some cases I only have the names of the owners, so I'll have a link by year and whether they were employees or owners. I'll tackle that one next. Maybe with the first question answered I'll see my way to this one.
Thanks for any help.
You need a m:n relationship qualified by date. That's typically implemented with a "join table".
e.g. given:
create table person (
person_id integer primary key,
...
);
create table address (
address_id integer primary key,
...
);
you might write a table like:
create table residence_period (
person_id integer references person(person_id),
address_id integer references address(address_id),
moved_in date not null,
moved_out date,
constraint moved_in_before_moved_out
check (moved_out is null or moved_in < moved_out)
);
You could use a daterange type instead of two fields if you preferred.
If you want to assert that someone didn't have multiple overlapping residences you can use an exclusion constraint.
Related
I'm making a real-time chat app and was stuck figuring out how the DB model should look like. I've made this diagram, but would this work? My issue is more to do with foreign keys.
I know this is a very vague question. But have been struggling with this model for a while now. This is the first database I'm setting up so it's probably got a load of errors.
Actually you are fairly close, but over complicated it a bit. At the conceptual/logical model you have just 2 entities. Users and Messages
with a many-to-many relationship. At the physical level the Channels table resolves the M:M into the 2 one_to_many you have described. But the
viewing this way ravels a couple issues. The attribute user is not required in the Messages table and if physically implemented requires a not easily done validation
that the user there exists in the Channels table. Further everything that Message:User relationship provides is a available
via Users:Channels:Messages relationship. A similar argument applies to Channels column in Users - completely resolved by the resolution table. Suggestion: drop user from message table and channels from users.
Now lets look at the columns of Channels. It looks like you using a boiler plate for created_at and updated_at, but are they necessary?
Well at least for updated_at No. What can be updated? If either User or Message is updated you have a brand new entry. Yes it may seem like the same physical row (actually it is not)
but the meaning is completely different. Well how about last massage? What is it trying to indicate that the max value created at for the user does not give you?
I cannot see anything. I guess you could change the created at but what is the point of tracking when I changed that column. Suggestion: drop last message sent and updated at (unless required by Institution standards) from message table.
That leaves the Users table itself. Besides Channels mentioned above there is the Contacts column. Physically as a array it violates 1NF and becomes difficult to manage - (as wall as validating that the contact is in fact a user)
Logically it is creating a M:M on USER:USER. So resolve it the same way as User:Messages, pull it out into another table, say User_Contacts with 2 attributes to the Users table. Suggestion drop contacts for the users table and create a resolution table.
Unfortunately, I do not have a good ERD diagrammer, so I just provide DDL.
create table users (
user_id integer generated always as identity primary key
, name text
, phone_number text
, last_login timestamptz
, created_at timestamptz
, updated_at timestamptz
) ;
create type message_type as enum ('short', 'long'); -- list all values
create table messages(
msg_id integer generated always as identity primary key
, msg_type message_type
, message text
, created_at timestamptz
, updated_at timestamptz
);
create table channels( -- resolves M:M Users:Messages
user_id integer
, msg_id integer
, created_at timestamptz
, constraint channels_pk
primary key (user_id, msg_id)
, constraint channels_2_users_fk
foreign key (user_id)
references users(user_id)
, constraint channels_2_messages_fk
foreign key (msg_id)
references messages(msg_id )
);
create table user_contacts( -- resolves M:M Users:Users
user_id integer
, contact_id integer
, created_at timestamptz
, constraint user_contacts_pk
primary key (user_id, contact_id)
, constraint user_2_users_fk
foreign key (user_id)
references users(user_id)
, constraint contact_2_user_fk
foreign key (user_id)
references users(user_id)
, constraint contact_not_me_check check (user_id <> contact_id)
);
Notes:
Do not use text as PK, use either integer (bigint) or UUID, and generate them during insert.
Caution on ENUM. In Postgres you can add new values, but you cannot remove a value. Depending upon number of values and how often the change consider creating a lookup/reference table for them.
Do not use the data type TIME. It is really not that useful without the date. Simple example I login today at 15:00, you login tomorrow at 13:00. Now, from the database itself, which of us logged in first.
If i have three type of users. Let's say seller, consumers, and sales persons. Should i make individual table for there details like name, email passwords and all other credentials etc with a role_type table or separate table for each of them. Which is the best approach for a large project considering all engineering principles for DBMS like normalization etc.
Also tell me Does it effect the performance of the app if i have lots of joins in tables to perform certain operations?
If the only thing that distinguishes those people is the role but all details are the same, then I would definitely go for a single table.
The question is however, can a single person have more than one role? If that is never the case, then add a role_type column to the person table. Depending on how fixed those roles are maybe use a lookup table and a foreign key, e.g.:
create table role_type
(
id integer primary key,
name varchar(20) not null unique
);
create table person
(
id integer primary key,
.... other attributes ...,
role_id integer not null references role_type
);
However, in my experience the restriction to exactly one role per person usually doesn't hold, so you would need a many-to-many relation ship
create table role_type
(
id integer primary key,
name varchar(20) not null unique
);
create table person
(
id integer primary key,
.... other attributes ...,
);
create table person_role
(
person_id integer not null references person,
role_id integer not null references role_type,
primary key (person_id, role_id)
);
It sounds like this is a case of trying to model inheritance in your relational database. Complex topic, discussed here and here.
It sounds like your "seller, consumer, sales person" will need lots of different attributes and relationships. A seller typically belongs to a department, has targets, is linked to sales. A consumer has purchase history, maybe a credit limit, etc.
If that's the case,I'd suggest "class table inheritance" might be the right solution.
That might look something like this.
create table user_account
(id int not null,
username varchar not null,
password varchar not null
....);
create table buyer
(id int not null,
user_account_id int not null(fk),
credit_limit float not null,
....);
create table seller
(id int not null,
user_account_id int not null(fk),
sales_target float,
....);
To answer your other question - relational databases are optimized for joining tables. Decades of research and development have gone into this area, and a well-designed database (with indexes on the columns you're joining on) will show no noticeable performance impact due to joins. From practical experience, queries with hundreds of millions of records and ten or more joins run very fast on modern hardware.
I will explain the problem with an example:
I am designing a specific case of referential integrity in a table. In the model there are two tables, enterprise and document. We register the companies and then someone insert the documents associated with it. The name of the enterprise is variable. When it comes to recovering the documents, I need the name of the enterprise to be the same as it was when it was registered and not the value it currently has. The solution that I thought was to register the company again in each change with the same code, the updated name in this way would have the expected result, but I am not sure if it is the best solution. Can someone make a suggestion?
There are several possible solutions and it is hard to determine which one will exactly be the easiest.
Side comment: your question is limited to managing names efficiently but I would like to comment the fact that your DB is sensitive to files being moved, renamed or deleted. Your database will not be able to keep records up-to-date if anything happen at OS level. You should consider to do something about it too.
Amongst the few solution I considered, the one that is best normalized is the schema below:
CREATE TABLE Enterprise
(
IdEnterprise SERIAL PRIMARY KEY
, Code VARCHAR(4) UNIQUE
, IdName INTEGER DEFAULT -1 /* This will be used to get a single active name */
);
CREATE TABLE EnterpriseName (
IDName SERIAL PRIMARY KEY
, IdEnterprise INTEGER NOT NULL REFERENCES Enterprise(IdEnterprise) ON UPDATE NO ACTION ON DELETE CASCADE
, Name TEXT NOT NULL
);
ALTER TABLE Enterprise ADD FOREIGN KEY (IdName) REFERENCES EnterpriseName(IdName) ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED;
CREATE TABLE Document
(
IdDocument SERIAL PRIMARY KEY
, IdName INTEGER NOT NULL REFERENCES EnterpriseName(IDName) ON UPDATE NO ACTION ON DELETE NO ACTION
, FilePath TEXT NOT NULL
, Description TEXT
);
Using flag and/or timestamps or moving the enterprise name to the document table are appealing solutions, but only at first glance.
Especially, the part where you have to ensure a company always has 1, and 1 only "active" name is no easy thing to do.
Add a date range to your enterprise: valid_from, valid_to. Initialise to -infinity,+infinity. When you change the name of an enterprise, instead: update existing rows where valid_to = +infinity to be now() and insert the new name with valid_from = now(), valid_to = +infinity.
Add a date field to the document, something like create_date. Then when joining to enterprise you join on ID and d.create_date between e.valid_from and e.valid_to.
This is a simplistic approach and breaks things like uniqueness for your id and code. To handle that you could record the name in a separate table with the id,from,to,name. Leaving your original table with just the id and code for uniqueness.
I am building a platform with two kinds of users: Users_A create projects with unique virtual coins associated, and Users_B can buy and exchange this coins.
The problem:
Approach 1: if I use one unique table as a virtual wallet, the User_B ID will be the row, and each column will be each coin. In this way, I have to ALter the table each time a new project is created.
Approach 2: I create an electronic wallet (table) for every single User_B.
Which one of the two is worse/better in terms of performance?
Is there any other possible approach?
It's a bit unclear to me what exactly you are trying to model. But any model that requires ALTERing a table because you add new content to the database is flawed.
That sounds like a basic many-to-many relationship to me:
You definitely need a table for the users:
create table users
(
user_id integer primary key,
... other columns ...
);
and one for the different coins:
create table coin
(
coin_id integer primary key,
... other columns ...
);
You need a table for the projects. You said "unique virtual coins associated", so I assume one project deals with exactly one type of coins:
create table project
(
project_id integer primary key,
owner_user_id integer not null references users,
coin_id integer not null references coin
... other columns
);
I am not sure what exactly you mean with "buy and exchange" coins, but you probably need something like a transfer table:
create table coin_transfer
(
from_user_id integer not null references users,
to_user_id integer not null references users,
project_id integer not null references project,
transfer_type text not null check (transfer_type in ('buy', 'exchange'))
amount numeric not null
);
You also mention a "wallet" that belongs to a user. You would never create one table for each wallet, instead a table that contains the information which user owns the wallet. Assuming each user would have one wallet for each coin type you'd need something like this:
create table wallet
(
wallet_id integer primary key,
owner_user_id integer not null references users,
coin_id integer not null references coin,
... other columns ...
);
The above is only a very rough sketch and because there is a lot of information missing from your question.
I have two tables:
Project_Info table that contains the project id (primary key), project name, and the project budget.
Project_Forecast table that contains project name and forecast amount
My question is...should I refer to the primary key in the Project_Info table in the Project_Forecast table? I'm new to SQL so I might be misunderstanding the concept but would doing this essentially refer each forecast amount back to the project via the project ID? If not, what would be better way of leveraging the primary key/foreign keys between these two tables?
should I refer to the primary key in the Project_Info table in the Project_Forecast table?
Yes
so I might be misunderstanding the concept but would doing this essentially refer each forecast amount back to the project via the project ID?
That's correct
Here is a basic schema for the tables you described:
CREATE TABLE project_info (
id serial unique, -- project ID
name text,
budget int
);
CREATE TABLE project_forecast (
id serial unique,
project_id int REFERENCES project_info (id),
forecast_budget int
);
you should build your tables like this,
Project_Info(
PojectID PK,
ProjectName,
ProjectBudget
)
Poject_Forecast(
ForcastID PK,
ProjectID FK,
ForcastAmount
)
The reason you don't want to have project name in the forecast table is that it is a property of a project and it would be redundant (non-normalized) to have it in two tables and cumbersome to manage if the project name changes. But to answer your question having the FK would relate each forecast to a project and would also not allow forecasts be created for projects that don't exist, or projects with forecasts be deleted. In other words when you make foreign keys the database will enforce referential integrity.
also I'm sure some people will mention if I don't you don't absolutely need a ForcastID in the ProjectForecast Table but it is probably a good idea.
I hope this answers your question.