I need the name of the enterprise to be the same as it was when it was registered and not the value it currently has - postgresql

I will explain the problem with an example:
I am designing a specific case of referential integrity in a table. In the model there are two tables, enterprise and document. We register the companies and then someone insert the documents associated with it. The name of the enterprise is variable. When it comes to recovering the documents, I need the name of the enterprise to be the same as it was when it was registered and not the value it currently has. The solution that I thought was to register the company again in each change with the same code, the updated name in this way would have the expected result, but I am not sure if it is the best solution. Can someone make a suggestion?

There are several possible solutions and it is hard to determine which one will exactly be the easiest.
Side comment: your question is limited to managing names efficiently but I would like to comment the fact that your DB is sensitive to files being moved, renamed or deleted. Your database will not be able to keep records up-to-date if anything happen at OS level. You should consider to do something about it too.
Amongst the few solution I considered, the one that is best normalized is the schema below:
CREATE TABLE Enterprise
(
IdEnterprise SERIAL PRIMARY KEY
, Code VARCHAR(4) UNIQUE
, IdName INTEGER DEFAULT -1 /* This will be used to get a single active name */
);
CREATE TABLE EnterpriseName (
IDName SERIAL PRIMARY KEY
, IdEnterprise INTEGER NOT NULL REFERENCES Enterprise(IdEnterprise) ON UPDATE NO ACTION ON DELETE CASCADE
, Name TEXT NOT NULL
);
ALTER TABLE Enterprise ADD FOREIGN KEY (IdName) REFERENCES EnterpriseName(IdName) ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED;
CREATE TABLE Document
(
IdDocument SERIAL PRIMARY KEY
, IdName INTEGER NOT NULL REFERENCES EnterpriseName(IDName) ON UPDATE NO ACTION ON DELETE NO ACTION
, FilePath TEXT NOT NULL
, Description TEXT
);
Using flag and/or timestamps or moving the enterprise name to the document table are appealing solutions, but only at first glance.
Especially, the part where you have to ensure a company always has 1, and 1 only "active" name is no easy thing to do.

Add a date range to your enterprise: valid_from, valid_to. Initialise to -infinity,+infinity. When you change the name of an enterprise, instead: update existing rows where valid_to = +infinity to be now() and insert the new name with valid_from = now(), valid_to = +infinity.
Add a date field to the document, something like create_date. Then when joining to enterprise you join on ID and d.create_date between e.valid_from and e.valid_to.
This is a simplistic approach and breaks things like uniqueness for your id and code. To handle that you could record the name in a separate table with the id,from,to,name. Leaving your original table with just the id and code for uniqueness.

Related

Would this PostgresQL model work for long-term use and security?

I'm making a real-time chat app and was stuck figuring out how the DB model should look like. I've made this diagram, but would this work? My issue is more to do with foreign keys.
I know this is a very vague question. But have been struggling with this model for a while now. This is the first database I'm setting up so it's probably got a load of errors.
Actually you are fairly close, but over complicated it a bit. At the conceptual/logical model you have just 2 entities. Users and Messages
with a many-to-many relationship. At the physical level the Channels table resolves the M:M into the 2 one_to_many you have described. But the
viewing this way ravels a couple issues. The attribute user is not required in the Messages table and if physically implemented requires a not easily done validation
that the user there exists in the Channels table. Further everything that Message:User relationship provides is a available
via Users:Channels:Messages relationship. A similar argument applies to Channels column in Users - completely resolved by the resolution table. Suggestion: drop user from message table and channels from users.
Now lets look at the columns of Channels. It looks like you using a boiler plate for created_at and updated_at, but are they necessary?
Well at least for updated_at No. What can be updated? If either User or Message is updated you have a brand new entry. Yes it may seem like the same physical row (actually it is not)
but the meaning is completely different. Well how about last massage? What is it trying to indicate that the max value created at for the user does not give you?
I cannot see anything. I guess you could change the created at but what is the point of tracking when I changed that column. Suggestion: drop last message sent and updated at (unless required by Institution standards) from message table.
That leaves the Users table itself. Besides Channels mentioned above there is the Contacts column. Physically as a array it violates 1NF and becomes difficult to manage - (as wall as validating that the contact is in fact a user)
Logically it is creating a M:M on USER:USER. So resolve it the same way as User:Messages, pull it out into another table, say User_Contacts with 2 attributes to the Users table. Suggestion drop contacts for the users table and create a resolution table.
Unfortunately, I do not have a good ERD diagrammer, so I just provide DDL.
create table users (
user_id integer generated always as identity primary key
, name text
, phone_number text
, last_login timestamptz
, created_at timestamptz
, updated_at timestamptz
) ;
create type message_type as enum ('short', 'long'); -- list all values
create table messages(
msg_id integer generated always as identity primary key
, msg_type message_type
, message text
, created_at timestamptz
, updated_at timestamptz
);
create table channels( -- resolves M:M Users:Messages
user_id integer
, msg_id integer
, created_at timestamptz
, constraint channels_pk
primary key (user_id, msg_id)
, constraint channels_2_users_fk
foreign key (user_id)
references users(user_id)
, constraint channels_2_messages_fk
foreign key (msg_id)
references messages(msg_id )
);
create table user_contacts( -- resolves M:M Users:Users
user_id integer
, contact_id integer
, created_at timestamptz
, constraint user_contacts_pk
primary key (user_id, contact_id)
, constraint user_2_users_fk
foreign key (user_id)
references users(user_id)
, constraint contact_2_user_fk
foreign key (user_id)
references users(user_id)
, constraint contact_not_me_check check (user_id <> contact_id)
);
Notes:
Do not use text as PK, use either integer (bigint) or UUID, and generate them during insert.
Caution on ENUM. In Postgres you can add new values, but you cannot remove a value. Depending upon number of values and how often the change consider creating a lookup/reference table for them.
Do not use the data type TIME. It is really not that useful without the date. Simple example I login today at 15:00, you login tomorrow at 13:00. Now, from the database itself, which of us logged in first.

Why does Id increases by two instead of one using insert

I have been trying to understand after lots of hours and still cannot understand why it is happening.
I have created two tables with ALTER:
CREATE TABLE stores (
id SERIAL PRIMARY KEY,
store_name TEXT
-- add more fields if needed
);
CREATE TABLE products (
id SERIAL,
store_id INTEGER NOT NULL,
title TEXT,
image TEXT,
url TEXT UNIQUE,
added_date timestamp without time zone NOT NULL DEFAULT NOW(),
PRIMARY KEY(id, store_id)
);
ALTER TABLE products
ADD CONSTRAINT "FK_products_stores" FOREIGN KEY ("store_id")
REFERENCES stores (id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE RESTRICT;
and everytime I am inserting a value to products by doing
INSERT
INTO
public.products(store_id, title, image, url)
VALUES((SELECT id FROM stores WHERE store_name = 'footish'),
'Teva Flatform Universal Pride',
'https://www.footish.se/sneakers/teva-flatform-universal-pride-t1116376',
'https://www.footish.se/pub_images/large/teva-flatform-universal-pride-t1116376-p77148.jpg?timestamp=1623417840')
I can see that the column of id increases by two everytime I insert instead of one and I would like to know what is the reason behind that?
I have not been able to figure out why and it would be nice to know! :)
There could be 3 reasons:
You've tried to create data but it failed. Even on failed creation and transaction rollback, a sequence does count. A used number will never be put back.
You're using a global sequence and created other data on other data meanwhile. Using a global sequence will always increase on any table data added, even on other tables be modified.
DB configuration for your sequence is set to stepsize/allocationsize=2. It can be configured however you want.
Overall it is not important. The most important thing is that it increases automatically and that even on a error/delete a already tried ID will never be put back.
If you want to have concrete information you need to procive the information about the sequence. You can check that using a SQL CLI or show it via DBeaver/....

PostgreSQL - on conflict update for GENERATED ALWAYS AS IDENTITY

I have a table of values which I want to manage with my application ...
let's say this is the table
CREATE TABLE student (
id_student int4 NOT NULL GENERATED ALWAYS AS IDENTITY,
id_teacher int2
student_name varchar(255),
age int2
CONSTRAINT provider_pk PRIMARY KEY (id_student)
);
In the application, each teacher can see the list of all his/her students .. and they can edit or add new students
I am trying to figure out how to UPSERT data in the table in PostgreSQL ... what I am doing now is for each teacher (after the manipulation in the app) they are allowed to edit on FE only in JS (without the necessity of saving each change individually)... so after the edit, they click SAVE button and that's the time I need to store the changes and new records in the DB ...
what I do now, is I delete all records for that particular teacher and store the new object/array they created (by editing, adding, .. whatever) - so it's easy and I don't have to check for changes and new records ... the drawbacks is a brutal waste of the sequence for ID_STUDENT (autogenerated on the DB side) and of course a huge overhead on indexes while inserting (=rebuilding) considering there will be a lot of teachers saving a lot of their students .. that might cause some perf issues ... not to mention the fragmenting (HWM) so I would have to VACUUM regularly on this table
In Oracle, I could easily use MERGE INTO (which is fantastic for this use case) but the MERGE is not in the PostgreSQL :(
the only thing I know about is the INSERT ON CONFLICT UPDATE ... but the problem is, how am I supposed to apply this on GENERATED ALWAYS AS IDENTITY key? I do not provide this sequence (on top of that I don't even know the latest number) and therefore I cannot trigger the ON CONFLICT (id_student) ....
is there any nice way out of this sh*t ? Or DELETE / INSERT is really the way to go?
You shouldn't be too worried about the data churn – after all, an UPDATE also writes a new version of the row, so it wouldn't be that much different. And the sequence is no problem, because you used bigint for the primary key, right (anything else would have been a mistake)?
If you want to use INSERT ... ON CONFLICT in combination with an auto-generated sequence, you need some way beside the primary key to identify a row, that is, you need a UNIQUE constraint that you can use with ON CONFLICT. If there is no candidate for such a constraint, how can you identify the records for the teacher?

How to maintain record history on table with one-to-many relationships?

I have a "services" table for detailing services that we provide. Among the data that needs recording are several small one-to-many relationships (all with a foreign key constraint to the service_id) such as:
service_owners -- user_ids responsible for delivery of service
service_tags -- e.g. IT, Records Management, Finance
customer_categories -- ENUM value
provider_categories -- ENUM value
software_used -- self-explanatory
The problem I have is that I want to keep a history of updates to a service, for which I'm using an update trigger on the table, that performs an insert into a history table matching the original columns. However, if a normalized approach to the above data is used, with separate tables and foreign keys for each one-to-many relationship, any update on these tables will not be recognised in the history of the service.
Does anyone have any suggestions? It seems like I need to store child keys in the service table to maintain the integrity of the service history. Is a delimited text field a valid approach here or, as I am using postgreSQL, perhaps arrays are also a valid option? These feel somewhat dirty though!
Thanks.
If your table is:
create table T (
ix int identity primary key,
val nvarchar(50)
)
And your history table is:
create table THistory (
ix int identity primary key,
val nvarchar(50),
updateType char(1), -- C=Create, U=Update or D=Delete
updateTime datetime,
updateUsername sysname
)
Then you just need to put an update trigger on all tables of interest. You can then find out what the state of any/all of the tables were at any point in history, to determine what the relationships were at that time.
I'd avoid using arrays in any database whenever possible.
I don't like updates for the exact reason you are saying here...you lose information as it's over written. My answer is quite simple...don't update. Not sure if you're at a point where this can be implemented...but if you can I'd recommend using the main table itself to store historical (no need for a second set of history tables).
Add a column to your main header table called 'active'. This can be a character or a bit (0 is off and 1 is on). Then it's a bit of trigger magic...when an update is preformed, you insert a row into the table identical to the record being over-written with a status of '0' (or inactive) and then update the existing row (this process keeps the ID column on the active record the same, the newly inserted record is the inactive one with a new ID).
This way no data is ever lost (admittedly you are storing quite a few rows...) and the history can easily be viewed with a select where active = 0.
The pain here is if you are working on something already implemented...every existing query that hits this table will need to be updated to include a check for the active column. Makes this solution very easy to implement if you are designing a new system, but a pain if it's a long standing application. Unfortunately existing reports will include both off and on records (without throwing an error) until you can modify the where clause

PostgreSQL ON INSERT CASCADE

I've got two tables - one is Product and one is ProductSearchResult.
Whenever someone tries to Insert a SearchResult with a product that is not listed in the Product table the foreign key constrain is violattet, hence i get an error.
I would like to know how i could get my database to automatically create that missing Product in the Product Table (Just the ProductID, all other attributes can be left blank)
Is there such thing as CASCADE ON INSERT? If there is, i was not able not get it working.
Rules are getting executed after the Insert, so because we get an Error beforehand there are useless if you USE an "DO ALSO". If you use "DO INSTEAD" and add the INSERT Command at the End you end up with endless recursion.
I reckon a Trigger is the way to go - but all my attempts to write one failed.
Any recommendations?
The Table Structure:
CREATE TABLE Product (
ID char(10) PRIMARY KEY,
Title varchar(150),
Manufacturer varchar(80),
Category smallint,
FOREIGN KEY(Category) REFERENCES Category(ID) ON DELETE CASCADE);
CREATE TABLE ProductSearchResult (
SearchTermID smallint NOT NULL,
ProductID char(10) NOT NULL,
DateFirstListed date NOT NULL DEFAULT current_date,
DateLastListed date NOT NULL DEFAULT current_date,
PRIMARY KEY (SearchTermID,ProductID),
FOREIGN KEY (SearchTermID) REFERENCES SearchTerm(ID) ON DELETE CASCADE,
FOREIGN KEY (ProductID) REFERENCES Product ON DELETE CASCADE);
Yes, triggers are the way to go. But before you can start to use triggers in plpgsql, you
have to enable the language. As user postgres, run the command createlang with the proper parameters.
Once you've done that, you have to
Write function in plpgsql
create a trigger to invoke that function
See example 39-3 for a basic example.
Note that a function body in Postgres is a string, with a special quoting mechanism: 2 dollar signs with an optional word in between them, as the quotes. (The word allows you to quote other similar quotes.)
Also note that you can reuse a trigger procedure for multiple tables, as long as they have the columns your procedure uses.
So the function has to
check if the value of NEW.ProductID exists in the ProductSearchResult table, with a select statement (you ought to be able to use SELECT count(*) ... INTO someint, or SELECT EXISTS(...) INTO somebool)
if not, insert a new row in that table
If you still get stuck, come back here.
In any case (rules OR triggers) the insert needs to create a new key (and new values for the attributes) in the products table. In most cases, this implies that a (serial,sequence) surrogate primary key should be used in the products table, and that the "real world" product_id ("product number") should default to NULL, and be degraded to a candidate key.
BTW: a rule can be used, rules just are tricky to implement correctly for N:1 relations (they need the same kind of EXISTS-logic as in Bart's answer above).
Maybe cascading on INSERT is not such a good idea after all. What do you want to happen if someone inserts a ProductSearchResult record for a not-existing product? [IMO a FK is always a domain; you cannot just extend a domain just by referring to a not-existant value for it; that would make the FK constraint meaningless]