Suppose I have a psql table with a primary key and some data:
pkey | price
----------------------+-------
0075QlyLvw8bi7q6XJo7 | 20
(1 row)
However, I would like to save historical updates on it without losing the functionality that comes from referencing it's key in other tables as foreign keys.
I am thinking of doing some kind of revision_number + timestamp approach where each "update" would be a new row, example:
pkey | price | rev_no
----------------------+-------+--------
0075QlyLvw8bi7q6XJo7 | 20 | 0
----------------------+-------+--------
0075QlyLvw8bi7q6XJo7 | 15 | 1
(2 rows)
Then create a view that always takes the highest revision number of the table and reference keys from that view.
However to me this workaraound seems a bit too heavy for a task that in my opinion should be fairly common. Is there something I'm missing? Do you have a better solution or is there a well known paradigm for these types of problems which I don't know about?
Assuming PKey is actually the defined primary key you cannot do the revision scheme you outlined without creating a history table and moving old data to it. The primary key must be unique for any revision. But if you have a properly normalized table there several valid method, the following is one:
Review the other attributes and identify the candidate business keys (columns of business meaning that could be defined unique -- perhaps the item name.
If not already present add 2 columns: effective timestamp and superseded timestamp.
Now create a partial unique index on the identified column,from #1) and the superseded timestamp being a column meaning this is the currently active version.
Create a simple view as Select * from table. Since this is a simple view it is fully update-able. Use this View for Select,Insert and Delete, but
for Update create an instead of trigger. This trigger will set the superseded timestamp of the current active row and insert a new row update applied and the updated the version number.
With the above you can get you uniquely keep on the current active revision. Further you maintain the history of all relationships at each version. (See demo, including a couple useful functions)
Related
I have a table of values which I want to manage with my application ...
let's say this is the table
CREATE TABLE student (
id_student int4 NOT NULL GENERATED ALWAYS AS IDENTITY,
id_teacher int2
student_name varchar(255),
age int2
CONSTRAINT provider_pk PRIMARY KEY (id_student)
);
In the application, each teacher can see the list of all his/her students .. and they can edit or add new students
I am trying to figure out how to UPSERT data in the table in PostgreSQL ... what I am doing now is for each teacher (after the manipulation in the app) they are allowed to edit on FE only in JS (without the necessity of saving each change individually)... so after the edit, they click SAVE button and that's the time I need to store the changes and new records in the DB ...
what I do now, is I delete all records for that particular teacher and store the new object/array they created (by editing, adding, .. whatever) - so it's easy and I don't have to check for changes and new records ... the drawbacks is a brutal waste of the sequence for ID_STUDENT (autogenerated on the DB side) and of course a huge overhead on indexes while inserting (=rebuilding) considering there will be a lot of teachers saving a lot of their students .. that might cause some perf issues ... not to mention the fragmenting (HWM) so I would have to VACUUM regularly on this table
In Oracle, I could easily use MERGE INTO (which is fantastic for this use case) but the MERGE is not in the PostgreSQL :(
the only thing I know about is the INSERT ON CONFLICT UPDATE ... but the problem is, how am I supposed to apply this on GENERATED ALWAYS AS IDENTITY key? I do not provide this sequence (on top of that I don't even know the latest number) and therefore I cannot trigger the ON CONFLICT (id_student) ....
is there any nice way out of this sh*t ? Or DELETE / INSERT is really the way to go?
You shouldn't be too worried about the data churn – after all, an UPDATE also writes a new version of the row, so it wouldn't be that much different. And the sequence is no problem, because you used bigint for the primary key, right (anything else would have been a mistake)?
If you want to use INSERT ... ON CONFLICT in combination with an auto-generated sequence, you need some way beside the primary key to identify a row, that is, you need a UNIQUE constraint that you can use with ON CONFLICT. If there is no candidate for such a constraint, how can you identify the records for the teacher?
Can I do row-specific update / delete operations in a DB2 table Via SQL, in a NON QUNIQUE Primary Key Context?
The Table is a PHYSICAL FILE on the NATIVE SYSTEM of the AS/400.
It was, like many other Files, created without the unique definition, which leads DB2 to the conclusion, that The Table, or PF has no qunique Key.
And that's my problem. I can't override the structure of the table to insert a unique ID ROW, because, I would have to recompile ALL my correlating Programs on the AS/400, which is a serious issue, much things would not work anymore, "perhaps". Of course, I can do that refactoring for one table, but our system has thousands of those native FILES, some well done with Unique Key, some without Unique definition...
Well, I work most of the time with db2 and sql on that old files. And all files which have a UNIQUE Key are no problem for me to do those important update / delete operations.
Is there some way to get an additional column to every select with a very unique row id, respective row number. And in addition, what is much more important, how can I update this RowNumber.
I did some research and meanwhile I assume, that there is no chance to do exact alterations or deletes, when there is no unique key present. What I would wish would be some additional ID-ROW which is always been sent with the table, which I can Refer to when I do my update / delete operations. Perhaps my thinking here has an fallacy as non Unique Key Tables are purposed to be edited in other ways.
Try the RRN function.
SELECT RRN(EMPLOYEE), LASTNAME
FROM EMPLOYEE
WHERE ...;
UPDATE EMPLOYEE
SET ...
WHERE RRN(EMPLOYEE) = ...;
I will explain the problem with an example:
I am designing a specific case of referential integrity in a table. In the model there are two tables, enterprise and document. We register the companies and then someone insert the documents associated with it. The name of the enterprise is variable. When it comes to recovering the documents, I need the name of the enterprise to be the same as it was when it was registered and not the value it currently has. The solution that I thought was to register the company again in each change with the same code, the updated name in this way would have the expected result, but I am not sure if it is the best solution. Can someone make a suggestion?
There are several possible solutions and it is hard to determine which one will exactly be the easiest.
Side comment: your question is limited to managing names efficiently but I would like to comment the fact that your DB is sensitive to files being moved, renamed or deleted. Your database will not be able to keep records up-to-date if anything happen at OS level. You should consider to do something about it too.
Amongst the few solution I considered, the one that is best normalized is the schema below:
CREATE TABLE Enterprise
(
IdEnterprise SERIAL PRIMARY KEY
, Code VARCHAR(4) UNIQUE
, IdName INTEGER DEFAULT -1 /* This will be used to get a single active name */
);
CREATE TABLE EnterpriseName (
IDName SERIAL PRIMARY KEY
, IdEnterprise INTEGER NOT NULL REFERENCES Enterprise(IdEnterprise) ON UPDATE NO ACTION ON DELETE CASCADE
, Name TEXT NOT NULL
);
ALTER TABLE Enterprise ADD FOREIGN KEY (IdName) REFERENCES EnterpriseName(IdName) ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY DEFERRED;
CREATE TABLE Document
(
IdDocument SERIAL PRIMARY KEY
, IdName INTEGER NOT NULL REFERENCES EnterpriseName(IDName) ON UPDATE NO ACTION ON DELETE NO ACTION
, FilePath TEXT NOT NULL
, Description TEXT
);
Using flag and/or timestamps or moving the enterprise name to the document table are appealing solutions, but only at first glance.
Especially, the part where you have to ensure a company always has 1, and 1 only "active" name is no easy thing to do.
Add a date range to your enterprise: valid_from, valid_to. Initialise to -infinity,+infinity. When you change the name of an enterprise, instead: update existing rows where valid_to = +infinity to be now() and insert the new name with valid_from = now(), valid_to = +infinity.
Add a date field to the document, something like create_date. Then when joining to enterprise you join on ID and d.create_date between e.valid_from and e.valid_to.
This is a simplistic approach and breaks things like uniqueness for your id and code. To handle that you could record the name in a separate table with the id,from,to,name. Leaving your original table with just the id and code for uniqueness.
I have a table with a boolean column automated. When that field is set to TRUE, then another table needs to have an entry referring to that row.
Table A
id | automated
---------------------
1 | False
2 | True
3 | False
Table B
id | FK-TableA | Value
-------------------------------
2 | 2 | X
So whenever a new entry gets inserted into Table A where automated is set to TRUE, then there also has to be a row inserted (or present) in Table B with a referece to it.
It seems an unnatural flow to me, with the constraint you are stating the natural flow should be creating a TRIGGER on Table B that inserts a record on Table A whenever a new Table B record is inserted.
But I understand this is a simplification of a more elaborated problem so if you really need to create this kind of procedure, there is still a question to be answered, what happens when check is negative, should there be an exception? should the record be inserted with FALSE instead of TRUE, should the record need to be ignored? there are two options from my point of view:
Create a TRIGGER before INSERT on table A that updates the table accordingly (Create a PROCEDURE that checks if this exists and a TRIGGER that executes this procedure)
Create a RULE on insert on your table A that checks that record exists on Table B and changes record or does instead nothing.
With a little more background I can help you with Trigger/Rule.
Anyway, take into account this can be performance-wise a real mistake if this table gets lots of INSERTs and you should go for some offline(as not being done on the live INSERT) procedure instead of doing on live INSERT
It is ugly and introduces a redundancy into the database, but I cannot think of a better way than this:
Introduce a new column b_id to a.
Add a UNIQUE constraint on ("FK-TableA", id) to b.
Add a foreign key on a so that (id, b_id) REFERENCES b("FK-TableA", id).
Add a CHECK (b_id IS NOT NULL OR NOT automated) constraint on a.
Then you have to point b_id to one of the rows in b that points back to this a row.
To make it perfect, you'd have to add triggers that guarantee that after each modification, the two foreign keys are still consistent.
I told you it was ugly!
I have a "services" table for detailing services that we provide. Among the data that needs recording are several small one-to-many relationships (all with a foreign key constraint to the service_id) such as:
service_owners -- user_ids responsible for delivery of service
service_tags -- e.g. IT, Records Management, Finance
customer_categories -- ENUM value
provider_categories -- ENUM value
software_used -- self-explanatory
The problem I have is that I want to keep a history of updates to a service, for which I'm using an update trigger on the table, that performs an insert into a history table matching the original columns. However, if a normalized approach to the above data is used, with separate tables and foreign keys for each one-to-many relationship, any update on these tables will not be recognised in the history of the service.
Does anyone have any suggestions? It seems like I need to store child keys in the service table to maintain the integrity of the service history. Is a delimited text field a valid approach here or, as I am using postgreSQL, perhaps arrays are also a valid option? These feel somewhat dirty though!
Thanks.
If your table is:
create table T (
ix int identity primary key,
val nvarchar(50)
)
And your history table is:
create table THistory (
ix int identity primary key,
val nvarchar(50),
updateType char(1), -- C=Create, U=Update or D=Delete
updateTime datetime,
updateUsername sysname
)
Then you just need to put an update trigger on all tables of interest. You can then find out what the state of any/all of the tables were at any point in history, to determine what the relationships were at that time.
I'd avoid using arrays in any database whenever possible.
I don't like updates for the exact reason you are saying here...you lose information as it's over written. My answer is quite simple...don't update. Not sure if you're at a point where this can be implemented...but if you can I'd recommend using the main table itself to store historical (no need for a second set of history tables).
Add a column to your main header table called 'active'. This can be a character or a bit (0 is off and 1 is on). Then it's a bit of trigger magic...when an update is preformed, you insert a row into the table identical to the record being over-written with a status of '0' (or inactive) and then update the existing row (this process keeps the ID column on the active record the same, the newly inserted record is the inactive one with a new ID).
This way no data is ever lost (admittedly you are storing quite a few rows...) and the history can easily be viewed with a select where active = 0.
The pain here is if you are working on something already implemented...every existing query that hits this table will need to be updated to include a check for the active column. Makes this solution very easy to implement if you are designing a new system, but a pain if it's a long standing application. Unfortunately existing reports will include both off and on records (without throwing an error) until you can modify the where clause