Versioning in the database

Versioning in the database - postgresql

I want to store full versioning of the row every time a update is made for amount sensitive table.
So far, I have decided to use the following approach.
Do not allow updates.
Every time a update is made create a new
entry in the table.
However, I am undecided on what is the best database structure design for this change.
Current Structure
Primary Key: id
id(int) | amount(decimal) | other_columns
First Approach
Composite Primary Key: id, version
id(int) | version(int) | amount(decimal) | change_reason
1 | 1 | 100 |
1 | 2 | 20 | correction
Second Approach
Primary Key: id
Uniqueness Index on [origin_id, version]
id(int) | origin_id(int) | version(int) | amount(decimal) | change_reason
1 | NULL | 1 | 100 | NULL
2 | 1 | 2 | 20 | correction

I would suggest a new table which store unique id for item. This serves as lookup table for all available items.
item Table:
id(int)
1000
For the table which stores all changes for item, let's call it item_changes table. item_id is a FOREIGN KEY to item table's id. The relationship between item table to item_changes table, is one-to-many relationship.
item_changes Table:
id(int) | item_id(int) | version(int) | amount(decimal) | change_reason
1 | 1000 | 1 | 100 | NULL
2 | 1000 | 2 | 20 | correction
With this, item_id will never be NULL as it is a valid FOREIGN KEY to item table.

The best method is to use Version Normal Form (vnf). Here is an answer I gave for a neat way to track all changes to specific fields of specific tables.
The static table contains the static data, such as PK and other attributes which do not change over the life of the entity or such changes need not be tracked.
The version table contains all dynamic attributes that need to be tracked. The best design uses a view which joins the static table with the current version from the version table, as the current version is probably what your apps need most often. Triggers on the view maintain the static/versioned design without the app needing to know anything about it.
The link above also contains a link to a document which goes into much more detail including queries to get the current version or to "look back" at any version you need.

Why you are not going for SCD-2 (Slowly Changing Dimension), which is a rule/methodology to describe the best solution for your problem. Here is the SCD-2 advantage and example for using, and it makes standard design pattern for the database.
Type 2 - Creating a new additional record. In this methodology, all history of dimension changes is kept in the database. You capture attribute change by adding a new row with a new surrogate key to the dimension table. Both the prior and new rows contain as attributes the natural key(or other durable identifiers). Also 'effective date' and 'current indicator' columns are used in this method. There could be only one record with the current indicator set to 'Y'. For 'effective date' columns, i.e. start_date, and end_date, the end_date for current record usually is set to value 9999-12-31. Introducing changes to the dimensional model in type 2 could be very expensive database operation so it is not recommended to use it in dimensions where a new attribute could be added in the future.
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 02-Apr-2018 N
2 80 04-Apr-2018 NULL Y
Detail Explanation::::
Here, all you need to add the 3 extra column, START_DATE, END_DATE, CURRENT_FLAG to track your record properly. When the first time record inserted # source, this table will be store the value as:
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 NULL Y
And, when the same record will be updated then you have to update the "END_DATE" of the previous record as current_system_date and "CURRENT_FLAG" as "N", and insert the second record as below. So you can track everything about your records. as below...
id | amount | start_date |end_date |current_flag
1 100 01-Apr-2018 02-Apr-2018 N
2 80 04-Apr-2018 NULL Y

Related

Is this INSERT statement containing SELECT subquery safe for multiple concurrent writes?

In Postgres, suppose I have the following table to be used like to a singly linked list, where each row has a reference to the previous row.
Table node
Column | Type | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+----------------------
id | uuid | | not null | gen_random_uuid()
created_at | timestamp with time zone | | not null | now()
name | text | | not null |
prev_id | uuid | | |
I have the following INSERT statement, which includes A SELECT to look up the last row as data for the new row to be inserted.
INSERT INTO node(name, prev_id)
VALUES (
:name,
(
SELECT id
FROM node
ORDER BY created_at DESC
LIMIT 1
)
)
RETURNING id;
I understand storing prev_id may seem redundant in this example (ordering can be derived from created_at), but that is beside the point. My question: Is the above INSERT statement safe for multiple concurrent writes? Or, is it necessary to explicitly use LOCK in some way?
For clarity, by "safe", I mean is it possible that by the time the SELECT subquery executed and found the "last row", another concurrent query would have just finished an insert, so the "last row" found earlier is no longer the last row, so this insert would use the wrong "last row" value. The effect is multiple rows may share the same prev_id values, which is invalid for a linked list structure.

Is it possible to use different forms and create one row of information in a table?

I have been searching for a way to combine two or more rows of one table in a database into one row.
I am currently creating multiple web-based forms that connect to one table in my database. Is there any way to write some mysql and php code that will take separate form submissions and put them into one row of the database instead of multiple rows?
Here is an example of what is going into the database:
This is all in one table with three rows.
Form_ID represents the three different forms that I used to insert the data into the table.
Form_ID | Lot_ID| F_Name | L_Name | Date | Age
------------------------------------------------------------
1 | 1 | John | Evans | *NULL* | *NULL*
-------------------------------------------------------------
2 |*NULL* | *NULL* | *NULL* | 2017-07-06 | *NULL*
-------------------------------------------------------------
3 |*NULL* | *NULL* | *NULL* | *NULL* | 22
This is an example of three separate forms going into one table. Every time the submit button is hit the data just inserts down to the next row of information.
I need some sort of join or update once the submit button is hit to replace the preceding NULL values.
Here is what I want to do after the submit button is hit:
I want it to be combined all into one row but still in one table
Form_ID is still the three separate forms but only in one row now.
Form_ID |Lot_ID | F_Name | L_Name | Date | Age
----------------------------------------------------------
1 | 1 | John | Evans | 2017-07-06 | 22
My goal is once a one form has been submitted I want the next, different form submission to replace the NULL values in the row above it and so on to create a single row of information.

I found a way to solve this issue. I used UPDATE tablename SET columname = newColumnName WHERE Form_ID = newID
So this way when I want to update rows that have blanks spaces I have it finding the matching ID's

DB2 table partitioning and delete old records based on condition

I have a table with few million records.
___________________________________________________________
| col1 | col2 | col3 | some_indicator | last_updated_date |
-----------------------------------------------------------
| | | | yes | 2009-06-09.12.2345|
-----------------------------------------------------------
| | | | yes | 2009-07-09.11.6145|
-----------------------------------------------------------
| | | | no | 2009-06-09.12.2345|
-----------------------------------------------------------
I have to delete records which are older than month with some_indicator=no.
Again I have to delete records older than year with some_indicator=yes.This job will run everyday.
Can I use db2 partitioning feature for above requirement?.
How can I partition table using last_updated_date column and above two some_indicator values?
one partition should contain records falling under monthly delete criterion whereas other should contain yearly delete criterion records.
Are there any performance issues associated with table partitioning if this table is being frequently read,upserted?
Any other best practices for above requirement will surely help.

I haven't done much with partitioning (I've mostly worked with DB2 on the iSeries), but from what I understand, you don't generally want to be shuffling things between partitions (ie - making the partition '1 month ago'). I'm not even sure if it's even possible. If it was, you'd have to scan some (potentially large) portion of your table every day, just to move it (select, insert, delete, in a transaction).
Besides which, partitioning is a DB Admin problem, and it sounds like you just have a DB User problem - namely, deleting 'old' records. I'd just do this in a couple of statements:
DELETE FROM myTable
WHERE some_indicator = 'no'
AND last_updated_date < TIMESTAMP(CURRENT_DATE - 1 MONTH, TIME('00:00:00'))
and
DELETE FROM myTable
WHERE some_indicator = 'yes'
AND last_updated_date < TIMESTAMP(CURRENT_DATE - 1 YEAR, TIME('00:00:00'))
.... and you can pretty much ignore using a transaction, as you want the rows gone.
(as a side note, using 'yes' and 'no' for indicators is terrible. If you're not on a version that has a logical (boolean) type, store character '0' (false) and '1' (true))

How to create a PostgreSQL partitioned sequence?

Is there a simple (ie. non-hacky) and race-condition free way to create a partitioned sequence in PostgreSQL. Example:
Using a normal sequence in Issue:
| Project_ID | Issue |
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
Using a partitioned sequence in Issue:
| Project_ID | Issue |
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |

I do not believe there is a simple way that is as easy as regular sequences, because:
A sequence stores only one number stream (next value, etc.). You want one for each partition.
Sequences have special handling that bypasses the current transaction (to avoid the race condition). It is hard to replicate this at the SQL or PL/pgSQL level without using tricks like dblink.
The DEFAULT column property can use a simple expression or a function call like nextval('myseq'); but it cannot refer to other columns to inform the function which stream the value should come from.
You can make something that works, but you probably won't think it simple. Addressing the above problems in turn:
Use a table to store the next value for all partitions, with a schema like multiseq (partition_id, next_val).
Write a multinextval(seq_table, partition_id) function that does something like the following:
Create a new transaction independent on the current transaction (one way of doing this is through dblink; I believe some other server languages can do it more easily).
Lock the table mentioned in seq_table.
Update the row where the partition id is partition_id, with an incremented value. (Or insert a new row with value 2 if there is no existing one.)
Commit that transaction and return the previous stored id (or 1).
Create an insert trigger on your projects table that uses a call to multinextval('projects_table', NEW.Project_ID) for insertions.
I have not used this entire plan myself, but I have tried something similar to each step individually. Examples of the multinextval function and the trigger can be provided if you want to attempt this...

Oracle: TABLE ACCESS FULL with Primary key?

There is a table:
CREATE TABLE temp
(
IDR decimal(9) NOT NULL,
IDS decimal(9) NOT NULL,
DT date NOT NULL,
VAL decimal(10) NOT NULL,
AFFID decimal(9),
CONSTRAINT PKtemp PRIMARY KEY (IDR,IDS,DT)
)
;
Let's see the plan for select star query:
SQL>explain plan for select * from temp;
Explained.
SQL> select plan_table_output from table(dbms_xplan.display('plan_table',null,'serial'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
---------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 61 | 2 (0)|
| 1 | TABLE ACCESS FULL| TEMP | 1 | 61 | 2 (0)|
---------------------------------------------------------------
Note
-----
- 'PLAN_TABLE' is old version
11 rows selected.
SQL server 2008 shows in the same situation Clustered index scan. What is the reason?

select * with no where clause -- means read every row in the table, fetch every column.
What do you gain by using an index? You have to go to the index, get a rowid, translate the rowid into a table offset, read the file.
What happens when you do a full table scan? You go the th first rowid in the table, then read on through the table to the end.
Which one of these is faster given the table you have above? Full table scan. Why? because it skips having to to go the index, retreive values, then going back to the other to where the table lives and fetching.

To answer this more simply without mumbo-jumbo, the reason is:
Clustered Index = Table
That's by definition in SQL Server. If this is not clear, look up the definition.
To be absolutely clear once again, since most people seem to miss this, the Clustered Index IS the table itself. It therefore follows that "Clustered Index Scan" is another way of saying "Table Scan". Or what Oracle calls "TABLE ACCESS FULL"

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse