DB Design for record tracking - entity-framework-core

Background: I am writing an app and designing out the database using efcore. There are particular tables that store user actions that I need to keep track of who created the record, last modified it, and deleted it (soft deletes).
I have a user table that has an int as a PK and each of the respective field (CreatedBy, LastModifiedBy, DeletedBy) hold an int that points to a user row.
I do have a full audit setup where the entirety of a row has it's old/new contents stored on save and that works fine. But this particular question is about the more immediate created/modified/deleted by tracking.
Help desk is generally who uses these fields on a daily basis to help users determine whats going on quickly but there are a lot of places in the app itself that will draw upon those fields eventually (moreso created/modified from the app perspective).
Question: I was going to create a pk/fk relationship between the tables and the user table. However, it got me thinking about if there is a better strategy then adding those 3 fields and relationships to every single table going forward. Maybe a single table that stores the table name with its pk and created/modified/deleted columns that have a relationship back to the user table so that only 1 table has those pk/fk relationships back to user. I just feel that there must be a better way/more efficient way to handle this. Is there a better way to handle this?

Don't do what you are thinking, stick with your original design - keep all auditing fields as they related to each table on the table itself. Adding a table that stores auditing fields from other tables just creates a design nightmare.
Now, a bigger question is how to track the audit transaction. A design I like is as follows:
CreateBy. Add default binding SUBSTRING(SUSER_NAME(),1,50)
CreateTs. Add default binding GETDATE()
UpdateBy
UpdateTs
Hard deletes are allowed (i.e., bad data). Soft deletes come in the form of an additional column called ActiveInd (BIT) where that transaction would be stored as an Update. This means that Updates and soft Deletes are recorded into the UpdateBy/UpdateTs columns.
That should get you what you need if you intend on tracking activity from a web application. If you have a back-end system that is loading and manipulating data that I would include a LoadInfo table that tracks all of the jobs and then you can add both a LoadSequenceKey and ParentSequenceKey (add a self referencing foreign key here) and then you can create a foreign key on all tables that are modified by jobs that store the sequence key as either a CreateSequenceKey or UpdateSequenceKey.

Related

Lagom persistent read side and model evolution

To learn lagom i created a simple application with some simple persistent entities and a persistent read side (as per the official documentation, using cassandra)
The official doc contains a section about model evolution, describing how to change the model. However, there is no mention of evolution when it comes to the read side.
Assuming i have an entity called Item, with an ID and a name, and the read side creates a table like CREATE TABLE IF NOT EXISTS items (id TEXT, name TEXT, PRIMARY KEY (id))
I now want to change the item to include a description. This is trivial for the persistent entity, but the read side has to be changed as well.
I can see several approaches to (maybe) achieve that:
use a model evolution tool like liquibase or play evolutions to change the read side tables.
somehow include update table statements in createTables that migrate the model
create additional tables containing the additional information, and keep the old tables without modifications
Which approach would be the most fitting? Is there something better?
Creating a new table and dropping the old table is an option too IMHO.
It is simple as modifying your "create table" command ("create table mytable_v2 ..." and "drop table mytable...") and changing the offset name and modifying your event handlers.
override def buildHandler( ): ReadSideProcessor.ReadSideHandler[MyEvent] = {
readSide.builder[MyEvent]("myOffset") // change it to "myOffset_v2"
...
}
This results in all events to be replayed and your read side table to be reconstructed from the scratch. This may not be an option if the current table is really huge as the recostruction may last very long time.
Regarding what #erip says I see perfectly normal adding a new column to your read side table. Suppose there are lots of records in this table with list of all entities and you want to retrieve a list of entities based on some criteria so you need some columns to be included in the where clause. Retrieving list of all entities and asking each of them if it complies with the criteria is not an option at all - it could be very unefficient as it needs more time, memory and network usage.
The point of a read-side is to materialize views from entity state changes from your service's event stream in your service. In this respect, you as the service controller can decide what is important for your subscribers to know about. This is handled by creating read-sides with an anti-corruption layer (or ACL).
Typically your subscribers will subscribe to API events which should experience no evolution. Your internal events (or impl events) will likely need to evolve; because of this, there should be a transformation from the impl to the API.
This is why it's very important to consider your domain very carefully before design: you really need to nail down what subscribers will need to know about. In the case of a description, it strikes me as unlikely that subscribers will need (or want!) to know about that.

How to avoid duplicate records with user edits in a referential system?

PostgreSQL 10.1
I am curious to see how the SO Community deals with this common database problem.
The problem is this. I have textual descriptions of various problems being entered at the desktop into the table ICD_Descriptions (the middle box below). Generally speaking there will also be one code associated with each description. However, over time (i.e., years), the codes for a specific text phrase/description will change. Hence, a general many-to-many relationship will exist for some codes to descriptions. So a third table, dx_log, is being used to allow for a many-to-many relationship between the code and the descriptions. Lastly, other "children" tables that need to see a specific combination of 'code-description' will be given a reference to the primary key (the recid) of the dx_log. I believe this arrangement to be fairly standard database management.
O.K., now for the problem. I wish the codes in the icd_code table to be unique to that table. I also wish the descriptions in the icd_description table to be unique to their table.
The problem. This being a referential system, changes to either the "data" part of the code table, (the code), or the description (in the descriptions table) will be seen in the child tables.
But, how to correctly manage user edits to a code or description that would conflict with the "unique" rule of the respective table?
As an example, lets say the initial text of a description is "this is mispelled", whereas another description text has "this is misspelled". At one point it time, both phrases coexist and are unique. However, at a later point in time, the wrong record is corrected (i.e., mispelled --->misspelled). When the edits are attempted to be saved to the file, it will be detected that the correct record already exists.
So if the dx_log table is also unique on (icd_description_recid, icd_code_recid), then simply replacing the reference to the already existing correctly spelled record in the dx_log will lead to a violation of uniqueness for the dx_log table. Therefore, I can think of only three solutions:
The table which the children tables actually reference CAN NOT HAVE A UNIQUES constraint on its reference pointers,
Leave the uniqueness constraint on the dx_log, but when a conflict would violate uniqueness, then use a migrate procedure to move the child table references to the existing record (in postgresql this would be heavy use of the catalogs) to the "new record", then delete the existing record before adding the new record.
Add an additional self-reference pointer in the dx_log such that when a record would conflict with an already existing record in dx_log, then don't change it but rather place a pointer to the already existing "correct" record in the dx_log.
I hope I've explained my question well enough. What is the recommended approach?
Thanks for any comments.
I would say the 2. is the correct solution.
Yes, that requires that all dependent records in dx_log have to be updated when two entries are consolidated, but you have an index on dx_log(icd_description_recid) anyway, right?
The other solutions compromise data consistency and will make all queries on the system more complicated and probably slower.
Separate the idea of a natural unique constraint from the idea of a key. Use a system generated surrogate key for referential integrity, and a UNIQUE INDEX for the natural key. Then you don't have to reparent anything.

How do I tell EF to ignore columns from my database?

I'm building my EF (v4.0.30319) data model from my SQL Server database. Each table has Created and Updated fields populated via database triggers.
This database is the backend of a ASP.NET Web API application and I recently discovered a problem. The problem is, since the Created and Updated fields are not populated in the model passed into the api endpoint, they are written to the database as NULL. This overwrites any values already in the database for those properties.
I discovered I can edit the EF data model and just delete those two columns from the entity. It works and the datetimes are not overwritten with NULL. But this leads to another, less serious but more annoying, problem... my data model has a bunch of tables that contain these properties and all the tables need to be updated by removing these two columns.
Is there a way to tell EF to ignore certain columns in entities across the entire data model without manually deleting them?
As far as I know, no. Generating the model from the database is going to always create all of the fields from the database table. Well, as long as it has a primary key it will.
It is possible to only update the fields that you want i.e. don't include the "Created" and "Updated" fields in your create and update methods. I'd have to say though, that I'd think it'd be better if those fields didn't even exist on the model at that point. You may at one point see those fields on the model and not remember that they won't get persisted to the DB.
You may want to look into just inserting the datetimes into those fields when you call your create() and update() methods. Then you could just ditch the triggers. You'd obviously want to use a class library for all of your database operations so this functionality would be in one place. To keep it nice and neat, you know?

Change tracking with EF over time?

I'd like to know what is the best practice to track and/or persist changes over time if I use EF. I'd like to get started with EF for a new project. What I need is a kind of change history.
That's how I did it before: If a record was created it was saved with an ID and with the same ID as InvariantID. If the record was updated i marked it as deleted and created a new record with the new values and a new ID but the same InvariantID. Like this I always had my current record but a history of changes as well.
This works perfectly fine for my scenarios. The amount of historical records is not an issue because I use this usually only for data that's not changing very often.
Is this build in EF somehow or what's the best way to get this behavior for EF?
No it is not build into EF and it will not work this way. I even don't think that it is a good approach on the database level because it makes referential integrity very complex.
With EF this will work only if you use following approach:
You will use conditional mapping for your entity - condition will be IsDeleted = 0. It will ensure that only non deleted entities will be used in queries.
You will have mapped stored procedure for delete operation to correctly set IsDeleted = 1 instead of really deleting the record
You will have to manually call DeleteObject to delete your record and after that you will insert new record - the reason is that EF is not able to deal with scenario where entity change its PK value during update.
Your entities will not be able to participate in relations unless you manually rebuild referential integrity with some other stored procedure
You will need stored procedure to query historical (deleted) records

iPhone and Core Data: how to retain user-entered data between updates?

Consider an iPhone application that is a catalogue of animals. The application should allow the user to add custom information for each animal -- let's say a rating (on a scale of 1 to 5), as well as some notes they can enter in about the animal. However, the user won't be able to modify the animal data itself. Assume that when the application gets updated, it should be easy for the (static) catalogue part to change, but we'd like the (dynamic) custom user information part to be retained between updates, so the user doesn't lose any of their custom information.
We'd probably want to use Core Data to build this app. Let's also say that we have a previous process already in place to read in animal data to pre-populate the backing (SQLite) store that Core Data uses. We can embed this database file into the application bundle itself, since it doesn't get modified. When a user downloads an update to the application, the new version will include the latest (static) animal catalogue database, so we don't ever have to worry about it being out of date.
But, now the tricky part: how do we store the (dynamic) user custom data in a sound manner?
My first thought is that the (dynamic) database should be stored in the Documents directory for the app, so application updates don't clobber the existing data. Am I correct?
My second thought is that since the (dynamic) user custom data database is not in the same store as the (static) animal catalogue, we can't naively make a relationship between the Rating and the Notes entities (in one database) and the Animal entity (in the other database). In this case, I would imagine one solution would be to have an "animalName" string property in the Rating/Notes entity, and match it up at runtime. Is this the best way to do it, or is there a way to "sync" two different databases in Core Data?
Here's basically how I ended up solving this.
While Amorya's and MHarrison's answers were valid, they had one assumption: that once created, not only the tables but each row in each table would always be the same.
The problem is that my process to pre-populate the "Animals" database, using existing data (that is updated periodically), creates a new database file each time. In other words, I can't rely on creating a relationship between the (static) Animal entity and a (dynamic) Rating entity in Core Data, since that entity may not exist the next time I regenerate the application. Why not? Because I have no control how Core Data is storing that relationship behind the scenes. Since it's an SQLite backing store, it's likely that it's using a table with foreign key relations. But when you regenerate the database, you can't assume anything about what values each row gets for a key. The primary key for Lion may be different the second time around, if I've added a Lemur to the list.
The only way to avoid this problem would require pre-populating the database only once, and then manually updating rows each time there's an update. However, that kind of process isn't really possible in my case.
So, what's the solution? Well, since I can't rely on the foreign key relations that Core Data makes, I have to make up my own. What I do is introduce an intermediate step in my database generation process: instead of taking my raw data (which happens to be UTF-8 text but is actually MS Word files) and creating the SQLite database with Core Data directly, I introduce an intermediary step: I convert the .txt to .xml. Why XML? Well, not because it's a silver bullet, but simply because it's a data format I can parse very easily. So what does this XML file have different? A hash value that I generate for each Animal, using MD5, that I'll assume is unique. What is the hash value for? Well, now I can create two databases: one for the "static" Animal data (for which I have a process already), and one for the "dynamic" Ratings database, which the iPhone app creates and which lives in the application's Documents directory. For each Rating, I create a pseudo-relationship with the Animal by saving the Animal entity's hash value. So every time the user brings up an Animal detail view on the iPhone, I query the "dynamic" database to find if a Rating entity exists that matches the Animal.md5Hash value.
Since I'm saving this intermediate XML data file, the next time there's an update, I can diff it against the last XML file I used to see what's changed. Now, if the name of an animal was changed -- let's say a typo was corrected -- I revert the hash value for that Animal in situ. This means that even if an Animal name is changed, I'll still be able to find a matching Rating, if it exists, in the "dynamic" database.
This solution has another nice side effect: I don't need to handle any migration issues. The "static" Animal database that ships with the app can stay embedded as an app resource. It can change all it wants. The "dynamic" Ratings database may need migration at some point, if I modify its data model to add more entities, but in effect the two data models stay totally independent.
The way I'm doing this is: ship a database of the static stuff as part of your app bundle. On app launch, check if there is a database file in Documents. If not, copy the one from the app bundle to Documents. Then open the database from Documents: this is the only one you read from and edit.
When an upgrade has happened, the new static content will need to be merged with the user's editable database. Each static item (Animal, in your case) has a field called factoryID, which is a unique identifier. On the first launch after an update, load the database from the app bundle, and iterate through each Animal. For each one, find the appropriate record in the working database, and update any fields as necessary.
There may be a quicker solution, but since the upgrade process doesn't happen too often then the time taken shouldn't be too problematic.
Storing your SQLite database in the Documents directory (NSDocumentDirectory) is certainly the way to go.
In general, you should avoid application changes that modify or delete SQL tables as much as possible (adding is ok). However, when you absolutely have to make a change in an update, something like what Amorya said would work - open up the old DB, import whatever you need into the new DB, and delete the old one.
Since it sounds like you want a static database with an "Animal" table that can't be modified, then simply replacing this table with upgrades shouldn't be an issue - as long as the ID of the entries doesn't change. The way you should store user data about animals is to create a relation with a foreign key to an animal ID for each entry the user creates. This is what you would need to migrate when an upgrade changes it.