In a SQLite database is it better to use tirggers to handle cascading table changes, or is it better to do it programmatically? - triggers

Background
I have a couple of projects that use a SQLite DB for data. The data stored in the databases are obviously stored across several tables, linked by key/foreign key values.
The thing is that in these databases, if something changes to one record I have to update several other tables. The best example off the top of my head is deleting a record. I have to make sure all other records related to the one being deleted are deleted as well. Now, this example can be solved using key/foreign key values, I believe, but what about more complicated updates?
Now I'm no pro DB admin, but I know that there needs to be data integrity in the DB or things get ugly.
The Question
So, my question. I know that I have greater control when updating related tables programmatically, but at the cost of human error and time. I may miss something or not implement the tables updates correctly and it takes a lot longer to code in the updates. On the other hand, I can put in triggers and let the DB handle the updates to other tables, but I then lose a lot of control.
So, which one is better? Is each better in different situations?

On the other hand, I can put in
triggers and let the DB handle the
updates to other tables, but I then
lose a lot of control.
What control do you think you're losing? If data integrity requires that "such-and-such an update here requires additional updates there and there", you're not losing control by coding that in a trigger. You're centralizing control, and delegating it to the dbms, which is the only piece of software that can guarantee every application follows those requirements.
I know that I have greater control
when updating related tables
programmatically, but at the cost of
human error and time. I may miss
something or not implement the tables
updates correctly and it takes a lot
longer to code in the updates.
You're thinking like a programmer, not a database designer. (That's an observation, not a criticism.) Don't think, "I might miss something". That way of thinking really misses the mark.
Instead, when you're tempted to delegate data integrity to application code, think "Every programmer and every new or changed application that hits this database from now until the end of time has to get it perfectly right."
Now, honestly, does that really sound like a good idea to you?
(The last Fortune 500 company I worked in had programs written in at least two dozen different languages hitting their OLTP database.)

Related

Event Sourcing - How to query inside a command?

We would like to be able to read state inside a command use case.
We could get the state from event store for the specific aggregate, but what about querying aggregates by field(not id) or performing more complicated queries, that are not fitted for the event store?
The approach we were thinking was to use our read model for those cases as well and not only for query use cases.
This might be inconsistent, so a solution could be to have the latest version of the aggregate stored in both write/read models, in order to be able to tell if the state is correct or stale.
Does this make sense and if yes, if we need to get state by Id should we use event store or the read model?
If you want the absolute latest state of an event-sourced aggregate, you're going to have to read the latest snapshot (assuming that you are snapshotting) and then replay events since that snapshot from the event store. You can be aggressive about snapshotting (conceivably even saving a snapshot after every command), but you're giving away some write performance to make the read faster.
Updating the read model directly is conceivably possible, though that level of coupling is something that should be considered very carefully. Note also that you will very likely need some sort of two-phase commit to ensure that the read model is only updated when the write model is updated and vice versa. I strongly suggest considering why you're using CQRS/ES in this project, because you are quite possibly undermining that reason by doing this sort of thing.
In general, if you need a query for processing a particular command, it's likely that query will generally be the same, i.e. you don't need free-form query support. In that case, you can often have a read model that's tuned for exactly that query and which only cares about events which could affect that query: often a fairly small subset of the events. The finer-grained the read model, the easier it is to keep in sync (if it ignores 99% of events, for instance, it can't really fall that far behind).
Needing to make complex queries as part of command processing could also be a sign that your aggregate boundaries aren't right and could do with a re-examination.
Does this make sense
Maybe. Let's start with
This might be inconsistent
Yup, they might be. So what?
We typically respond to a query by sending an unlocked copy of the answer. In other words, it's possible that the actual information in the write model will change after this response is dispatched but before the response arrives at its destination. The client will be looking at a copy of the answer taken from the past.
So we might reasonably ask how much better it is to get information no more than one minute old compared to information no more than five minutes old. If the difference in value is pennies, then you should probably deploy the five minute version. If the difference is millions of dollars, then you're in a good position to negotiate a real budget to solve the problem.
For processing a command in our own write model, that kind of inconsistency isn't usually acceptable or wise. But neither of the two common answers require keeping the read and write models synchronized. The most common answer is to just work with the write model alone. The less common answer is to grab a snapshot out of a cache, and then apply any additional events to it to bring it up to date. The latter approach is "just" a performance optimization (first rule: don't.)
The variation that trips everyone up is trying to process a command somewhere else, enforcing a consistency rule on our data here. Once again, you need a really clear picture of how valuable the consistency is to the business. If it's really important, that may be a signal that the information in question shouldn't be split into two different piles - you may be working with the wrong underlying data model.
Possibly useful references
Pat Helland Data on the Outside Versus Data on the Inside
Udi Dahan Race Conditions Don't Exist

Is it practical to use one table for reading purpose only in a relational database?

I know this question would not be ideal in a real database world, however, I am building a web REST api to server a result that potentially need to join almost every table(i use normalization for sure).
So is it OK to do have one single table to hold the meta data used for reading API, but the table get updated as well when data updated in other tables? I am using PostgreSQL by the way.
This is not very clear so I will state my understanding of the question and give you what I see are the tradeoffs.
First.... It sounds to me like you want to effectively materialize a metadata table and have it live-updated when other tables update. This is not really what the MATERIALIED VIEW support in PostgreSQL is for.
You can use a trigger to update the data whenever something changes. Because of the way PostgreSQL handles things, this leads to more disk and CPU activity, but will probably add more on the latter than the former. So if you hare heavily CPU-bound that will pose more problems than if you are I/O bound.
Using triggers in this way adds a fair bit of complexity to your database and may reduce write scaling a bit but if the data is seldom written but read frequently it may be a clear win.
So in answer to your question, yes it is practical in at least some cases. Whether it is practical in your case, that will be for you to decide.

Synchronizing two tables, best practice

I need to synchronize two tables across databases, whenever either one changes, on either update, delete or insert. The tables are NOT identical.
So far the easiest and best solution i have been able to find is adding SQL triggers.
Which i slowly started adding, and seems to be working fine. But before i continue finishing it, I want to be sure that this is a good idea? And in general good practice.
If not, what a better option for this scenario?
Thank you in advance
Regards
Daniel.
Triggers will work, but there are quite a few different options available to consider.
Are all data modifications to these tables done through stored procedures? If so, consider putting the logic in the stored procedures instead of in a trigger.
Do the updates have to be real-time? If not, consider a job that regularly synchronizes the tables instead of a trigger. This probably gets tricky with deletes, though. Not impossible, just tricky.
We had one situation where the tables were very similar, but had slightly different column names or orders. In that case, we created a view to the original table that let the application use the view instead of the second copy of the table. We were also able to use a Synonym one time to point to the original table, but that requires that the table structures be the same.
Generally speaking, a lot of people try to avoid unnecessary triggers as they're just too easy to miss when doing other work in the database. That doesn't make them bad, but can lead to interesting times when trying to troubleshoot problems.
In your scenario, I'd probably briefly explore other options before continuing with the triggers. Just watch out for cascading trigger effects where your one update results in the second table updating, passing the update back to the first table, then the second, etc. You can guard for this a little with nesting levels. Otherwise you run the risk of hitting that maximum recursion level and throwing errors.

Database to choose for game

I'm working on a game (something like chess) where every move needs to be saved into a database per move.
Since it's very important that both reads and writes are fast, and I don't need complex queries, I'm probably going to go with a NoSQL solution.
Which one though? I'm thinking about MongoDB but will its durability be an issue?
It's not the end of the world if some game movelists are corrupted, but would be problematic if it occurs too often.
Suggestions?
Thanks.
Just about any database technology would work in your scenario. There are no compelling reasons (that you've given, anyway) that would recommend "NoSQL" style data access over any other technology.
Last time I looked at a chess game you would need about 1 insert every 5 minutes. You could use LDAP server for that. But of course I do not know your use-case.
Mongo-db excels for document like records where you would want to extend the information over time. It has a nice ad-hoc query facility and is relatively speedy, which means about as fast as mysql database on the same machine.
In first view your data looks to be highly structured so the document flexibility will not add a lot of benefits.
If you have a huge number of records (like in many x10^6) then it will show the difference.
Now if you want to use mongo-db, fine, you probably won't be dissappointed, but I see no compelling case here.
If your database have more than one table, than NoSQL is no starter. I can immediately think of two: Positions and Moves.
The only advantage I can see to a mongodb or couchdb is that you could conceivably store the entire match in a single record, thus making your data a little simpler. You wouldn't have to do the traditional one to many mapping between moves table and a game table. CouchDB could be interesting because it supports offline transactions and a pure rest interface so all your code could be javascript. But you probably don't want your users to directly interface with the database!

Propagated delete in code or database?

I'm working on an iPhone application with a few data relationships (Author -> Books for example). When a user deletes an Author object from the application, I have a few SQLite triggers that run on the delete to remove any books from the database that have a foreign key matching the Author's primary key.
I'm also using a trigger to insert some data when a new item is created.
I can't help but shake the feeling that this might be bad design or lead to some problems down the road I am not thinking of. That said, should I rely on code in my app to handle propagating the deletes like this when the database has the capability built in to handle it?
What say you?
True. Use the inbuilt capabilities of the database as much as possible. Atleast try and start off like that and only compromise when things really demand so.
I would make use of the database's features to ensure relational integrity, especially with respect to updates/deletes. There are cases where I might use a trigger to insert some additional data (auditing comes to mind), though I would tend to avoid this and insert all of the data from my application. If you are doing multiple inserts, though, make sure to wrap it all in a single transaction so that you don't end up with a partial insert which could lead to loss of relational integrity.
I like the idea of using the database's built in functionality (I am not familiar with how it works).. but I would worry if I went back to the code a year from now, would I remember how it worked? (Given the code isn't right in front of me).
I imagine if you add a lot of comments to remind yourself about how it works now, if anything goes wrong in the future, at least you won't need to relearn the database features when you need to go do some debugging.
You're a few steps ahead of me: I recently learned about how to do that stuff with triggers and I am tempted to use them myself.
Based on the other answers here, it seems like a philosophical choice. It would probably be fine to use either triggers or code, but best to be consistent. So don't use triggers for cascading deletes on one table but then C code for another table.
Since you tagged the question iphone, I think the most important difference would be relative performance of C code versus a trigger. You'd probably have to code both and experiment to determine the difference, if any.
Another thing that comes to mind is that, of all the horror stories that I read on thedailywtf.com, about half of them seem to involve database triggers.
Unfortunately SQLite does NOT support on delete cascade etc. From the SQLite documentation:
http://www.sqlite.org/omitted.html
FOREIGN KEY constraints are parsed but are not enforced. However, the equivalent constraint enforcement can be achieved using triggers. The SQLite source tree contains source code and documentation for a C program that will read an SQLite database, analyze the foreign key constraints, and generate appropriate triggers automatically.
There is some support for triggers but it is not complete. Missing subfeatures include FOR EACH STATEMENT triggers (currently all triggers must be FOR EACH ROW), INSTEAD OF triggers on tables (currently INSTEAD OF triggers are only allowed on views), and recursive triggers - triggers that trigger themselves.
Therefore, the only way to code on delete cascade etc using SQLite requires triggers.
Kind regards,
Code goes in your app.
Triggers are code. The functionality goes in your app. Not in the database.
I think that databases should be used for data, not processing. I think apps should be used for processing, not data.
Database processing features merely muddy the water.