Event logging for auditing with replay - postgresql

I need to implement an auditing log for GDPR compliance so that we have a record of every consent given or revoked (an event) per user of our system. It has to store how & when it happened alongside things like what the wording of the consent actually was at the time.
So that we can recover from a backup restore, this log will be stored separately from our main DB. We will then need to be able to update the state of the user consent so that it accurately reflects the event log (i.e. the last known value (true/false) of each consent question per user)
I could simply do this using a second postgres instance (our main DB is postgres) with a single table to store the information and then some simple application code to log each event as well as update the main DB. There could also be some simple application logic to find the last known states of each consent from the event log and update the master DB.
To me it seems like a bit of overkill using postgres to store this info? though adding a new technology to store this also seems overkill. Are there any technologies that are more suitable for this sort of thing? It sounds a lot like Event Sourcing to me.

If you're already running postgres, it doesn't seem like overkill, given that it needs to be online and queryable. Something like kafka is often a natural fit for this kind of problem, but that's even more overkill.
This bears a passing resemblance to event sourcing, but on a really small scale. Event sourcing usually means that all your data is expressed in terms of events, and replayed from beginning to end to materialize the current state.
Could you elaborate on this?:
So that we can recover from a backup restore, this log will be stored separately from our main DB.
Doesn't your main database recover from a backup / restore?

Related

Read Directly from the Event Store Or Implement a Copy of the Events in the Read Side

I'am wokring on a project where I have implemented CQRS without event sourcing. I am using Mongo for the read database and also for the write database. When something has to be changed it is first changed in the write db and then the read db is synchronized.
Later I introtoduced somehting like Event Store, also a MongoDB instace. I am making history of all the events that changed the other databases in some way. The read db does not synchronize with the Event Store, so I have no way to read the events.
I've reached a situation where I need the information that's inside the events from the Event Store. Should I connect directly to the Event Store and read from there or I should make the read db synchronize with the Event Store and basically hold a duplicate of the Event Store?
Thank you in advance guys! I am using C# .NET Core if someone needs this kind of info.
There's no functional reason why you shouldn't just read events directly from the event store.
The motivation for read models is usually low latency queries; we take the representation of data in the book of record (the event streams), and reshape it into a data structure that can answer queries. We accept the consequences of eventual consistency to get fast response times.
But if the shape we need for a query is an event stream, then we can just use the source data.
If the queries of the event store are having a negative performance impact on the writes to the store, then we might redesign the system to direct queries to a cached copy of the events instead.

MongoDB - Safeguard against .remove() entire database?

I'm using MongoDB as my database, and as a first-time back-end developer the ease with which I can delete an entire database/collection really bothers me.
Simply typing db.collection.remove() removes all records from that collection!
I know that an effective backup strategy should render this a non-issue, but I occasionally do run .remove() on some collections, and I'd hate to type in the wrong collection name by accident and (a) have to go through a backup restore, and (b) lose whatever data I had gathered between the backup and the restore, especially as my app gathers a lot of user data.
Is there any 'safeguard' I can set up my database to use, even if it's just a warning/confirmation that says
"Yo, are you sure you want to remove everything from <collectionname>? Choose: Yes/No"
User roles won't fix your problem. If your account has permissions to delete one user, you could accidentally delete them all. If your account has permissions to update an attribute for one user, you could accidentally update all of your users.
There's a simple fix for this however.
Step 0: Backup your database. And test your backups regularly. And make sure you get alerted if the backup did not run, or errored. Replica sets are not backups. I know this is obvious, but evidentally it's not obvious to everybody.
Step 1: Write a web admin GUI interface for your database. This it will only take a day or two -- and it should be simple enough that a secretary or intern could use it without fear for your data. (If you think this will take a long time, find a framework with more bells and whistles. Your admin console doesn't even need to be written in the same language as your app.)
Step 2: Data migrations (maintenance transformations of your database) should always be run from scripts checked into source control and tested on non-prod beforehand. The script could be as simple as mongo -e "foo.update(blah)", but you should run it as a script to avoid cut-n-paste errors. Ideally, you would even have a checklist for all migrations. (Check that you have a recent backup. Check the database log and system load beforehand. Write a before and after query that will tell you if the migration was successful...)
Step 3: You now no longer need to use the production Mongo console. So don't. It's a useful tool for development, but that's only needed on local development databases.
The above-mentioned Roles might be useful for read-only queries. But you can already do that against the non-master replica set member.
tl;dr: You can go pretty far using cowboy admin techniques, but eventually you're going to figure out that it's better (and not much more work) to automate everything.
There is nothing you can do in the current version to provide this functionality.
In a future version when user defined roles are available you could define a role which allows insert() and update() but not remove() or drop() etc. and therefore make yourself log-in as a different higher-role user, but that's not available in the current (2.4) version.

New/Read Flags in CQRS

I am currently drafting a concept for a (mostly) HTML-based collaboration suite which I plan to implement using CQRS. This software will contain messages that can be sent to the user (which can either be read or unread, obviously) and other elements which shall be marked "new" if they were created after the last user login.
Hardly something new, but I am not quite sure how that would be correctly implemented using CQRS. As I understand it, Change of any kind should, without exception, only be possible via Commands. But creating commands for every single (new) element that is being accessed seems a bit too much, not to mention the overhead.
I don't know if I need it, but what would be the best way to implement a Last-Accessed Timestamp on elements. Basically the same problem like the above, with the difference that the change happens EVERY time the element is accessed, not only the first time for each user.
CQRS seems to be an awesome concept but it really needs more learning material. Can't wait till a book is released :)
Regards
[Edit] No one? Wouldn't have thought that this is such a complicated issue..
I assume you're using event-sourcing in which case once you allow your query-service/event-handlers to raise appropriate events then this becomes fairly easy to solve.
For your messages/elements; when handling the specific creation events of your elements either add to existing or create additional event-handlers, to store to a messages read-model with a status of new and appropriate information about the element.
As part of you're user login I don't see why you can't raise a user-logged-in event (from the security/query service depending on how your implementing authentication) to say the user has logged in. An event-handler could capture this and write the last-login timestamp to a specific user-last-login read-model.
In addition the user-logged-in event-handler would need to update all the new messages (for that user) to an unread status. Seeing as we're changing the status of the messages as the user logs in do you still need to store the last-login timestamp?
For your last-accessed timestamp, perhaps you could just work this into your query service as queries for your different elements complete. Raise a query-completed event with element id/type information.

Core Data with Web Services recommended pattern?

I am writing an app for iOS that uses data provided by a web service. I am using core data for local storage and persistence of the data, so that some core set of the data is available to the user if the web is not reachable.
In building this app, I've been reading lots of posts about core data. While there seems to be lots out there on the mechanics of doing this, I've seen less on the general principles/patterns for this.
I am wondering if there are some good references out there for a recommended interaction model.
For example, the user will be able to create new objects on the app. Lets say the user creates a new employee object, the user will typically create it, update it and then save it. I've seen recommendations that updates each of these steps to the server --> when the user creates it, when the user makes changes to the fields. And if the user cancels at the end, a delete is sent to the server. Another different recommendation for the same operation is to keep everything locally, and only send the complete update to the server when the user saves.
This example aside, I am curious if there are some general recommendations/patterns on how to handle CRUD operations and ensure they are sync'd between the webserver and coredata.
Thanks much.
I think the best approach in the case you mention is to store data only locally until the point the user commits the adding of the new record. Sending every field edit to the server is somewhat excessive.
A general idiom of iPhone apps is that there isn't such a thing as "Save". The user generally will expect things to be committed at some sensible point, but it isn't presented to the user as saving per se.
So, for example, imagine you have a UI that lets the user edit some sort of record that will be saved to local core data and also be sent to the server. At the point the user exits the UI for creating a new record, they will perhaps hit a button called "Done" (N.B. not usually called "Save"). At the point they hit "Done", you'll want to kick off a core data write and also start a push to the remote server. The server pus h won't necessarily hog the UI or make them wait till it completes -- it's nicer to allow them to continue using the app -- but it is happening. If the update push to server failed, you might want to signal it to the user or do something appropriate.
A good question to ask yourself when planning the granularity of writes to core data and/or a remote server is: what would happen if the app crashed out, or the phone ran out of power, at any particular spots in the app? How much loss of data could possibly occur? Good apps lower the risk of data loss and can re-launch in a very similar state to what they were previously in after being exited for whatever reason.
Be prepared to tear your hair out quite a bit. I've been working on this, and the problem is that the Core Data samples are quite simple. The minute you move to a complex model and you try to use the NSFetchedResultsController and its delegate, you bump into all sorts of problems with using multiple contexts.
I use one to populate data from your webservice in a background "block", and a second for the tableview to use - you'll most likely end up using a tableview for a master list and a detail view.
Brush up on using blocks in Cocoa if you want to keep your app responsive whilst receiving or sending data to/from a server.
You might want to read about 'transactions' - which is basically the grouping of multiple actions/changes as a single atomic action/change. This helps avoid partial saves that might result in inconsistent data on server.
Ultimately, this is a very big topic - especially if server data is shared across multiple clients. At the simplest, you would want to decide on basic policies. Does last save win? Is there some notion of remotely held locks on objects in server data store? How is conflict resolved, when two clients are, say, editing the same property of the same object?
With respect to how things are done on the iPhone, I would agree with occulus that "Done" provides a natural point for persisting changes to server (in a separate thread).

Best strategy for synching data in iPhone app

I am working on a regular iPhone app which pulls data from a server (XML, JSON, etc...), and I'm wondering what is the best way to implement synching data. Criteria are speed (less network data exchange), robustness (data recovery in case update fails), offline access and flexibility (adaptable when the structure of the database changes slightly, like a new column). I know it varies from app to app, but can you guys share some of your strategy/experience?
For me, I'm thinking of something like this:
1) Store Last Modified Date in iPhone
2) Upon launching, send a message like getNewData.php?lastModifiedDate=...
3) Server will process and send back only modified data from last time.
4) This data is formatted as so:
<+><data id="..."></data></+> // add this to SQLite/CoreData
<-><data id="..."></data></-> // remove this
<%><data id="..."><attribute>newValue</attribute></data></%> // new modified value
I don't want to make <+>, <->, <%>... for each attribute as well, because it would be too complicated, so probably when receive a <%> field, I would just remove the data with the specified id and then add it again (assuming id here is not some automatically auto-incremented field).
5) Once everything is downloaded and updated, I will update the Last Modified Date field.
The main problem with this strategy is: If the network goes down when I am updating something => the Last Modified Date is not yet updated => next time I relaunch the app, I will have to go through the same thing again. Not to mention potential inconsistent data. If I use a temporary table for update and make the whole thing atomic, it would work, but then again, if the update is too long (lots of data change), the user has to wait a long time until new data is available. Should I use Last-Modified-Date for each of the data field and update data gradually?
I would start by making the update routine atomic, since you'll have enough on your hands figuring out how to get the client-server communication working properly.
After that is a good time to consider tweaking it to be incremental, but only after you do some testing to figure out if it's really necessary. If you're tuning your update protocol to be as low bandwidth as possible, you might discover that even a "big" update is downloaded fast enough.
Another way to look at it is to ask yourself, how often is there going to be network trouble when an average user is doing a sync? You probably don't want to tune for unlikely scenarios.
If you are trying to optimize (minimize) the data transfer you may want to consider a different format than XML, since XML is fairly verbose. Or at least you may want to trade in XML readability for space by making each element name and attribute as small as possible, and eliminate all unnecessary whitespace.
Your basic scheme is good. The thing you need to do is to somehow make your updates idempotent so that you can restart a partially-completed transfer without risk. This is a better way to go than to try to implement some sort of true atomic commit (though you could do that too, using, eg, the SQLite database).
In our experience fairly large updates (10s of KB) can be downloaded quite rapidly, if the server is fast enough. No great need to break updates up into tiny bits. But certainly it won't hurt to try to minimize the amount of data transferred by keeping more granular info on "last update".
(And definitely you should use JSON rather than XML as your transmitted data representation.)
Wonder if you have considered using a Sync Framework to manage the synchronization. If that interests you can take a look at the open source project, OpenMobster's Sync service. You can do the following sync operations
two-way
one-way client
one-way device
bootup
Besides that, all modifications are automatically tracked and synced with the Cloud. You can have your app offline when network connection is down. It will track any changes and automatically in the background synchronize it with the cloud when the connection returns. It also provides synchronization like iCloud across multiple devices
Also, modifications in the Cloud are synched using Push notifications, so the data is always current even if it is stored locally.
In your case,
Criteria are speed (less network data exchange), robustness (data recovery in case update fails), offline access
Speed: Only the changes are sent across the network in both directions
Robustness: It stores data in a transactional store like sqlite and any failed updates are communicated in the SyncML payload. Only the successful operations are processed while the failed operations are re-tried during the next sync
Here is a link to the open source project: http://openmobster.googlecode.com
Here is a link to iPhone App Sync: http://code.google.com/p/openmobster/wiki/iPhoneSyncApp