How to evolve akka-persistence events in production? - scala

Let's say we have design our system using akka-persistence. Now, we have events stored in the event-store. While, the system in production, a new feature was requested. As a result, we find the best way to go about it is to add or modify a field in an event. Let's say by changing the field name or type.
Now, we have two versions of the event, the one we have in production, and the one in the new deployment, which are not compatible. We will fail if we try to recover data from the old version.
What is the best way to go about that, other than data migration?

This is definitely one of the bigger issues with using akka persistence in production. There has been a lot of discussion about this on the akka-user mailing list.
I would say that as long as the new feature requires just additional information, going with a serialization format that allows limited schema evolution such as google protocol buffers or json would be a solution.
If the new feature requires changing the existing data, there is nothing you can do but to do data migration.

Related

Distributed database which allows custom CRDT merging

I‘m rather new to distributed databases, though I have already studied related literature (e.g. CAP theorem, CRDT) and implemented some POC to allow scaling my application horizontally.
Now I however face a challenging problem. In ordere to scale the app horizontally, communication between services is done via a distributed queue. As a background here, I do require a custom CRDT method to keep the data eventually consistent, and I do require my application to work like a cache (remotely related to REDIS).
The challenge is now that I also need to persist the data. That requires me to keep the data within the application cache and database eventually consistent. I‘ve checked Cassandra, I saw a ticket [1] where somebody tried to add functionality for custom CRDT merge functionality (which as I mentioned do require for a reason). That never made it into Cassandra, and seems to have a few issues to resolve.
What are my options, either in form of a concrete distributed database engine allowing custom merging, or an algorithm that could help solve the problem (e.g. in form of a db trigger or something like this).
[1] https://issues.apache.org/jira/browse/CASSANDRA-6412
As far as I know, there are very few databases that allow you to specify your own custom conflict resolution algorithms. Tbh. the only one I really found - disclaimer: I'm not a Microsoft Advocate - is Azure CosmosDB. It has MongoDB-compatible API and can be configured to use master-master replication strategy, where you need to specify your own conflict resolution algorithm (using JavaScript). You can use it to define your own merge operation.
If you'll take a look outside of database-native solutions into application-level ones, there are several tools, like ie. Akka (available in both JVM or .NET version) which enables you to write custom CRDTs inside of distributed-data module. JVM version additionally supports multi-datacenter persistence, which is conceptually closer to how commutative CRDTs work and can be integrated with Cassandra backend.
I've implemented a MerkleClock CRDT at my merkle-crdt repository.
You could use an approach that when you update the database record column, you fetch the column's value and then you merge it with your CRDT of your current state and then when you save, you serialise the CRDT as JSON and store it in the database.

Keeping track of changed properties in JPA

Currently, I'm working on a Java EE project with some non-trivial requirements regarding persistence management. Changes to entities by users first need to be applied to some working copy before being validated, after which they are applied to the "live data". Any changes on that live data also need to have some record of them, to allow auditing.
The entities are managed via JPA, and Hibernate will be used as provider. That is a given, so we don't shy away from Hibernate-specific stuff. For the first requirement, two persistence units are used. One maps the entities to the "live data" tables, the other to the "working copy" tables. For the second requirement, we're going to use Hibernate Envers, a good fit for our use-case.
So far so good. Now, when users view the data on the (web-based) front-end, it would be very useful to be able to indicate which fields were changed in the working copy compared to the live data. A different colour would suffice. For this, we need some way of knowing which properties were altered. My question is, what would be a good way to go about this?
Using the JavaBeans API, a PropertyChangeListener could suffice to be notified of any changes in an entity of the working copy and keep a set of them. But the set would also need to be persisted, since the application could be restarted and changes can be long-lived before they're validated and applied to the live data. And applying the changes on the live data to obtain the working copy every time it is needed isn't feasible (hence the two persistence units).
We could also compare the working copy to the live data and find fields that are different. Some introspection and reflection code would suffice, but again that seems rather processing-intensive, not to mention the live data would need to be fetched.
Maybe I'm missing something simple, or someone know of a wonderful JPA/Hibernate feature I can use. Even if I can't avoid making (a) separate database table(s) for storing such information until it is applied to the live data, some best-practices or real-life experience with this scenario could be very useful.
I realize it's a semi-open question but surely other people must have encountered a requirement like this. Any good suggestion is appreciated, and any pointer to a ready-made solution would be a good candidate as accepted answer.
Maybe you can use the Hibernate flush entity event listener. The dirty properties are calculated before the flush. You can store them somewhere in your database.
A sample code of using the dirty properties feature of Hibernate which may give you an idea.

Is Entity Framework also persistance layer?

OK. I know that Entity Framework is ORM. We use it for mapping data from database to object model, and from objects to relational data. But where it fits in a context of persistance layer? Can we say that persistance layer is also Entity Framework?
I would say - No! There are a lot of articles about this topic. But in general you don't want your object-relational mapper to be data-persistent. In fact exactly the opposite, keeping it persistent ignorant you can benefit by using your data classes with different types of data providers such as relational databases, web services, XML files and what not.
To keep data persistence you may take advantage of different design patterns like Repository pattern and Unit Of Work so you can really decouple you business layer from your data layer.
Ok, to make myself clear since it's very difficult through comments, here's an update to what I wanted to explain. Please have in mind that this is just my interpretation, and the way I'm using EF, I've been using it in different projects (desktop and web) but it's not universal, but still covers a lot of the most common scenarios.
So since I'm a big fan of Code First I'll write from this prespective. The Database Model is where your entities lies. Later on based on those entities the EF will generate your database. So what is important on this stage of development - you want to have you database normalized and you want all navigation properties set correctly. Not so trivial tasks as it may seems but that it's, you just care about how efficient your database will be.
Now comes the tricky moment somehow you should deliver you data to the business layer and it's true - as far as we are talking only about data from a database using repository is very arguable. However even then the one advantage that you get when having this Repository between the data and the business logic is that you don't have to take in mind the business needs while creating the data model, and after that this doesn't make it any harder to use your data from inside the business layer even though what exactly will your front end looks like at the time you create the database model.
So at this point let's consider again the example case where in you Database Model you have those two entities - Customers and Orders. When a user log in into your application and wants to see his orders you need to join two tables in order to provide the front end the information that it needs. Option 1 - you don't have a Repository and you are using the DbContext directly from the method that returns the data. That means two things - you gonna have to write the same code everywhere you need to get this specific piece of information and 2 - if the business requirements change and in the same view that since now was used to show a customer and his orders now you have to show some additional info which is taken, let's say from a third table, then what happens - you have to go to each place where you use this view and change the way you retrieve the data. And option 2 - you have Repository, all your methods for accessing data are stored there and the Business Layer is completely ignorant about the way it get's the data, the Database Model is also ignorant about the needs of the business model which lead to loose coupling and only one place where you gonna have to make changes if you have to. In the scenario above, if you indeed use Repository and in your repository you have method called GetUserOrders() and inside this method you make the database call, the joins and so on, and all that the Business layer needs to do to get the data in the proper way is call this method when the requirements change and you have to include one more table, this time you don't have to look for all the places where you are using this data, you just have to modify one method and that's all.
It's pretty much the same logic on the way back. When you have some complex data returned from your front end and you want to save/update the old data with the new one, again - you can do it from the business layer but it leads to the same problem as when you have to get data, instead - you just pass the complex data to another Repository method which knows how to deal with it (say maybe some of the data should be saved directly into database and other should be used to feed a web service or whatever scenario comes to your mind) and here again - when something change, like - you want to use more heavily web services or the opposite, you want to migrate to more database centric design, all you have to do is change the method that takes care about the data the is concerned with this changes and nothing more.
So even though when I'm writing this I can see that DbContext can very well act as a repository and in this regard also as a data persistent layer, there are still some valid reason to not let this happen. Especially right now when the web services are more and more popular, WebAPI2 is out and RESTFull services are frequently used I think that leaving the EF as persistent ignorant as possible is the way to go.
But yet again, this is my opinion. There are a lot of articles on this topic so I urge you to google and read about it, since I think this is very important part form the architecture of every application.
P.S
In response to your comment which was written while I was writing my edited answer:
If I change data source I need to make changes in DAL anyway or in my example in repostitory. - the answer is yes. But there is no way tho change the data source without changing the DAL. The question is how easy will be to do that. I think the with what I've written already you can decide for yourself which way is better but just because I really think this is one of the few really strong arguments of leaving the EF persistent ignorant all write it again. When you have Repository and there are methods which take care for data manipulation, every time something related with the way the data is fetched affects only those methods and nothing else. If you use the context freely, in your business layer even a little change may cause you a lot of trouble just because it always possible to miss something, you have to go through the entire code to make sure that you have fixed all places and it's just not as efficient as having all in one place.

iPhone app with CoreData

I am planning to create an iphone app which uses CoreData. There might be enhancements added later as new versions of the app.
My question is;
When using CoreData, what are the factors to keep in mind to ensure if the user upgrades the version, his previous data remains intact ? Like I heard we should keep the.sqlite file name same. What are other factors to keep in mind while releasing Core Data apps?
Thank you.
Data migration concepts are important to understand if you're going to maintain it over time, since you're likely to want to change at least some things eventually.
The ideal is Lightweight Migration, where minor conversion from your old data model to your new one is automatic. As noted in the document, it can take care of itself if your changes are:
Simple addition of a new attribute
A non-optional attribute becoming optional
An optional attribute becoming non-optional, and defining a default value
Renaming an entity or an attribute is also easy and nearly automatic.
Everything beyond that -- new or removed entities, new or removed or changed relationships -- is hairier. It's not incredibly difficult, but it's definitely more work, with more room for failure.
As such, a little speculation about likely potential changes may make it easier and more efficient to provide a little wiggle room in advance. Obviously if you do too much, especially with theoretical-but-currently-unused relationships, you're likely slowing down the current system and potentially for no reason.
Worth consideration.
One thing we have done is to manage two separate core data databases.
First, a "read-only" core data database that gets supplied with app updates (assuming you want to be sending data with the app, if not then don't bother with this part).
Second, a local core data database (data store) that's stored on the phone that is initially populated with the data from the first, and then added to by the user or with updates from a server that you control. This second core data store can stay persistent between updates.
For later modification and updates you have two options. You can add additional features in a new core data store as long as you don't need to get at the new data at the same time as the old data. The other option is to use apple's core data migration stuff which you can read more about here.
Here are also some additional resources for gearing up with core data, there are plenty of more specific core data examples on SO.
Finally, if you plan on significantly adding/modifying your core data store I'd suggest looking into SQLite. That's easier to change with updates (in my experience) than migrating an existing core data store to a new schema, especially if the schema changes often.

How do we share data between two different services

I am currently working on a web service which is periodically polled. It does not store its state and is instantiated everytime it is queried. Essentially, it retrieves the state of other external entities e.g. databases and delivers it back to the requester.
Recently, the need to store state as arisen in that
There is the need to continously collect data from a particular source and store the bits that are important/relevant
There is the need to collect the aggregate of a particular data source over a period of time
I came up with the following idea:
My main concern here is the fact that I am using a static class (essentially a global) to share data between the two services. Is there a better way to doing this?
edit: Thanks for the responses thus far. Apologies for the vaguesness of this question: just trying to work out what is the best way to share data across different services and am unsure as to the specifics (i.e. what is required). The platform that I am developing on is the .NET framework and both services are simply WCF services hosted as a Windows service.
The database route sounds like the most conventional way to go - however I am reluctant to go down that path for now (mainly for deployment/setup issues; it introduces the need to create new tables, etc in addition to simply installing the software) for at this point the transfer of relatively small amounts of data. This may of course change in the future and going the database route might be the way to go at that point.
Is there any other way besides adding a database persistance layer?
If you need to collect and aggregate data, you might want to consider using a database between the two layers. Or have I misunderstood something?
You should consider enhancing your question with more requirements: pretty much all options are open here.
Sure - how about data binding? I don't have a lot of information to go on here - about your platform but most sufficiently advanced systems offer it in some form.
You could replace your static shared data with some database representation, with a caching layer (like memcached) between the database and the webservice, so that most of the time the data is available very quickly from the cache, but can be retrieved from the database as needed.
I appreciate that you want to keep the architecture simple. Depending on the magnitude of items you have to look up and there permanency, you might just consider leveraging your file system or a message queue. It sounds like you want a file system, because that sounds the least amount of impact to your design.
If you start dealing with tens of thousands of small files, your directories could get hard to navigate and slow to do file lookups on. I typically shoot for about 1000 - 10000 files per directory, and concoct a routine that can generate a path to the file depending on the file name pattern. Keeping the number of subdirectories even is important, some file systems have a limit on the number of subdirectories in a parent directory.