hazelcast spring-data write-through - spring-data

I am using Spring-Boot, Spring-Data/JPA with Hazelcast client/server topology. In parts of my test application, I am calculating time when performing CRUD operations on the client side (the server is the one interacting with a relational db). I configured the map(Store) to be write-behind by setting write-delay-seconds to 10.
Spring-Data's save() returns the persisted entity. In the client app, therefore, the application flow will be blocked until the (server) returns the persisted entity.
Would like to know is there is an alternative in which case the client does NOT have to wait for the entity to persist. Was under the impression that once new data is stored in the Map, persisting to the backed happens asynchronously -> the client app would NOT have to wait.
Map config in hazelast.xml:
<map name="com.foo.MyMap">
<map-store enabled="true" initial-mode="EAGER">
<class-name>com.foo.MyMapStore</class-name>
<write-delay-seconds>10</write-delay-seconds>
</map-store>
</map>
#NeilStevenson I don't find your response particularly helpful. I asked on an earlier post about where and how to generate the Map keys. You pointed me to the documentation which fails to shed any light on this topic. Same goes for the hazelcast (and other) examples.
The point of having the cache in the 1st place, is to avoid hitting the database. When we add data (via save()), we need to also generate an unique key for the Map. This key also becomes the Entity.Id in the database table. Since, again, its the hazelcast client that generates these Ids, there is no need to wait for the record to be persisted in the backend.
The only reason to wait for save() to return the persisted object would be to catch any exceptions NOT because of the ID.

That unfortunately is how it is meant to work, see https://docs.spring.io/spring-data/commons/docs/current/api/org/springframework/data/repository/CrudRepository.html#save-S-.
Potentially the external store mutates the saved entry in some way.
Although you know it won't do this, there isn't a variant on the save defined.
So the answer seems to be this is not currently available in the general purpose Spring repository definition. Why not raise a feature request for the Spring Data team ?

Related

Caching in a microservice with multiple replicas in k8s

I've a Golang based micro-service which has an in-memory cache as follows:
Create object -> Put it in cache -> Persist
Update object -> Update the cache -> Persist
Get -> Get it from the cache
Delete -> Delete cache entry -> Remove from data store.
On a service re-start, the cache is populated from the data store.
The cache organizes the data in different ways that matches my access patterns.
Note that one client can create the object, and other clients can update it at a later point in time.
Everything works fine as long as I've one replica. But, this pattern will break when I increase the replica count in my deployment.
If I have to go to the DB for each GET, it defeats the purpose of the cache. The first thought is, to move the cache out. But, this seems like a fairly common problem when moving to multi-replica microservices. So, curious to understand alternatives.
Thanks for your time.
Mainly many things depends on how you structure your application.
One common solution is use Redis Cache or Distributed Cache. Here advantage is that your all services will go to same cache to manage object. This will give more consistent data.
Another approach that you can take and this will be some how more complex. Try to use sharding.
For Get Operation based on Id of object, you have to route request to specific instance. That instance will have that object in cache. If not then it read from db and put it in that instance cache. Eachtime for that object it will go that instance. This is applicable to Update and Delete operation.
For create operation.
If you want DB generate Id automatically for object then there is once chance object created in DB and then it return that Id and based on Id you have to route request and that way for first access after creation will be from DB but after that it will be in cache of that instance.
If you have provision that Id can be manually generated then during creation if you have to prefix Id with something that map to instance.
Note : In distributed system , there is no one solution. You always have to decide which approach works for you scenario.

Using spring data rest with postgresql and cache as redis

I have a simple model with repository configured persisting to postgresql. Using spring-data-rest, the api's are available out of the box for all the crud operations.
Now I want to introduce the caching with redis-6.0. So that any write(rest api's for POST or PUT, DELETE) operation, the model is persisted to db first and updated to the cache.
For the read operation(rest api GET), the item is looked into cache first, if available, then use that or else use spring-data-rest default behavior in this case i.e. find it in postgresql.
Write Operations (POST, PUT, DELETE):
Using the RepositoryEventHandler, HandleAfterCreate, HandleAfterDelete, HandleAfterSave events are subscribed and used to sync up the cache. This reasonably keep the cache to latest.
Read Operations(GET):
I do not see any event listener for read operation. Read is the only operation, that I want to bypass hitting the db as much as possible. But currently do not find a way to do this.
Please let me know, if there is a way to listen for the read operation and do cache lookup first.
Thanks.

How to keep state consistent across distributed systems

When building distributed systems, it must be ensured the client and the server eventually ends up with consistent view of the data they are operating on, i.e they never get out of sync. Extra care is needed, because network can not be considered reliable. In other words, in the case of network failure, client never knows if the operation was successful, and may decide to retry the call.
Consider a microservice, which exposes simple CRUD API, and unbounded set of clients, maintained in-house by the same team, by different teams and by different companies also.
In the example, client request a creation of new entity, which the microservice successfully creates and persists, but the network fails and client connection times out. The client will most probably retry, unknowingly persisting the same entity second time. Here is one possible solution to this I came up with:
Use client-generated identifier to prevent duplicate post
This could mean the primary key as it is, the half of the client and server -generated composite key, or the token issued by the service. A service would either persist the entity, or reply with OK message in the case the entity with that identifier is already present.
But there is more to this: What if the client gives up after network failure (but entity got persisted), mutates it's internal view of the entity, and later decides to persist it in the service with the same id. At this point and generally, would it be reasonable for the service just silently:
Update the existing entity with the state that client posted
Or should the service answer with some more specific status code about what happened? The point is, developer of the service couldn't really influence the client design solutions.
So, what are some sensible practices to keep the state consistent across distributed systems and avoid most common pitfalls in the case of network and system failure?
There are some things that you can do to minimize the impact of the client-server out-of-sync situation.
The first measure that you can take is to let the client generate the entity IDs, for example by using GUIDs. This prevents the server to generate a new entity every time the client retries a CreateEntityCommand.
In addition, you can make the command handing idempotent. This means that if the server receives a second CreateEntityCommand, it just silently ignores it (i.e. it does not throw an exception). This depends on every use case; some commands cannot be made idempotent (like updateEntity).
Another thing that you can do is to de-duplicate commands. This means that every command that you send to a server must be tagged with an unique ID. This can also be a GUID. When the server receives a command with an ID that it already had processed then it ignores it and gives a positive response (i.e. 200), maybe including some meta-information about the fact that the command was already processed. The command de-duplication can be placed on top of the stack, as a separate layer, independent of the domain (i.e. in front of the Application layer).

Service Fabric Actors - save state to database

I'm working on a sample Service Fabric project, where I have to maintain a shopping list. For this I have a ShoppingList actor, which is identifiable by a specific id. It stores the current list content in its state using StateManager. All works fine.
However, in parallel I'd like to maintain the shopping list content in a sql database. In particular:
store all add/remove item request for future analysis (ML)
on first actor initialization load list content from db (e.g. after cluster has been re-created)
What is the best approach to achieve that? Create a custom StateProvider (how? can't find examples)?
Or maybe have another service/actor for handling all db operations (possibly using queues and reminders)?
All examples seem to completely rely on default StateManager, with no data persistence to external storage, so I'm not sure what's the best practice.
The best way will be to have a separate entity responsible for storing data to DB. And actor will just send an event (not implying SF events) with some data about performed operation, and another entity will catch it and perform the rest of the work.
But of course you can implement this thing in actor itself, but it will bring two possible issues:
Actor will be not able to process other requests if there will be some issues with DB or connectivity between actor and DB or if there will be high loading of DB itself and it will process requests slowly. The actor would have to wait till transferring to DB successfully completes.
Possible overloading of DB with many single connections from many actors instead of one or several connection from another entity and batch insertion.
So, your final solution will depend on workload of your system. But definitely you will need a reliable queue to safely store data in DB if value of such data is too high to afford a loss.
Also, I think you could use default state manager to store logs and information about transactions before it will be transferred to DB and remove from service's state after transaction completes. There is no need to have permanent storage of such data in services.
And another things to take into consideration — reading from DB. Probably, if you have relationship database and will update with new records only one table + if there will be huge amount of actors that will query such data on activation, you will have performance degradation as this table will be locked for reading or writing if you will not configure it to behave differently. So, probably, you will need caching system to read data for actors activation — depends on your workload.
And about implementing your custom State Manager: take a look at this example. Basically, all you need to do is to implement IReliableStateManagerReplica interface and pass it to StatefullService constructor.

Spring Batch: reuse existing service as a reader

I want to reuse an existing, transactional,paginated service class, which retrieves the items using JPA from a database, inside a Spring batch job, as a reader. I want to do that instead of using directly the JpaPagingItemReader basically because the JPA query is more complex to build and the service already provides this functionality.
My question would be what are the things I should take into account when developing the Spring batch adapter over this service. Although the reference documentation http://docs.spring.io/spring-batch/trunk/reference/html/readersAndWriters.html#pagingItemReaders has a section on reusing existing services, it doesn't say anything regarding the constraints, if there are any, of using such a transactional service.
Now, I looked at the JpaPagingItemReader as an example for building the reader, and I came up with a couple of questions I couldn't find answers for netiher in the documentation or on stackoverflow, although this post https://stackoverflow.com/a/26549831/4473261 helped.
The first thing I noticed is that a new transaction is used by the JpaPagingItemReader for reading a page of data. The above post says that this new transaction is needed "so that features like retry and skip can be correctly performed.". I have also found this article related to the matter https://blog.codecentric.de/en/2012/03/transactions-in-spring-batch-part-3-skip-and-retry/ that says that "when a skippable exception occurs during reading, we just increase the skip count and keep the exception for a later call on the onSkipInRead method of the SkipListener, if configured. There’s no rollback". So I assume that the reader has to do any reading of the records in a new transaction so that if a rollback of the transaction started when the processing of the chunk began happened, then the reader is not affected. I am wondering if this is true and if in this case my adapter should create a new transaction, invoke the service inside that transaction and then commit the transaction, similarly to how the JpaPagingItemReader does it. If that's true though, I wonder why there isn't any template provided by the framework which creates the transaction, delegates to the service the actual call to retrieve the data and then commits the transaction.
Greetings,
Cristi
From a reader perspective, there really isn't much to be concerned about. You can see in our JmsItemReader which obviously works with a transactional store that we don't take any additional precautions within the ItemReader itself.
What really matters is how you configure your step. When configuring your step, you'll need to mark the reader as transactional so that Spring Batch handles rollback correctly. When Spring Batch reads items in a fault tollerant step, the default behavior is to buffer them so that they won't be re-read on failure (retry, skip, etc). However, since the items read from a transactional store are tied to the transaction (and therefore reset when the rollback occurs), you need to tell Spring Batch to not buffer the items as they are read.
To mark the ItemReader as transactional, you'll set the not-quite-well-named flag is-reader-transactional-queue to true. You can read more about configuring steps and transactions in the documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/configureStep.html