How to keep thread safe with multiple pods of a Spring Data Reactive Repository microservice - mongodb

I have a microservice to wrap the access to a MongoDB (DaaS). It is implemented with Spring Data Reactive Repository (ReactiveMongoRepository).
I have deployed within a docker image, running on Kubernetes (in Google Cloud). I have configured the orchestrator to keep a minimum of 2 pods of my microservice (and a maximum of 4).
In other microservice, I have implemented a batch process with multithreading, where I call my daas, with the following sequence:
findById
Modify some fields (including increments of counters)
save
Here is the relevant code:
public Mono<Element> updateElement(String id) {
return this.daasClient.findById(id)
.map(elem -> modify(elem))
.flatMap(elem -> this.daasClient.save(elem));
}
When there are lots of operations, each pod runs some of then (including the previuos code), so I have seen that the access to the resource (Mongo) is not thread safe, so the resut is not as expected.
I guess, 2 pods run the findById simultaneously, so the update is not to the "real" document, so, the last in invoking the save method overrides the changes of the other.
Anybody know how I could do to avoid this, i.e., to implement this operation in thread-safe (pod-safe) way?
Thanks

Related

Caching in a microservice with multiple replicas in k8s

I've a Golang based micro-service which has an in-memory cache as follows:
Create object -> Put it in cache -> Persist
Update object -> Update the cache -> Persist
Get -> Get it from the cache
Delete -> Delete cache entry -> Remove from data store.
On a service re-start, the cache is populated from the data store.
The cache organizes the data in different ways that matches my access patterns.
Note that one client can create the object, and other clients can update it at a later point in time.
Everything works fine as long as I've one replica. But, this pattern will break when I increase the replica count in my deployment.
If I have to go to the DB for each GET, it defeats the purpose of the cache. The first thought is, to move the cache out. But, this seems like a fairly common problem when moving to multi-replica microservices. So, curious to understand alternatives.
Thanks for your time.
Mainly many things depends on how you structure your application.
One common solution is use Redis Cache or Distributed Cache. Here advantage is that your all services will go to same cache to manage object. This will give more consistent data.
Another approach that you can take and this will be some how more complex. Try to use sharding.
For Get Operation based on Id of object, you have to route request to specific instance. That instance will have that object in cache. If not then it read from db and put it in that instance cache. Eachtime for that object it will go that instance. This is applicable to Update and Delete operation.
For create operation.
If you want DB generate Id automatically for object then there is once chance object created in DB and then it return that Id and based on Id you have to route request and that way for first access after creation will be from DB but after that it will be in cache of that instance.
If you have provision that Id can be manually generated then during creation if you have to prefix Id with something that map to instance.
Note : In distributed system , there is no one solution. You always have to decide which approach works for you scenario.

How do I seed default data to Mongo db (or any database) in a microservice architecture?

I have a use case where there are multiple microservices and one of them deals with roles and resources(let's call this microservice as A). Resources are just endpoints.
A maintains a collection(let's call this X) to store all the resources from different microservices. For each microservice other than A, I would like to store all of its resources(endpoints) into X the first time this microservice boots up.
I am thinking of having a json file with all the resources in each microservice and calling A's endpoint to add resources whenever a microservice boots up.
Is there any idiomatic way to do this?
Consider the use of Viper so you can set default data from multiple different sources like yaml, json, remote config like etcd, live watch of files among others. You can algo configure the call to and endpoint with it's remote configuration feature.

Getting "java.lang.IllegalStateException:Was expecting to find transaction set on current strand" while running nodes from terminal

There is a question on stackoverflow, but in my case I run the nodes from console: deployNodes, runnodes. So there is no StartedMockNode class to use transaction{} function
What’s wrong with it and how can I fix it?
Here is the method throwing the exception
serviceHub.withEntityManager {
persist(callbackData)
}
Debugged this through with Hayk on Slack.
DB transactions are handled by corda. These transactions are only generated at two points. During node startup so corda services can invoke database queries and inserts and inside of flows.
In this scenario, the database was being accessed from outside of node startup and a not during a flow's invocation.
To circumvent this, a new flow needs to be created that handles the db operations that were causing the error. The db operation can still be kept inside of a corda service but it must be called from a flow.
This flow does not need a responder. It should be annotated with #StartableByService and should not need #InitiatingFlow (need to double check that one). The contents of call simply call the db operation and return the result back to the caller.
TLDR - all db operations must be called from inside a flow or during node startup.

hazelcast spring-data write-through

I am using Spring-Boot, Spring-Data/JPA with Hazelcast client/server topology. In parts of my test application, I am calculating time when performing CRUD operations on the client side (the server is the one interacting with a relational db). I configured the map(Store) to be write-behind by setting write-delay-seconds to 10.
Spring-Data's save() returns the persisted entity. In the client app, therefore, the application flow will be blocked until the (server) returns the persisted entity.
Would like to know is there is an alternative in which case the client does NOT have to wait for the entity to persist. Was under the impression that once new data is stored in the Map, persisting to the backed happens asynchronously -> the client app would NOT have to wait.
Map config in hazelast.xml:
<map name="com.foo.MyMap">
<map-store enabled="true" initial-mode="EAGER">
<class-name>com.foo.MyMapStore</class-name>
<write-delay-seconds>10</write-delay-seconds>
</map-store>
</map>
#NeilStevenson I don't find your response particularly helpful. I asked on an earlier post about where and how to generate the Map keys. You pointed me to the documentation which fails to shed any light on this topic. Same goes for the hazelcast (and other) examples.
The point of having the cache in the 1st place, is to avoid hitting the database. When we add data (via save()), we need to also generate an unique key for the Map. This key also becomes the Entity.Id in the database table. Since, again, its the hazelcast client that generates these Ids, there is no need to wait for the record to be persisted in the backend.
The only reason to wait for save() to return the persisted object would be to catch any exceptions NOT because of the ID.
That unfortunately is how it is meant to work, see https://docs.spring.io/spring-data/commons/docs/current/api/org/springframework/data/repository/CrudRepository.html#save-S-.
Potentially the external store mutates the saved entry in some way.
Although you know it won't do this, there isn't a variant on the save defined.
So the answer seems to be this is not currently available in the general purpose Spring repository definition. Why not raise a feature request for the Spring Data team ?

Service Fabric Actors - save state to database

I'm working on a sample Service Fabric project, where I have to maintain a shopping list. For this I have a ShoppingList actor, which is identifiable by a specific id. It stores the current list content in its state using StateManager. All works fine.
However, in parallel I'd like to maintain the shopping list content in a sql database. In particular:
store all add/remove item request for future analysis (ML)
on first actor initialization load list content from db (e.g. after cluster has been re-created)
What is the best approach to achieve that? Create a custom StateProvider (how? can't find examples)?
Or maybe have another service/actor for handling all db operations (possibly using queues and reminders)?
All examples seem to completely rely on default StateManager, with no data persistence to external storage, so I'm not sure what's the best practice.
The best way will be to have a separate entity responsible for storing data to DB. And actor will just send an event (not implying SF events) with some data about performed operation, and another entity will catch it and perform the rest of the work.
But of course you can implement this thing in actor itself, but it will bring two possible issues:
Actor will be not able to process other requests if there will be some issues with DB or connectivity between actor and DB or if there will be high loading of DB itself and it will process requests slowly. The actor would have to wait till transferring to DB successfully completes.
Possible overloading of DB with many single connections from many actors instead of one or several connection from another entity and batch insertion.
So, your final solution will depend on workload of your system. But definitely you will need a reliable queue to safely store data in DB if value of such data is too high to afford a loss.
Also, I think you could use default state manager to store logs and information about transactions before it will be transferred to DB and remove from service's state after transaction completes. There is no need to have permanent storage of such data in services.
And another things to take into consideration — reading from DB. Probably, if you have relationship database and will update with new records only one table + if there will be huge amount of actors that will query such data on activation, you will have performance degradation as this table will be locked for reading or writing if you will not configure it to behave differently. So, probably, you will need caching system to read data for actors activation — depends on your workload.
And about implementing your custom State Manager: take a look at this example. Basically, all you need to do is to implement IReliableStateManagerReplica interface and pass it to StatefullService constructor.