Caching in a microservice with multiple replicas in k8s

Caching in a microservice with multiple replicas in k8s - kubernetes

I've a Golang based micro-service which has an in-memory cache as follows:
Create object -> Put it in cache -> Persist
Update object -> Update the cache -> Persist
Get -> Get it from the cache
Delete -> Delete cache entry -> Remove from data store.
On a service re-start, the cache is populated from the data store.
The cache organizes the data in different ways that matches my access patterns.
Note that one client can create the object, and other clients can update it at a later point in time.
Everything works fine as long as I've one replica. But, this pattern will break when I increase the replica count in my deployment.
If I have to go to the DB for each GET, it defeats the purpose of the cache. The first thought is, to move the cache out. But, this seems like a fairly common problem when moving to multi-replica microservices. So, curious to understand alternatives.
Thanks for your time.

Mainly many things depends on how you structure your application.
One common solution is use Redis Cache or Distributed Cache. Here advantage is that your all services will go to same cache to manage object. This will give more consistent data.
Another approach that you can take and this will be some how more complex. Try to use sharding.
For Get Operation based on Id of object, you have to route request to specific instance. That instance will have that object in cache. If not then it read from db and put it in that instance cache. Eachtime for that object it will go that instance. This is applicable to Update and Delete operation.
For create operation.
If you want DB generate Id automatically for object then there is once chance object created in DB and then it return that Id and based on Id you have to route request and that way for first access after creation will be from DB but after that it will be in cache of that instance.
If you have provision that Id can be manually generated then during creation if you have to prefix Id with something that map to instance.
Note : In distributed system , there is no one solution. You always have to decide which approach works for you scenario.

Related

REST new ID with DDD Aggregate

This question seemed fool at first sight for me, but then I realized that I don't have a proper answer yet, and interestingly also didn't find good explanation about it in my searches.
I'm new to Domain Driven Design concepts, so, even if the question is basic, feel free to add any considerations to it.
I'm designing in Rest API to configure Server Instances, and I came up with a Aggregate called Instance that contains a List of Configurations, only one specific Configuration will be active at a given time.
To add a Configuration, one would call an endpoint POST /instances/{id}/configurations with the body on the desired configuration. In response, if all okay, it would receive a HTTP 204 with a Header Location containing the new Configuration ID.
I'm planning to have only one Controller, InstanceController, that would call InstanceService that would manipulate the Instance Aggregate and then store to the Repo.
Since the ID's are generated by the repository, If I call Instance.addConfiguration and then InstanceRepository.store, how would I get the ID of the newly created configuration? I mean, it's a List, so It's not trivial as calling Instance.configuration.identity
A option would implement a method in Instance like, getLastAddedConfiguration, but this seems really brittle.
What is the general approach in this situation?

the ID's are generated by the repository
You could remove this extra complexity. Since Configuration is an entity of the Instance aggregate, its Id only needs to be unique inside the aggregate, not across the whole application. Therefore, the easiest is that the Aggregate assigns the ConfigurationId in the Instance.addConfiguration method (as the aggregate can easily ensure the uniqueness of the new Id). This method can return the new ConfigurationId (or the whole object with the Id if necessary).
What is the general approach in this situation?
I'm not sure about the general approach, but in my opinion, the sooner you create the Ids the better. For Aggregates, you'd create the Id before storing it (maybe a GUID), for entities, the Aggregate can create it the moment of creating/adding the entity. This allows you to perform other actions (eg publishing an event) using these Ids without having to store and retrieve the Ids from the DB, which will necessarily have an impact on how you implement and use your repositories and this is not ideal.

hazelcast spring-data write-through

I am using Spring-Boot, Spring-Data/JPA with Hazelcast client/server topology. In parts of my test application, I am calculating time when performing CRUD operations on the client side (the server is the one interacting with a relational db). I configured the map(Store) to be write-behind by setting write-delay-seconds to 10.
Spring-Data's save() returns the persisted entity. In the client app, therefore, the application flow will be blocked until the (server) returns the persisted entity.
Would like to know is there is an alternative in which case the client does NOT have to wait for the entity to persist. Was under the impression that once new data is stored in the Map, persisting to the backed happens asynchronously -> the client app would NOT have to wait.
Map config in hazelast.xml:
<map name="com.foo.MyMap">
<map-store enabled="true" initial-mode="EAGER">
<class-name>com.foo.MyMapStore</class-name>
<write-delay-seconds>10</write-delay-seconds>
</map-store>
</map>
#NeilStevenson I don't find your response particularly helpful. I asked on an earlier post about where and how to generate the Map keys. You pointed me to the documentation which fails to shed any light on this topic. Same goes for the hazelcast (and other) examples.
The point of having the cache in the 1st place, is to avoid hitting the database. When we add data (via save()), we need to also generate an unique key for the Map. This key also becomes the Entity.Id in the database table. Since, again, its the hazelcast client that generates these Ids, there is no need to wait for the record to be persisted in the backend.
The only reason to wait for save() to return the persisted object would be to catch any exceptions NOT because of the ID.

That unfortunately is how it is meant to work, see https://docs.spring.io/spring-data/commons/docs/current/api/org/springframework/data/repository/CrudRepository.html#save-S-.
Potentially the external store mutates the saved entry in some way.
Although you know it won't do this, there isn't a variant on the save defined.
So the answer seems to be this is not currently available in the general purpose Spring repository definition. Why not raise a feature request for the Spring Data team ?

Service Fabric Actors - save state to database

I'm working on a sample Service Fabric project, where I have to maintain a shopping list. For this I have a ShoppingList actor, which is identifiable by a specific id. It stores the current list content in its state using StateManager. All works fine.
However, in parallel I'd like to maintain the shopping list content in a sql database. In particular:
store all add/remove item request for future analysis (ML)
on first actor initialization load list content from db (e.g. after cluster has been re-created)
What is the best approach to achieve that? Create a custom StateProvider (how? can't find examples)?
Or maybe have another service/actor for handling all db operations (possibly using queues and reminders)?
All examples seem to completely rely on default StateManager, with no data persistence to external storage, so I'm not sure what's the best practice.

The best way will be to have a separate entity responsible for storing data to DB. And actor will just send an event (not implying SF events) with some data about performed operation, and another entity will catch it and perform the rest of the work.
But of course you can implement this thing in actor itself, but it will bring two possible issues:
Actor will be not able to process other requests if there will be some issues with DB or connectivity between actor and DB or if there will be high loading of DB itself and it will process requests slowly. The actor would have to wait till transferring to DB successfully completes.
Possible overloading of DB with many single connections from many actors instead of one or several connection from another entity and batch insertion.
So, your final solution will depend on workload of your system. But definitely you will need a reliable queue to safely store data in DB if value of such data is too high to afford a loss.
Also, I think you could use default state manager to store logs and information about transactions before it will be transferred to DB and remove from service's state after transaction completes. There is no need to have permanent storage of such data in services.
And another things to take into consideration — reading from DB. Probably, if you have relationship database and will update with new records only one table + if there will be huge amount of actors that will query such data on activation, you will have performance degradation as this table will be locked for reading or writing if you will not configure it to behave differently. So, probably, you will need caching system to read data for actors activation — depends on your workload.
And about implementing your custom State Manager: take a look at this example. Basically, all you need to do is to implement IReliableStateManagerReplica interface and pass it to StatefullService constructor.

Querying a list of Actors in Azure Service Fabric

I currently have a ReliableActor for every user in the system. This actor is appropriately named User, and for the sake of this question has a Location property. What would be the recommended approach for querying Users by Location?
My current thought is to create a ReliableService that contains a ReliableDictionary. The data in the dictionary would be a projection of the User data. If I did that, then I would need to:
Query the dictionary. After GA, this seems like the recommended approach.
Keep the dictionary in sync. Perhaps through Pub/Sub or IActorEvents.
Another alternative would be to have a persistent store outside Service Fabric, such as a database. This feels wrong, as it goes against some of the ideals of using the Service Fabric. If I did, I would assume something similar to the above but using a Stateless service?
Thank you very much.

I'm personally exploring the use of Actors as the main datastore (ie: source of truth) for my entities. As Actors are added, updated or deleted, I use MassTransit to publish events. I then have Reliable Statefull Services subscribed to these events. The services receive the events and update their internal IReliableDictionary's. The services can then be queried to find the entities required by the client. Each service only keeps the entity data that it requires to perform it's queries.
I'm also exploring the use of EventStore to publish the events as well. That way, if in the future I decide I need to query the entities in a new way, I could create a new service and replay all the events to it.
These Pub/Sub methods do mean the query services are only eventually consistent, but in a distributed system, this seems to be the norm.

While the standard recommendation is definitely as Vaclav's response, if querying is the exception then Actors could still be appropriate. For me whether they're suitable or not is defined by the normal way of accessing them, if it's by key (presumably for a user record it would be) then Actors work well.
It is possible to iterate over Actors, but it's quite a heavy task, so like I say is only appropriate if it's the exceptional case. The following code will build up a set of Actor references, you then iterate over this set to fetch the actors and then can use Linq or similar on the collection that you've built up.
ContinuationToken continuationToken = null;
var actorServiceProxy = ActorServiceProxy.Create("fabric:/MyActorApp/MyActorService", partitionKey);
var queriedActorCount = 0;
do
{
var queryResult = actorServiceProxy.GetActorsAsync(continuationToken, cancellationToken).GetAwaiter().GetResult();
queriedActorCount += queryResult.Items.Count();
continuationToken = queryResult.ContinuationToken;
} while (continuationToken != null);
TLDR: It's not always advisable to query over actors, but it can be achieved if required. Code above will get you started.

if you find yourself needing to query across a data set by some data property, like User.Location, then Reliable Collections are the right answer. Reliable Actors are not meant to be queried over this way.
In your case, a user could simply be a row in a Reliable Dictionary.

Dependency between data store

TL;DR
What's the best way to handle dependency between types of data that is loaded asynchronously from different backend endpoints?
Problem
My app fetches data from a backend, for each entity I have an endpoint to fetch all instances.
For example api.myserver.com/v1/users for User model and api.myserver.com/v1/things for Thing model.
This data is parsed and placed into data store objects (e.g. UserDataStore and ThingDataStore) that serve these models to the rest of the app.
Question
What should I do if the data that comes from /things depends on data that comes from /users and the fetch operations are async. In my case /things returns the id of a user that created them. This means that if /things returns before /users, then I won't have enough data to create the Thing model.
Options
Have /things return also relevant /users data nested.
This is bad because:
I'll then have multiple model instances User for the same actual user - one that came from /users and one that came nested in /things.
Increases the total payload size transferred.
In a system with some permission policy, data that is returned for /users can be different to /things, and then it'll allow partially populated models to be in the app.
Create an operational dependency between the two data stores, so that ThingsDataStore will have to wait for UserDataStore to be populated before it attempts to load its own data.
This is also bad because:
Design-wise this dependency is not welcome.
Operational-wise, it will very quickly become complicated once you throw in another data stores (e.g. dependency cycles, etc).
What is the best solution for my problem and in general?
This is obviously not platform / language dependent.

I see two possible solutions:
Late initialization of UserDataStore in ThingDataStore. You will have to allow for creation an object that is not fully valid. And you will also need to add method that will give you an information whether UserDataStore is initialized or not. Not perfect, because for some time there will exists an invalid instance.
Create some kind of proxy or maybe a buider object for ThingDataStore that will hold all information about particular thing and will create ThingDataStore object as soon as UserDataStore related with this instance will be received.
Maybe it will help you. Good luck!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse