Is there a way to backup a single ReliableCollection in service fabric? - azure-service-fabric

I have three reliable collections defined in my StatefulService defined this way:
this.dataCount =
await StateManager.GetOrAddAsync<IReliableDictionary<string, int>>("count");
this.dataDictionary1 =
await StateManager
.GetOrAddAsync<IReliableDictionary<string, ResourceElement>>
("data1");
this.dataDictionary2 =
await StateManager
.GetOrAddAsync<IReliableDictionary<string, ResourceElement>>
("data2");
Now I would like to move collection data2 (its contents) to a separate StatefulService and I am not sure how to proceed.
Ideally, I am looking for a mechanism that would allow me to backup data2 (i don't mind if I have to add a method to my service) to a file and then resume the backup from that file into a different service.
Is there anything like this available?

Backups work at the service partition level. The only way to create a backup of a single collection, is to have a stateful service with 1 partition that holds only that collection.
What you can do, is restore backup(s) of your old service (service 1) to a new service (service 2), that you create just for the purpose of moving data from the data2 collection to another service (service 3). After that you delete service 2 and continue running service 3.

Related

Caching in a microservice with multiple replicas in k8s

I've a Golang based micro-service which has an in-memory cache as follows:
Create object -> Put it in cache -> Persist
Update object -> Update the cache -> Persist
Get -> Get it from the cache
Delete -> Delete cache entry -> Remove from data store.
On a service re-start, the cache is populated from the data store.
The cache organizes the data in different ways that matches my access patterns.
Note that one client can create the object, and other clients can update it at a later point in time.
Everything works fine as long as I've one replica. But, this pattern will break when I increase the replica count in my deployment.
If I have to go to the DB for each GET, it defeats the purpose of the cache. The first thought is, to move the cache out. But, this seems like a fairly common problem when moving to multi-replica microservices. So, curious to understand alternatives.
Thanks for your time.
Mainly many things depends on how you structure your application.
One common solution is use Redis Cache or Distributed Cache. Here advantage is that your all services will go to same cache to manage object. This will give more consistent data.
Another approach that you can take and this will be some how more complex. Try to use sharding.
For Get Operation based on Id of object, you have to route request to specific instance. That instance will have that object in cache. If not then it read from db and put it in that instance cache. Eachtime for that object it will go that instance. This is applicable to Update and Delete operation.
For create operation.
If you want DB generate Id automatically for object then there is once chance object created in DB and then it return that Id and based on Id you have to route request and that way for first access after creation will be from DB but after that it will be in cache of that instance.
If you have provision that Id can be manually generated then during creation if you have to prefix Id with something that map to instance.
Note : In distributed system , there is no one solution. You always have to decide which approach works for you scenario.

How to keep thread safe with multiple pods of a Spring Data Reactive Repository microservice

I have a microservice to wrap the access to a MongoDB (DaaS). It is implemented with Spring Data Reactive Repository (ReactiveMongoRepository).
I have deployed within a docker image, running on Kubernetes (in Google Cloud). I have configured the orchestrator to keep a minimum of 2 pods of my microservice (and a maximum of 4).
In other microservice, I have implemented a batch process with multithreading, where I call my daas, with the following sequence:
findById
Modify some fields (including increments of counters)
save
Here is the relevant code:
public Mono<Element> updateElement(String id) {
return this.daasClient.findById(id)
.map(elem -> modify(elem))
.flatMap(elem -> this.daasClient.save(elem));
}
When there are lots of operations, each pod runs some of then (including the previuos code), so I have seen that the access to the resource (Mongo) is not thread safe, so the resut is not as expected.
I guess, 2 pods run the findById simultaneously, so the update is not to the "real" document, so, the last in invoking the save method overrides the changes of the other.
Anybody know how I could do to avoid this, i.e., to implement this operation in thread-safe (pod-safe) way?
Thanks

How do I seed default data to Mongo db (or any database) in a microservice architecture?

I have a use case where there are multiple microservices and one of them deals with roles and resources(let's call this microservice as A). Resources are just endpoints.
A maintains a collection(let's call this X) to store all the resources from different microservices. For each microservice other than A, I would like to store all of its resources(endpoints) into X the first time this microservice boots up.
I am thinking of having a json file with all the resources in each microservice and calling A's endpoint to add resources whenever a microservice boots up.
Is there any idiomatic way to do this?
Consider the use of Viper so you can set default data from multiple different sources like yaml, json, remote config like etcd, live watch of files among others. You can algo configure the call to and endpoint with it's remote configuration feature.

Service Fabric Actors - save state to database

I'm working on a sample Service Fabric project, where I have to maintain a shopping list. For this I have a ShoppingList actor, which is identifiable by a specific id. It stores the current list content in its state using StateManager. All works fine.
However, in parallel I'd like to maintain the shopping list content in a sql database. In particular:
store all add/remove item request for future analysis (ML)
on first actor initialization load list content from db (e.g. after cluster has been re-created)
What is the best approach to achieve that? Create a custom StateProvider (how? can't find examples)?
Or maybe have another service/actor for handling all db operations (possibly using queues and reminders)?
All examples seem to completely rely on default StateManager, with no data persistence to external storage, so I'm not sure what's the best practice.
The best way will be to have a separate entity responsible for storing data to DB. And actor will just send an event (not implying SF events) with some data about performed operation, and another entity will catch it and perform the rest of the work.
But of course you can implement this thing in actor itself, but it will bring two possible issues:
Actor will be not able to process other requests if there will be some issues with DB or connectivity between actor and DB or if there will be high loading of DB itself and it will process requests slowly. The actor would have to wait till transferring to DB successfully completes.
Possible overloading of DB with many single connections from many actors instead of one or several connection from another entity and batch insertion.
So, your final solution will depend on workload of your system. But definitely you will need a reliable queue to safely store data in DB if value of such data is too high to afford a loss.
Also, I think you could use default state manager to store logs and information about transactions before it will be transferred to DB and remove from service's state after transaction completes. There is no need to have permanent storage of such data in services.
And another things to take into consideration — reading from DB. Probably, if you have relationship database and will update with new records only one table + if there will be huge amount of actors that will query such data on activation, you will have performance degradation as this table will be locked for reading or writing if you will not configure it to behave differently. So, probably, you will need caching system to read data for actors activation — depends on your workload.
And about implementing your custom State Manager: take a look at this example. Basically, all you need to do is to implement IReliableStateManagerReplica interface and pass it to StatefullService constructor.

Embedding custom metadata with Service Fabric application/service

The objective that I have is to run multiple applications with some metadata embedded into applications/services so that I could query applications/services using the metadata. Is this possible?
I was looking at the following post and the answer hints at this possibility, but no specific details on how to achieve the result.
The primary piece of "metadata" you get is the service/application instance name. That's what I talked about in my other post. The way that works is by creating each service/application instance with a name that contains some information clients can use when resolving them. Clients can then query Service Fabric for named application/service instances and connect to a specific one. A service/application instance name is URI, so you can use a path hierarchy to categorize information.
Continuing with the audio/video example: Let's extend that example so we have an application that can perform specific tasks for specific media formats for audio or video. Each combination of task + media format is a unique named service instance, resulting in a deployment that looks something like this:
Application:
fabric:/avapp
Services:
fabric:/avapp/video/encoding/mp4
fabric:/avapp/video/encoding/h264
fabric:/avapp/video/captioning/english
fabric:/avapp/video/captioning/czech
fabric:/avapp/audio/encoding/aac
fabric:/avapp/audio/encoding/mp3
etc.
Now clients can query Service Fabric to discover what services are available:
FabricClient fabricClient = new FabricClient();
System.Fabric.Query.ServiceList services = await fabricClient.QueryManager.GetServiceListAsync(new Uri("fabric:/avapp"));
Then you can simply query the list of services with LINQ. For example, if I want to see all services that do video encoding:
services.Where(x => x.ServiceName.AbsolutePath.Contains("video/encoding"));
And then you can resolve an address for a specific service to connect to it:
ServicePartitionResolver resolver = ServicePartitionResolver.GetDefault();
ResolvedServicePartition servicePartition = await resolver.ResolveAsync(new Uri("fabric:/avapp/video/encoding/h264"), new ServicePartitionKey(1), cancellationToken);
ResolvedServiceEndpoint endpoint = servicePartition.GetEndpoint();
There's a bit more to the address resolution part (see here), but that's the general idea.
Application instances also allow you to set custom application parameters (key-value pairs) that can be set per instance at creation time. They don't show up in the application name, but you get that information back when you ask Service Fabric for a list of running application instances. That can potentially also be used as metadata by clients when they need to decide what application to connect to.
Update: More info on application instance parameters:
When you create a new application instance you can supply a set of key-value pairs in the application description. Then when you query Service Fabric for application instances you get back a list of Application result objects that have said parameters. This also shows up in Visual Studio, in your application project, where you have environment-specific application parameter files. Visual Studio extracts those key-value pairs from the XML files and uses them in the application description when it creates an instance of your application.