Blue/Green deployment for service that using Kafka in docker swarm - apache-kafka

Goal: is to organize blue/green deployment for the particular spring-boot service which is using Kafka.(I'm not interested how to solve B/G on REST or DB layer so lets assume that this part of B/G is already done on LB)
I want: to run two instances of service in docker swarm cluster simultaneously. But one of them must be in sleep mode i.e. do not produce and consume any messages.
Biggest problem: it is not a big deal to set scale = 2 for my service. However in this case each instance of service will consume events and process them. This leads to disaster. So I need simple and transparent mechanism for turning off all services' producers and consumers and restore them with specific offset
I`m looking for an example or suggestions how to achieve that.
Current idea: is to store current offsets in zookeeper and write custom layer that will pool those configs and manage consumers and producers based on that. However, I believe some better and simpler way/framework exists.

There are several approaches:
1. Pause and resume - https://docs.spring.io/spring-kafka/reference/html/_reference.html#pause-resume
For each topic split to 2 topics: active/non-active, and switch between them
If you use public cloud - https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-kafka-on-aws/

Related

Hazelcast IMap Lock not working on kubernetes across different pods

We are using Hazelcast 4 to implement distributed locking across two pods on kuberentes.
We have developed distributed application, two pods of micro service has been created. Both instances are getting auto discovered and forming members.
We are trying to use IMap.lock(key) method to achieve distributed locking across two pods however both pods are acquiring lock at same time, thereby executing the business logic at the concurrently. Also hazelcast management center shows zero locks for the created Imap.
Can you please help on how to achieve synchronization of imap lock(key) so that single pod get the lock for given key at given point of time ?
Code Snippet:-
HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
try{
IMap map = client.getMap("customers");
map.lock( key );
//business logic
} finally {
map.unlock( key );
}
}
Can you create an mvce and confirm the version of Hazelcast used please.
There are tests for locks here that you can perhaps use as a way to simplify to determine where the fault lies.

Number of runtime instances (pods) on Stream deployment on Spring Cloud Dataflow

We are right now busy with a new project where we want to introduce SCDF, but running into one major issue and was wondering if you guys faced a similar issue and how did you solve it.
What we saw, for every stream we create in SCDF, the deployment(on Kubernetes) creates separate instances of the microservices per stream. So if microservice A is used in 3 different streams, at runtime we have 3 instances of microservice A. In our solution, we have a lot of reusable microservices but if SCDF instantiates these microservices per stream we are roughly running almost 400 instances (pods) in production, and if we scale on top of this, we are using an enormous amount of resources. We need to somehow find a way to share pods (instances) across streams.
Did you face this issue? If yes, what was your approach to this?
There are a couple of ways to reduce the number of pods.
Use function composition. All of the prepackaged apps are now function based, meaning you can combine functions into a single source, sink or processor app. The SCDF stream definition requires at least a source and sink, but the out of the box functions are designed to be reused in custom apps which may apply the functions to implement intermediate steps as necessary. Bear in mind that composed functions processes data in memory, eliminating the messaging middleware used to stream data between separate pods. This could make your app more susceptible to data loss. There are always trade offs.
Use named destinations: You may share parts of a streaming pipeline using named destinations. This allows you to fan-in or fan-out. In this example, 3 stream definitions enable 2 sources to feed a shared processor and sink.
source1 > :my-named-destination
source2 > :my-named-destination
:my-named-destination > proccessor1 | sink1
The commercial edition of SCDF supports stream definitions using custom components that implement multiple input/outputs. This gives you options similar to the above, where custom routing logic is implemented internally
You can deploy a custom task in place of a stream if appropriate for your use case. The task may incorporate out of the box functions and function composition as needed.
An important consideration when combining components is increased coupling and dependencies among pipeline steps. Simple linear processing creates more pods but is much simpler to implement,deploy,manage, and reason about.

Stateful or Stateless service for processing servicebus queues

I have a Session enabled Azure servicebus queue. I need some form of service that can read from the queue and process them and save the result (in memory for later retrieval). We are using azure servicefabric in our current architecture. I got few questions regarding which one to choose Stateful or Stateless service.
If I use Stateful service, then based on the documentation my understanding is, service will be running on 1 primary node (assuming 1 partition) and 2 active secondary nodes. That means, if I have a 10 node Service fabric cluster, then this stateful service will be utilizing only one node (VM) primarily.
So if I add a listener to this stateful service to read messages from Queues then that service on primary node will read messages from queues and all other remaining 9 nodes wont be able to utilized. Is this correct?
Whereas if I use Stateless service, I can create instances on all 10 nodes and all of them could listen to the message in Queues and process them in parallel. However, I will loose the option to save the results.
Please advise.
So if I add a listener to this stateful service to read messages from Queues then that service on primary node will read messages from queues and all other remaining 9 nodes wont be able to utilized. Is this correct?
That is correct. With stateful service scenario, only the primary replica will have it's listener executed, and work will be done. Other replicas can be used in read-only mode, but they would not be writing anything into reliable collections.
Whereas if I use Stateless service, I can create instances on all 10 nodes and all of them could listen to the message in Queues and process them in parallel.
Exactly. Stateless services can perform their work in parallel and no state is persisted. That's also the reason whey there's no reliable collection available for this Service Fabric model.
However, I will loose the option to save the results.
Not necessarily true. You could still save your data in a centralized/shared DB, just like you'd do with stateless solutions in the past (for example Cloud Services, or a Azure WebApp).
What you should ask yourself is what problem are you solving. If you have data sharding, the Statful makes more sense. If you don't have data sharding and/or you need to scale out your processing power, rather that scale up, Stateless is a better approach.

How to run something on each node in service fabric

In a service fabric application, using Actors or Services - what would the design be if you wanted to make sure that your block of code would be run on each node.
My first idea would be that it had to be a Service with instance count set to -1, but also in cases that you had set to to 3 instances. How would you make a design where the service ensured that it ran some operation on each instance.
My own idea would be having a Actor with state controlling the operations that need to run, and it would itterate over services using serviceProxy to call methods on each instance - but thats just a naive idea for which I dont know if its possible or if it is the proper way to do so?
Some background info
Only Stateless services can be given a -1 for instance count. You can't use a ServiceProxy to target a specific instance.
Stateful services are deployed using 1 or more partitions (data shards). Partition count is configured in advance, as part of the service deployment and can't be changed automatically. For instance if your cluster is scaled out, partitions aren't added automatically.
Autonomous workers
Maybe you can invert the control flow by running Stateless services (on all nodes) and have them query a 'repository' for work items. The repository could be a Stateful service, that stores work items in a Queue.
This way, adding more instances (scaling out the cluster) increases throughput without code modification. The stateless service instances become autonomous workers.
(opposed to an intelligent orchestrator Actor)

How can I reach a specific replica of a stateless service

I've created a stateless service within Service Fabric. It has a SingletonPartition, but multiple instances (InstanceCount is -1 in my case).
I want to communicate with a specific replica of this service. To find all replica's I use:
var fabricClient = new FabricClient();
var serviceUri = new Uri(SERVICENAME);
Partition partition = (await fabricClient.QueryManager.GetPartitionListAsync(serviceUri)).First();
foreach(Replica replica in await fabricClient.QueryManager.GetReplicaListAsync(partition.PartitionInformation.Id))
{
// communicate with this replica, but how to construct the proxy?
//var eventHandlerServiceClient = ServiceProxy.Create<IService>(new Uri(replica.ReplicaAddress));
}
The problem is that there is no overload of the ServiceProxy to create one to the replica. Is there another way to communicate with a specific replica?
Edit
The scenario we are building is the following. We have different moving parts with counter information: 1 named partitioned stateful service (with a couple of hundred partitions), 1 int64 partitioned stateful service, and 1 actor with state. To aggregate the counter information, we need to reach out to all service-partitions and actor-instances.
We could of course reverse it and let everyone send there counts to a single (partitioned) service. But that would add a network call in the normal flow (and thus overhead).
Instead, we came up with the following. The mentioned services&actors are combined into one executable and one servicemanifest. Therefore they are in the same process. We add a stateless service with instancecount -1 to the mentioned services&actors. All counter information is stored inside a static variable. The stateless service can read this counter information.
Now, we only need to reach out to the stateless service (which has an upper limit of the number of nodes).
Just to get some terminology out of the way first, "replica" only applies to stateful services where you have a unique replica set for each partition of a service and replicate state between them for HA. Stateless services just have instances, all of which are equal and identical.
Now to answer your actual question: ServiceProxy doesn't have an option to connect to a specific instance of a deployed stateless service. You have the following options:
Primary replica: connect to the primary replica of a stateful service partition.
Random instance: connect to a random instance of a stateless service.
Random replica: connect to a random replica - regardless of its role - of a stateful service partition.
Random secondary replica - connect to a random secondary replica of a stateful service partition.
E.g.:
ServiceProxy.Create<IMyService>(serviceUri, partitionKey, TargetReplicaSelector.RandomInstance)
So why no option to connect to a specific stateless service instance?
Well, I would turn this question around and ask why would you want to connect to a specific stateless service instance? By definition, each stateless instance should be identical. If you are keeping some state in there - like user sessions - then now you're stateful and should use stateful services.
You might think of intelligently deciding which instance to connect to for load balancing, but again since it's stateless, no instance should be doing more work than any other as long as requests are distributed evenly. And for that, Service Proxy has the random distribution option.
With that in mind, if you still have some reason to seek out specific stateless service instances, you can always use a different communication stack - like HTTP - and do whatever you want.
"Well, I would turn this question around and ask why would you want to connect to a specific stateless service instance?"
One example would be if you have multiple (3x) stateless service instances all having WebSocket connections to different clients, let's say 500 each. And you want to notify all 1500 (500x3) users of the same message, if it was possible to connect directly to a specific instance (which I would expect was possible, since I can query for those instances using the FabricClient), I could send a message to each instance which would redirect it to all connected clients.
Instead we have to come up with any of multiple workarounds:
Have all instances connect to some evented system that allows them to trigger on incoming message, e.g. Azure Event Hubs, Azure Service Bus, RedisCache.
Host an additional endpoint, as mentioned here, which makes it 3 endpoints pr service instance: WCF, WebSocket, HTTP.
Change to a stateful partitioned service which doesn't hold any state or any replicas, but simply allows to call partitions.
Currently having some serious issues with RedisCache so migrating away from that, and would like to avoid external dependencies such as Event Hubs and Service Bus just for this scenario.
Sending many messages each second, which will give additional overhead when having to call HTTP, and then the request need to transition over to the WebSocket context.
In order to target a specific instance of stateless service you can use named partitions. You can have a single instance per partition and use multiple Named partitions. For example, you can have 5 named partitions [0,1,2,3,4] each will have only one instance of the "service". Then you can call it like this
ServiceProxy.Create<IMyService>(serviceUri, partitionKey, TargetReplicaSelector.RandomInstance)
where partitionKey parameter will have one of values [0,1,2,3,4].
the real example would be
_proxyFactory.CreateServiceProxy<IMyService>(
_myServiceUri,
new ServicePartitionKey("0"), // One of "0,1,2,3,4"
TargetReplicaSelector.Default,
MyServiceEndpoints.ServiceV1);
This way you can choose one of 5 instances. But all 5 instancies may not be always available. For example during startup or when the service dies and SF is recreating or it is in InBuild stage... So for this reason you should run Partition discovery