Changing number of partitions for a reliable actor service - azure-service-fabric

When I create a new Service Fabric actor the underlying (auto generated) actor service is configured to use 10 partitions.
I'm wondering how much I need to care about this value?
In particular, I wonder whether the Actor Runtime has support for changing the number of partitions of an actor service on a running cluster.
The Partition Service Fabric reliable services topic says:
In rare cases, you may end up needing more partitions than you have initially chosen. As you cannot change the partition count after the fact, you would need to apply some advanced partition approaches, such as creating a new service instance of the same service type. You would also need to implement some client-side logic that routes the requests to the correct service instance, based on client-side knowledge that your client code must maintain.
However, due to the nature of Actors and that they are managed by the Actor Runtime I'm tempted to believe that it would indeed be possible to do this. -- That the Actor Runtime would be able to take care of all the heavylifting required to re-partition actor instances.
Is that at all possible?

The number of partitions in a running service cannot be changed. This is true of Actors as well as Reliable Services. Typically, you would want to pick a large number of partitions (more than the number of nodes) up front and then scale out the number of nodes in the cluster instead of trying to repartition your data on the fly. Take a look at Abhishek and Matthew's comments in the discussion here for some ideas on how to estimate how many partitions you might need.

Related

kafka and parallel consumer: why order is important into a microservice architecture

I started to dive into kafka ecosystem.
I was surprised to find out that by default, each consumer only digests one "event" at a time, sequentially!
It's given by offset acknowledgement, unit of parallelism is at partition-level and some other stuff... you can find nice details here.
If I need to consume received messages in parallel into my application node thread pool, I need to use and make some non-default development effort to get it.
By other hand, several technologies have their own recipes to get it: quarkus/smallrye, confluentinc has a parallel-consummer, spring, ...
I hope to find an by-default code configuration in order to get it.
This suggests me that perhaps, some other technologies are more suitable in order to consume messages straightforwardly...
Why parallel consumer is not given by default into libraries?
Why order is important into a microservice architecture?
KafkaConsumer is a relatively low-level object, that's basically capable of reading records from given offset position, seeking to a particular offset, reading and saving that offset in existing Kafka store (called __consumer_offsets). Similarly, the receive API is fully synchronous with its poll(Duration) signature.
If more custom, e.g. asynchronous behaviour is desired, then you can use the wrappers like parallel-consumer or spring-kafka.
When it comes to library design, very often it is preferable to do only one thing (basically an applied single responsibility principle).
As an example, consider that if the "main" library were to be asynchrous, the library providers would need to provide thread creation and maintaining semantics, what happens when there are no records (compare to spring-kafka's listeners), and so on. By exposing low-level API these concerns that are not immediately relevant to Kafka these concerns can be avoided.
Why parallel consumer is not given by default into libraries?
Kafka clients are a largely pluggable ecosystem. The core developers are focused on optimizing the server code, and the built-in client libraries (and serializers) work "well-enough" (TM). So, therefore, a "by default code configuration" for parallel-consumption doesn't exist.
Why order is important into a microservice architecture
That completely depends on your app, but one example is payment-processing or handling any sort of ledger system (after all, Kafka is a sort of distributed ledger). You cannot withdraw money without first depositing a balance. This is not unique to microservices.

Scaling Kafka: how new event processing capacity is added dynamically?

To a large extent getting throughout in a system on Kafka rests of these degrees of freedom:
(highly recommended) messages should be share nothing. If share-nothing they can be randomly assigned to different partitions within a topic and processed independently of other messages
(highly recommended) the partition count per topic should be sized. More partitions per topic equals greater possible levels of parallelism
(highly recommended) to avoid hotspots within a topic partition, the Kafka key may need to include time or some other varying data point so that a single partition does not unintentionally get the majority of the work
(helpful) the processing time per message should be small when possible
https://dzone.com/articles/20-best-practices-for-working-with-apache-kafka-at mentions other items fine tuning these principles
Now suppose that on an otherwise OK system, one will get a lot of new work. For example, a new and large client may be added mid-day or an existing client may need to onboard a new account adding zillions of new events. How do we scale horizontally adding new capacity for this work?
If the messages are truly share-nothing throughout the entire system --- I have a data pipeline of services where A gets a message, processes it, publishes a new message to another service B, and so on --- adding new capacity to the system could be easy as sending a message on an separate administration topic telling the consumer task(s) to spin up new threads. Then so long as the number of partitions in the topic(s) is not a bottleneck, one would have indeed add new processing capacity.
This approach is certainly doable but is still un-optimal in these respects:
Work on different clientIds is definitely share-nothing. Merely adding new threads takes work faster, but any new work would interleave behind and within the existing client work. Had a new topic been available with a new pub/sub process pair(s), the new work could be done in parallel if the cluster has spare capacity on the new topic(s)
In general, share-nothing work may not be always possible at every step in a data pipeline. If ordering was ever required, the addition of new subscriber threads could get messages out of order for a given topic, partition. This happens when there are M paritions in a topic but >M subscriber threads. I have one such order sensitive case. It's worth noting then that ordering effectively means at most 1 subscriber thread per partition so sizing paritions may be even more important.
Tasks may not be allowed to add topics at runtime by the sysadmin
Even if adding topics at runtime is possible, system orchestration is required to tell various produces that clientID no longer is associated with the old topic T, but rather T'. WIP on T should be flushed first before using T'
How does the Cassandra community deal with adding capacity at runtime or is this day-dreaming? Adding new capacity via in this way seems to roughly center on:
Dynamic, elastic horizontal capacity seems to broadly center on these principles:
have spare capacity on your cluster
have extra unused topics for greater parallelism; create them at runtime or pre-create but not use if sys-admins don't allow dynamically creation
equip the system so that events for a given clientID can be intercepted before they enter the pipeline and deferred to a special queue, know when existing events on the clientID have flushed through the system, then update config(s) sending the held/deferred events and any new events on new clients to the new topic
Telling consumers to spin up more listeners
Dynamically adding more partitions? (Doubt that's possible or practical)

High Scalability Question: How to sync data across multiple microservices

I have the following use cases:
Assume you have two micro-services one AccountManagement and ActivityReporting that processes event U.
When a user registers, event U containing the user information will published into a broker for the two micro-services to process.
AccountManagement, and ActivityReporting microservice are replicated across two instances each for performance and scalability reasons.
Each microservice instance has a consumer listening on the broker topic. The choice of topic is so that both AccountManagement, and ActivityReporting can process U concurrently.
However, I want only one instance of AccountManagement to process event U, and one instance of ActivityReporting to process event U.
Please share your experience implementing a Consume Once per Application Group, broker system.
As this would effectively solve this problem.
If all your consumer listeners even from different instances have the same group.id property then only one of them will receive the message. You need to set this property when you initialise the consumer. So in your case you will need one group.id for AccountManagement and another for ActivityReporting.
I would recommend Cadence Workflow which is much more powerful solution for microservice orchestration.
It offers a lot of advantages over using queues for your use case.
Built it exponential retries with unlimited expiration interval
Failure handling. For example it allows to execute a task that notifies another service if both updates couldn't succeed during a configured interval.
Support for long running heartbeating operations
Ability to implement complex task dependencies. For example to implement chaining of calls or compensation logic in case of unrecoverble failures (SAGA)
Gives complete visibility into current state of the update. For example when using queues all you know if there are some messages in a queue and you need additional DB to track the overall progress. With Cadence every event is recorded.
Ability to cancel an update in flight.
See the presentation that goes over Cadence programming model.

Distributed DDD entities with Akka

Suppose we'd have a large number of persistent Person actors, each constructed with an identity and a name argument. What would be the best way to distribute these actors in a cluster, in such a manner that:
new actors are appointed a node by strategy X (round robin, consistent hash, etc.)
a "coordinator" actor contains a mapping from identity to ActorRef
one or more nodes can fail and the affected actors are recovered on other nodes
there is no SPF
I've considered the following, which doesn't seem to solve the problem:
Cluster sharding; all actors are initialised equally and created by coordinator
Cluster aware routing; groups or pools are fixed size and can't be modified dynamically
Sounds like you pretty much exactly are describing Akka cluster sharding and there isn't enough information to see why it would not fit.
The common solution to deal with such a design problem is to have an uninitialized state of the sharded entity where it only accepts an initialize command containing the needed values (so something like CreateUser(id, name)) and when it gets that it toggles to its "normal" behavior.
Another option could be to introduce an intermediate actor that doesn't start the actual actor until it has extracted the name value if you have no means to change the design of your Person actor.
Ofc. you could also drop down to the Akka cluster APIs directly and build something that exactly matches your use case, but handling redistribution on cluster topology change (add, remove nodes etc) is far from trivial to get right.
I think you would also come to the realisation that achieving such a tool that is entirely non-invasive for your entities without the sharding solution being tightly coupled with you business logic is very hard.

Akka and state among actors in cluster

I am working on my bc thesis project which should be a Minecraft server written in scala and Akka. The server should be easily deployable in the cloud or onto a cluster (not sure whether i use proper terminology...it should run on multiple nodes). I am, however, newbie in akka and i have been wondering how to implement such a thing. The problem i'm trying to figure out right now, is how to share state among actors on different nodes. My first idea was to have an Camel actor that would read tcp stream from minecraft clients and then send it to load balancer which would select a node that would process the request and then send some response to the client via tcp. Lets say i have an AuthenticationService implementing actor that checks whether the credentials provided by user are valid. Every node would have such actor(or perhaps more of them) and all the actors should have exactly same database (or state) of users all the time. My question is, what is the best approach to keep this state? I have came up with some solutions i could think of, but i haven't done anything like this so please point out the faults:
Solution #1: Keep state in a database. This would probably work very well for this authentication example where state is only represented by something like list of username and passwords but it probably wouldn't work in cases where state contains objects that can't be easily broken into integers and strings.
Solution #2: Every time there would be a request to a certain actor that would change it's state, the actor will, after processing the request, broadcast information about the change to all other actors of the same type whom would change their state according to the info send by the original actor. This seems very inefficient and rather clumsy.
Solution #3: Having a certain node serve as sort of a state node, in which there would be actors that represent the state of the entire server. Any other actor, except the actors in such node would have no state and would ask actors in the "state node" everytime they would need some data. This seems also inefficient and kinda fault-nonproof.
So there you have it. Only solution i actually like is the first one, but like i said, it probably works in only very limited subset of problems (when state can be broken into redis structures). Any response from more experienced gurus would be very appriciated.
Regards, Tomas Herman
Solution #1 could possibly be slow. Also, it is a bottleneck and a single point of failure (meaning the application stops working if the node with the database fails). Solution #3 has similar problems.
Solution #2 is less trivial than it seems. First, it is a single point of failure. Second, there are no atomicity or other ordering guarantees (such as regularity) for reads or writes, unless you do a total order broadcast (which is more expensive than a regular broadcast). In fact, most distributed register algorithms will do broadcasts under-the-hood, so, while inefficient, it may be necessary.
From what you've described, you need atomicity for your distributed register. What do I mean by atomicity? Atomicity means that any read or write in a sequence of concurrent reads and writes appears as if it occurs in single point in time.
Informally, in the Solution #2 with a single actor holding a register, this guarantees that if 2 subsequent writes W1 and then W2 to the register occur (meaning 2 broadcasts), then no other actor reading the values from the register will read them in the order different than first W1 and then W2 (it's actually more involved than that). If you go through a couple of examples of subsequent broadcasts where messages arrive to destination at different points in time, you will see that such an ordering property isn't guaranteed at all.
If ordering guarantees or atomicity aren't an issue, some sort of a gossip-based algorithm might do the trick to slowly propagate changes to all the nodes. This probably wouldn't be very helpful in your example.
If you want fully fault-tolerant and atomic, I recommend you to read this book on reliable distributed programming by Rachid Guerraoui and Luís Rodrigues, or the parts related to distributed register abstractions. These algorithms are built on top of a message passing communication layer and maintain a distributed register supporting read and write operations. You can use such an algorithm to store distributed state information. However, they aren't applicable to thousands of nodes or large clusters because they do not scale, typically having complexity polynomial in the number of nodes.
On the other hand, you may not need to have the state of the distributed register replicated across all of the nodes - replicating it across a subset of your nodes (instead of just one node) and accessing those to read or write from it, providing a certain level of fault-tolerance (only if the entire subset of nodes fails, will the register information be lost). You can possibly adapt the algorithms in the book to serve this purpose.