How to count discarded entities in a FIFO queue using Simulink? - matlab

I'm trying to model a single queue, single server simulation using Simulink in MATLAB, I've recently installed it and I'm pretty new.
I've created a Time-Based Entity Generator (with an exponential arrival time), a FIFO queue with capacity of 50 entities and a Single Server with an exponential service time as shown in this image:
I wonder how I can count the number of entities that are generated but can't get into the FIFO because it's full (reached 50 entities already) and discard them.

This will probably not help you anymore, but I found a solution to this problem and thought I would share it for future reference. The way to solve it is using an Output Switch block with 2 ports. Connect the first to your FIFO queue and the second to a sink (or whatever you want your entities to go to) and select "First port that is not blocked" as a switching criterion. Picture here: http://i.imgur.com/qxmQS4s.png. Cheers!

Related

Tracking an expected set of Kafka events

Say I have N cities and each will report their temperature for the hour (H) by producing Kafka events. I have a complex model I want to run but want to ensure it doesn't attempt to kick-off before all N are read.
Say they are being produced in batches, I understand that to ensure at-least-once consumption, if a consumer fails mid-batch then it will pick up at the front of the batch. I have built this into my model to count by unique Cities (and if a city is sent multiple times it will overwrite existing records).
My current plan is to set it up as follows:
An application creates an initial event which says "Expect these N cities to report for H o'clock".
The events are persisted (in db, Redis, etc) by another application. After writing, it produces an event which states how many unique cities have been reported in total so far for H.
Some process matches the initial "Expect N" events with "N Written" events. It alerts the rest of the system that the data set for H is ready for creating the model when they are equal.
Does this problem have a name and are there common patterns or libraries available to manage it?
Does the solution as outlined have glaring holes or overcomplicate the issue?
What you're describing sounds like an Aggregator, described by Gregor Hohpe and Bobby Woolf's "Enterprise Integration Patterns" as:
a special Filter that receives a stream of messages and identifies messages that are correlated. Once a complete set of messages has been received [...], the Aggregator collects information from each correlated message and publishes a single, aggregated message to the output channel for further processing.
This could be done on top of Kafka Streams, using its built-in aggregation, or with a stateful service like you suggested.
One other suggestion -- designing processes like this with event-driven choreography can be tricky. I have seen strong engineering teams fail to deliver similar solutions due to diving into the deep end without first learning to swim. If your scale demands it and your organization is already primed for event-driven distributed architecture, then go for it, but if not, consider an orchestration-based alternative (for example, AWS Step Functions, Airflow, or another workflow orchestration tool). These are much easier to reason about and debug.

Is there a way of assigning an int number to different instances of stateless services?

I'm building a solution where we'll have a (service-fabric) stateless service deployed to K instances. This service is tasked with some workload (like querying) and I want to split the workload between them as evenly as I can - and I want to make this a dynamic solution, which means if I decide to go from K instances to N instances tomorrow, I want the workload splitting to happen in a way that it will automatically distribute the load across N instances now. I don't have any partitions specified for this service.
As an example -
Let's say I'd like to query a database to retrieve a particular chunk of the records. I have 5 nodes. I want these 5 nodes to retrieve different 1/5th of the set of records. This can be achieved through some query logic like (row_id % N == K) where N is the total number of instances and K is the unique instance_number.
I was hoping to leverage FabricRuntime.GetNodeContext().NodeId - but this returns a guid which is not overly useful.
I'm looking for a way where I can deterministically say it's instance number M out of N (I need to be able to name the instances through 1..N) - so I can set my querying logic according to this. One of the requirements is if that instance goes down / crashes etc... when SF automatically restarts it, it should still identify as the same instance id - so that 2 or more nodes doesn't query the same set of results.
What is the best of solving this problem? Is there a solution which involves pure configuration through ApplicationManifest.xml or ServiceManifest.xml?
There is no out of the box solution for your problem, but it can be easily done in many different ways.
The simplest way is using the Queue-Based Load Leveling pattern in conjunction with Competing Consumers pattern.
It consists of creating a queue, add the work to the queue, and each instance get one message to process this work, if one instance goes down and the message is not processed, it goes back to the queue and another instance pick it up.
This way you don't have to worry about the number of instances running, failures and so on.
Regarding the work being put in the queue, it will depend if you want to to do batch processing or process item by item.
Item by item, you put one message in the queue for each item being processed, this is a simple way to handle the work and each instance process one message at time, or multiple messages in parallel.
In batch, you can put a message that represents a list of items to be processed and each instance process that batch until completed, this is a bit trickier because you might have to handle the progress of the work being done, in case of failure, the next time you can continue from where it stopped.
The queue approach is a reactive design, in this case the work need to be put in the queue to trigger the processing, If you want a proactive approach and need to keep track of which work goes to who, you probably might be better of using some other approach, like a Leasing mechanism, where each instance acquire a lease that belongs to the instance until it releases the lease, this would more suitable when you work with partitioned data or other mechanism where you can easily split the load.
Regarding the issue with the ID, an option would be the InstanceId of the replica you are on, you can reach by StatelessService.Context.InstanceId, it is not a sequential ID, but it is a random number. It is better than using the node id, because you might have multiple partitions on same node and the id would conflict with each other.
If you decide to use named partitions, you could use order in the partition name instead, so each partition would have a sequential name.
Worth mention that service fabric has a limitation that doesn't allow services to have multiple replicas on same node, because of this limitation you might have to design your services with this in mind, otherwise you won't be able to scale out once the limit is reached. Also, the same thread has some discussion about approaches to process multiple distributed items that might give you some ideas.

Algorithm and Data Structure - Queue

The 2 queueing strategies are as follows:
1. A single queue. Each server will take the next customer as soon as the server becomes available.
2. A queue for each server. Customers will choose the server with the shortest queue on arrival and not allowed to jump queue thereafter.
Can someone explain the 2nd queue? It means the same thing as the first queue just that the customer will choose the shortest one(which means will faster process the customer) to queue. Where can I get more information of this queue or if there is any sample code?
Image representing the two queuing strategies
It has been found out that the single queue - multiple servers approach is more efficient than the multiple queues approach. In this approach, the waiting time is almost equally distributed among all the customers, even though the processing time for each customer is different.
Here is a link to a detailed analysis and mathematical proof of the same.
Comparison Between Single and Multiple Queues

Using Akka Actors/Remoting to distribute graph algorithms across a cluster

So I'm currently working on distributing my Scala code across multiple machines for a large graph ("part 1" of the question) and am currently working with the Akka framework in the hopes of using Actors and Remoting.
I read the document here and it seems the way the example was done can be extended to do what I want to do, but I have a few concerns regarding this method...
1) How do we decide how many instances of Actors we should create? Do we have to do a trial/error thing to see which is the best, or is there some more intuitive way to go about it?
2) I am thinking of doing my task similarly to how the example was done - with a Master that spawns several Workers and communicate using case classes as messages. What I want to do is to find some metric between pairs of vertices (random walk), for all-pairs. I have a graph class that implements a method to calculate the metric given two vertices.
I will give each Worker two vertices 'u' and 'v' to calculate the metric for, and have workers return the value.
When the Master sends messages to Workers to calculate the metric, the Worker needs the graph structure - do I just do this by including the graph structure (i.e. adjacency list that is a HashMap) in the message? Will this cause any overhead by copying the graph structure each time, or do all workers just share that graph, or is there a better way to go about this?
3) Does the algorithm for calculating the metric between pairs of vertices need to be re-implemented to the extended Actor class, or is there a way for individual Actors to access the same graph structure to call the method (I guess this is similar to the question above about passing the entire graph structure as part of the message)?
Thanks!
Regards,
-kstruct
1) How do we decide how many instances of Actors we should create?
Although actors abstract the underlying threading management, creating less actors than available CPU cores is wasting the computational power. If you have 10 servers, 8 cores each, create at least 80 actors, 8 per machine.
If the algorithm is CPU intensive, creating more won't give you a performance boost - extra workers will simply wait for available core.
[...] Worker needs the graph structure - do I just do this by including the graph structure (i.e. adjacency list that is a HashMap) in the message? [...]
There is no overhead if all your actors live in the same JVM - you are simply passing a reference to the graph structure in a message. However in distributed environment this will cause the graph to be serialized and sent over wire - probably a lot of data.
Consider sharing this data structure by all actors.
I don't understand question 3.

Akka and state among actors in cluster

I am working on my bc thesis project which should be a Minecraft server written in scala and Akka. The server should be easily deployable in the cloud or onto a cluster (not sure whether i use proper terminology...it should run on multiple nodes). I am, however, newbie in akka and i have been wondering how to implement such a thing. The problem i'm trying to figure out right now, is how to share state among actors on different nodes. My first idea was to have an Camel actor that would read tcp stream from minecraft clients and then send it to load balancer which would select a node that would process the request and then send some response to the client via tcp. Lets say i have an AuthenticationService implementing actor that checks whether the credentials provided by user are valid. Every node would have such actor(or perhaps more of them) and all the actors should have exactly same database (or state) of users all the time. My question is, what is the best approach to keep this state? I have came up with some solutions i could think of, but i haven't done anything like this so please point out the faults:
Solution #1: Keep state in a database. This would probably work very well for this authentication example where state is only represented by something like list of username and passwords but it probably wouldn't work in cases where state contains objects that can't be easily broken into integers and strings.
Solution #2: Every time there would be a request to a certain actor that would change it's state, the actor will, after processing the request, broadcast information about the change to all other actors of the same type whom would change their state according to the info send by the original actor. This seems very inefficient and rather clumsy.
Solution #3: Having a certain node serve as sort of a state node, in which there would be actors that represent the state of the entire server. Any other actor, except the actors in such node would have no state and would ask actors in the "state node" everytime they would need some data. This seems also inefficient and kinda fault-nonproof.
So there you have it. Only solution i actually like is the first one, but like i said, it probably works in only very limited subset of problems (when state can be broken into redis structures). Any response from more experienced gurus would be very appriciated.
Regards, Tomas Herman
Solution #1 could possibly be slow. Also, it is a bottleneck and a single point of failure (meaning the application stops working if the node with the database fails). Solution #3 has similar problems.
Solution #2 is less trivial than it seems. First, it is a single point of failure. Second, there are no atomicity or other ordering guarantees (such as regularity) for reads or writes, unless you do a total order broadcast (which is more expensive than a regular broadcast). In fact, most distributed register algorithms will do broadcasts under-the-hood, so, while inefficient, it may be necessary.
From what you've described, you need atomicity for your distributed register. What do I mean by atomicity? Atomicity means that any read or write in a sequence of concurrent reads and writes appears as if it occurs in single point in time.
Informally, in the Solution #2 with a single actor holding a register, this guarantees that if 2 subsequent writes W1 and then W2 to the register occur (meaning 2 broadcasts), then no other actor reading the values from the register will read them in the order different than first W1 and then W2 (it's actually more involved than that). If you go through a couple of examples of subsequent broadcasts where messages arrive to destination at different points in time, you will see that such an ordering property isn't guaranteed at all.
If ordering guarantees or atomicity aren't an issue, some sort of a gossip-based algorithm might do the trick to slowly propagate changes to all the nodes. This probably wouldn't be very helpful in your example.
If you want fully fault-tolerant and atomic, I recommend you to read this book on reliable distributed programming by Rachid Guerraoui and Luís Rodrigues, or the parts related to distributed register abstractions. These algorithms are built on top of a message passing communication layer and maintain a distributed register supporting read and write operations. You can use such an algorithm to store distributed state information. However, they aren't applicable to thousands of nodes or large clusters because they do not scale, typically having complexity polynomial in the number of nodes.
On the other hand, you may not need to have the state of the distributed register replicated across all of the nodes - replicating it across a subset of your nodes (instead of just one node) and accessing those to read or write from it, providing a certain level of fault-tolerance (only if the entire subset of nodes fails, will the register information be lost). You can possibly adapt the algorithms in the book to serve this purpose.