As seen below in my flowchart I am trying to model jobs that are being sent to servers. In the service block, my resource pool is servers.
My current model has Agent 'Jobs' being created in the source. they are then sent to the Queue and to the Service block where the Service block will seize a server(Server Agent) from the resource pool.
I have set out my simulation so that servers are deleted at random times.
My trouble is: When a server that is currently working on a Job is deletes (at a random time), how is it possible to send the Job back to the queue.
I'm having an issue getting the service block/server pool accessing the Jobs agent
I'm not sure how you're deleting your servers but if you're doing so by reducing the capacity of the resource pool my answer will work as you desire.
For you to return the job back to queue, first you'll need some changes to your flowchart. (See Image)
Then, in your service block, change your settings to match mine:
And voilá, that's it. If you're using a different type of deletion and this approach doesn't work, let me know.
Cheers,
Luís Pereira
Related
Followup to Configuration of MessageChannelPartitionHandler for assortment of remote steps
Even though the first question was answered (I think well), I think I'm confused enough that I'm not able to ask the right questions. Please direct me if you can.
Here is a sketch of the architecture I am attempting to build. Today, we have a job that runs a step across the cluster that works. We want to extend the architecture to run n (unbounded and different) jobs with n (unbounded and different) remote steps across the cluster.
I am not confusing job executions and job instances with jobs. We already run multiple job instances across the cluster. We need to be able to run other processes that are scalable in hte same way as the one we have defined already.
The source data is all coming from database which are known to the steps. The partitioner is defining the range of data for the "where" clause in the source database and putting that in the stepContext. All of the actual work happens in the stepContext. The jobContext simply serves to spawn steps, wait for completion, and provide the control API.
There will be 0 to n jobs running concurrently, with 0 to n steps from however many jobs running on the slave VM's concurrently.
Does each master job (or step?) require its own request and reply channel, and by extension its own OutboundChannelAdapter? Or are the request and reply channels shared?
Does each master job (or step?) require its own aggregator? By implication this means each job (or step) will have its own partition handler (which may be supported by the existing codebase)
The StepLocator on the slave appears to require a single shared replyChannel across all steps, but it appears to me that the messageChannelpartitionHandler requires a separate reply channel per step.
What I think is unclear (but I can't tell since it's unclear) is how the single reply channel is picked up by the aggregatedReplyChannel and then returned to the correct step. Of course I could be so lost I'm asking the wrong questions.
Thank you in advance
Does each master job (or step?) require its own request and reply channel, and by extension its own OutboundChannelAdapter? Or are the request and reply channels shared?
No, there is no need for that. StepExecutionRequests are identified with a correlation Id that makes it possible to distinguish them.
Does each master job (or step?) require its own aggregator? By implication this means each job (or step) will have its own partition handler (which may be supported by the existing codebase)
That should not be the case, as requests are uniquely identified with a correlation ID (similar to the previous point).
The StepLocator on the slave appears to require a single shared replyChannel across all steps, but it appears to me that the messageChannelpartitionHandler requires a separate reply channel per step.
The messageChannelpartitionHandler should be step or job scoped, as mentioned in the Javadoc (see recommendation in the last note). As a side note, there was an issue with message crossing in a previous version due to the reply channel being instance based, but it was fixed here.
I'm trying to represent a machine that works for a x amount of time before warning the operator that the oil tank needs to be refilled. Have in mind that the machine doesn't stop as soon as it send the warning message out. That way, the operator will wait until the machine stops any activity it had already started, and once it's done, he'll stop the machine and fill the tank.
In order to represent this process I'm using a Station block from the Material Handling library, that seizes a resource from a resource pool block, to which a downtime block is applied.
Is there a way to make the downtime block wait until the machine stops before performing the maintenance?
I also want to associate a resource pool representative of the operator to the downtime block, so that the operator is busy during the downtime, since he's the responsible for filling the tank. Can I do that?
Thank you in advance!
Is there a way to make the downtime block wait until the machine stops before performing the maintenance?
Yes, explore how Priorities work. Give your machine task a higher priority than the downtime task and ensure that the downtime block does not preempt other tasks:
I also want to associate a resource pool representative of the operator to the downtime block, so that the operator is busy during the downtime, since he's the responsible for filling the tank. Can I do that?
Yes, set the task type to "go to flowchart" and use a custom flow chart to seize from a resource pool (again, check the help on how to set this up in detail):
PS: Please only ask 1 question per issue always. See https://stackoverflow.com/help/how-to-ask and for AnyLogic https://www.benjamin-schumann.com/blog/2021/4/1/how-to-win-at-anylogic-on-stackoverflow
I'm new with anylogic so I'm not sure how to do this simple thing but is there a way for the service block to only do one agent at a time? I'm making a simulation but it seems that if one agent goes to the service block, it will be serviced even if there is still another agent being serviced? I don't know how to stop the new agent when there is still an agent being serviced. Please help me. Thanks
the problem is probably because you have not set any resources for service block. if you define a resource for your service block, only one agent can be processed at one time and other agents would wait in queue until the delay of first agent is finished.
We are using Uber Cadence and periodically we run into issues on the production environment.
The setup is the following:
One Java 14 BE with Cadence client 2.7.5
Cadence service version 0.14.1 with Postgres DB
There are multiple domains, for all domains the single BE server is registered as a worker.
What is visible in the logs is that sometimes during a query the cadence seems to lose stickiness to the BE service:
"msg":"query direct through matching failed on sticky, clearing sticky before attempting on non-sticky","service":"cadence-history","shard-id":1,"address":"10.1.1.111:7934"
"msg":"query directly though matching on non-sticky failed","service":"cadence-history","shard-id":1,"address":"10.1.1.111:7934"..."error":"code:deadline-exceeded message:timeout"
"msg":"query directly though matching on non-sticky failed","service":"cadence-history","shard-id":1,"address":"10.1.1.111:7934"..."error":"code:deadline-exceeded message:timeout"
"msg":"query directly though matching on non-sticky failed","service":"cadence-history","shard-id":1,"address":"10.1.1.111:7934"..."error":"code:deadline-exceeded message:timeout"
"msg":"query directly though matching on non-sticky failed","service":"cadence-history","shard-id":1,"address":"10.1.1.111:7934"..."error":"code:deadline-exceeded message:timeout"
...
In the backend in the meanwhile nothing is visible. However, during this time if I check the pollers on the cadence web client I see that the task list is there, but it is not considered as a decision handler any more (http://localhost:8088/domains/mydomain/task-lists/mytasklist/pollers). Because of this pretty much the whole environment is dead because there is nothing that can progress with the decision. The only option is to restart the backend service and let it re-register as a worker.
At this point the investigation is stuck, so some help would be appreciated.
Does anyone know about how a worker or task list can lose its ability to be a decision handler? Is it managed by cadence, like based on how many errors the worker generates? I was not able to find anything about this.
As I understand when the stickiness is lost, cadence will check for another worker to replay the workflow and continue it (in my case this will be the same worker as there is only one). Is it possible that replaying the flow is not possible (although I think it would generate something in the backend log from the cadence client) or at that point the worker is already removed from the list and that causes the time-out?
Any help would be more than welcome! Thanks!
Does anyone know about how a worker or task list can lose its ability to be a decision handler
This will happen when worker stops polling for decision tasks. For example if you configure the worker only polls for activity tasks, then it will show like that. So apparently it will also happen if for some reason the worker stops polling for decision tasks.
As I understand when the stickiness is lost, cadence will check for another worker to replay the workflow and continue
Yes, as long as there is another worker polling for decision tasks. Note that Query tasks is considered as of the the decision task types. (this is a wrong design, we are working on to separate it).
From your logs:
"msg":"query directly though matching on non-sticky failed","service":"cadence-history","shard-id":1,"address":"10.1.1.111:7934"..."error":"code:deadline-exceeded message:timeout"
This means that Cadence dispatch the Query tasks to a worker, and a worker accepted, but didn't respond back within timeout.
It's very highly possible that there is some bugs in your Query handler logic. The bug caused decision worker to crash(which means Cadence java client also has a bug too, user code crashing shouldn't crash worker). And then a query task loop over all instances of your worker pool, finally crashed all your decision workers.
I'm trying to emulate what QUEST does when a buffer is queried for a certain Part. In there if the part is not in the buffer the request is left pending and if a Part arrives to the buffer it's released to the machine requesting it. I have also seen this behavior in SimPy which is another DES engine.
I can't seem to find a simple way to do this in AL. The queue block has the following methods:
release(agent): Will return false and forget about the request if there's not an agent as the one specified
remove(agent): Will return null if there's no agent in the queue
So those methods won't do what I want...
It gets a little more complicated as the queue contains agents with parameters and I want to request a specific set of parameters (let's say the agents have a number parameter that can go from 1 to 3 and I'm only interested in agents in the queue if this parameter has the value 2).
Also there's a series of agents pulling this agents from the queue simultaneously and I'd like a priority to be set (let's say FIFO)
so there's a couple things that I've tried and have lead me nowhere:
Using a seize block instead of queue and adding the agents to the embedded queue in the seize block. -> I can't find the proper method to seize from the buffer in a different way from a buffer block (so I moved to option 2) but seize does have a promising customize resource choice that could help with the parameter down-selection
Using a seize block and storing the agents in a pool as resources. issues with dynamic creation of resources, seizing the appropriate one etc...
Creating a queue of requests that have returned null from a queue. This sounds like an overkill but I'll look into it
All of those appear to be a bit complex for such a simple thing in other softwares for simulation so I'm wondering if I'm missing something or if someone has come across this issue before
Suggestion 1: may it helps you to store the agents in the queue in a collection (or different collections, according to the parameter settings). Events: "on enter" and "on exit"
Suggestion 2: may the Wait - block helps you here?