I have an application that queues requests received from callers. Each request results in a call to an external web service that has restrictions on how many calls we can make to it. E.g. we can only make X calls per minute.
Each request is added to Quartz.NET scheduler and I need to be able to schedule jobs in such a way as to not violate the terms of the external web service.
I've considered keeping track somehow of the last time a job was added to scheduler and making sure jobs are triggered N milliseconds apart (i.e. each job coming in is set to trigger at LastJobTime + N), where N = (60000/X). However, I'm not sure if this is reasonable.
Is there a better way to accomplish this? If not, must I keep track of LastJobTime myself or can Quartz.NET provide some help here?
Thanks
You can create a TriggerListener that implements the ITriggerListener interface. The ITriggerListener give you the possibility to veto a job. Just count how many calls do you made in this minute and if you over your call contingent, veto the Job.
Related
I want to try out Azure Functions with the following project:
Triggered by time (every 30 minutes) my initial function1 puts some data in a queue1.
This queue1 triggers another function2 that calls an external REST API, modifies the response and puts the results in another queue3.
This queue3 starts another functions3 doing the rest.
My problem is that the REST API has a rate limiting. So if my function1 puts 100 items in the queue1 and the function2 is called 100 times parallel, my API calls will be blocked. I therefore need some kind of throttling.
How would you achieve that? I could tell function2 to wait a specific time and then add the item back to queue1, but since everything is parallel I might run in to a deadlock?
Thanks in advance for ideas!
I would recommend that you take a look at Azure Durable Functions here. The Durable Functions framework allows you to orchestrate complex workflows and manage state. In your example, you could use Durable Functions to work around the rate limiting issue
is called 100 times parallel
You can limit this (to some extent) by configuring host.json:
batchSize for Storage Queues
maxConcurrentCalls for Service Bus
If that's not enough, you could do something more sofisticated:
Function 1 knows how many items it has to process, so it could calculate the "ideal" distribution of those for the next 30 minutes
When adding messages to queue1, it could set the time when the message should be picked up (ScheduledEnqueueTimeUtc for Service Bus or initialVisibilityDelay for CloudQueue)
Function2 will be called "on schedule" which should prevent throttling, if the total amount of messages is not too high
MarkLogic Scheduled Tasks cannot be configured to run at an interval less than a minute.
Is there any way I can execute an XQuery module at an interval of 1 second?
NOTE:
Considering the situation where the Task Server is fully loaded and I need to make sure that the secondly scheduled task gets the Task Server thread whenever it needs.
Please let me know if there is anything in MarkLogic that can be used to achieve this.
Wanting rapid-fire scheduled tasks may be a hint that the design needs rethinking.
Even running a task once a minute can be risky, and needs careful thought to manage the possibilities of overlapping tasks and runaway tasks. If the application design calls for a scheduled task to run once a second, I would raise that as a potentially serious problem. Back up a few steps, and if necessary ask a new question about the higher-level problem that led to looking at scheduled tasks.
There was a sub-question about managing queue priority for tasks. Task priorities can handle some of that. There are two priorities: normal and higher. The Task Server empties the higher-priority queue first, then the normal queue. But each queue is still a simple queue, and there's no way to change priorities after a task has been spawned. So if you always queue tasks with priority=higher, then they'll all be in the higher priority queue and they'll all run in order. You can play some games with techniques like using server fields as signals to already-running tasks. But wanting to reorder tasks within a queue could be another hint that the design needs rethinking.
If, after careful thought about all the pitfalls and dangers, I decided I needed a rapid-fire task of some kind.... I would probably do it using external requests. Pick any scripting language and write a simple while loop with an HTTP request to the MarkLogic cluster. Even so, spend some time thinking about overlapping requests and locking. What happens if the request times out on the client side? Will it keep running on the server? Will that lead to overlapping requests and require deadlock resolution? Could it lead to runaway resource consumption?
Avoid any ideas that use xdmp:sleep. That will tie up a Task Server thread during the sleep period, and then you'll have two problems.
I'm planning a mechanism whose usage scenarios would be like cron's. It's a clock-ish mechanism that attempts task execution at prespecified times. Cron doesn't seem suitable, because these tasks trigger Scala method calls and the queue stored on a cloud database.
I imagine it like this: every x minutes, tasks' execution dates are retrieved from the database, and compared against current time, if the task is over-due it is executed and removed from queue.
My question is: how do I run the aforementioned check every x minutes on a distributed environment?
All advice encouraged.
I think the Akka scheduler might be what you are looking for. Here's a link to the Akka documentation and here's another link describing how to use Akka in Play.
Update: as Viktor Klang points out Akka is not a scheduler, however it does allow you to run a task periodically. I've used it in this mode quite successfully.
The best known library for this is Quartz Scheduler.
I realize that when I execute a SCOM Task on demand from a Powershell script, there are 2 columns in Task Status view called Schedule Time and Start Time. It seems that there is an interval these two fields of around 15 seconds. I'm wondering if there is a way to minimize this time so I could have a response time shorter when I execute an SCOM task on demand.
This is not generally something that users can control. The "ScheduledTime" correlates to the time when the SDK received the request to execute the task. The "StartTime" represents the time that the agent healthservice actually began executing the task workflow locally.
In between those times, things are moving as fast as they can. The request needs to propagate to the database, and a server healthservice needs to be notified that a task is being triggered. The servers then need to determine the correct route for the task message to take, then the healthservices need to actually send and receive the message. Finally, it gets to the actual agent where the task will execute. All of these messages go through the same queues as other monitoring data.
That sequence can be very quick (when running a task against the local server), or fairly slow (in a big Management Group, or when there is lots of load, or if machines/network are slow). Besides upgrading your hardware, you can't really do anything to make the process run quicker.
Can WWF handle high throughput scenarios where several dozen records are 'actively' being processed in parallel at any one time?
We want to build a workflow process which handles a few thousand records per hour. Each record takes up to a minute to process, because it makes external web service calls.
We are testing Windows Workflow Foundation to do this. But our demo programs show processing of each record appear to be running in sequence not in parallel, when we use parallel activities to process several records at once within one workflow instance.
Should we use multiple workflow instances or parallel activities?
Are there any known patterns for high performance WWF processing?
You should definitely use a new workflow per record. Each workflow only gets one thread to run in, so even with a ParallelActivity they'll still be handled sequentially.
I'm not sure about the performance of Windows Workflow, but from what I have heard about .NET 4 at Tech-Ed was that its Workflow components will be dramatically faster then the ones from .NET 3.0 and 3.5. So if you really need a lot of performance, maybe you should consider waiting for .NET 4.0.
Another option could be to consider BizTalk. But it's pretty expensive.
I think the common pattern is to use one workflow instance per record. The workflow runtime runs multiple instances in parallel.
One workflow instance runs one thread at a time. The parallel activity calls Execute method of each activity sequentially on this single thread. You may still get performance improvement from parallel activity however, if the activities are asynchronous and spend most of the time waiting for external process to finish its work. E.g. if activity calls an external web method, and then waits for a reply - it returns from Execute method and does not occupy this thread while waiting for the reply, so another activity in the Parallel group can start its job (e.g. also call to a web service) at the same time.