ScheduledExecutorService.scheduleWithFixedDelay usage in vertx

ScheduledExecutorService.scheduleWithFixedDelay usage in vertx - vert.x

I would like schedule a task that gets executed periodically after the completion of task. Previously I used ScheduledExecutorService.scheduleWithFixedDelay but now in vertx I am thinking whether it may cause any issue since vertx already uses thread for event loop and worker verticles.
I checked Vertx.setPeriodic, but that just executes periodically without checking or waiting for the task to complete before scheduling other.
With all options explored currently I have a workaround where I use Vertx.setTimer to schedule the task and on completion I am calling Vertx.setTimer again inside the handler.
On high level the schedule task will query records from one table and update other table.
Anyone has any other better solution, please guide me.
Vertx version - 3.9.4

To my best knowledge ScheduledExecutorService.scheduleWithFixedDelay shouldn't cause any problems with Vert.x, and you can continue using it.
Your suggestion of setting timer in a callback is the better way of solving that, though.

Related

Processing Groups of Results with Vertx - How to coordinate?

I have a job processing system where each job contains thousands of individual tasks that require different strategies to complete. The individual tasks make up the whole job. If all tasks have been completed, the job is marked as successfully completed and other steps are taken, if any of the tasks fail, the job must be marked as failed and other steps are taken, if the job times out the job must be marked as failed and other steps are taken.
Once all of the results for a job have been received, the next job can be fetched. The next job shouldn't be fetched while a job is currently being processed.
Here is the what the flow looks like:
The Job Polling Verticle publishes a job to the event bus, and the Job Processing Verticle publishes each task to the event bus. When the job strategy completes, it publishes the task result to the event bus.
The issue is that I don't know the right way to determine when all tasks have been completed in this model. All verticles are stateless, The Job Processing Verticle doesn't await any futures, and even if the Job Results Verticle was stateful, it doesn't know how many results it should expect.
The only way I can think to do this would be to have a global stateful object. But I don't think this is good design.
Additionally, I need to know when a Job has timed out. That is, it's run longer than it should and I need to consider it's failed, log it, and move on.
I could do this with the global state, but again I don't think that's the right solution.
Does this verticle pattern make sense for what I'm trying to do?

First, let me try to address your questions. Then I'll try to explain what problems this design has.
The issue is that I don't know the right way to determine when all tasks have been completed in this model. All verticles are stateless, The Job Processing Verticle doesn't await any futures, and even if the Job Results Verticle was stateful, it doesn't know how many results it should expect.
The solution could be reference counting verticle. Each worker should emit a start message on event bus with jobId when it starts, and end message with jobId when it completes. Even if you have fan-out (those are the cases that you don't know how many workers there are), counting verticle will know that. In your diagram, "Job Post Processing Verticle" is a good candidate for this. It can maintain a counter, and only when it reaches zero, it should start the next job. That also helps avoiding actually sharing some memory reference.
Additionally, I need to know when a Job has timed out. That is, it's run longer than it should and I need to consider it's failed, log it, and move on.
In the same verticle you can start a timer every time you get a new start message. If you get end message, cancel the timer. Otherwise, cancel current job and start again.
Now, this solution will work, but the design has two main flaws. One is the fact that you maintain all your flow in memory, it seems. If your application crashes, all progress is lost, and it's not clear how you record it. Maybe polling Jobs table in DB would actually be better, since your job execution is sequential anyway.
Second point is the fact that all those timeouts and reference counting is homemade implementation of structured concurrency. Maybe you should take a look at something like Kotlin coroutines for that, at it will handle many of your problems for you.

How can I create a Scheduled Task that will run every Second in MarkLogic?

MarkLogic Scheduled Tasks cannot be configured to run at an interval less than a minute.
Is there any way I can execute an XQuery module at an interval of 1 second?
NOTE:
Considering the situation where the Task Server is fully loaded and I need to make sure that the secondly scheduled task gets the Task Server thread whenever it needs.
Please let me know if there is anything in MarkLogic that can be used to achieve this.

Wanting rapid-fire scheduled tasks may be a hint that the design needs rethinking.
Even running a task once a minute can be risky, and needs careful thought to manage the possibilities of overlapping tasks and runaway tasks. If the application design calls for a scheduled task to run once a second, I would raise that as a potentially serious problem. Back up a few steps, and if necessary ask a new question about the higher-level problem that led to looking at scheduled tasks.
There was a sub-question about managing queue priority for tasks. Task priorities can handle some of that. There are two priorities: normal and higher. The Task Server empties the higher-priority queue first, then the normal queue. But each queue is still a simple queue, and there's no way to change priorities after a task has been spawned. So if you always queue tasks with priority=higher, then they'll all be in the higher priority queue and they'll all run in order. You can play some games with techniques like using server fields as signals to already-running tasks. But wanting to reorder tasks within a queue could be another hint that the design needs rethinking.
If, after careful thought about all the pitfalls and dangers, I decided I needed a rapid-fire task of some kind.... I would probably do it using external requests. Pick any scripting language and write a simple while loop with an HTTP request to the MarkLogic cluster. Even so, spend some time thinking about overlapping requests and locking. What happens if the request times out on the client side? Will it keep running on the server? Will that lead to overlapping requests and require deadlock resolution? Could it lead to runaway resource consumption?
Avoid any ideas that use xdmp:sleep. That will tie up a Task Server thread during the sleep period, and then you'll have two problems.

How to trigger a method call every x minutes in Scala?

I'm planning a mechanism whose usage scenarios would be like cron's. It's a clock-ish mechanism that attempts task execution at prespecified times. Cron doesn't seem suitable, because these tasks trigger Scala method calls and the queue stored on a cloud database.
I imagine it like this: every x minutes, tasks' execution dates are retrieved from the database, and compared against current time, if the task is over-due it is executed and removed from queue.
My question is: how do I run the aforementioned check every x minutes on a distributed environment?
All advice encouraged.

I think the Akka scheduler might be what you are looking for. Here's a link to the Akka documentation and here's another link describing how to use Akka in Play.
Update: as Viktor Klang points out Akka is not a scheduler, however it does allow you to run a task periodically. I've used it in this mode quite successfully.

The best known library for this is Quartz Scheduler.

Running parallel tasks in background

I am using NSOperationQueue to run the task. I am using background processing in my application. But while adding tasks in NSOperationQueue, I found that tasks will be added in a queue format.
Does NSOperationQueue performs parallel task or sequential task processing?
If not parallel, then how can I achieve parallel task operations in my app?

I would recommend you read this document about concurrency programming.

You can perform parallel tasks by using "performSelectorInBackground" method. In this you are just showing the foreground tasks, meanwhile background tasks is being in progress. I have already answered on the following link, please follow the link given below with accepted answer. Perhaps it will help you.
http://[stackoverflow.com/questions/8725636/cant-get-the-activity-indicator-to-show-on-iphone-app/8725875#8725875][1]
Thanks!
Regards!
Khalid Usman

Use the setMaxConcurrentOperationCount: to set the number of concurrent operations.

I would suggest that you read this. The relevant information for you is:
Operation queues work with the system to restrict the number of concurrent operations to a value that is appropriate for the available cores and system load.
Although the NSOperationQueue class is designed for the concurrent execution of operations, it is possible to force a single queue to run only one operation at a time. The setMaxConcurrentOperationCount: method lets you configure the maximum number of concurrent operations for an operation queue object. Passing a value of 1 to this method causes the queue to execute only one operation at a time. Although only one operation at a time may execute, the order of execution is still based on other factors, such as the readiness of each operation and its assigned priority.

Networking using run loop

I have an application which uses some external library for analytics. Problem is that I suspect it does some things synchronously, which blocks my thread and makes watchdog kill my app after 10 secs (0x8badf00d code). It is really hard to reproduce (I cannot), but there are quite few cases "in the wild".
I've read some documentation, which suggested that instead creating another thread I should use run-loops. Unfortunately the more I read about them, the more confused I get. And the last thing i want to do is release a fix which will break even more things :/
What I am trying to achieve is:
From main thread add a task to the run-loop, which calls just one function: initMyAnalytics(). My thread continues running, even if initMyAnalytics() gets locked waiting for network data. After initMyAnalytics() finishes, it quietly quits and never gets called again (so it doesnt loop or anything).
Any ideas how to achieve it? Code examples are welcome ;)
Regards!

You don't need to use a run loop in that case. Run loops' purpose is to proceed events from various sources sequentially in a particular thread and stay idle when they have nothing to do. Of course, you can detach a thread, create a run loop, add a source for your function and run the run loop until the function ends. The same as you can use a semi-trailer truck to carry your groceries home.
Here, what you need are dispatch queues. Dispatch queues are First-In-First-Out data structures that run tasks asynchronously. In contrary to run loops, a dispatch queue isn't tied to a particular thread: the working threads are automatically created and terminated as and when required.
As you only have one task to execute, you don't need to create a dispatch queue. Instead you will use an existing global concurrent queue. A concurrent queue execute one or more tasks concurrently, which is perfectly fine in our case. But if we had many tasks to execute and wanted each task to wait for its predecessor to end, we would need to create a serial queue.
So all you have to do is:
create a task for your function by enclosing it into a Block
get a global queue using dispatch_get_global_queue
add the task to the queue using dispatch_async.
dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^{
initMyAnalytics();
});
DISPATCH_QUEUE_PRIORITY_DEFAULT is a macro that evaluates to 0. You can get different global queues with different priorities. The second parameter is reserved for future use and should always be 0.