The task scheduling problem of Flink, in Flink, how to place subtasks in a slot of the specified task manager? - scheduled-tasks

Recently, I am studying the problem of task scheduling in Flink. My purpose is to schedule subtasks to a slot of the specified node according to my own needs by modifying some source codes of the scheduling part. Through remote debugging and checking the source code, I found the following method call stack, most of which I can't understand (the comments are a little less), especially in this method: org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateMultiTaskSlot. I guess the code that allocates slots to tasks is around here. Because it is too difficult to read the source code, I have to ask you for help. Of course, if there is a better way to achieve my needs, please specify one or two. Sincerely look forward to your reply! Thank you very much!!!
The method call stack is as follows(The version of Flink I use is 1.11.1):
org.apache.flink.runtime.jobmaster.JobMaster#startJobExecution
org.apache.flink.runtime.jobmaster.JobMaster#resetAndStartScheduler
org.apache.flink.runtime.jobmaster.JobMaster#startScheduling
org.apache.flink.runtime.scheduler.SchedulerBase#startScheduling
org.apache.flink.runtime.scheduler.DefaultScheduler#startSchedulingInternal
org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy#startScheduling
(This is like the method call chain of PipelinedRegionSchedulingStrategy class. In order to simply write it as the method call chain of EagerSchedulingStrategy class, it should have no effect)
org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy#allocateSlotsAndDeploy
org.apache.flink.runtime.scheduler.DefaultScheduler#allocateSlotsAndDeploy
org.apache.flink.runtime.scheduler.DefaultScheduler#allocateSlots
org.apache.flink.runtime.scheduler.DefaultExecutionSlotAllocator#allocateSlotsFor
org.apache.flink.runtime.executiongraph.SlotProviderStrategy.NormalSlotProviderStrategy#allocateSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateSlotInternal
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#internalAllocateSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateSharedSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateMultiTaskSlot
(I feel that this is the key to allocate slot for subtask, that is, execution vertex, but there is no comment, and I don't understand the process idea, so I can't understand it.)

Related

Understanding GLib Task and Context

I don't understand the GTask functionality? why do I need this?
In my mind it is like callback.. you set a callback to a source in some context and this callback is then called when event is happening.
In general, i'm a bit confused about what is a Context and a Task in GLib and why do we need them.
In my understanding there is a main loop (only 1?) that can run several contexts (what is a context?) and each context is related to several sources which in their turn have callbacks that are like handlers.
So can someone please make some sense for me in it all.
I don't understand the GTask functionality? why do I need this? In my mind it is like callback.. you set a callback to a source in some context and this callback is then called when event is happening.
The main functionality GTask exposes is easily and safely running a task in a thread and returning the result back to the main thread.
In general, i'm a bit confused about what is a Context and a Task in GLib and why do we need them. In my understanding there is a main loop (only 1?) that can run several contexts (what is a context?) and each context is related to several sources which in their turn have callbacks that are like handlers.
For simplicity I think its safe to consider contexts and loops the same thing and there can be multiple of them. So in order to be thread-safe the task must know which context the result is returned to.

Measuring Duration Of Service Fabric Actor Methods

I wish to measure the time it takes for some of my Actor methods to execute. I plan to use OnPreActorMethodAsync and OnPostActorMethodAsync to log when these methods have begun and finished execution.
However I also wish to measure the time it takes for these methods to execute. My original plan was to use an instance of StopWatch to begin timing in the OnPreActorMethodAsync class and stop in the OnPostActorMethodAsync class. However I'm unsure how I can accessthe same StopWatch instance in the second method. If there is a better way of doing this I would be very interested.
If anyone could point me in the right direction on this matter that would be great.
Many Thanks
By using some dirty reflection tricks you can access the Actors' call context.
More about it here: https://stackoverflow.com/a/39526751/5946937
(Disclaimer: Actor internals may have changed since then)

how to debug in simpy

I have a general question about how to debug in Simpy. Normal debugging tools don't seem to work, since everything is working on the event loop, and you can't step through the code line by line and inspect what exists at any point in time.
Primarily, I'm interested in finding what kinds of processes and callbacks are in existence at a particular time, and how to remove them at the appropriate point. Are there any best practices surrounding debugging in discrete event simulation generally?
I would just use a bunch of print()s.
One thing you might find useful is the specific requests that can be passed to primitives such as resources. For example you can ask a resource how many users it currently has or how big the queue to use the resource is with:
All of these commands can be found in the documentation, here is the resource example: https://simpy.readthedocs.io/en/latest/api_reference/simpy.resources.html

Tutorial OpenCl event handling

In my last question, OpenCl cleanup causes segfault. , somebody hinted that missing event handling, i.e. not waiting for code to finish, could cause the seg faults. Since then I looked again into the tutorials I used, but they don't pay attention to events (Matrix Multiplication 1 (OpenCL) and NVIDIA_OpenCL_GettingStartedLinux.pdf) or talk about it in detail and (for me) understandable.
Do you know a tutorial on where and how to wait in OpenCL?
Merci!
I don't have a tutorial on events in OpenCL, and I'm by no means an expert, but since no one else is responding...
As a rule of thumb, you'll need to wait for any function named clEnqueue*. Those functions return immediately before the job is done. The easiest way to make sure your queue is finished is to call clFinish(). It won't return until the entire queue has completed.
If you want to get a little fancier, most of the clEnqueue* functions have an optional cl_event parameter that you can pass in. You can check on a particular event with clGetEventInfo(), and you can wait for a particular set of events to finish with clWaitForEvents().

How to do a "kill_proc()" in Linux Kernel 2.6.31.5

Trying this free forum for developers. I am migrating a serial driver to kernel 2.6.31.5. I have used various books and articles to solve problems going from 2.4
Now I have a couple of kill_proc that is not supported anymore in kernel 2.6.31.5
What would be the fastest way to migrate this into the kernel 2.6.31.5 way of killing a thread. In the books they say use kill() but it does not seem to be so in 2.6.31.5. Using send_signal would be a good way, but how do I do this? There must be a task_struct or something, I wich I could just provide my PID and SIGTERM and go ahaed and kill my thread, but it seems more complicated, having to set a struct with parameters I do not know of.
If anyone have a real example, or a link to a place with up to date info on kernel 2.6.31 I would be very thankful. Siply put, I need to kill my thread, and this is not suppose to be hard. ;)
This is my code now:
kill_proc(ex_pid, SIGTERM, 1);
/Jörgen
For use with kthreads, there is now kthread_stop that the caller (e.g. the module's exit function) can invoke. The kthread itself has to check using kthread_should_stop. Examples of that are readily available in the kernel source tree.