Running two instances of the ScheduledThreadPoolExecutor - scheduled-tasks

I have a number of asynchronous tasks to run in parallel. All the tasks can be divided into two types, lets call one - type A (that are time consuming) and everything else type B (faster and quick to execute ones).
with a single ScheduledThreadPoolExecutor with x poolsize, eventually at some point all threads are busy executing type A, as a resul type B gets blocked and delayed.
what im trying to accomplish is to run a type A tasks parallel to type B, and i want tasks in both the types to run parallel within their group for performance .
Would you think its prudent to have two instances of ScheduledThreadPoolExecutor for the type A and B exclusively with their own thread pools ? Do you see any issues with this approach?

No, that's seems reasonable.
I am doing something similar i.e. I need to execute tasks in serial fashion depending on some id e.g. all the tasks which are for component with id="1" need to be executed serially to each another and in parallel to all other tasks which are for components with different ids.
so basically I need a separate queue of tasks for each different component, the tasks are pulled one after another from each specific queue.
In order to achieve that I use
Executors.newSingleThreadExecutor(new JobThreadFactory(componentId));
for each component.
Additionally I need ExecutorService for a different type of tasks which are not bound to componentIds, for that I create additional ExecutorService instance
Executors.newFixedThreadPool(DEFAULT_THREAD_POOL_SIZE, new JobThreadFactory());
This works fine for my case at least.
The only problem I can think of if there is a need of ordered execution of the tasks i.e.
task2 NEEDS to be executed after task1 and so on... But I doubt this the case here ...


Difference between executing StreamTasks in the same instance v/s multiple instances

Say I have a topic with 3 partitions
Method 1: I run one instance of Kafka Streams, it starts 3 tasks [0_0,0_1,0_2] and each of these tasks consume from one partition.
Method 2: I spin up three instance of the same streams application, here again three tasks are started but now, it is distributed among the 3 instances that was created.
Which method is preferable and why?
In method 1 do all the tasks run as a part of the same thread, and in method 2, they run on different threads, or is it different?
Consider that the streams application has a very simple topology, and does only mapping of values from a single stream
By default, a single KafkaStreams instance runs one thread, thus in "Method 1" all three tasks are executed by a single thread. In "Method 2" each task is executed by its own thread. Note, that you can also configure multiple thread pre KafkaStreams instance via configuration parameter. If you set it to 3 for "Method 1" both method are more or less the same. How many threads you need, depends on your workload, ie, how many messages you need to process per time unit and how expensive the computation is. It also depends on the hardware: for a single-core CPU, it may not make sense to configure more than one thread, but you should deploy multiple instances on multiple machines to get more hardware. Hence, if your workload is lightweight one single-threaded instance might be enough.
Also note, that you may be network bound. For this case, starting more thread would not help, but you want to scale out to multiple machines, too.
The last consideration is fault-tolerance. Even if a single thread/instance may be powerful enough to not lag, what should happen if the instance crashes? If you only have one instance, the whole computation goes down. If you run two instances, the second instance would take over all the work and your application stays online.

Dynamic growth of `Actor`s vs Primal creation

In my project, I have defined groups of Actors called cells. Those cells process messages and operate basic calculations of some different kinds, one kind by small Actor type.
What is the advantage(s) of growing those small Actors, then killing them once the job is done, rather than creating them on my cell initialization, roaming till system shutdown?
If you'll crate one actor per job, you'll get parallel processing of messages, while if you'll create actor on initialization, you messages of the same type will be processed one by one.
Usually, you shouldn't use actors for parallel execution of your programm, their task is to process common resources, like incrementing counters in multi threaded programm. If you want parallel processing of messages, use futures

UVM shared variables

I have a doubt regarding UVM. Let's think I have a DUT with two interfaces, each one with its agent, generating transactions with the same clock. These transactions are handled with analysis imports (and write functions) on the scoreboard. My problem is that both these transactions read/modify shared variables of the scoreboard.
My questions are:
1) Have I to guarantee mutual exclusion explicitly though a semaphore? (i suppose yes)
2) Is this, in general, a correct way to proceed?
3) and the main problem, can in some way the order of execution be fixed?
Depending on that order the values of shared variables can change, generating inconsistency. Moreover, that order is fixed by specifications.
Thanks in advance.
While SystemVerilog tasks and functions do run concurrently, they do not run in parallel. It is important to understand the difference between parallelism and concurrency and it has been explained well here.
So while a SystemVerilog task or function could be executing concurrently with another task or function, in reality it does not actually run at the same time (run time context). The SystemVerilog scheduler keeps a list of all the tasks and functions that need to run on the same simulation time and at that time it executes them one-by-one (sequentially) on the same processor (concurrency) and not together on multiple processors (parallelism). As a result mutual exclusion is implicit and you do not need to use semaphores on that account.
The sequence in which two such concurrent functions would be executed is not deterministic but it is repeatable. So when you execute a testbench multiple times on the same simulator, the sequence of execution would be same. But two different simulators (or different versions of the same simulator) could execute these functions in a different order.
If the specifications require a certain order of execution, you need to ensure that order by making one of these tasks/functions wait on the other. In your scoreboard example, since you are using analysis port, you will have two "write" functions (perhaps using uvm_analysis_imp_decl macro) executing concurrently. To ensure an order, (since functions can not wait) you can fork out join_none threads and make one of the threads wait on the other by introducing an event that gets triggered at the conclusion of the first thread and the other thread waits for this event at the start.
This is a pretty difficult problem to address. If you get 2 transactions in the same time step, you have to be able to process them regardless of the order in which they get sent to your scoreboard. You can't know for sure which monitor will get triggered first. The only thing you can do is collect the transactions and at the end of the time step do your modeling/checking/etc.
Semaphores only help you if you have concurrent threads that take (simulation) time that are trying to access a shared resource. If you get things from an analysis port, then you get them in 0 time, so semaphores won't help you here.
So to my understanding, the answer is: compiler/vendor/uvm cannot ensure the order of execution. If you need to ensure the order which actually happen in same time step, you need to use semaphore correctly to make it work the way you want.
Another thing is, only you yourself know which one must execute after the other if they are in same simulation time.
this is a classical race condition where the result depends upon the actual thread order...
first of all you have to decide if the write race is problematic for you and/or if there is a priority order in this case. if you dont care the last access would win.
if the access isnt atomic you might need a semaphore to ensure only one access is handled at a time and the next waits till the first has finished.
you can also try to control order by changing the structure or introducing thread ordering (wait_order) or if possible you remove timing at all (here instead of directly operating with the data you get you simply store the data for some time and then later you operate on it.

How to synchronize tasks in different dispatch queues?

I'm new to queues and I'm having some trouble setting up the following scheme.
I have three tasks that need doing.
Task A: Can only run on the main queue, can run asynchronously with task B, cannot run asynchronously with task C. Runs a lot but runs fairly quickly.
Task B: Can run on any queue, can run asynchronously with task A, cannot run asynchronously with task C. Runs rarely, but takes a long time to run. Needs Task C to run afterwards, but once again task C cannot run asynchronously with task A.
Task C: Can run on any queue. Cannot run asynchronously with either task A or task B. Runs rarely and runs quickly.
Right now I have it like this:
Task A is submitted to the main queue by a Serial Queue X (a task is submitted to Serial Queue X to submit task A to the main queue).
Task B is submitted to Serial Queue X.
Task C is submitted to the main queue by Serial Queue X, just like task A.
The problem here is that task C sometimes runs at the same time as task B. The main queue sometimes runs task C at the same time that the serial queue runs task B.
So, how can I ensure that task B and task C never run at the same time while still allowing A and B to run at the same time and preventing A and C from running at the same time? Further, is there any easy way to make sure they run the same number of times? (alternating back and forth)
You know, I think I had this problem on my GRE, only A, B, and C were Bob, Larry, and Sue and they all worked at the same office.
I believe that this can be solved with a combination of a serial queue and a dispatch semaphore. If you set up a single-wide serial dispatch queue and submit tasks B and C to that, you'll guarantee that they won't run at the same time. You can then use a dispatch semaphore with a count set to 1 that is shared between tasks A and C to guarantee that only one of them will run at a time. I describe how such a semaphore works in my answer here. You might need to alter that code to use DISPATCH_TIME_FOREVER so that task A is held up before submission rather than just tossed aside if C is running (likewise for submission of C).
This way, A and B will be running on different queues (the main queue and your serial queue) so they can execute in parallel, but B and C cannot run at the same time due to their shared queue, nor can A and C because of the semaphore.
As far as load balancing on A and C goes (what I assume you want to balance), that's probably going to be fairly application-specific, and might require some experimentation on your part to see how to interleave actions properly without wasting cycles. I'd also make sure that you really need them to alternate evenly, or if you can get by with one running slightly more than another.
Did you check out NSOperation to synchronize your operations? You can handle dependencies there.
There's a much simpler way, of course, assuming that C must always follow A and B, which is to have A and B schedule C as completion callbacks for their own operations (and have C check to make sure it's not already running, in case A and B both ask for it to happen simultaneously). The completion callback pattern (described in dispatch_async man page) is very powerful and a great way of serializing async operations that need to be nonetheless coupled.
Where the problem is A, B, C, D and E where A-D can run async and E must always run at the end, dispatch groups are a better solution since you can set E to run as the completion callback for the entire group and then simply put A-E in that group.

Condor job using DAG with some jobs needing to run the same host

I have a computation task which is split in several individual program executions, with dependencies. I'm using Condor 7 as task scheduler (with the Vanilla Universe, due do constraints on the programs beyond my reach, so no checkpointing is involved), so DAG looks like a natural solution. However some of the programs need to run on the same host. I could not find a reference on how to do this in the Condor manuals.
Example DAG file:
JOB A A.condor
JOB B B.condor
JOB C C.condor
JOB D D.condor
I need to express that B and D need to be run on the same computer node, without breaking the parallel execution of B and C.
Thanks for your help.
Condor doesn't have any simple solutions, but there is at least one kludge that should work:
Have B leave some state behind on the execute node, probably in the form of a file, that says something like MyJobRanHere=UniqueIdentifier". Use the STARTD_CRON support to detect this an advertise it in the machine ClassAd. Have D use Requirements=MyJobRanHere=="UniqueIdentifier". A part of D's final cleanup, or perhaps a new node E, it removes the state. If you're running large numbers of jobs through, you'll probably need to clean out left-over state occasionally.
I don't know the answer but you should ask this question on the Condor Users mailing list. The folks who support the DAG functionality in Condor monitor it and will respond. See this page for subscription information. It's fairly low traffic.
It's generally fairly difficult to keep two jobs together on the same host in Condor without locking them to a specific host in advance, DAG or no DAG. I actually can't think of a really viable way to do this that would let B start before C or C start before B. If you were willing to enforce that B must always start before C you could make part of the work that Job B does when it starts running be modify the Requirements portion of Job C's ClassAd so that it has a "Machine == " string where is the name of the machine B landed on. This would also require that Job C be submitted held or not submitted at all until B was running, B would also have to release it as part of its start up work.
That's pretty complicated...
So I just had a thought: you could use Condor's dynamic startd/slots features and collapse your DAG to achieve what you want. In your DAG where you currently have two separate nodes, B and C, you would collapse this down into one node B' that would run both B and C in parallel when it starts on a machine. As part of the job requirements you note that it needs 2 CPUs on a machine. Switch your startd's to use the dynamic slot configuration so machines advertise all of their resources and not just statically allocated slots. Now you have B and C running concurrently on one machine always. There are some starvation issues with dynamic slots when you have a few multi-CPU jobs in a queue with lots of single-CPU jobs, but it's at least a more readily solved problem.
Another option is to tag B' with a special job attribute:
MultiCPUJob = True
And target it just at slot 1 on machines:
Requirements = Slot == 1 && ...your other requirements...
And have a static slot startd policy that says, "If a job with MultiCPUJob=True tries to run on slot 1 on me preempt any job that happens to be in slot 2 on this machine because I know this job will need 2 cores/CPUs".
This is inefficient but can be done with any version of Condor past 6.8.x. I actually use this type of setup in my own statically partitioned farms so if a job needs a machine all to itself for benchmarking it can happen without reconfiguring machines.
If you're interested in knowing more about that preemption option let me know and I can point you to some further configuration reading in the condor-user list archives.
The solution here is to use the fact that you can modify submit descriptions even while DAGMan is running as long as DAGMan has not yet submitted the node. Assume a simple DAG of A -> B -> C. If you want all nodes to run on the same host you can do the following:
Define a POST script on node A.
The post script searches condor_history for the ClusterId of the completed node A. Something like condor_history -l -attribute LastRemoteHost -m1 $JOB_ID ... You'll need to clean up the output and what not, but you'll be left with the host that ran node A.
The post script then searches for and modifies dependent job submit files, inserting into them a job job requirement at the top of the submit file. Just make sure you build your job requirements incrementally so that they pick up this new requirement if it is present.
When the post script completes, DAGMan will then look to submit ready nodes, of which in this example we have one: B. The submission of B will now be done with the new requirement you added in step 3, so that it will run on the same execute host as A.
I do this currently with numerous jobs. It works great.