I noticed a strange behavior in my rxjava workflow.
The problem is when I combine long data streams with short ones ( depending on what getEntries() returns).
The long streams take minutes, the short ones up to a couple of seconds.
Usually, even if the short subscription starts after the long one, it gets onComplete before. However, in about 1 of 4 or 5 attempts, the onComplete for the short stream is received right after the one of the long one.
I suspect this is due to the computation scheduler that chooses to schedule the onComplete on the rx thread kept busy by the long subscription. What boggles me is that the other 7 rx computation threads (8 core machine) threads are doing nothing.
I tried with a different thread on each subscription and it works as expected every time
Executor executor = Executors.newSingleThreadExecutor()
.observeOn(Schedulers.from(executor))
.subscribeOn(Schedulers.from(executor))
Code below
Service
public Flowable<E> entryFlow()
{
return Flowable
.fromIterable(getEntries())
.filter(Objects::nonNull)
.concatMap(this::transformEntry);
}
Flowable<E> transformEntry(E entry)
{
return Flowable.just(entry);
}
transformEntry may be overriden in some cases, but in my trial runs it was always a single element so I'm not sure if concatMap plays a role in this.
Consumer
service
.entryFlow()
.doOnCancel(this::onCancel)
.observeOn(Schedulers.computation())
.subscribeOn(Schedulers.computation())
.subscribe(
this::onNext,
this::onError,
this::onComplete);
I took a look at this question RxJava - Schedulers vs ExecutorService? and the referenced blog post and comments, but it's still not clear on why this happens and furthermore the frequency (~ 1 in 4 times).
Would appreciate some ideas on this.
Thank you
Related
I am analysing the performance of my spark application in case of small datasets. I have a lineage graph which looks something like following:
someList.toDS()
.repartition(x)
.mapPartitions(func1)
.mapPartitions(func2)
.mapPartitions(func3)
.filter(cond1)
.count()
I have a cluster of 2 nodes with 8 cores on each. Executors are configured to use 4 cores. So, when the application is running four executors come up using 4 cores each.
I am observing at least (and usually only) 1 task on each thread (i.e. 16 tasks in total) takes a lot longer than other tasks. For example, in one run these tasks are taking approx 15-20 seconds, compared to other tasks running in a second or less.
On profiling the code, I found the bottleneck to be in func3 above:
def func3 = (partition: Iterator[DomainObject]) => {
val l = partition.toList // This takes almost all of the time
val t = doSomething(l)
}
The conversion from an Iterator to a List takes up almost all of the time.
The partition size is very small (even less than 50 in some cases). Even then, the size of partition is almost consistent across different partitions, but only one task per thread takes up the time.
I would have assumed that by the time func3 runs on the executor for a task, the data within that partition would already be present on the executor. Is this not the case? (Does it iterate over the entire dataset to filter out data for this partition somehow, during the execution of func3?!)
Else, why should the conversion from an Iterator over less than fifty objects to a List take up that much time?
Other thing I note (not sure if that is relevant) is the GC time (as per spark ui) for these tasks is also unusually consistent 2s for all of these sixteen tasks, as compared to other tasks (even then, 2s<<20s)
Update:
Following is how the event timeline looks for the four executors:
First realization is during repartition()
Second is after the filter operation, where all the three mapPartitions starts execution (when the count action is called). Depending on your doSomething() in each of those functions it will depend how the DAG will be created and where it is taking time and accordingly you can optimize.
It appears that the data in the partition is available as soon as the task starts executing (or, at least there is not any significant cost in iterating through that data, as the question would make it seem.)
The bottleneck in above code is actually in func2 (which I did not investigate properly!), and is because of the lazy nature of the iterators in scala. The problem is not related to spark at all.
Firstly, the functions in the mapPartitions calls above appear to get chained and called like so: func3( func2( func1(Iterator[A]) ) ) : Iterator[B]. So the Iterator produced as output of func2 is fed to func3 directly.
Secondly, for above issue func1 (and func2) are defined as :
func1(x: Iterator[A]) => Iterator[B] = x.map(...).filter...
Since these take an iterator and map them to a different iterator, these are not executed right away. But when func3 is executed, partition.toList causes to map closure in func2 to get executed. On profiling, it appears that func3 took all the time, where instead func2 has the code slowing the application.
(Specific to above problem, func2 contains some serialising of case objects to a json string. It appears to execute some time-consuming implicit code, only for the first object on each thread. Since it happens once for each thread, each thread has just one task which takes very long, and explains the event timeline above.)
I have a python asyncio application, with several coroutines being run within a single thread. Some data is being passed using queues.
The queue consumer looks like this:
async def queue_consumer(q):
"""Consume from an asyncio.Queue, making it an async iterable"""
while True:
try:
e = await q.get()
yield e
except:
continue
the consumer is being pulled from with async for.
In this particular case the coroutine which consumes from a specific queue sequentially calls some code which puts data into its queue with put_nowait.
EDIT: In this particular case coroutine A, which is listening for inbound network traffic, puts message into queue of coroutine B.
I have noticed that there is a consistent ~50ms delay between a call to put_nowait in coroutine A and then the data being processed as a result of pulling it from queue async iterable in coroutine B.
I suspect it might have something to do with some asyncio internal polling resolution, but I am not sure, not I would suspect where such configuration could be modified.
I would be very much interested in increasing event polling frequency in the asyncio loop, hence, decreasing observed delay between put_nowait and get to/from a queue between coroutines. Maybe also there's a way to hint asyncio framework to process items from the queue earlier?
NB: the application I am working with is not doing any computationally demanding work.
It turns out the problem was caused by my app doing some UI updates with prompt_toolkit. I tracked this down by placing some measurements within _run_once. Anyway, the queue was not being processed because the event loop was busy executing some UI code that I did not expect to take so much time.
Have been trying to google this but getting a bit stuck.
Let's say we have a class that fires an event, and that event could be fired by several threads at the same time.
Using Observable.FromEventPattern, we create an Observable, and subscribe to that event. How exactly does Rx manage multiple those events being fired at once? Let's say we have 3 events fired in quick succession on different threads. Does it queue them internally, and then call the Subscribe delegate synchronously for each one? Let's say we were subscribing on a thread pool, can we still guarantee the Subscriptions would be processed separately in time?
Following on from that, let's say for each event, we want to perform an action, but it's a method that's potentially not thread safe, so we only want one thread to be in this method at a time. Now I see we can use an EventLoop Scheduler, and presumably we wouldn't need to implement any locking on the code?
Also, would observing on the Current Thread be an option? Is Current Thread the thread that the event was fired from, or the event the subscription was set up on? i.e. Is that current thread guaranteed to always be the same or could be have 2 threads running ending up in the method at the same time?
Thx
PS: I put an example together but I always seem to end up on the samethread in my subscrive method, even when I ObserveOn the threadpool, which is confusing :S
PSS: From doing a few more experiments, it seems that if no Schedulers are specified, then RX will just execute on whatever thread the event was fired on, meaning it processes several concurrently. As soon as I introduce a scheduler, it always runs things consecutively, no matter what the type of the scheduler is. Strange :S
According to the Rx Design Guidelines, an observable should never call OnNext of an observer concurrently. It will always wait for the current call to complete before making the next call. All Rx methods honor this convention. And, more importantly, they assume you also honor this convention. When you violate this condition, you may encounter subtle bugs in the behavior of your Observable.
For those times when you have source data that does not honor this convention (ie it can produce data concurrently), they provide Synchronize.
Observable.FromEventPattern assumes you will not be firing concurrent events and so does nothing to prevent concurrent downstream notifications. If you plan on firing events from multiple threads, sometimes concurrently, then use Synchronize() as the first operation you do after FromEventPattern:
// this will get you in trouble if your event source might fire events concurrently.
var events = Observable.FromEventPattern(...).Select(...).GroupBy(...);
// this version will protect you in that case.
var events = Observable.FromEventPattern(...).Synchronize().Select(...).GroupBy(...);
Now all of the downstream operators (and eventually your observer) are protected from concurrent notifications, as promised by the Rx Design Guidelines. Synchronize works by using a simple mutex (aka the lock statement). There is no fancy queueing or anything. If one thread attempts to raise an event while another thread is already raising it, the 2nd thread will block until the first thread finishes.
In addition to the recommendation to use Synchronize, it's probably worth having a read of the Intro to Rx section on scheduling and threading. It Covers the different schedulers and their relationship to threads, as well as the differences between ObserveOn and SubscribeOn, etc.
If you have several producers then there are RX methods for combining them in a threadsafe way
For combining streams of the same type of event into a single stream
Observable.Merge
For combining stream of different types of events into a single stream using a selector to transform the latest value on each stream into a new value.
Observable.CombineLatest
For example combining stock prices from different sources
IObservable<StockPrice> source0;
IObservable<StockPrice> source1;
IObservable<StockPrice> combinedSources = source0.Merge(source1);
or create balloons at the current position every time there is a click
IObservable<ClickEvent> clicks;
IObservable<Position> position;
IObservable<Balloons> balloons = clicks
.CombineLatest
( positions
, (click,position)=>new Balloon(position.X, position.Y)
);
To make this specifically relevant to your question you say there is a class which combines events from different threads. Then I would use Observable.Merge to combine the individual event sources and expose that as an Observable on your main class.
BTW if your threads are actually tasks that are firing events to say they have completed here is an interesting patterns
IObservable<Job> jobSource;
IObservable<IObservable<JobResult>> resultTasks = jobSource
.Select(job=>Observable.FromAsync(cancelationToken=>DoJob(token,job)));
IObservable<JobResult> results = resultTasks.Merge();
Where what is happening is you are getting a stream of jobs in. From the jobs you are creating a stream of asynchronous tasks ( not running yet ). Merge then runs the tasks and collects the results. It is an example of a mapreduce algorithm. The cancellation token can be used to cancel running async tasks if the observable is unsubscribed from (ie canceled )
I have a task which can be easily be broken into parts which can and should be processed in parallel to optimize performance.
I wrote an producer actor which prepares each part of the task that could be processed independently. This preparation is relatively cheap.
I wrote a consumer Actor that processes each of the independent tasks. Depending on the parameters each piece of independent task may take up to a couple of seconds to be processed. All tasks are quite the same. They all process the same algorithm, with the same amount of data (but different values of course) resulting in about equal time of processing.
So the producer is much faster than the consumer. Hence there quickly may be 200 or 2000 tasks prepared (depending on the parameters). All of them consuming memory while just a couple of them can be executed at at once.
Now I see two simple strategies to consume and process the tasks:
Create a new consumer actor instance for each task.
Each consumer processes only on task.
I assume there would be many consumer actor instances at the same time, while only a couple of them, can be processed at any point in time.
How does the default scheduler work? Can each consumer actor finish processing before the next consumer will be scheduled? Or will a consumer be interrupted and be replaced by another consumer resulting in longer time until the first task will be finished? I think this actor scheduling is not the same as process or thread scheduling, but I can imagine, that interruption can still have some disadvantages (e.g. like more cache misses).
The other strategy is to use N instances of the consumer actor and send the tasks to process as messages to them.
Each consumer processes multiple tasks in sequence.
It is left up to me, to find a appropriate value for the N (number of consumers).
The distribution of the tasks over the N consumers is also left up to me.
I could imagine a more sophisticated solution where more coordination is done between the producer and the consumers, but I can't make a good decision without knowledge about the scheduler.
If manual solution will not result in significant better performance, I would prefer a default solution (delivered by some part of the Scala world), where scheduling tasks are not left up to me (like strategy 1).
Question roundup:
How does the default scheduler work?
Can each consumer actor finish processing before the next consumer will be scheduled?
Or will a consumer be interrupted and be replaced by another consumer resulting in longer time until the first task will be finished?
What are the disadvantages when the scheduler frequently interrupts an actor and schedules another one? Cache-Misses?
Would this interruption and scheduling be like a context-change in process scheduling or thread scheduling?
Are there any more advantages or disadvantages comparing these strategies?
Especially does strategy 1 have disadvantages over strategy 2?
Which of these strategies is the best?
Is there a better strategy than I proposed?
I'm afraid, that questions like the last two can not be answered absolutely, but maybe this is possible this time as I tried to give a case as concrete as possible.
I think the other questions can be answered without much discussion. With those answers it should be possible to choose the strategy fitting the requirements best.
I made some research and thoughts myself and came up with some assumptions. If any of these assumptions are wrong, please tell me.
If I were you, I would have gone ahead with 2nd option. A new actor instance for each task would be too tedious. Also with smart decision of N, complete system resources can be used.
Though this is not a complete solution. But one possible option is that, can't the producer stop/slow down the rate of producing tasks? This would be ideal. Only when there is a consumer available or something, the producer will produce more tasks.
Assuming you are using Akka (if you don't, you should ;-) ), you could use a SmallestMailboxRouter to start a number of actors (you can also add a Resizer) and the message distribution will be handled according to some rules. You can read everything about routers here.
For such a simple task, actors give no profit at all. Implement the producer as a Thread, and each task as a Runnable. Use a thread pool from java.util.concurrent to run the tasks. Use a java.util.concurrent. Semaphore to limit the number of prepared and running tasks: before creating the next tasks, producer aquires the sempahore, and each task releases the semaphore at the end of its execution.
Can any body explain me what is the difference among sleep() , usleep() & [NSThread sleepForTimeInterval:] ?
What is the best condition to use these methods ?
sleep(3) is a posix standard library method that attempts to suspend the calling thread for the amount of time specified in seconds. usleep(3) does the same, except it takes a time in microseconds instead. Both are actually implemented with the nanosleep(2) system call.
The last method does the same thing except that it is part of the Foundation framework rather than being a C library call. It takes an NSTimeInterval that represents the amount of time to be slept as a double indicating seconds and fractions of a second.
For all intents and purposes, they all do functionally the same thing, i.e., attempt to suspend the calling thread for some specified amount of time.
What is the best condition to use
these methods ?
Never
Or, really, pretty much almost assuredly never ever outside of the most unique of circumstances.
What are you trying to do?
On most OSs, sleep(0) and its variants can be used to improve efficiency in a polling situation to give other threads a chance to work until the thread scheduler decides to wake up the polling thread. It beats a full-on while loop. I haven't found much use for a non-zero timeout though, and apple in particular has done a pretty good job of building an event driven architecture that should eliminate the need for polling in most situations anyway.
-Example usage of sleep is in the following state:
In network simulation scenario, we usually have events that are executed event by event,using a scheduler. The scheduler executes events in orderly fashion.
When an event is finished executing, and the scheduler moves to the next event, the scheduler compares the next event execution time with the machine clock. If the next event is scheduled for a future time, the simulator sleeps until that realtime is reached and then executes the next event.
-From linux Man pages:
The usleep() function suspends execution of the calling thread for (at least) usec microseconds. The sleep may be lengthened slightly by any system activity or by the time spent processing the call or by the granularity of system timers.
while sleep is delaying the execution of a task(could be a thread or anything) for sometime .Refer to 1 and 2 for more details about the functions.