Best practices setting onSubscribe thread from an API - rx-java2

Let's say I have an API that exposes two methods, each returns an observable
import org.assertj.core.util.VisibleForTesting;
import java.util.Random;
import java.util.concurrent.TimeUnit;
import io.reactivex.Observable;
import io.reactivex.Scheduler;
class SomeApiClass {
private static final String[] doOnSubscribeThread = new String[1];
static Observable<Integer> immediatelyDoWork() {
return Observable.just(1, 2)
.doOnSubscribe(ignore -> doOnSubscribeThread[0] = Thread.currentThread().getName())
.flatMap(ignore -> doWork());
}
static Observable<Integer> periodicallyDoWork() {
// interval is using default computation scheduler
return Observable.interval(1, TimeUnit.SECONDS)
.doOnSubscribe(ignore -> doOnSubscribeThread[0] = Thread.currentThread().getName())
.flatMap(ignore -> doWork());
}
#VisibleForTesting
static String getSubscribedOnThread() {
return doOnSubscribeThread[0];
}
private static Observable<Integer> doWork() {
return Observable.create(emitter -> {
Random random = new Random();
emitter.onNext(random.nextInt());
emitter.onComplete();
});
}
Most APIs would just let the calling application set the subscribeOn thread (imagine these tests are my application):
import org.junit.Test;
import io.reactivex.Observable;
import io.reactivex.android.schedulers.AndroidSchedulers;
import io.reactivex.observers.TestObserver;
import io.reactivex.schedulers.Schedulers;
import static com.google.common.truth.Truth.assertThat;
public class ExampleTest {
#Test
public void canSetSubscribeOnThread() {
Observable<Integer> coloObservable = SomeApiClass.immediatelyDoWork()
.subscribeOn(Schedulers.newThread())
.observeOn(AndroidSchedulers.mainThread());
TestObserver<Integer> testObserver = coloObservable.test();
testObserver.awaitCount(2); // wait for a few emissions
assertThat(SomeApiClass.getSubscribedOnThread()).contains("RxNewThreadScheduler");
}
#Test
public void canSetSubscribeOnThreadIfApiUsesInterval() {
Observable<Integer> coloObservable = SomeApiClass.periodicallyDoWork()
.subscribeOn(Schedulers.newThread())
.observeOn(AndroidSchedulers.mainThread());
TestObserver<Integer> testObserver = coloObservable.test();
testObserver.awaitCount(2); // wait for a few emissions
assertThat(SomeApiClass.getSubscribedOnThread()).contains("RxNewThreadScheduler");
}
}
IIUC in the immediate example all subscription side-effects (including just()) will happen on a new thread. Karnok explains well here.
But in the periodic example, interval will use the default (computation) scheduler. What do most APIs do in this case? Do they let the caller set subscribeOn thread for all subscription side-effects except interval itself? In the periodic test above, we are still able to set subscribeOn thread for everything but interval. Or do they add an argument to set this subscribeOn too:
/**
* Works like {#link #periodicallyDoWork()} but allows caller to set subscribeOnSchedueler
*/
static Observable<Integer> periodicallyDoWork(Scheduler subscribeOnScheduler) {
return Observable.interval(1, TimeUnit.SECONDS, subscribeOnScheduler)
.doOnSubscribe(ignore -> doOnSubscribeThread[0] = Thread.currentThread().getName())
.flatMap(ignore -> doWork());
}
And then allow callers to omit the subscribeOn() method:
#Test
public void canSetSubscribeOnThreadIfApiUsesInterval() {
Observable<Integer> coloObservable = SomeApiClass.periodicallyDoWork(Schedulers.newThread())
.observeOn(AndroidSchedulers.mainThread());
TestObserver<Integer> testObserver = coloObservable.test();
testObserver.awaitCount(2); // wait for a few emissions
assertThat(SomeApiClass.getSubscribedOnThread()).contains("RxNewThreadScheduler");
}
Is this overkill? As long as the caller also calls subscribeOn() is there any danger in just letting interval use the default computation scheduler?

In my opinion, an API that creates observer chains must provide ways of injecting schedulers. Without that capability, unit testing becomes almost impossible to manage.
I have quite a bit of experience writing tests for real time systems. Simply being able to supply a TestScheduler or two to the unit under test makes the difference between being able to test reasonably and not bothering. Consider a subsystem that has a debounce() method period of 1 second. It is not feasible to write unit tests of several dozen cases without being able to use a TestScheduler and using advanceTimeBy() to control the clock. This means unit tests can be done in 10s of milliseconds that would take minutes if using a regular scheduler.

Related

What is a good way/pattern to use Temporal/Cadence versioning API

The Versioning API is powerful. However, with the pattern of using it, the code will quickly get messy and hard to read and maintain.
Over the time, product need to move fast to introduce new business/requirements. Is there any advice to use this API wisely.
I would suggest using a Global Version Provider design pattern in Cadence/Temporal workflow if possible.
Key Idea
The versioning API is very powerful to let you change the behavior of the existing workflow executions in a deterministic way(backward compatible). In real world, you may only care about adding the new behavior, and being okay to only introduce this new behavior to newly started workflow executions. In this case, you use a global version provider to unify the versioning for the whole workflow.
The Key idea is that we are versioning the whole workflow (that's why it's called GlobalVersionProvider). Every time adding a new version, we will update the version provider and provide a new version.
Example In Java
import com.google.common.annotations.VisibleForTesting;
import com.google.common.collect.ImmutableMap;
import io.temporal.workflow.Workflow;
import java.util.HashMap;
import java.util.Map;
public class GlobalVersionProvider {
private static final String WORKFLOW_VERSION_CHANGE_ID = "global";
private static final int STARTING_VERSION_USING_GLOBAL_VERSION = 1;
private static final int STARTING_VERSION_DOING_X = 2;
private static final int STARTING_VERSION_DOING_Y = 3;
private static final int MAX_STARTING_VERSION_OF_ALL =
STARTING_VERSION_DOING_Y;
// Workflow.getVersion can release a thread and subsequently cause a non-deterministic error.
// We're introducing this map in order to cache our versions on the first call, which should
// always occur at the beginning of an workflow
private static final Map<String, GlobalVersionProvider> RUN_ID_TO_INSTANCE_MAP =
new HashMap<>();
private final int versionOnInstantiation;
private GlobalVersionProvider() {
versionOnInstantiation =
Workflow.getVersion(
WORKFLOW_VERSION_CHANGE_ID,
Workflow.DEFAULT_VERSION,
MAX_STARTING_VERSION_OF_ALL);
}
private int getVersion() {
return versionOnInstantiation;
}
public boolean isAfterVersionOfUsingGlobalVersion() {
return getVersion() >= STARTING_VERSION_USING_GLOBAL_VERSION;
}
public boolean isAfterVersionOfDoingX() {
return getVersion() >= STARTING_VERSION_DOING_X;
}
public boolean isAfterVersionOfDoingY() {
return getVersion() >= STARTING_VERSION_DOING_Y;
}
public static GlobalVersionProvider get() {
String runId = Workflow.getInfo().getRunId();
GlobalVersionProvider instance;
if (RUN_ID_TO_INSTANCE_MAP.containsKey(runId)) {
instance = RUN_ID_TO_INSTANCE_MAP.get(runId);
} else {
instance = new GlobalVersionProvider();
RUN_ID_TO_INSTANCE_MAP.put(runId, instance);
}
return instance;
}
// NOTE: this should be called at the beginning of the workflow method
public static void upsertGlobalVersionSearchAttribute() {
int workflowVersion = get().getVersion();
Workflow.upsertSearchAttributes(
ImmutableMap.of(
WorkflowSearchAttribute.TEMPORAL_WORKFLOW_GLOBAL_VERSION.getValue(),
workflowVersion));
}
// Call this API on each replay tests to clear up the cache
#VisibleForTesting
public static void clearInstances() {
RUN_ID_TO_INSTANCE_MAP.clear();
}
}
Note that because of a bug in Temporal/Cadence Java SDK, Workflow.getVersion can release a thread and subsequently cause a non-deterministic error.
We're introducing this map in order to cache our versions on the first call, which should
always occur at the beginning of the workflow execution.
Call clearInstances API on each replay tests to clear up the cache.
Therefor in the workflow code:
public class HelloWorldImpl{
private GlovalVersionProvider globalVersionProvider;
#VisibleForTesting
public HelloWorldImpl(final GlovalVersionProvider versionProvider){
this.globalVersionProvider = versionProvider;
}
public HelloWorldImpl(){
this.globalVersionProvider = GlobalVersionProvider.get();
}
#Override
public void start(final Request request) {
if (globalVersionProvider.isAfterVersionOfUsingGlobalVersion()) {
GlobalVersionProvider.upsertGlobalVersionSearchAttribute();
}
...
...
if (globalVersionProvider.isAfterVersionOfDoingX()) {
// doing X here
...
}
...
if (globalVersionProvider.isAfterVersionOfDoingY()) {
// doing Y here
...
}
...
}
Best practice with the pattern
How to add a new version
For every new version
Add the new constant STARTING_VERSION_XXXX
Add a new API ` public boolean isAfterVersionOfXXX()
Update MAX_STARTING_VERSION_OF_ALL
Apply the new API into workflow code where you want to add the new logic
Maintain the replay test JSON in a pattern of `HelloWorldWorkflowReplaytest-version-x-description.json. Make sure always add a new replay test for every new version you introduce to the workflow. When generating the JSON from a workflow execution, make sure it exercise the new code path – otherwise it won't be able to protect the determinism. If it requires more than one workflow executions to exercise all branches, then make multiple JSON files for replay. 
How to remove a old version:
To remove an old code path(version), add a new version to not execute old code path, then later on use Search attribute query like
GlobalVersion>=STARTING_VERSION_DOING_X AND GlobalVersion<STARTING_VERSION_NOT_DOING_X to find out if there is existing workflow execution still running with certain versions.
Instead of waiting for workflows to close, you can terminate or reset workflows
Example of deprecating a code path DoingX:
Therefor in the workflow code:
public class HelloWorldImpl implements Helloworld{
...
#Override
public void start(final Request request) {
...
...
if (globalVersionProvider.isAfterVersionOfDoingX() && !globalVersionProvider.isAfterVersionOfNotDoingX()) {
// doing X here
...
}
}
###TODO Example In Golang
Benefits
Prevent spaghetti code by using native Temporal versioning API everywhere in the workflow code
Provide search attribute to find workflow of particular version. This will fill the gaps that Temporal Java SDK is missing TemporalChangeVersion feature.
Even Cadence Java/Golang SDK has CadenceChangeVersion, this global
version search attribute is much better in query, because it's an
integer instead of a keyword.
Provide a pattern to maintain replay test easily
Provide a way to test different version without this missing feature
Cons
There shouldn't be any cons. Using this pattern doesn't stop you from using the raw versioning API directly in the workflow. You can combine this pattern with others together.

Reactor spring mongodb repository combine multiple results together

I'm kind of new to reactive programing and currently working on a spring webflux based application. I'm stuck between few questions.
public class FooServiceImpl {
#Autowired
private FooDao fooDao;
#Autowired
private AService aService;
#Autowired
private BService bService;
public long calculateSomething(long fooId) {
Foo foo = fooDao.findById(fooId); // Blocking call one
if (foo == null) {
foo = new Foo();
}
Long bCount = bService.getCountBByFooId(fooId); // Blocking call two
AEntity aEntity = aService.getAByFooId(fooId); // Blocking call three
// Do some calculation using foo, bCount and aEntity
// ...
// ...
return someResult;
}
}
This is the way we write a blocking code which uses three external API call results (let's consider as DB calls). I'm struggling to convert this into a reactive code, If all three becomes mono and if I subscribe all three will the outer subscriber get blocked?
public Mono<Long> calculateSomething(long fooId) {
return Mono.create(sink -> {
Mono<Foo> monoFoo = fooDao.findById(fooId); // Reactive call one
monoFoo.subscribe(foo -> {
if (foo == null) {
foo = new Foo();
}
Mono<Long> monoCount = bService.getCountBByFooId(fooId); // Reactive call two
monoCount.subscribe(aLong -> {
Mono<AEntity> monoA = aService.getAByFooId(fooId); // Reactive call three
monoA.subscribe(aEntity -> {
//...
//...
sink.success(someResult);
});
});
});
};
}
I saw there is a function called zip, but it only works with two results, So is there a way to apply it here?
Also what will happen if we get subscribe for something inside create method, Will it block the thread?
Would be very thankful if you could help me.
If you gave me the calculation you want you do with those values, it would be easier for me to show the reactor way of doing it. But lets suppose you want to read a value from database and then use that value for another thing. Use flatmaps and make a unique Flux reducing the lines of code and complexity, no need to use subscribe() as told by the other people. Example:
return fooDao.findById(fooId)
.flatmap(foo -> bService.getCountBByFooId(foo))
.flatmap(bCount -> aService.getAByFooId(fooId).getCount()+bCount);

Is RxJava Completable's Emitter.onComplete happens-before Observer's callback?

Given the following code, is it guaranteed that System.out.println(v)will print 1? What if I change the io and computation schedulers to other schedulers?
I have checked the source of computation scheduler, it seems use executor's submit method and according to the documentation, submit is happens-before the execution of the actual runnable, so I think in this case, this happens-before relationship is guaranteed, but is this apply to other schedulers?
import io.reactivex.Completable;
import io.reactivex.schedulers.Schedulers;
public class Test {
static int v = 0;
public static void main(String[] args){
Completable.create(e -> {v = 1; e.onComplete();})
.subscribeOn(Schedulers.io())
.observeOn(Schedulers.computation())
.subscribe(() -> System.out.println(v));
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
also, if I assign 1 to v before Completable#create, is this change visible to Completable's body?
Given the following code, is it guaranteed that System.out.println(v) will print 1?
Yes.
If you, however, swapped the order, there is no guarantee:
Completable.create(e -> {e.onComplete(); v = 1;})
What if I change the io and computation schedulers to other schedulers?
All standard schedulers have this guarantee.
but is this apply to other schedulers?
Any asynchronous scheduler is expected to provide this happens-before relationship and the standard ones are guaranteed because of the underlying ExecutorService.
if I assign 1 to v before Completable#create, is this change visible to Completable's body?
subscribeOn is also establishes a happens-before relationship so upon subscription, the v is committed and the body of the create will see the value.

How to turn a proprietary stream into a Flux from spring webflux

I have a custom event bus where I can subscribe a lambda like
bus.subscribe(topic, event -> {/*gets executed for every new event*/}, exception -> {})
Now the lambda is obviously running in a different thread. Now my question is how can I connect this kind of interface to a Flux<Event>? do I have to write my own Publisher? But people say it's not a good idea to do so.
A mock implementation would be
import java.util.function.Consumer
class Mock extends Thread {
Consumer<String> lambda
public Mock(Consumer<String> lambda) {
this.lambda = lambda
}
#Override
void run() {
while(true) {
Thread.sleep(1000)
lambda.accept("lala")
}
}
}
Flux<String> flux = new Mock({ /*TODO write to flux*/ }).start()
You’re right, you should not implement your own publisher. In most cases, you should not have to deal with threads either and instead rely on static methods on Flux.
Something like:
Flux<Event> events = Flux.<Event>create(emitter -> {
bus.subscribe(topic, event -> emitter.next(event),
exc -> emitter.error(exc));
// you should also unsubscribe
emitter.onDispose(() -> {
bus.unsubscribe(topic, ...);
});
});

How to access memcached asynchronously in netty

I am writing a server in netty, in which I need to make a call to memcached. I am using spymemcached and can easily do the synchronous memcached call. I would like this memcached call to be async. Is that possible? The examples provided with netty do not seem to be helpful.
I tried using callbacks: created a ExecutorService pool in my Handler and submitted a callback worker to this pool. Like this:
public class MyHandler extends ChannelInboundMessageHandlerAdapter<MyPOJO> implements CallbackInterface{
...
private static ExecutorService pool = Executors.newFixedThreadPool(20);
#Override
public void messageReceived(ChannelHandlerContext ctx, MyPOJO pojo) {
...
CallingbackWorker worker = new CallingbackWorker(key, this);
pool.submit(worker);
...
}
public void myCallback() {
//get response
this.ctx.nextOutboundMessageBuf().add(response);
}
}
CallingbackWorker looks like:
public class CallingbackWorker implements Callable {
public CallingbackWorker(String key, CallbackInterface c) {
this.c = c;
this.key = key;
}
public Object call() {
//get value from key
c.myCallback(value);
}
However, when I do this, this.ctx.nextOutboundMessageBuf() in myCallback gets stuck.
So, overall, my question is: how to do async memcached calls in Netty?
There are two problems here: a small-ish issue with the way you're trying to code this, and a bigger one with many libraries that provide async service calls, but no good way to take full advantage of them in an async framework like Netty. That forces users into suboptimal hacks like this one, or a less-bad, but still not ideal approach I'll get to in a moment.
First the coding problem. The issue is that you're trying to call a ChannelHandlerContext method from a thread other than the one associated with your handler, which is not allowed. That's pretty easy to fix, as shown below. You could code it a few other ways, but this is probably the most straightforward:
private static ExecutorService pool = Executors.newFixedThreadPool(20);
public void channelRead(final ChannelHandlerContext ctx, final Object msg) {
//...
final GetFuture<String> future = memcachedClient().getAsync("foo", stringTranscoder());
// first wait for the response on a pool thread
pool.execute(new Runnable() {
public void run() {
String value;
Exception err;
try {
value = future.get(3, TimeUnit.SECONDS); // or whatever timeout you want
err = null;
} catch (Exception e) {
err = e;
value = null;
}
// put results into final variables; compiler won't let us do it directly above
final fValue = value;
final fErr = err;
// now process the result on the ChannelHandler's thread
ctx.executor().execute(new Runnable() {
public void run() {
handleResult(fValue, fErr);
}
});
}
});
// note that we drop through to here right after calling pool.execute() and
// return, freeing up the handler thread while we wait on the pool thread.
}
private void handleResult(String value, Exception err) {
// handle it
}
That will work, and might be sufficient for your application. But you've got a fixed-sized thread pool, so if you're ever going to handle much more than 20 concurrent connections, that will become a bottleneck. You could increase the pool size, or use an unbounded one, but at that point, you might as well be running under Tomcat, as memory consumption and context-switching overhead start to become issues, and you lose the scalabilty that was the attraction of Netty in the first place!
And the thing is, Spymemcached is NIO-based, event-driven, and uses just one thread for all its work, yet provides no way to fully take advantage of its event-driven nature. I expect they'll fix that before too long, just as Netty 4 and Cassandra have recently by providing callback (listener) methods on Future objects.
Meanwhile, being in the same boat as you, I researched the alternatives, and not being too happy with what I found, I wrote (yesterday) a Future tracker class that can poll up to thousands of Futures at a configurable rate, and call you back on the thread (Executor) of your choice when they complete. It uses just one thread to do this. I've put it up on GitHub if you'd like to try it out, but be warned that it's still wet, as they say. I've tested it a lot in the past day, and even with 10000 concurrent mock Future objects, polling once a millisecond, its CPU utilization is negligible, though it starts to go up beyond 10000. Using it, the example above looks like this:
// in some globally-accessible class:
public static final ForeignFutureTracker FFT = new ForeignFutureTracker(1, TimeUnit.MILLISECONDS);
// in a handler class:
public void channelRead(final ChannelHandlerContext ctx, final Object msg) {
// ...
final GetFuture<String> future = memcachedClient().getAsync("foo", stringTranscoder());
// add a listener for the Future, with a timeout in 2 seconds, and pass
// the Executor for the current context so the callback will run
// on the same thread.
Global.FFT.addListener(future, 2, TimeUnit.SECONDS, ctx.executor(),
new ForeignFutureListener<String,GetFuture<String>>() {
public void operationSuccess(String value) {
// do something ...
ctx.fireChannelRead(someval);
}
public void operationTimeout(GetFuture<String> f) {
// do something ...
}
public void operationFailure(Exception e) {
// do something ...
}
});
}
You don't want more than one or two FFT instances active at any time, or they could become a drain on CPU. But a single instance can handle thousands of outstanding Futures; about the only reason to have a second one would be to handle higher-latency calls, like S3, at a slower polling rate, say 10-20 milliseconds.
One drawback of the polling approach is that it adds a small amount of latency. For example, polling once a millisecond, on average it will add 500 microseconds to the response time. That won't be an issue for most applications, and I think is more than offset by the memory and CPU savings over the thread pool approach.
I expect within a year or so this will be a non-issue, as more async clients provide callback mechanisms, letting you fully leverage NIO and the event-driven model.