Why IoScheduler using ScheduledExecutorService with the poolCoreSize is 1? - rx-java2

I found the IoScheduler.createWorker() will create a NewThreadWorker immediately if there is no cached NewThreadWorker,This may result in OutOfMemoryError.
If I put 1000 count of work to IoScheduler one-time,it will create 1000 count of NewThreadWorker and ScheduledExecutorService.
private void submitWorkers(int workerCount) {
for (int i = 0; i < workerCount; i++) {
Single.fromCallable(new Callable<String>() {
#Override
public String call() throws Exception {
Thread.sleep(1000);
return "String-call(): " + Thread.currentThread().hashCode();
}
})
.subscribeOn(Schedulers.io())
.subscribe(new Consumer<String>() {
#Override
public void accept(String s) throws Exception {
// TODO
}
});
}
}
If I set the workerCount with 1000, I received a OutOfMemoryError,I want to know why IoScheduler use NewThreadWorker with ScheduledExecutorService but just execute a single work。
Every time a new work is coming it will create a NewThreadWorker and ScheduledExecutorService if there is no cached NewThreadWorker,Why is it designed to be such a process?

The standard workers of RxJava each use a dedicated thread to avoid excessive thread hopping and work migration in flows.
The standard IO scheduler uses an unbounded number of worker threads because it's main use is to allow blocking operations to block a worker thread while other operations can commence on other worker threads. The difference from newThread is that there is a thread reuse allowed once a worker is returned to an internal pool.
If there was a limit on the number of threads, it would drastically increase the likelihood of deadlocks due to resource exhaustion. Also, unlike the computation scheduler, there is no good default number for this limit: 1, 10, 100, 1000?
There are several ways to work around this problem, such as:
use Schedulers.from() with an arbitrary ExecutorService which you can limit and configure as you wish,
use ParallelScheduler from the Extensions project and define an arbitrary large but fixed pool of workers.

Related

How to implement distributed rate limiter?

Let's say, I have P processes running some business logic on N physical machines. These processes call some web service S, say. I want to ensure that not more than X calls are made to the service S per second by all the P processes combined.
How can such a solution be implemented?
Google Guava's Rate Limiter works well for processes running on single box, but not in distributed setup.
Are there any standard, ready to use, solutions available for JAVA? [may be based on zookeeper]
Thanks!
Bucket4j is java implementation of "token-bucket" rate limiting algorithm. It works both locally and distributed(on top of JCache). For distributed use case you are free to choose any JCache implementation like Hazelcast or Apache Ignite. See this example of using Bucket4j in cluster.
I have been working on an opensource solution for these kind of problems.
Limitd is a "server" for limits. The limits are implemented using the Token Bucket Algorithm.
Basically you define limits in the service configuration like this:
buckets:
"request to service a":
per_minute: 10
"request to service b":
per_minute: 5
The service is run as a daemon listening on a TCP/IP port.
Then your application does something along these lines:
var limitd = new Limitd('limitd://my-limitd-address');
limitd.take('request to service a', 'app1' 1, function (err, result) {
if (result.conformant) {
console.log('everything is okay - this should be allowed');
} else {
console.error('too many calls to this thing');
}
});
We are currently using this for rate-limiting and debouncing some application events.
The server is on:
https://github.com/auth0/limitd
We are planning to work on several SDKs but for now we only have node.js and partially implemented go:
https://github.com/limitd
https://github.com/jdwyah/ratelimit-java provides distributed rate limits that should do just this. You can configure your limit as S per second / minute etc and choose burst size / refill rate of the leaky bucket that is under the covers.
Simple rate limiting in java where you want to achieve a concurrency of 3 transactions every 3 seconds. If you want to centralize this then either store the tokens array in elasticache or any database. And in place of synchronized block you will have to implement a lock flag as well.
import java.util.Date;
public class RateLimiter implements Runnable {
private long[] tokens = new long[3];
public static void main(String[] args) {
// TODO Auto-generated method stub
RateLimiter rateLimiter = new RateLimiter();
for (int i=0; i<20; i++) {
Thread thread = new Thread(rateLimiter,"Thread-"+i );
thread.start();
}
}
#Override
public void run() {
// TODO Auto-generated method stub
long currentStartTime = System.currentTimeMillis();
while(true) {
if(System.currentTimeMillis() - currentStartTime > 100000 ) {
throw new RuntimeException("timed out");
}else {
if(getToken()) {
System.out.println(Thread.currentThread().getName() +
" at " +
new Date(System.currentTimeMillis()) + " says hello");
break;
}
}
}
}
synchronized private boolean getToken() {
// TODO Auto-generated method stub
for (int i = 0; i< 3; i++) {
if(tokens[i] == 0 || System.currentTimeMillis() - tokens[i] > 3000) {
tokens[i] = System.currentTimeMillis();
return true;
}
}
return false;
}
}
So with all distributed rate limiting architecture, you'll need a single backend store that acts as single source of true to track the number of requests. You can always use zookeeper as a in memory datastore for this out of convenience, although there are better choices such as Redis.

JeroMQ shutdown correctly

I am wondering how to shutdown JeroMQ properly, so far I know three methods that all have their pro and cons and I have no clue which one is the best.
The situation:
Thread A: owns context, shall provide start/stop methods
Thread B: actual listener thread
My current method:
Thread A
static ZContext CONTEXT = new ZContext();
Thread thread;
public void start() {
thread = new Thread(new B()).start();
}
public void stop() {
thread.stopping = true;
thread.join();
}
Thread B
boolean stopping = false;
ZMQ.Socket socket;
public void run() {
socket = CONTEXT.createSocket(ROUTER);
... // socket setup
socket.setReceiveTimeout(10);
while (!stopping) {
socket.recv();
}
if (NUM_SOCKETS >= 1) {
CONTEXT.destroySocket(socket);
} else {
CONTEXT.destroy();
}
}
This works just great. 10ms to shutdown is no problem for me, but I will unnecessarily increase the CPU load when there are no messages received. At the moment I prefer this one.
The second method shares the socket between the two threads:
Thread A
static ZContext CONTEXT = new ZContext();
ZMQ.Socket socket;
Thread thread;
public void start() {
socket = CONTEXT.createSocket(ROUTER);
... // socket setup
thread = new Thread(new B(socket)).start();
}
public void stop() {
thread.stopping = true;
CONTEXT.destroySocket(socket);
}
Thread B
boolean stopping = false;
ZMQ.Socket socket;
public void run() {
try {
while (!stopping) {
socket.recv();
}
} catch (ClosedSelection) {
// socket closed by A
socket = null;
}
if (socket != null) {
// close socket myself
if (NUM_SOCKETS >= 1) {
CONTEXT.destroySocket(socket);
} else {
CONTEXT.destroy();
}
}
}
Works like a charm, too, but even if recv is already blocking the exception does not get thrown sometimes. If I wait one millisecond after I started thread A the exception is always thrown. I don't know if this is a bug or just an effect of my misuse, as I share the socket.
"revite" asked this question before (https://github.com/zeromq/jeromq/issues/116) and got an answer which is the third solution:
https://github.com/zeromq/jeromq/blob/master/src/test/java/guide/interrupt.java
Summary:
They call ctx.term() and interrupt the thread blocking in socket.recv().
This works fine, but I do not want to terminate my whole context, but just this single socket. I would have to use one context per socket, so I were not able to use inproc.
Summary
At the moment I have no clue how to get thread B out of its blocking state other than using timeouts, share the socket or terminate the whole context.
What is the correct way of doing this?
It is often mentioned that you can just destroy the zmq context and anything sharing that context will exit, however this creates a nightmare because your exiting code has to do its best in avoiding a minefield of accidentally calling into dead socket objects.
Attempting to close the socket doesn't work either because they are not thread safe and you'll end up with crashes.
ANSWER: The best way is to do as the ZeroMQ guide suggests for any use via multiple threads; use zmq sockets and not thread mutexes/locks/etc. Set up an additional listener socket that you'll connect&send something to on shutdown, and your run() should used a JeroMQ Poller to check which of your two sockets receive anything - if the additional socket receives something then exit.
Old question, but just in case...
I'd recommend checking out ZThread source. You should be able to create an instance of IAttachedRunnable that you can pass to the fork method, and the run method of your instance will be passed a PAIR socket and execute in another thread, while the fork will return the connected PAIR socket to use for communicating with the PAIR socket that your IAttachedRunnable got.
Check out the jeromq source here, even when you're doing a "blocking" recv, you're still burning CPU the entire time (the thread never sleeps). If you're worried about that, have the second thread sleep between polling and let the parent thread interrupt. Something like (just the relevant portions):
Thread A
public void stop() {
thread.interrupt();
thread.join();
}
Thread B
while (!Thread.interrupted()) {
socket.recv(); // do whatever
try {
Thread.sleep(10); //milliseconds
} catch (InterruptedException e) {
break;
}
}
Also, with regard to your second solution, in general you should not share sockets between threads - the zeromq guide is pretty clear on this - "Don't share ØMQ sockets between threads. ØMQ sockets are not threadsafe." Remember that a major use for ZMQ is IPC - threads communicating through connected sockets, not sharing the same end of one socket. No need for things like shared boolean stop variables.

boost::asio io_service stop specific thread

I've got a boost::asio based thread pool running on N threads.
It used mainly for IO tasks (DB data storing/retreival). It also launches self-diagnostic timer job to check how 'busy' pool is (calculates ms diff between 'time added' and 'time handler called')
So the question is - is there any way to stop M of N threads ( for cases when load is very low and pool does not need so many threads).
When the load is high (determined by diagnostic task) then new thread is added:
_workers.emplace_back(srv::unique_ptr<srv::thread>(new srv::thread([this]
{
_service.run();
})));
(srv namespace is used to switch quickly between boost and std)
But when 'peak load' is passed I need some way to stop additional threads. Is there any solution for this?
What you are looking for is a way to interrupt a thread that is waiting on the io_service. You can implement some sort of interruption mechanism using exceptions.
class worker_interrupted : public std::runtime_error
{
public:
worker_interrupted()
: runtime_error("thread interrupted") {}
};
_workers.emplace_back(srv::unique_ptr<srv::thread>(new srv::thread([this]
{
try
{
_service.run();
}
catch (const worker_interrupted& intrruption)
{
// thread function exits gracefully.
}
})));
You could then just use io_service::post to enqueue a completion handler which just throws worker_interrupted exception.

Memory Leak using Windows ThreadPool API

I am using Windows ThreadPools in my application, and am experiencing a memory leak of 136 bytes for every call to CreateThreadPoolWork(), as seen via UMDH:
+ 1257728 ( 1286424 - 28696) 9459 allocs BackTraceB0035CC
+ 9248 ( 9459 - 211) BackTraceB0035CC allocations
ntdll!RtlUlonglongByteSwap+B52
ntdll!TpAllocWork+8D
KERNEL32!CreateThreadpoolWork+25
... My Code ...
I am using Cleanup Group, so per the documentation I am not calling CloseThreadPoolWork().
My code for handling the ThreadPool is:
typedef PTP_WORK ThreadHandle_t;
typedef PTP_WORK_CALLBACK THREAD_ENTRY_POINT_T;
static PTP_POOL pool = NULL;
static TP_CALLBACK_ENVIRON CallBackEnviron;
static PTP_CLEANUP_GROUP cleanupgroup = NULL;
int mtInitialize()
{
InitializeThreadpoolEnvironment(&CallBackEnviron);
pool = CreateThreadpool(NULL);
if (NULL == pool)
{
return -1;
}
cleanupgroup = CreateThreadpoolCleanupGroup();
if (NULL == cleanupgroup)
{
return -1;
}
SetThreadpoolCallbackPool(&CallBackEnviron, pool);
SetThreadpoolCallbackCleanupGroup(&CallBackEnviron, cleanupgroup, NULL);
return 0; // Success
}
void mtDestroy()
{
CloseThreadpoolCleanupGroupMembers(cleanupgroup, FALSE, NULL);
CloseThreadpoolCleanupGroup(cleanupgroup);
DestroyThreadpoolEnvironment(&CallBackEnviron);
CloseThreadpool(pool);
}
//Create thread
ThreadHandle_t mtRunThread(THREAD_ENTRY_POINT_T entry_point, void *thread_args)
{
PTP_WORK work = NULL;
work = CreateThreadpoolWork(entry_point, thread_args, &CallBackEnviron);
if (NULL == work) {
// CreateThreadpoolWork() failed.
return 0;
}
SubmitThreadpoolWork(work);
return work;
}
//Wait for a thread to finish
void mtWaitForThread(ThreadHandle_t thread)
{
WaitForThreadpoolWorkCallbacks(thread, FALSE);
}
Am I doing something wrong?
Any ideas why I'm leaking memory?
I'm guessing you figured it out, given your comment, but the problem is that you only call CloseThreadpoolCleanupGroupMembers() in mtDestroy().
If you have a persistent thread pool the memory will not be freed unless you call CloseThreadpoolCleanupGroupMembers() periodically. Your code and comments suggests that you do, though I can't confirm this without the code responsible for creating and destroying your thread pool.
My recommendation for persistent thread pools is to just call CloseThreadpoolWork() in the callback functions. Microsoft's recommendations work better if you're creating and destroying thread pools, but CloseThreadpoolWork() is simpler and easier than periodically calling CloseThreadpoolCleanupGroupMembers() if you're maintaining one thread pool for the life of your application.
By the way, it's safe to do both as long as you tell CloseThreadpoolCleanupGroupMembers() to cancel any pending callbacks (pass fCancelPendingCallbacks as TRUE) to ensure CloseThreadpoolWork() is called on any cleaned up work items:
You can revoke the work object’s membership only by closing it, which
can be done on an individual basis with the CloseThreadpoolWork
function. The thread pool knows that the work object is a member of
the cleanup group and revokes its membership before closing it. This
ensures that the application doesn’t crash when the cleanup group
later attempts to close all of its members. The inverse isn’t true: If
you first instruct the cleanup group to close all of its members and
then call CloseThreadpoolWork on the now invalid work object, your
application will crash.
From Windows with C++ - Thread Pool Cancellation and Cleanup

Rx queue implementation and Dispatcher's buffer

I want to implement a queue which is capable of taking events/items from multiple producers in multiple threads, and consume them all on single thread.
this queue will work in some critical environment, so I am quite concerned with it's stability.
I have implemented it using Rx capabilities, but I have 2 questions:
Is this implementation OK? Or maybe it is flawed in some way I do not know of? (as an alternative - manual implementation with Queue and locks)
What is Dispatcher's buffer length? Can it handle 100k of queued items?
The code below illustrates my approach, using a simple TestMethod. It's output shows that all values are put in from different threads, but are processed on another single thread.
[TestMethod()]
public void RxTest()
{
Subject<string> queue = new Subject<string>();
queue
.ObserveOnDispatcher()
.Subscribe(s =>
{
Debug.WriteLine("Value: {0}, Observed on ThreadId: {1}", s, Thread.CurrentThread.ManagedThreadId);
},
() => Dispatcher.CurrentDispatcher.InvokeShutdown());
for (int j = 0; j < 10; j++)
{
ThreadPool.QueueUserWorkItem(o =>
{
for (int i = 0; i < 100; i++)
{
Thread.Sleep(10);
queue.OnNext(string.Format("value: {0}, from thread: {1}", i.ToString(), Thread.CurrentThread.ManagedThreadId));
}
queue.OnCompleted();
});
}
Dispatcher.Run();
}
I'm not sure about the behaviour of Subject in heavily multithreaded scenarios. I can imagine though that something like BlockingCollection (and its underlying ConcurrentQueue) are well worn in the situations you're talking about. And simple to boot.
var queue = new BlockingCollection<long>();
// subscribing
queue.GetConsumingEnumerable()
.ToObservable(Scheduler.NewThread)
.Subscribe(i => Debug.WriteLine("Value: {0}, Observed on ThreadId: {1}", i, Thread.CurrentThread.ManagedThreadId));
// sending
Observable.Interval(TimeSpan.FromMilliseconds(500), Scheduler.ThreadPool)
.Do(i => Debug.WriteLine("Value: {0}, Sent on ThreadId: {1}", i, Thread.CurrentThread.ManagedThreadId))
.Subscribe(i => queue.Add(i));
You certainly don't want to touch queues and locks. The ConcurrentQueue implementation is excellent and will certainly handle the size queues you're talking about effectively.
Take a look at EventLoopScheduler. It's built-in to RX and I think it does everything you want.
You can take any number of observables, call .ObserveOn(els) (els is your instance of an EventLoopScheduler) and you're now marshalling multiple observable from multiple threads onto a single thread and queuing each call to OnNext serially.