How can I process a list of delayed jobs in Vertx (actually
hundreds of HTTP GET requests, to limited API that bans fast requesting hosts)? now, I am using this code and it gets blocked because Vertx starts all requests at once. It is desirable to process each request with a 5-second delay between each request.
public void getInstrumnetDailyInfo(Instrument instrument,
Handler<AsyncResult<OptionInstrument>> handler) {
webClient
.get("/Loader")
.addQueryParam("i", instrument.getId())
.timeout(30000)
.send(
ar -> {
if (ar.succeeded()) {
String html = ar.result().bodyAsString();
Integer thatData = processHTML(html);
instrument.setThatData(thatData);
handler.handle(Future.succeededFuture(instrument));
} else {
// error
handler.handle(Future.failedFuture("error " +ar.cause()));
}
});
}
public void start(){
List<Instrument> instruments = loadInstrumentsList();
instruments.forEach(
instrument -> {
webClient.getInstrumnetDailyInfo(instrument,
async -> {
if(async.succeeded()){
instrumentMap.put(instrument.getId(), instrument);
}else {
log.warn("getInstrumnetDailyInfo: ", async.cause());
}
});
});
}
You can consider using a timer to fire events (rather than all at startup).
There are two variants in Vertx,
.setTimer() that fires a specific event after a delay
vertx.setTimer(interval, new Handler<T>() {});
and
2. .setPeriodic() that fires every time a specified period of time has passed.
vertx.setPeriodic(interval, new Handler<Long>() {});
setPeriodic seems to be what you are looking for.
You can get more info from the documentation
For more sophisticated Vertx scheduling use-cases, you can have a look at Chime or other schedulers or this module
You could use any out of the box rate limiter function and adapt it for async use.
An example with the RateLimiter from Guava:
// Make permits available at a rate of one every 5 seconds
private RateLimiter limiter = RateLimiter.create(1 / 5.0);
// A vert.x future that completes when it obtains a throttle permit
public Future<Double> throttle() {
return vertx.executeBlocking(p -> p.complete(limiter.acquire()), true);
}
Then...
throttle()
.compose(d -> {
System.out.printf("Waited %.2f before running job\n", d);
return runJob(); // runJob returns a Future result
});
Everything I read says this should work: I need my listener to trigger every 10 seconds with events. What I am getting now is every event in, it a listener trigger. What am I missing? The basic requirements are to create summarized statistics every 10s. Ideally I just want to pump data into the runtime. So, in this example, I would expect a dump of 10 records, once every 10 seconds
class StreamTest {
private final Configuration configuration = new Configuration();
private final EPRuntime runtime;
private final CompilerArguments args = new CompilerArguments();
private final EPCompiler compiler;
public DatadogApplicationTests() {
configuration.getCommon().addEventType(CommonLogEntry.class);
runtime = EPRuntimeProvider.getRuntime(this.getClass().getSimpleName(), configuration);
args.getPath().add(runtime.getRuntimePath());
compiler = EPCompilerProvider.getCompiler();
}
#Test
void testDisplayStatsEvery10S() throws Exception{
// Display stats every 10s about the traffic during those 10s:
EPCompiled compiled = compiler.compile("select * from CommonLogEntry.win:time(10)", args);
runtime.getDeploymentService().deploy(compiled).getStatements()[0].addListener(
(old, newEvents, epStatement, epRuntime) ->
Arrays.stream(old).forEach(e -> System.out.format("%s: received %n", LocalTime.now()))
);
new BufferedReader(new InputStreamReader(this.getClass().getResourceAsStream("/access.log"))).lines().map(CommonLogEntry::new).forEachOrdered(e -> {
runtime.getEventService().sendEventBean(e, e.getClass().getSimpleName());
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(1));
} catch (InterruptedException ex) {
System.err.println(ex);
}
});
}
}
Which currently outputs every second, corresponding to the sleep in my stream:
11:00:54.676: received
11:00:55.684: received
11:00:56.689: received
11:00:57.694: received
11:00:58.698: received
11:00:59.700: received
A time window is a sliding window. There is a chapter on basic concepts that explains how they work. Here is the link to the basic concepts chapter.
It is not clear what the requirements are but I think what you want to achieve is collecting events for a while and then releasing them. You can draw inspiration from the solution patterns.
This will collect events for 10 seconds.
create schema StockTick(symbol string, price double);
create context CtxBatch start #now end after 10 seconds;
context CtxBatch select * from StockTick#keepall output snapshot when terminated;
I would like to set up an Rx subscription that can respond to an event right away, and then ignore subsequent events that happen within a specified "cooldown" period.
The out of the box Throttle/Buffer methods respond only once the timeout has elapsed, which is not quite what I need.
Here is some code that sets up the scenario, and uses a Throttle (which isn't the solution I want):
class Program
{
static Stopwatch sw = new Stopwatch();
static void Main(string[] args)
{
var subject = new Subject<int>();
var timeout = TimeSpan.FromMilliseconds(500);
subject
.Throttle(timeout)
.Subscribe(DoStuff);
var factory = new TaskFactory();
sw.Start();
factory.StartNew(() =>
{
Console.WriteLine("Batch 1 (no delay)");
subject.OnNext(1);
});
factory.StartNewDelayed(1000, () =>
{
Console.WriteLine("Batch 2 (1s delay)");
subject.OnNext(2);
});
factory.StartNewDelayed(1300, () =>
{
Console.WriteLine("Batch 3 (1.3s delay)");
subject.OnNext(3);
});
factory.StartNewDelayed(1600, () =>
{
Console.WriteLine("Batch 4 (1.6s delay)");
subject.OnNext(4);
});
Console.ReadKey();
sw.Stop();
}
private static void DoStuff(int i)
{
Console.WriteLine("Handling {0} at {1}ms", i, sw.ElapsedMilliseconds);
}
}
The output of running this right now is:
Batch 1 (no delay)
Handling 1 at 508ms
Batch 2 (1s delay)
Batch 3 (1.3s delay)
Batch 4 (1.6s delay)
Handling 4 at 2114ms
Note that batch 2 isn't handled (which is fine!) because we wait for 500ms to elapse between requests due to the nature of throttle. Batch 3 is also not handled, (which is less alright because it happened more than 500ms from batch 2) due to its proximity to Batch 4.
What I'm looking for is something more like this:
Batch 1 (no delay)
Handling 1 at ~0ms
Batch 2 (1s delay)
Handling 2 at ~1000s
Batch 3 (1.3s delay)
Batch 4 (1.6s delay)
Handling 4 at ~1600s
Note that batch 3 wouldn't be handled in this scenario (which is fine!) because it occurs within 500ms of Batch 2.
EDIT:
Here is the implementation for the "StartNewDelayed" extension method that I use:
/// <summary>Creates a Task that will complete after the specified delay.</summary>
/// <param name="factory">The TaskFactory.</param>
/// <param name="millisecondsDelay">The delay after which the Task should transition to RanToCompletion.</param>
/// <returns>A Task that will be completed after the specified duration.</returns>
public static Task StartNewDelayed(
this TaskFactory factory, int millisecondsDelay)
{
return StartNewDelayed(factory, millisecondsDelay, CancellationToken.None);
}
/// <summary>Creates a Task that will complete after the specified delay.</summary>
/// <param name="factory">The TaskFactory.</param>
/// <param name="millisecondsDelay">The delay after which the Task should transition to RanToCompletion.</param>
/// <param name="cancellationToken">The cancellation token that can be used to cancel the timed task.</param>
/// <returns>A Task that will be completed after the specified duration and that's cancelable with the specified token.</returns>
public static Task StartNewDelayed(this TaskFactory factory, int millisecondsDelay, CancellationToken cancellationToken)
{
// Validate arguments
if (factory == null) throw new ArgumentNullException("factory");
if (millisecondsDelay < 0) throw new ArgumentOutOfRangeException("millisecondsDelay");
// Create the timed task
var tcs = new TaskCompletionSource<object>(factory.CreationOptions);
var ctr = default(CancellationTokenRegistration);
// Create the timer but don't start it yet. If we start it now,
// it might fire before ctr has been set to the right registration.
var timer = new Timer(self =>
{
// Clean up both the cancellation token and the timer, and try to transition to completed
ctr.Dispose();
((Timer)self).Dispose();
tcs.TrySetResult(null);
});
// Register with the cancellation token.
if (cancellationToken.CanBeCanceled)
{
// When cancellation occurs, cancel the timer and try to transition to cancelled.
// There could be a race, but it's benign.
ctr = cancellationToken.Register(() =>
{
timer.Dispose();
tcs.TrySetCanceled();
});
}
if (millisecondsDelay > 0)
{
// Start the timer and hand back the task...
timer.Change(millisecondsDelay, Timeout.Infinite);
}
else
{
// Just complete the task, and keep execution on the current thread.
ctr.Dispose();
tcs.TrySetResult(null);
timer.Dispose();
}
return tcs.Task;
}
Here's my approach. It's similar to others that have gone before, but it doesn't suffer the over-zealous window production problem.
The desired function works a lot like Observable.Throttle but emits qualifying events as soon as they arrive rather than delaying for the duration of the throttle or sample period. For a given duration after a qualifying event, subsequent events are suppressed.
Given as a testable extension method:
public static class ObservableExtensions
{
public static IObservable<T> SampleFirst<T>(
this IObservable<T> source,
TimeSpan sampleDuration,
IScheduler scheduler = null)
{
scheduler = scheduler ?? Scheduler.Default;
return source.Publish(ps =>
ps.Window(() => ps.Delay(sampleDuration,scheduler))
.SelectMany(x => x.Take(1)));
}
}
The idea is to use the overload of Window that creates non-overlapping windows using a windowClosingSelector that uses the source time-shifted back by the sampleDuration. Each window will therefore: (a) be closed by the first element in it and (b) remain open until a new element is permitted. We then simply select the first element from each window.
Rx 1.x Version
The Publish extension method used above is not available in Rx 1.x. Here is an alternative:
public static class ObservableExtensions
{
public static IObservable<T> SampleFirst<T>(
this IObservable<T> source,
TimeSpan sampleDuration,
IScheduler scheduler = null)
{
scheduler = scheduler ?? Scheduler.Default;
var sourcePub = source.Publish().RefCount();
return sourcePub.Window(() => sourcePub.Delay(sampleDuration,scheduler))
.SelectMany(x => x.Take(1));
}
}
The solution I found after a lot of trial and error was to replace the throttled subscription with the following:
subject
.Window(() => { return Observable.Interval(timeout); })
.SelectMany(x => x.Take(1))
.Subscribe(i => DoStuff(i));
Edited to incorporate Paul's clean-up.
Awesome solution Andrew! We can take this a step further though and clean up the inner Subscribe:
subject
.Window(() => { return Observable.Interval(timeout); })
.SelectMany(x => x.Take(1))
.Subscribe(DoStuff);
The initial answer I posted has a flaw: namely that the Window method, when used with an Observable.Interval to denote the end of the window, sets up an infinite series of 500ms windows. What I really need is a window that starts when the first result is pumped into the subject, and ends after the 500ms.
My sample data masked this problem because the data broke down nicely into the windows that were already going to be created. (i.e. 0-500ms, 501-1000ms, 1001-1500ms, etc.)
Consider instead this timing:
factory.StartNewDelayed(300,() =>
{
Console.WriteLine("Batch 1 (300ms delay)");
subject.OnNext(1);
});
factory.StartNewDelayed(700, () =>
{
Console.WriteLine("Batch 2 (700ms delay)");
subject.OnNext(2);
});
factory.StartNewDelayed(1300, () =>
{
Console.WriteLine("Batch 3 (1.3s delay)");
subject.OnNext(3);
});
factory.StartNewDelayed(1600, () =>
{
Console.WriteLine("Batch 4 (1.6s delay)");
subject.OnNext(4);
});
What I get is:
Batch 1 (300ms delay)
Handling 1 at 356ms
Batch 2 (700ms delay)
Handling 2 at 750ms
Batch 3 (1.3s delay)
Handling 3 at 1346ms
Batch 4 (1.6s delay)
Handling 4 at 1644ms
This is because the windows begin at 0ms, 500ms, 1000ms, and 1500ms and so each Subject.OnNext fits nicely into its own window.
What I want is:
Batch 1 (300ms delay)
Handling 1 at ~300ms
Batch 2 (700ms delay)
Batch 3 (1.3s delay)
Handling 3 at ~1300ms
Batch 4 (1.6s delay)
After a lot of struggling and an hour banging on it with a co-worker, we arrived at a better solution using pure Rx and a single local variable:
bool isCoolingDown = false;
subject
.Where(_ => !isCoolingDown)
.Subscribe(
i =>
{
DoStuff(i);
isCoolingDown = true;
Observable
.Interval(cooldownInterval)
.Take(1)
.Subscribe(_ => isCoolingDown = false);
});
Our assumption is that calls to the subscription method are synchronized. If they are not, then a simple lock could be introduced.
Use .Scan() !
This is what I use for Throttling when I need the first hit (after a certain period) immediately, but delay (and group/ignore) any subsequent hits.
Basically works like Throttle, but fires immediately if the previous onNext was >= interval ago, otherwise, schedule it at exactly interval from the previous hit. And of course, if within the 'cooling down' period multiple hits come, the additional ones are ignored, just like Throttle does.
The difference with your use case is that if you get an event at 0 ms and 100 ms, they will both be handled (at 0ms and 500ms), which might be what you actually want (otherwise, the accumulator is easy to adapt to ignore ANY hit closer than interval to the previous one).
public static IObservable<T> QuickThrottle<T>(this IObservable<T> src, TimeSpan interval, IScheduler scheduler)
{
return src
.Scan(new ValueAndDueTime<T>(), (prev, id) => AccumulateForQuickThrottle(prev, id, interval, scheduler))
.Where(vd => !vd.Ignore)
.SelectMany(sc => Observable.Timer(sc.DueTime, scheduler).Select(_ => sc.Value));
}
private static ValueAndDueTime<T> AccumulateForQuickThrottle<T>(ValueAndDueTime<T> prev, T value, TimeSpan interval, IScheduler s)
{
var now = s.Now;
// Ignore this completely if there is already a future item scheduled
// but do keep the dueTime for accumulation!
if (prev.DueTime > now) return new ValueAndDueTime<T> { DueTime = prev.DueTime, Ignore = true };
// Schedule this item at at least interval from the previous
var min = prev.DueTime + interval;
var nextTime = (now < min) ? min : now;
return new ValueAndDueTime<T> { DueTime = nextTime, Value = value };
}
private class ValueAndDueTime<T>
{
public DateTimeOffset DueTime;
public T Value;
public bool Ignore;
}
I got another one for your. This one doesn't use Repeat() nor Interval() so it might be what you are after:
subject
.Window(() => Observable.Timer(TimeSpan.FromMilliseconds(500)))
.SelectMany(x => x.Take(1));
Well the most obvious thing will be to use Repeat() here. However, as far as I know Repeat() might introduce problems so that notifications disappear in between the moment when the stream stops and we subscribe again. In practice this has never been a problem for me.
subject
.Take(1)
.Concat(Observable.Empty<long>().Delay(TimeSpan.FromMilliseconds(500)))
.Repeat();
Remember to replace with the actual type of your source.
UPDATE:
Updated query to use Concat instead of Merge
I have stumbled upon this question while trying to re-implement my own solution to the same or similar problem using .Window
Take a look, it seems to be the same as this one and solved quite elegantly:
https://stackoverflow.com/a/3224723/58463
It's an old post, but no answer could really fill my needs, so I'm giving my own solution :
public static IObservable<T> ThrottleOrImmediate<T>(this IObservable<T> source, TimeSpan delay, IScheduler scheduler)
{
return Observable.Create<T>((obs, token) =>
{
// Next item cannot be send before that time
DateTime nextItemTime = default;
return Task.FromResult(source.Subscribe(async item =>
{
var currentTime = DateTime.Now;
// If we already reach the next item time
if (currentTime - nextItemTime >= TimeSpan.Zero)
{
// Following item will be send only after the set delay
nextItemTime = currentTime + delay;
// send current item with scheduler
scheduler.Schedule(() => obs.OnNext(item));
}
// There is still time before we can send an item
else
{
// we schedule the time for the following item
nextItemTime = currentTime + delay;
try
{
await Task.Delay(delay, token);
}
catch (TaskCanceledException)
{
return;
}
// If next item schedule was change by another item then we stop here
if (nextItemTime > currentTime + delay)
return;
else
{
// Set next possible time for an item and send item with scheduler
nextItemTime = currentTime + delay;
scheduler.Schedule(() => obs.OnNext(item));
}
}
}));
});
}
First item is immediately sent, then following items are throttled. Then if a following item is sent after the delayed time, it's immediately sent too.
I'm using vertx.io web framework to send a list of items to a downstream HTTP server.
records.records() emits 4 records and I have specifically set the web client to connect to the wrong I.P/port.
Processing... prints 4 times.
Exception outer! prints 3 times.
If I put back the proper I.P/port then Susbscribe outer! prints 4 times.
io.reactivex.Flowable
.fromIterable(records.records())
.flatMap(inRecord -> {
System.out.println("Processing...");
// Do stuff here....
Observable<Buffer> bodyBuffer = Observable.just(Buffer.buffer(...));
Single<HttpResponse<Buffer>> request = client
.post(..., ..., ...)
.rxSendStream(bodyBuffer);
return request.toFlowable();
})
.subscribe(record -> {
System.out.println("Subscribe outer!");
}, ex -> {
System.out.println("Exception outer! " + ex.getMessage());
});
UPDATE:
I now understand that on error RX stops right a way. Is there a way to continue and process all records regardless and get an error for each?
Given this article: https://medium.com/#jagsaund/5-not-so-obvious-things-about-rxjava-c388bd19efbc
I have come up with this... Unless you see something wrong with this?
io.reactivex.Flowable
.fromIterable(records.records())
.flatMap
(inRecord -> {
Observable<Buffer> bodyBuffer = Observable.just(Buffer.buffer(inRecord.toString()));
Single<HttpResponse<Buffer>> request = client
.post("xxxxxx", "xxxxxx", "xxxxxx")
.rxSendStream(bodyBuffer);
// So we can capture how long each request took.
final long startTime = System.currentTimeMillis();
return request.toFlowable()
.doOnNext(response -> {
// Capture total time and print it with the logs. Removed below for brevity.
long processTimeMs = System.currentTimeMillis() - startTime;
int status = response.statusCode();
if(status == 200)
logger.info("Success!");
else
logger.error("Failed!");
}).doOnError(ex -> {
long processTimeMs = System.currentTimeMillis() - startTime;
logger.error("Failed! Exception.", ex);
}).doOnTerminate(() -> {
// Do some extra stuff here...
}).onErrorResumeNext(Flowable.empty()); // This will allow us to continue.
}
).subscribe(); // Don't handle here. We subscribe to the inner events.
Is there a way to continue and process all records regardless and get
an error for each?
According to the doc, the observable should be terminated if it encounters an error. So you can't get each error in onError.
You can use onErrorReturn or onErrorResumeNext() to tell the upstream what to do if it encounters an error (e.g. emit null or Flowable.empty()).
Following the documentation at:
https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
I am using Manual Offset Control and am trying a simple test case with the commitSync() method to set the offset to a static value. Then, I want to call the poll() method again to consume from that offset. This should be handled with the while loop:
try {
while(true) {
System.out.println("outer loop");
ConsumerRecords<String, String> records = consumer.poll(Long.MAX_VALUE);
System.out.println("partition size: " + records.partitions().size());
for (TopicPartition partition : records.partitions()) {
List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
for (ConsumerRecord<String, String> record : partitionRecords) {
System.out.println(record.offset() + ": " + record.value());
long offset = record.offset();
if (SOME CONDITION) {
consumer.commitSync(Collections.singletonMap(partition, new OffsetAndMetadata(offset+1)));
}
}
}
}
} finally {
consumer.close();
}
However, it looks like I always get partitions of size 0 after the 1st poll() call.
In my output I just see the following loop forever:
outer loop
partition size: 0
However, if I terminate my program and rerun it, I can see that the commitSync() worked and it starts reading from offset 6. I want to know how to do this without manually terminating and rerunning the program.