Flink watermark is negative - scala

I was trying to assign timestamp and watermarks to a stream by implementing the AssignerWithPeriodicWatermarks, inside the function, it implements:
override def getCurrentWatermark: Watermark = {
// this guarantees that the watermark never goes backwards.
val potentialWM = currentMaxTimestamp - maxOutOfOrderness
if (potentialWM >= lastEmittedWatermark) lastEmittedWatermark = potentialWM
new Watermark(lastEmittedWatermark)
}
override def extractTimestamp(element: T, previousElementTimestamp: Long): Long = {
val timestamp = element.streamTime // something exists in the stream
if (timestamp > currentMaxTimestamp) currentMaxTimestamp = timestamp
timestamp
}
However, I still got watermarks of the default value -9223372036854775808, and when I tried to add printing inside both functions, I found only println inside extractTimestamp was printed, which is saying function of getCurrentWatermark was never called.
The implementations seem to be right, because the same code was able to run on another script(some code not written by me).
PS: It's not the first time that I encountered negative watermark, what I found is that after a certain period of time, the watermark will go positive, however I am still quite confused what happened at the beginning.

The issue is that You are using AssignerWithPeriodicWatermark which does not generate watermarks per event but in intervals. Whenever You are using AssingerWithPeriodicWatermark You should set call the setTheAutowatermarkInterval on the execution environment. The value that You provide there will be the interval that the getCurrentWatermark will be called with.
If You haven't set it then the method will never be called, thus You will never have a watermark changed.
For testing and learning, You can consider using AssignerWithPunctuatedWatermark as this will simply emit watermark for each event.
EDIT:
As it was mentioned below this answer, the default value for autowatermarkInterval is actually 200 ms. Also, using the AssignerWithPunctuatedWatermark doesn't mean You need to emit the Watermark for each event, but the method for emitting them will be called for each event. If You don't want to emit the Watermark then the method should simply return null.

Related

Flutter screenshot.dart returns a nullable value when calling capture?

I'm having a hard time understanding in what case capturing a screenshot can return null or throw an Error.
##Note: I've made sure the delay value is set to Duration(milliseconds: 100).
Github Thread
screenshot.dart capture implementation
I've updated the milliseconds to 100 just in case it doesn't rasterize on time. The documentation points out this function is unreliable: https://api.flutter.dev/flutter/flutter_driver/FlutterDriver/screenshot.html

flink streaming window trigger

I have flink stream and I am calucating few things on some time window say 30 seconds.
here what happens it is giving me result my aggregating previous windows as well.
say for first 30 seconds I get result 10.
next thiry seconds I want fresh result, instead I get last window result + new
and so on.
so my question is how I get fresh result for each window.
You need to use a purging trigger. What you want is FIRE_AND_PURGE (emit and remove window content), what the default flink trigger does is FIRE (emit and keep window content).
input
.keyBy(...)
.timeWindow(Time.seconds(30))
// The important part: Replace the default non-purging ProcessingTimeTrigger
.trigger(new PurgingTrigger[..., TimeWindow](ProcessingTimeTrigger))
.reduce(...)
For a more in depth explanation have a look into Triggers and FIRE vs FIRE_AND_PURGE.
A Trigger determines when a window (as formed by the window assigner) is ready to be processed by the window function. Each WindowAssigner comes with a default Trigger. If the default trigger does not fit your needs, you can specify a custom trigger using trigger(...).
When a trigger fires, it can either FIRE or FIRE_AND_PURGE. While FIRE keeps the contents of the window, FIRE_AND_PURGE removes its content. By default, the pre-implemented triggers simply FIRE without purging the window state.
The functionality you describe can be found in Tumbling Windows: https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/windows.html#tumbling-windows
A bit more detail and/or code would help :)
I'm little late into this question but I encountered the same issue with OP's. What I found out later was a bug in my own code. FYI my mistake could be good reference for your problem.
// Old code (modified to be an example):
val tenSecondGrouping: DataStream[MyCustomGrouping] = userIdsStream
.keyBy(_.somePartitionedKey)
.window(TumblingProcessingTimeWindows.of(Time.of(10, TimeUnit.SECONDS)))
.trigger(ProcessingTimeTrigger.create())
.aggregate(new MyCustomAggregateFunc(new MyCustomGrouping()))
Bug happened at new MyCustomGrouping: I unintentionally created a singleton MyCustomGrouping object and reusing it in MyCustomAggregateFunc. As more tumbling windows created, the later aggregation results grow crazy! The fix was to create new MyCustomGrouping each time MyCustomAggregateFunc is triggered. So:
// New code, problem solved
...
.aggregate(new MyCustomAggregateFunc(() => new MyCustomGrouping()))
// passing in a func to create new object per trigger

Spark::KMeans calls takeSample() twice?

I have many data and I have experimented with partitions of cardinality [20k, 200k+].
I call it like that:
from pyspark.mllib.clustering import KMeans, KMeansModel
C0 = KMeans.train(first, 8192, initializationMode='random', maxIterations=10, seed=None)
C0 = KMeans.train(second, 8192, initializationMode='random', maxIterations=10, seed=None)
and I see that initRandom() calls takeSample() once.
Then the takeSample() implementation doesn't seem to call itself or something like that, so I would expect KMeans() to call takeSample() once. So why the monitor shows two takeSample()s per KMeans()?
Note: I execute more KMeans() and they all invoke two takeSample()s, regardless of the data being .cache()'d or not.
Moreover, the number of partitions doesn't affect the number takeSample() is called, it's constant to 2.
I am using Spark 1.6.2 (and I cannot upgrade) and my application is in Python, if that matters!
I brought this to the mailing list of the Spark devs, so I am updating:
Details of 1st takeSample():
Details of 2nd takeSample():
where one can see that the same code is executed.
As suggested by Shivaram Venkataraman in Spark's mailing list:
I think takeSample itself runs multiple jobs if the amount of samples
collected in the first pass is not enough. The comment and code path
at GitHub
should explain when this happens. Also you can confirm this by
checking if the logWarning shows up in your logs.
// If the first sample didn't turn out large enough, keep trying to take samples;
// this shouldn't happen often because we use a big multiplier for the initial size
var numIters = 0
while (samples.length < num) {
logWarning(s"Needed to re-sample due to insufficient sample size. Repeat #$numIters")
samples = this.sample(withReplacement, fraction, rand.nextInt()).collect()
numIters += 1
}
However, as one can see, the 2nd comment said it shouldn't happen often, and it does happen always to me, so if anyone has another idea, please let me know.
It was also suggested that this was a problem of the UI and takeSample() was actually called only once, but that was just hot air.

How to find all set intervals using CoffeeScript?

The first handler listens some channel of messages and if there is an incoming message, it sets interval:
toggleFlagInterval = setInterval (-> toggleFlag), 500
Messages can be arbitrarily much, but I need to set only one interval.
Second handler reads the message and in it I want to remove the interval:
clearInterval toggleFlagInterval
I want to control that was always zero or one interval .
To do this, I need to find all set intervals.
How to find all set intervals using CoffeeScript?
I would be very grateful for your help.
Thanks to all.
That doesn't make sense. You cannot find all functions registered with setInterval, with or without CoffeeScript (that would be a JavaScript question, it has nothing to do with CoffeeScript). You just need to keep track of them yourself.
It seems like in this specific case, you simply need to choose to conditionally not set an interval, if one is already set.
To do so, your setting code would use ?=:
toggleFlagInterval ?= setInterval (-> toggleFlag), 500
And your clearing code would reset toggleFlagInterval to null:
clearInterval toggleFlagInterval
toggleFlagInterval = null
Alternatively, you need to cancel any already set interval at the point when you set a new one:
clearInterval(toggleFlagInterval) if toggleFlagInterval?
toggleFlagInterval = setInterval (-> toggleFlag), 500

IPhone: different system timers?

I have been using mach_absolute_time() for all my timing functions so far. calculating how long between frames etc.
I now want to get the exact time touch input events happen using event.timestamp in the touch callbacks.
the problem is these two seem to use completely different timers. sure, you can get them both in seconds, but their origins are different and seemingly random...
is there any way to sync the two different timers?
or is there anyway to get access to the same timer that the touch input uses to generate that timestamp property? otherwise its next to useless.
Had some trouble with this myself. There isn't a lot of good documentation, so I went with experimentation. Here's what I was able to determine:
mach_absolute_time depends on the processor of the device. It returns ticks since the device was last rebooted (otherwise known as uptime). In order to get it in a human readable form, you have to modify it by the result from mach_timebase_info (a ratio), which will return billionth of seconds (or nanoseconds). To make this more usable I use a function like the one below:
#include <mach/mach_time.h>
int getUptimeInMilliseconds()
{
static const int64_t kOneMillion = 1000 * 1000;
static mach_timebase_info_data_t s_timebase_info;
if (s_timebase_info.denom == 0) {
(void) mach_timebase_info(&s_timebase_info);
}
// mach_absolute_time() returns billionth of seconds,
// so divide by one million to get milliseconds
return (int)((mach_absolute_time() * s_timebase_info.numer) / (kOneMillion * s_timebase_info.denom));
}
Get the initial difference between two i.e
what is returned by mach_absolute_time() initally when your application starts and also get the event.timestamp initially at the same time...
store the difference... it would remain same through out the time your application runs.. so you can use this time difference to convert one to another...
How about CFAbsoluteTimeGetCurrent?