How to use delta trigger in flink? - scala

I want to use the deltatrigger in apache flink (flink 1.3) but I have some trouble with this code :
.trigger(DeltaTrigger.of(100, new DeltaFunction[uniqStruct] {
override def getDelta(oldFp: uniqStruct, newFp: uniqStruct): Double = newFp.time - oldFp.time
}, TypeInformation[uniqStruct]))
And I have this error:
error: object org.apache.flink.api.common.typeinfo.TypeInformation is not a value [ERROR] }, TypeInformation[uniqStruct]))
I don't understand why DeltaTrigger need TypeSerializer[T]
and I don't know what to do to remove this error.
Thanks a lot everyone.

I would read into this a bit https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/types_serialization.html sounds like you can create a serializer using typeInfo.createSerializer(config) on your type info. Note what you're passing in currently is a type itself and NOT the type info which is why you're getting the error you are.
You would need to do something more like
val uniqStructTypeInfo: TypeInformation[uniqStruct] = createTypeInformation[uniqStruct]
val uniqStrictTypeSerializer = typeInfo.createSerializer(config)
To quote the page above regarding the config param you need to pass to create serializer
The config parameter is of type ExecutionConfig and holds the
information about the program’s registered custom serializers. Where
ever possibly, try to pass the programs proper ExecutionConfig. You
can usually obtain it from DataStream or DataSet via calling
getExecutionConfig(). Inside functions (like MapFunction), you can get
it by making the function a Rich Function and calling
getRuntimeContext().getExecutionConfig().

DeltaTrigger needs a TypeSerializer because it uses Flink's managed state mechanism to store each element for later comparison with the next one (it just keeps one element, the last one, which is updated as new elements arrive).
You will find an example (in Java) here.
But if all you need is a window that triggers every 100msec, then it'll be easier to just use a TimeWindow, such as
input
.keyBy(<key selector>)
.timeWindow(Time.milliseconds(100)))
.apply(<window function>)
Updated:
To have hour-long windows that trigger every 100msec, you could use sliding windows. However, you would have 10 * 60 * 60 windows, and every event would be placed into each of these 36000 windows. So that's not a great idea.
If you use a GlobalWindow with a DeltaTrigger, then the window will be triggered only when events are more than 100msec apart, which isn't what you've said you want.
I suggest you look at ProcessFunction. It should be straightforward to get what you want that way.

Related

Wiremock response of a certain regex

I have to send a random value back from wiremocked response. I have seen examples using {{randomValue type='ALPHANUMERIC'}}
However I could not find anything where I can give randomvalue of a particular regex - say alphanumeric value which starts with ABC and 9 random digits.
I did try -
{{randomValue regex='ABC[0-9]{9}'}}
But this is not working. I am not sure if there is any other way to do this.Please guide me to any appropriate resource if available.
The only way to do this currently is via a custom Handlebars helper.
You can provide custom helpers when creating the templating transformer during startup e.g.
WireMockServer wm = new WireMockServer(wireMockConfig()
.dynamicPort()
.extensions(new ResponseTemplateTransformer(
false,
Collections.singletonMap("myHelper", new MyHelper()))
)
);
Where MyHelper should extend the HandlebarsHelper abstract class.

Calling GCP Translate API within Dataproc pyspark map

I am trying to call the language detection method of the translate client api from pyspark for each row in a file.
I created a map method as the following but the job seems to just freeze with no error. If I remove the call to the translate API it executes fine. Is it possible to call Google client API methods within pySpark map ?
mapping method to do translation
def doTranslate(data):
translate_client = translate.Client()
# Get the message information
messageId = data[0]
messageContent = data[6]
detectedLang = translate_client.detect_language(messageContent)
r = []
r.append(detectedLang)
return r
Figured it out!! your question led me in the right direction. thanks!
Turns out I was getting an exception from the call because I was going past the default quota for sizes of messages. I added a try/except block and determined this was the problem. Then cutting the message size down (I am just testing so dont want to mess with the quota) fixed the issue.

Scala Netty is there any way to share a ReplayingDecoder

I am looking to open up multiple connections using a netty client bootstrap in order to parse messages coming from multiple sources. The messages all have the same format, however, due to the amount of data that needs to be processed, I must run each connection on separate threads (This is assuming netty creates a thread per client channel, which I couldn't find a reference for - if that's not the case, how would this be achieved?).
This is the code that I use to connect to the data server:
var b = new Bootstrap()
.group(group)
.channel(classOf[NioSocketChannel])
.handler(RawFeedChannelInitializer)
var ch1 = b.clone().connect(host, port).sync().channel();
var ch2 = b.clone().connect(host, port).sync().channel();
The initializer calls RawPacketDecoder, which extends ReplayingDecoder, and is defined here.
The code works well without #Sharable when opening a single connection, but for the purpose of my application I must connect to the same server multiple times.
This results in the runtime error #Sharable annotation is not allowed pointing to my RawPacketDecoder class.
I am not entirely sure on how to get past this issue, short of reimplementing in scala an instantiable class of ReplayingDecoder as my decoder based directly on ByteToMessageDecoder.
Any help would be greatly appreciated.
Note: I am using netty 4.0.32 Final
I found the solution in this StockExchange answer.
My issue was that I was using an object based ChannelInitializer (singleton), and ReplayingDecoder as well as ByteToMessageDecoder are not sharable.
My initializer was created as a scala object, and therefore a single instance allowed. Changing the initializer to a scala class and instantiating for each bootstrap clone solved the problem. I modified the bootstrap code above as follows:
var b = new Bootstrap()
.group(group)
.channel(classOf[NioSocketChannel])
//.handler(RawFeedChannelInitializer)
var ch1 = b.clone().handler(new RawFeedChannelInitializer()).connect(host, port).sync().channel();
var ch2 = b.clone().handler(new RawFeedChannelInitializer()).connect(host, port).sync().channel();
I am not sure whether this ensures multithreading as wanted but it does allow to split the data access into multiple connections to the feed server.
Edit Update: After performing additional research on the subject, I have determined that netty does in fact create a thread per channel; this was verified by printing to console after the creation of each channel:
println("No. of active threads: " + Thread.activeCount());
The output shows an incremental number as channels are created and associated with their respective threads.
By default NioEventLoopGroup uses 2*Num_CPU_cores threads as defined here:
DEFAULT_EVENT_LOOP_THREADS = Math.max(1, SystemPropertyUtil.getInt(
"io.netty.eventLoopThreads",
Runtime.getRuntime().availableProcessors() * 2));
This value can be overriden to something else by setting
val group = new NioEventLoopGroup(16)
and then using the group to create/setup the bootstrap.

What is timeout_id in GTK for?

I'm reading through the official PyGObject tutorial, and I found this (unexplained) line in one of the examples:
self.timeout_id = None
(it was within an __init__ function of a Gtk.Window-descendant class; the whole listing is here). I couldn't google it; what is it for?
You didn't see it being set and used further down in on_pulse_toggled ?
It is assigned the return value of GObject.timeout_add, which adds a function to be called at a later interval, possibly repeatedly (like in this case):
self.timeout_id = GObject.timeout_add(100, self.do_pulse, None)
When you want this timeout not be called anymore, you have to remove it, and to do so, you need the id of the timeout you created:
GObject.source_remove(self.timeout_id)

Symfony form gets messy when calling getObject() in form configuration

I have a Strain model that has a belongsTo relationship with a Sample model, i. e. a strain belongs to a sample.
I am configuring a hidden field in the StrainForm configure() method this way:
$defaultId = (int)$this->getObject()->getSample()->getTable()->getDefaultSampleId();
$this->setWidget('sample_id', new sfWidgetFormInputHidden(array('default' => $defaultId)));
Whenever I create a new Strain, the $form->save() fails. The debug toolbar revealed that it tries to save a Sample object first and I do not know why.
However, if I retrieve the default sample ID using the table it works like a charm:
$defaultId = (int)Doctrine_Core::getTable('Sample')->getDefaultSampleId();
$this->setWidget('sample_id', new sfWidgetFormInputHidden(array('default' => $defaultId)));
My question here is what can be happening with the getObject()->getSample()... sequence of methods that causes the StrainForm to think it has to save a Sample object instead of Strain.
I tried to debug with xdebug but I cannot came up with a clear conclusion.
Any thoughts?
Thanks!!
When you call getSample its creating a Sample instance. This is automatically attached to the Strain object, thus when you save you also save the Sample.
An altenrative to calling getSample would be to chain through Strain object to the Sample table since i assume youre only doing this so your not hardcodeing the Sample's name in related form:
// note Sample is the alias not necessarily the Model name
$defaultId = Doctrine_Core::getTable($this->getObject()->getTable()->getRelation('Sample')->getModel())->getDefaultId();
Your solution probably falls over because you can't use getObject() on a new form (as at that stage the object simply doesn't exist).
Edit: Why don't you pass the default Sample in via the options array and then access it from within the form class via $this->getOption('Sample') (if I remember correctly)?