Debug a custom Pipeline Transformer in Flink - scala

I am trying to implement a custom Transformer in Flink following indications in its documentation but when I try to executed it seems the fit operation is never being called. Here it is what I've done so far:
class InfoGainTransformer extends Transformer[InfoGainTransformer] {
import InfoGainTransformer._
private[this] var counts: Option[collection.immutable.Vector[Map[Key, Double]]] = None
// here setters for params, as Flink does
}
object InfoGainTransformer {
// ====================================== Parameters =============================================
// ...
// ==================================== Factory methods ==========================================
// ...
// ========================================== Operations =========================================
implicit def fitLabeledVectorInfoGain = new FitOperation[InfoGainTransformer, LabeledVector] {
override def fit(instance: InfoGainTransformer, fitParameters: ParameterMap, input: DataSet[LabeledVector]): Unit = {
val counts = collection.immutable.Vector[Map[Key, Double]]()
input.map {
v =>
v.vector.map {
case (i, value) =>
println("INSIDE!!!")
val key = Key(value, v.label)
val cval = counts(i).getOrElse(key, .0)
counts(i) + (key -> cval)
}
}
}
}
implicit def fitVectorInfoGain[T <: Vector] = new FitOperation[InfoGainTransformer, T] {
override def fit(instance: InfoGainTransformer, fitParameters: ParameterMap, input: DataSet[T]): Unit = {
input
}
}
implicit def transformLabeledVectorsInfoGain = {
new TransformDataSetOperation[InfoGainTransformer, LabeledVector, LabeledVector] {
override def transformDataSet(
instance: InfoGainTransformer,
transformParameters: ParameterMap,
input: DataSet[LabeledVector]): DataSet[LabeledVector] = input
}
}
implicit def transformVectorsInfoGain[T <: Vector : BreezeVectorConverter : TypeInformation : ClassTag] = {
new TransformDataSetOperation[InfoGainTransformer, T, T] {
override def transformDataSet(instance: InfoGainTransformer, transformParameters: ParameterMap, input: DataSet[T]): DataSet[T] = input
}
}
}
Then I tried to use it in two ways:
val scaler = StandardScaler()
val polyFeatures = PolynomialFeatures()
val mlr = MultipleLinearRegression()
val gain = InfoGainTransformer().setK(2)
// Construct the pipeline
val pipeline = scaler
.chainTransformer(polyFeatures)
.chainTransformer(gain)
.chainPredictor(mlr)
val r = pipeline.predict(dataSet map (_.vector))
r.print()
And only my transformer:
pipeline.fit(dataSet)
In both cases, when I set a breakpoint inside fitLabeledVectorInfoGain, for example in the line input.map, the debugger stops there, but if I also set a breakpoint inside the nested map, for example bellow println("INSIDE!!!"), it never stops there.
Does anyone knows how could I debug this custom transformer?

It seems its working now. I think what was happening was I wasn't implementing right the FitOperation because nothing was being saved in the instance state, this is the implementation now:
implicit def fitLabeledVectorInfoGain = new FitOperation[InfoGainTransformer, LabeledVector] {
override def fit(instance: InfoGainTransformer, fitParameters: ParameterMap, input: DataSet[LabeledVector]): Unit = {
// val counts = collection.immutable.Vector[Map[Key, Double]]()
val r = input.map {
v =>
v.vector.foldLeft(Map.empty[Key, Double]) {
case (m, (i, value)) =>
println("INSIDE fit!!!")
val key = Key(value, v.label)
val cval = m.getOrElse(key, .0) + 1.0
m + (key -> cval)
}
}
instance.counts = Some(r)
}
}
Now the debugger enters correctly in all breakpoints and the TransformOperation its also being called.

Related

Scala3 macro summon typeclass instance of a TypeTree (no type arg)

trait Show[T] {
def show(t: T): String
}
Give such Show typeclass, I want to generate show for case class like
def caseClassShow[A](using Type[A], Quotes): Expr[Show[A]] = {
import quotes.reflect._
def shows(caseClassExpr: Expr[A]): Expr[String] = {
val caseClassTerm = caseClassExpr.asTerm
val parts = TypeRepr.of[A].typeSymbol.caseFields.collect {
case cf if cf.isValDef =>
val valDefTree = cf.tree.asInstanceOf[ValDef]
val valType = valDefTree.tpt
val showCtor = TypeTree.of[Show[_]]
val valShowType = Applied(showCtor, List(valType))
val showInstance = Expr.summon[valShowType] // compile error, how to summon the instance here
val valuePart = Apply(Select.unique(showInstance, "show"), List(Select(caseClassTerm, cf)))
'{s"${Expr(cf.name)}:${valuePart}"}
}
val strParts = Expr.ofList(parts)
'{$strParts.mkString(",")}
}
'{
new Show[A] {
def show(a: A) = {
${shows('{a})}
}
}
}
}
But the showInstance part won't compile, so how to summon an implicit Show[X] here ?
Implicits.search
can be used to summon implicit instance if there is no type arg avaiable for Expr.summon
val valDefTree = cf.tree.asInstanceOf[ValDef]
val valType = valDefTree.tpt
val showCtor = TypeRepr.typeConstructorOf(classOf[Show[_]])
val valShowType = showCtor.appliedTo(valType.tpe)
Implicits.search(valShowType) match {
case si: ImplicitSearchSuccess =>
val siExpr: Expr[Show[Any]] = si.tree.asExpr.asInstanceOf[Expr[Show[Any]]]
val valueExpr = Select(caseClassTerm, cf).asExpr
'{$siExpr.show($valueExpr)}
}

Wrap higher order function into progress bar

I have an iteration module which can apply an arbitrary function (Build generic reusable iteration module from higher order function) and would love to wrap it into a progressbar.
val things = Range(1,10)
def iterationModule[A](
iterationItems: Seq[A],
functionToApply: A => Any
): Unit = {
iterationItems.foreach(functionToApply)
}
def foo(s:Int) = println(s)
iterationModule[Int](things, foo)
A basic progressbar could look like:
import me.tongfei.progressbar.ProgressBar
val pb = new ProgressBar("Test", things.size)
things.foreach(t=> {
println(t)
pb.step
})
But how can the function which is passed to the iterator module be intercepted and surrounded with a progressbar, i.e. call the pb.step?
An annoying possibility would be to pass the mutable pb object into each function (have it implement an interface).
But is it also possible to intercept and surround the function being passed by this stepping logic?
However, when looping with Seq().par.foreach, this might be problematic.
I need the code to work in Scala 2.11.
edit
A more complex example:
val things = Range(1,100).map(_.toString)
def iterationModule[A](
iterationItems: Seq[A],
functionToApply: A => Any,
parallel: Boolean = false
): Unit = {
val pb = new ProgressBar(functionToApply.toString(), iterationItems.size)
if (parallel) {
iterationItems.par.foreach(functionToApply)
} else {
iterationItems.foreach(functionToApply)
}
}
def doStuff(inputDay: String, inputConfigSomething: String): Unit = println(inputDay + "__"+ inputConfigSomething)
iterationModule[String](things, doStuff(_, "foo"))
The function should be able to take the iteration item and additional parameters.
edit 2
import me.tongfei.progressbar.ProgressBar
val things = Range(1,100).map(_.toString)
def doStuff(inputDay: String, inputConfigSomething: String): Unit = println(inputDay + "__"+ inputConfigSomething)
def iterationModulePb[A](items: Seq[A], f: A => Any, parallel: Boolean = false): Unit = {
val pb = new ProgressBar(f.toString, items.size)
val it = if (parallel) {
items.par.iterator
} else {
items.iterator
}
it.foreach { x =>
f(x)
pb.step()
}
}
iterationModulePb[String](things, doStuff(_, "foo"))
After a little discussion I figured out how to use a Seq with standard iterators.
For Scala 2.13 this would be the most general form.
import me.tongfei.progressbar.ProgressBar
def iterationModule[A](items: IterableOnce[A], f: A => Any): Unit = {
val (it, pb) =
if (items.knowSize != -1)
items.iterator -> new ProgressBar("Test", items.knowSize)
else {
val (iter1, iter2) = items.iterator.split
iter1 -> new ProgressBar("Test", iter2.size)
}
it.foreach { x =>
f(x)
pb.step()
}
}
Note: most of the changes are just to make the code more generic, but the general idea is just to create a function that wraps both the original function and the call to the ProgressBar.
Edit
A simplified solution for 2.11
def iterationModule[A](items: Seq[A], parallel: Boolean = false)
(f: A => Any): Unit = {
val pb = new ProgressBar("test", items.size)
val it = if (parallel) {
items.iterator.par
} else {
items.iterator
}
it.foreach { a =>
f(a)
pb.step()
}
}

Synthesizing syntax tree and variable tree to avoid cross stage evaluation

While creating a syntax tree, I want to synthesize variables in the syntax tree as a tree.
However, error in cross-stage evaluation.
Is it possible to create a single tree by compositing trees with different stages?
This is a sample to see errors.
object Sample {
case class Input(a: String, b: Int)
trait Scriber[T] {
def scribe(i: Input): T
}
trait Hoge[T] {
def exec(t: Input): Either[Throwable, T]
}
}
object Macro {
def exec[T](scribers: Seq[Scriber[_]]): Hoge[T] = macro SampleMacro.test[T]
}
class SampleMacro(val c: blackbox.Context) {
import c.universe._
def test[T: c.WeakTypeTag](scribers: c.Expr[Seq[Scriber[_]]]): c.Expr[Hoge[T]] = {
reify {
new Hoge[T] {
override def exec(t: Input): Either[Throwable, T] = {
val x = reify {
scribers.splice.map(_.scribe(t)) // <- cross stage evaluation.
}
Right(
c.Expr[T](
q"${weakTypeOf[T].typeSymbol.companion}.apply(..$x)"
).splice
)
}
}
}
}
}
In this case, t is cross stage.
This is a sample where cross stage evaluation errors will occur, but will not work even if resolved.
With quasiquotes try
def test[T: c.WeakTypeTag](scribers: c.Expr[Seq[Scriber[_]]]): c.Expr[Hoge[T]] = {
c.Expr[Hoge[T]](q"""
new Hoge[T] {
override def exec(t: Input): Either[Throwable, T] = {
val x = $scribers.map(_.scribe(t))
Right(
${weakTypeOf[T].typeSymbol.companion}.apply(x)
)
}
}""")
With reify/splice try
def test[T: c.WeakTypeTag](scribers: c.Expr[Seq[Scriber[_]]]): c.Expr[Hoge[T]] =
reify {
new Hoge[T] {
override def exec(t: Input): Either[Throwable, T] = {
val x = scribers.splice.map(_.scribe(t))
Right(
c.Expr[Seq[_] => T](
q"${weakTypeOf[T].typeSymbol.companion}"
).splice.apply(x)
)
}
}
}

Is it possible to pass values from two instances of the same Scala class

Say I have this situation
class Pipe {
var vel = 3.4
var V = 300
var a = 10.2
var in = ???
var TotV = V+in
var out = TotV*a/vel
}
val pipe1 = new Pipe
val pipe2 = new Pipe
The in variable is were my problem is, what i'd like to do is get the out variable from pipe1 and feed that in as the in variable for pipe 2 effectively to join the two pipes but I cant figure out if this is even possible in the same class. So I can do it manually but need to know if its possible to do in the class.
pipe2.in = pipe1.out
my attempted fix was to add an ID field then try and use that to reference an instance with a higher id field but that doesnt seem doable. ie
class Pipe(id:Int) {
var vel = 3.4
var V = 300
var a = 10.2
var in = Pipe(id+1).out //this is the sticking point, I want to reference instances of this class and use their out value as in value for instances with a lower ID
var TotV = V+in
var out = TotV*a/vel
}
any help would be appreciated
You can do this by defining a companion object for the class and passing in the upstream pipe as an optional parameter to the factory method, then extracting its in value and passing it to the class constructor, as follows:
object Pipe {
def apply(upstreamPipe: Option[Pipe]): Pipe = {
val inValue = upstreamPipe match {
case Some(pipe) => pipe.out
case None => 0 // or whatever your default value is
new Pipe(inValue)
}
You would then call
val pipe1 = Pipe(None)
val pipe2 = Pipe(Some(pipe1))
Unfortunately your question is not clear now. Under certain assumptions what you describe looks like what is now called "FRP" aka "Functional Reactive Programming". If you want to do it in a serious way, you probably should take a look at some mature library such as RxScala or Monix that handle many important in the real world details such as error handling or scheduling/threading and many others.
For a simple task you might roll out a simple custom implementation like this:
trait Observable {
def subscribe(subscriber: Subscriber): RxConnection
}
trait RxConnection {
def disconnect(): Unit
}
trait Subscriber {
def onChanged(): Unit
}
trait RxOut[T] extends Observable {
def currentValue: Option[T]
}
class MulticastObservable extends Observable with Subscriber {
private val subscribers: mutable.Set[Subscriber] = mutable.HashSet()
override def onChanged(): Unit = subscribers.foreach(s => s.onChanged())
override def subscribe(subscriber: Subscriber): RxConnection = {
subscribers.add(subscriber)
new RxConnection {
override def disconnect(): Unit = subscribers.remove(subscriber)
}
}
}
abstract class BaseRxOut[T](private var _lastValue: Option[T]) extends RxOut[T] {
private val multicast = new MulticastObservable()
protected def lastValue: Option[T] = _lastValue
protected def lastValue_=(value: Option[T]): Unit = {
_lastValue = value
multicast.onChanged()
}
override def currentValue: Option[T] = lastValue
override def subscribe(subscriber: Subscriber): RxConnection = multicast.subscribe(subscriber)
}
class RxValue[T](initValue: T) extends BaseRxOut[T](Some(initValue)) {
def value: T = this.lastValue.get
def value_=(value: T): Unit = {
this.lastValue = Some(value)
}
}
trait InputConnector[T] {
def connectInput(input: RxOut[T]): RxConnection
}
class InputConnectorImpl[T] extends BaseRxOut[T](None) with InputConnector[T] {
val inputHolder = new RxValue[Option[(RxOut[T], RxConnection)]](None)
private def updateValue(): Unit = {
lastValue = for {inputWithDisconnect <- inputHolder.value
value <- inputWithDisconnect._1.currentValue}
yield value
}
override def connectInput(input: RxOut[T]): RxConnection = {
val current = inputHolder.value
if (current.exists(iwd => iwd._1 == input))
current.get._2
else {
current.foreach(iwd => iwd._2.disconnect())
inputHolder.value = Some(input, input.subscribe(() => this.updateValue()))
updateValue()
new RxConnection {
override def disconnect(): Unit = {
if (inputHolder.value.exists(iwd => iwd._1 == input)) {
inputHolder.value.foreach(iwd => iwd._2.disconnect())
inputHolder.value = None
updateValue()
}
}
}
}
}
}
abstract class BaseRxCalculation[Out] extends BaseRxOut[Out](None) {
protected def registerConnectors(connectors: InputConnectorImpl[_]*): Unit = {
connectors.foreach(c => c.subscribe(() => this.recalculate()))
}
private def recalculate(): Unit = {
var newValue = calculateOutput()
if (newValue != lastValue) {
lastValue = newValue
}
}
protected def calculateOutput(): Option[Out]
}
case class RxCalculation1[In1, Out](func: Function1[In1, Out]) extends BaseRxCalculation[Out] {
private val conn1Impl = new InputConnectorImpl[In1]
def conn1: InputConnector[In1] = conn1Impl // show to the outer world only InputConnector
registerConnectors(conn1Impl)
override protected def calculateOutput(): Option[Out] = {
for {v1 <- conn1Impl.currentValue}
yield func(v1)
}
}
case class RxCalculation2[In1, In2, Out](func: Function2[In1, In2, Out]) extends BaseRxCalculation[Out] {
private val conn1Impl = new InputConnectorImpl[In1]
def conn1: InputConnector[In1] = conn1Impl // show to the outer world only InputConnector
private val conn2Impl = new InputConnectorImpl[In2]
def conn2: InputConnector[In2] = conn2Impl // show to the outer world only InputConnector
registerConnectors(conn1Impl, conn2Impl)
override protected def calculateOutput(): Option[Out] = {
for {v1 <- conn1Impl.currentValue
v2 <- conn2Impl.currentValue}
yield func(v1, v2)
}
}
// add more RxCalculationN if needed
And you can use it like this:
def test(): Unit = {
val pipe2 = new RxCalculation1((in: Double) => {
println(s"in = $in")
val vel = 3.4
val V = 300
val a = 10.2
val TotV = V + in
TotV * a / vel
})
val in1 = new RxValue(2.0)
println(pipe2.currentValue)
val conn1 = pipe2.conn1.connectInput(in1)
println(pipe2.currentValue)
in1.value = 3.0
println(pipe2.currentValue)
conn1.disconnect()
println(pipe2.currentValue)
}
which prints
None
in = 2.0
Some(905.9999999999999)
in = 3.0
Some(909.0)
None
Here your "pipe" is RxCalculation1 (or other RxCalculationN) which wraps a function and you can "connect" and "disconnect" other "pipes" or just "values" to various inputs and start a chain of updates.

Scala: assigning a variable once without val

This is what I want to do -
class A(some args) {
var v: SomeType = null
def method1(args) = {
v = something1
...
method3
}
def method2(args) = {
v = something2
...
method3
}
def method3 = {
// uses v
}
}
In this specific case method1 and 2 are mutually exclusive and either one of them is called exactly once in the lifetime of an instance of A. Also, v is assigned once. I would prefer making it a val. But since I need method2 or method3's context to initialize v, I can't do that in the constructor.
How can achieve this "val" behavior? I can think of modifying method1 and method2 to apply methods, but I don't like the idea. Moreover, method1 and 2 have the same argument signature (hence apply would need some more info to distinguish the 2 types of calls).
An important question is: what exactly do you call "val behavior"? To me "val behavior" is that is assigned once immediately when it is declared, which can be enforced statically. You seem to want to enforce that v is not assigned twice. You possibly also want to enforce it is never read before it is assigned. You could create a very small helper box for that:
final class OnceBox[A] {
private[this] var value: Option[A] = None
def update(v: A): Unit = {
if (value.isDefined)
throw new IllegalStateException("Value assigned twice")
value = Some(v)
}
def apply(): A = {
value.getOrElse {
throw new IllegalStateException("Value not yet assigned")
}
}
}
and now your snippet:
class A(some args) {
val v = new OnceBox[SomeType]
def method1(args) = {
v() = something1
...
method3
}
def method2(args) = {
v() = something2
...
method3
}
def method3 = {
// uses v
v()
}
}
Oh and, just kidding, but Ozma has single-assignment vals built-in :-p
Similar idea to the other answer, but instead of subtypes, a field.
scala> :pa
// Entering paste mode (ctrl-D to finish)
class A {
private[this] var context: Int = _
lazy val v: String =
context match {
case 1 => "one"
case 2 => "two"
case _ => ???
}
def m1() = { context = 1 ; v }
def m2() = { context = 2 ; v }
}
// Exiting paste mode, now interpreting.
defined class A
scala> val a = new A
a: A = A#62ce72ff
scala> a.m2
res0: String = two
scala> a.m1
res1: String = two
Something like this maybe:
class A private (mtdType: Int, ...) {
val v = mdtType match {
case 1 => method1(...)
case 2 => method2(...)
}
}
object A {
def withMethod1(...) = new A(1, ...)
def withMethod2(...) = new A(2, ...)
}
Or, another possibilty:
sealed trait A {
val v
def method3 = println(v)
}
class Method1(...) extends A {
val v = method1(...)
}
class Method2(...) extends A {
val v = method2(...)
}