How to use math.sqrt for DStream[(Double,Double)]? - scala

For the streaming data DStream[(Double, Double)], how do I estimate the root mean squared error? See my code below. The line math.sqrt(summse) is where I have a problem (the code does not compile):
def calculateRMSE(output: DStream[(Double, Double)], n: DStream[Long]): Double = {
val summse = output.foreachRDD { rdd =>
rdd.map {
case pair: (Double, Double) =>
val err = math.abs(pair._1 - pair._2);
err*err
}.reduce(_ + _)
}
math.sqrt(summse)
}
UPDATE:
The code doesn't compile: Cannot resolve reference sqrt with such signature. Expected: Double, Actual: Unit

The method foreachRDD(...) returns unit so that is expected. According to the docs the result is written back to the this (output) DStream. I guess it's that you'll have to apply sqrt to.

Related

Flink Cogroup - value map is not a member of Object

I try to run the example Scala code of CoGroup function, which is provided in the Flink website , but it throw error "value map is not a member of Object".
Here is my code
val iVals: DataSet[(String, Int)] = env.fromCollection(Seq(("a",1),("b",2),("c",3)))
val dVals: DataSet[(String, Int)] = env.fromCollection(Seq(("a",11),("b",22)))
val output = iVals.coGroup(dVals).where(0).equalTo(0) {
(iVals, dVals, out: Collector[Double]) =>
val ints = iVals map { _._2 } toSet
for (dVal <- dVals) {
for (i <- ints) {
out.collect(dVal._2 * i)
}
}
}
output.print()
I don't know what cause the error or is there any library I miss to import? Thanks.
Have you tried adding the type annotations for iVals and dVals? It seems that Scala is inferring the type Object, hence the error. (Why, I don't know).
What I mean is:
(iVals: Iterator[(String, Int)], dVals: Iterator[(String, Int)], out: Collector[Double]) =>

Scala Yeild returning Try[Either[]] rather then Either

I am trying to do some handson with scala basic operations and got stuck here in the following sample code
def insuranceRateQuote(a: Int, tickets:Int) : Either[Exception, Double] = {
// ... something
Right(Double)
}
def parseInsuranceQuoteFromWebForm(age: String, numOfTickets: String) : Either[Exception, Double]= {
try{
val a = Try(age.toInt)
val tickets = Try(numOfTickets.toInt)
for{
aa <- a
t <- tickets
} yield insuranceRateQuote(aa,t) // ERROR HERE
} catch {
case _ => Left(new Exception)}
}
The Error I am getting is that it says found Try[Either[Exception,Double]]
I am not getting why it is wrapper under Try of Either
PS - This must not be the perfect way to do in scala so feel free to post your sample code :)
The key to understand is that for-comprehensions might transform what is inside the wrapper but will not change the wrapper itself. The reason is because for-comprehension de-sugar to map/flatMap calls on the wrapper determined in the first step of the chain. For example consider the following snippet
val result: Try[Int] = Try(41).map(v => v + 1)
// result: scala.util.Try[Int] = Success(42)
Note how we transformed the value inside the Try wrapper from 41 to 42 however the wrapper remained unchanged. Alternatively we could express the same thing using a for-comprehension
val result: Try[Int] = for { v <- Try(41) } yield v + 1
// result: scala.util.Try[Int] = Success(42)
Note how the effect is exactly the same. Now consider the following for comprehension which chains multiple steps
val result: Try[Int] =
for {
a <- Try(41) // first step determines the wrapper for all the other steps
b <- Try(1)
} yield a + b
// result: scala.util.Try[Int] = Success(42)
This expands to
val result: Try[Int] =
Try(41).flatMap { (a: Int) =>
Try(1).map { (b: Int) => a + b }
}
// result: scala.util.Try[Int] = Success(42)
where again we see the result is the same, namely, a value transformed inside the wrapper but wrapper remained untransformed.
Finally consider
val result: Try[Either[Exception, Int]] =
for {
a <- Try(41) // first step still determines the top-level wrapper
b <- Try(1)
} yield Right(a + b) // here we wrap inside `Either`
// result: scala.util.Try[Either[Exception,Int]] = Success(Right(42))
The principle remains the same - we did wrap a + b inside Either however this does not affect the top-level outer wrapper which is still Try.
Mario Galic's answer already explains the problem with your code, but I'd fix it differently.
Two points:
Either[Exception, A] (or rather, Either[Throwable, A]) is kind of equivalent to Try[A], with Left taking the role of Failure and Right the role of Success.
The outer try/catch is not useful because the exceptions should be captured by working in Try.
So you probably want something like
def insuranceRateQuote(a: Int, tickets:Int) : Try[Double] = {
// ... something
Success(someDouble)
}
def parseInsuranceQuoteFromWebForm(age: String, numOfTickets: String): Try[Double] = {
val a = Try(age.toInt)
val tickets = Try(numOfTickets.toInt)
for{
aa <- a
t <- tickets
q <- insuranceRateQuote(aa,t)
} yield q
}
A bit unfortunately, this does a useless map(q => q) if you figure out what the comprehension does, so you can write it more directly as
a.flatMap(aa => tickets.flatMap(t => insuranceRateQuote(aa,t)))

Find count in WindowedStream - Flink

I am pretty new in the world of Streams and I am facing some issues in my first try.
More specifically, I am trying to implement a count and groupBy functionality in a sliding window using Flink.
I 've done it in a normal DateStream but I am not able to make it work in a WindowedStream.
Do you have any suggestion on how can I do it?
val parsedStream: DataStream[(String, Response)] = stream
.mapWith(_.decodeOption[Response])
.filter(_.isDefined)
.map { record =>
(
s"${record.get.group.group_country}, ${record.get.group.group_state}, ${record.get.group.group_city}",
record.get
)
}
val result: DataStream[((String, Response), Int)] = parsedStream
.map((_, 1))
.keyBy(_._1._1)
.sum(1)
// The output of result is
// ((us, GA, Atlanta,Response()), 14)
// ((us, SA, Atlanta,Response()), 4)
result
.keyBy(_._1._1)
.timeWindow(Time.seconds(5))
//the following part doesn't compile
.apply(
new WindowFunction[(String, Int), (String, Int), String, TimeWindow] {
def apply(
key: Tuple,
window: TimeWindow,
values: Iterable[(String, Response)],
out: Collector[(String, Int)]
) {}
}
)
Compilation Error:
overloaded method value apply with alternatives:
[R](function: (String, org.apache.flink.streaming.api.windowing.windows.TimeWindow, Iterable[((String, com.flink.Response), Int)], org.apache.flink.util.Collector[R]) => Unit)(implicit evidence$28: org.apache.flink.api.common.typeinfo.TypeInformation[R])org.apache.flink.streaming.api.scala.DataStream[R] <and>
[R](function: org.apache.flink.streaming.api.scala.function.WindowFunction[((String, com.flink.Response), Int),R,String,org.apache.flink.streaming.api.windowing.windows.TimeWindow])(implicit evidence$27: org.apache.flink.api.common.typeinfo.TypeInformation[R])org.apache.flink.streaming.api.scala.DataStream[R]
cannot be applied to (org.apache.flink.streaming.api.functions.windowing.WindowFunction[((String, com.flink.Response), Int),(String, com.flink.Response),String,org.apache.flink.streaming.api.windowing.windows.TimeWindow]{def apply(key: String,window: org.apache.flink.streaming.api.windowing.windows.TimeWindow,input: Iterable[((String, com.flink.Response), Int)],out: org.apache.flink.util.Collector[(String, com.flink.Response)]): Unit})
.apply(
This is a simpler example that we can work on
val source: DataStream[(JsonField, Int)] = env.fromElements(("hello", 1), ("hello", 2))
val window2 = source
.keyBy(0)
.timeWindow(Time.minutes(1))
.apply(new WindowFunction[(JsonField, Int), Int, String, TimeWindow] {})
I have tried Your code and found the errors, it seems that you have an error when declaring the types for your WindowFunction.
The documentation says that the expected types for WindowFunction are WindowFunction[IN, OUT, KEY, W <: Window]. Now, if you take a look at Your code, Your IN is the type of the datastream that You are calculating windows on. The type of the stream is ((String, Response), Int) and not as declared in the code (String, Int).
If You will change the part that is not compiling to :
.apply(new WindowFunction[((String, Response), Int), (String, Response), String, TimeWindow] {
override def apply(key: String, window: TimeWindow, input: Iterable[((String, Response), Int)], out: Collector[(String, Response)]): Unit = ???
})
EDIT: As for the second example the error occurs because of the same reason in general. When You are using keyBy with Tuple You have two possible functions to use keyBy(fields: Int*), which uses integer to access field of the tuple using index provided (this is what You have used). And also keyBy(fun: T => K) where You provide a function to extract the key that will be used.
But there is one important difference between those functions one of them returns key as JavaTuple and the other one returns the key with its exact type.
So basically If You change the String to Tuple in Your simplified example it should compile clearly.

create a simple function for calculating root mean square error using data Seq[Seq[(Double,Double]]

I need to create a simple function for calculating Root Mean Square Error using Seq[Seq[(Double,Double)]] as the input:
This is my attempt:
val getRMSE: (Seq[Seq[(Double, Double)]]) => Double = {
(predictions) =>
val mse = predictions
.map {
case (rating, prediction) =>
val err = rating-prediction
err*err
}.mean()
math.sqrt(mse)
}
The question is how to resolve the compilation error with err*err and rating-prediction. It says "Cannot resolve symbol *"
The type of your predictions is actually Seq[Seq[(Double, Double)]]. So when you call map on it you have to provide a function which takes parameter of type Seq[(Double, Double)], but you pass a function from (Double, Double).
case (rating, prediction)
is wrong, change it to
case seqOfPairs: Seq[(Double, Double)] //actually ((Double, Double) is erased in compile time)
I hope this will get you on the right way.

Getting a HashMap from Scala's HashMap.mapValues?

The example below is a self-contained example I've extracted from my larger app.
Is there a better way to get a HashMap after calling mapValues below? I'm new to Scala, so it's very likely that I'm going about this all wrong, in which case feel free to suggest a completely different approach. (An apparently obvious solution would be to move the logic in the mapValues to inside the accum but that would be tricky in the larger app.)
#!/bin/sh
exec scala "$0" "$#"
!#
import scala.collection.immutable.HashMap
case class Quantity(val name: String, val amount: Double)
class PercentsUsage {
type PercentsOfTotal = HashMap[String, Double]
var quantities = List[Quantity]()
def total: Double = (quantities map { t => t.amount }).sum
def addQuantity(qty: Quantity) = {
quantities = qty :: quantities
}
def percentages: PercentsOfTotal = {
def accum(m: PercentsOfTotal, qty: Quantity) = {
m + (qty.name -> (qty.amount + (m getOrElse (qty.name, 0.0))))
}
val emptyMap = new PercentsOfTotal()
// The `emptyMap ++` at the beginning feels clumsy, but it does the
// job of giving me a PercentsOfTotal as the result of the method.
emptyMap ++ (quantities.foldLeft(emptyMap)(accum(_, _)) mapValues (dollars => dollars / total))
}
}
val pu = new PercentsUsage()
pu.addQuantity(new Quantity("A", 100))
pu.addQuantity(new Quantity("B", 400))
val pot = pu.percentages
println(pot("A")) // prints 0.2
println(pot("B")) // prints 0.8
Rather than using a mutable HashMap to build up your Map, you can just use scala collections' built in groupBy function. This creates a map from the grouping property to a list of the values in that group, which can then be aggregated, e.g. by taking a sum:
def percentages: Map[String, Double] = {
val t = total
quantities.groupBy(_.name).mapValues(_.map(_.amount).sum / t)
}
This pipeline transforms your List[Quantity] => Map[String, List[Quantity]] => Map[String, Double] giving you the desired result.