Aggregate realtime data in KStreams

Aggregate realtime data in KStreams - scala

I want to sum one columns data based on the key specified. Stream is like id(String) Key, value(Long).
val aggtimelogs: KTable[String, java.lang.Long] = stream
.groupByKey()
.aggregate(
() => 0L,
(key: String, value: java.lang.Long, aggregate: java.lang.Long) => value + aggregate)
//Failing here
Getting
Unspecified value parameters: : Materialized[K, VR, KeyValueStore[Bytes, Array[Byte]]
How to do in Scala?
Kafka version is
compile "org.apache.kafka:kafka-clients:2.0.0"
compile (group: "org.apache.kafka", name: "kafka-streams", version: "2.0.0"){
exclude group:"com.fasterxml.jackson.core"
}
Even I tried this
val reducer = new Reducer[java.lang.Long]() {
def apply(value1: java.lang.Long, value2: java.lang.Long): java.lang.Long = value1 + value2
}
val agg = stream
.groupByKey()
.reduce(reducer)
Also this
val reducer : Reducer[Long] = (value1: Long, value2: Long) => value1 + value2
Says
StreamAggregation.scala:39: type mismatch;
found : (Long, Long) => Long
required: org.apache.kafka.streams.kstream.Reducer[Long]
val reducer : Reducer[Long] = (value1: Long, value2: Long) => value1 + value2

I did it like this
val aggVal = streams.groupByKey().reduce(new Reducer[Double]() {
def apply(val1: Double, val2: Double): Double = val1 + val2
})

Related

Scala Play Framework: cannot generate object from json with null values

I'm new to Scala and the Play Framework. I have written the following controller:
#Singleton
class MyController #Inject()(val controllerComponents: ControllerComponents) extends BaseController {
implicit val newMeasurementJson: OFormat[MeasurementModel] = Json.format[MeasurementModel]
def addMeasurement(): Action[AnyContent] = Action { implicit request =>
val content = request.body
val jsonObject: Option[JsValue] = content.asJson
val measurement: Option[MeasurementModel] =
jsonObject.flatMap(
Json.fromJson[MeasurementModel](_).asOpt
)
...
}
...
}
Where the endpoint receives the following JSON:
{
"sensor_id": "1029",
"sensor_type": "BME280",
"location": 503,
"lat": 48.12,
"lon": 11.488,
"timestamp": "2022-04-05T00:34:24",
"pressure": 94667.38,
"altitude": null,
"pressure_sealevel": null,
"temperature": 3.91,
"humidity": 65.85
}
MeasurementModel looks like this:
case class MeasurementModel(
sensor_id: String,
sensor_type: String,
location: Int,
lat: Float,
lon: Float,
timestamp: String,
pressure: Float,
altitude: Int,
pressure_sealevel: Int,
temperature: Float,
humidity: Float) {
}
Through testing I have seen that the null values in the JSON are causing the creation of the measurement object to be unsuccessful. How can I successfully handle null values and have them set in the generated MeasurementModel object?

The datatypes that can store null are Null and Option[].
Consider the following REPL code:
`
scala> val mightBeIntOrNull: Option[Int] = Option(1)
val a: Option[Int] = Some(1)
scala> val mightBeIntOrNull: Option[Int] = null
val a: Option[Int] = null
The Option wraps the Int value in Some, which can be extracted by pattern matching.
scala> val mightBeIntOrNull: Option[Int] = Option(1)
val mightBeIntOrNull: Option[Int] = Some(1)
scala> mightBeIntOrNull match {
| case Some(myIntVal) => println("This is an integer :" + myIntVal)
| case _ => println("This might be a null")
| }
This is an integer :1
scala> val mightBeIntOrNull: Option[Int] = null
val mightBeIntOrNull: Option[Int] = null
scala> mightBeIntOrNull match {
| case Some(myIntVal) => println("This is an integer :" + myIntVal)
| case _ => println("This might be a null")
| }
This might be a null
As Gaël J mentioned, you should add Option for the desired datatype in your case class
So the solution can be to wrap the datatype in option where you expect a null. Like:
{
"altitude": Option[Float],
"sensor_type": Option[String],
}

Convert Timestamp value to Double value in Scala

So I have changed the "t" that is TimeStamp and i have to convert it in Double. I defined this class:
case class RawData(sessionId: String,
t: Double,
channel: Int,
signalName: String,
physicalValue: Double,
messageId: Long,
vehicleId: String)
And I have problem casting "t" into double in this code:
def raw(): Unit = {
import rawData.sqlContext.implicits._
//TODO solve timestamp
val datDMY = rawData
.map(row => {
cal.setTimeInMillis(row.t.)
RawDataExtended(
row.sessionId,
row.t,
row.channel,
row.signalName,
row.physicalValue,
row.messageId,
cal.get(Calendar.YEAR),
cal.get(Calendar.MONTH) + 1,
cal.get(Calendar.DAY_OF_MONTH)
)
})

Maybe this could help you:
case class InputModel(id: Int, time: Timestamp)
case class Foo(id: Int, timeLong: Long, timeDouble: Double)
val xs = Seq((1, Timestamp.from(Instant.now())), (2, Timestamp.from(Instant.now()))).toDF("id", "time")
val ys = xs
.select('id, 'time cast (DataTypes.LongType) as "timeLong", 'time cast (DataTypes.DoubleType) as "timeDouble")
.as[Foo]
val zs = xs
.as[InputModel]
.map(row => {
Foo(row.id, row.time.getTime.toLong, row.time.getTime.toDouble)
})
xs.show(false)
ys.show(false)
zs.show(false)
Although be careful time zones and also with precision when converting to Double - notice how Timestampt it is represented as Long compared to Double.

How to choose typeclass by dynamic input in Scala

Have a slick table with columns:
def name: Rep[Option[String]] = ???
def email: Rep[String] = ???
def fraudScores: Rep[Int] = ???
also there is typeclass to calcualte rate for different fields:
trait Rater[T] {
def apply(rep: T): Result
}
object Rater {
def getRater[T](t: T)(implicit rater: Rater[T]): Result = rater(t)
implicit val int: Rater[Rep[Int]] = v => calculateRate(v, _)
implicit val str: Rater[Rep[String]] = v => calculateRate(v, _)
implicit val strOpt: Rater[Rep[Option[String]]] = v => calculateRate(v, _)
}
and map:
val map: Map[String, Rep[_ >: Option[String] with String with Int]] = Map(
"name" -> name,
"email" -> email,
"scores" -> fraudScores
)
what I'd like to do is getting correct instance based on dynamic input, like:
val fname = "scores" // value getting from http request
map.get(fname).fold(default)(f => {
val rater = getRater(f)
rater(someVal)
})
but getting error that there is no implicit for Rep[_ >: Option[String] with String with Int], is there some workaround for this?

I think, your problem is that Map is a wrong way to represent this.
Use a case class:
case class Foo(
name: Rep[Option[String]],
email: Rep[String],
score: Rep[Int]
)
def applyRater[T : Rater](t: T) = implicitly[Rater[T]](t)
def rate(foo: Foo, what: String) = what match {
case "name" => applyRater(foo.name)
case "email" => applyRater(foo.email)
case "score" => applyRater(foo.score)
case _ => default
}

Scala Seq GroupBy with Future

I have 2 case classes
case class First(firstId: Long, pt: Long, vt: Long)
case class Second(secondId: Int, vt: Long, a: Long, b: Long, c: Long, d: Long)
I have one collection (data:Seq[First]). There is one function which transforms this sequence to another Seq[Second] after applying groupBy and one future operation. getFutureInt is some function returns Future[Int]
val output: Future[Seq[Second]] = Future.sequence(data.groupBy(d => (d.vt, getFutureInt(d.firstId))).map
{case(k, v) => k._2.map { si => Second(si, k._1, v.minBy(_.pt).pt,
v.maxBy(_.pt).pt, v.minBy(_.pt).pt, v.maxBy(_.pt).pt)}}.toSeq)
Is there any way to avoid multiple minBy, maxBy?

You can get away with just .min, and .max if you define an Ordering for your class:
implicit val ordering = Ordering.by[First, Long](_.pt)
futures.map { case(k, v) =>
k._2.map { si => Second(si, k._1, v.min.pt, v.max.pt, v.min.pt, v.max.pt) }
}

You could compute those only once :
val output: Future[Seq[Second]] = Future.sequence(data.groupBy(d => (d.vt, getFutureInt(d.firstId))).map
{case(k, v) => k._2.map { si => {
val minV = v.minBy(_.pt)
val maxV = v.maxBy(_.pt)
Second(si, k._1, minV.pt,
maxV.pt, minV.pt, maxV.pt)
}}}.toSeq)

group by with foldleft scala

I have the following list in input:
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
and I want to write a function in scala using groupBy and foldleft ( just one function) in order to sum up third and fourth colum for lines having the same title(first column here), the wanted output is :
val listOutput1 =
List(
"itemA,CATS,5,5",
"itemB,CATQ,8,11",
"itemC,CARC,5,10"
)
def sumIndex (listIn:List[String]):List[String]={
listIn.map(_.split(",")).groupBy(_(0)).map{
case (title, label) =>
"%s,%s,%d,%d".format(
title,
label.head.apply(1),
label.map(_(2).toInt).sum,
label.map(_(3).toInt).sum)}.toList
}
Kind regards

The logic in your code looks sound, here it is with a case class implemented as that handles edge cases more cleanly:
// represents a 'row' in the original list
case class Item(
name: String,
category: String,
amount: Int,
price: Int
)
// safely converts the row of strings into case class, throws exception otherwise
def stringsToItem(strings: Array[String]): Item = {
if (strings.length != 4) {
throw new Exception(s"Invalid row: ${strings.foreach(print)}; must contain only 4 entries!")
} else {
val n = strings.headOption.getOrElse("N/A")
val cat = strings.lift(1).getOrElse("N/A")
val amt = strings.lift(2).filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
val p = strings.lastOption.filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
Item(n, cat, amt, p)
}
}
// original code with case class and method above used
listInput1.map(_.split(","))
.map(stringsToItem)
.groupBy(_.name)
.map { case (name, items) =>
Item(
name,
category = items.head.category,
amount = items.map(_.amount).sum,
price = items.map(_.price).sum
)
}.toList

You can solve it with a single foldLeft, iterating the input list only once. Use a Map to aggregate the result.
listInput1.map(_.split(",")).foldLeft(Map.empty[String, Int]) {
(acc: Map[String, Int], curr: Array[String]) =>
val label: String = curr(0)
val oldValue: Int = acc.getOrElse(label, 0)
val newValue: Int = oldValue + curr(2).toInt + curr(3).toInt
acc.updated(label, newValue)
}
result: Map(itemA -> 10, itemB -> 19, itemC -> 15)

If you have a list as
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
Then you can write a general function that can be used with foldLeft and reduceLeft as
def accumulateLeft(x: Map[String, Tuple3[String, Int, Int]], y: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and you can call them as
foldLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldLeft(Map.empty[String, Tuple3[String, Int, Int]])(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res0: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
reduceLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceLeft(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res1: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
Similarly you can just interchange the variables in the general function so that it can be used with foldRight and reduceRight as
def accumulateRight(y: Map[String, Tuple3[String, Int, Int]], x: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and calling the function would give you
foldRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldRight(Map.empty[String, Tuple3[String, Int, Int]])(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res2: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
reduceRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceRight(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res3: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
So you don't really need a groupBy and can use any of the foldLeft, foldRight, reduceLeft or reduceRight functions to get your desired output.