How to keep attribute after reduce?

How to keep attribute after reduce? - scala

Using below code I'm attempting to output the name and the sum of the ages for each person :
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Sink, Source}
object CalculateMeanInStream extends App {
implicit val actorSystem = ActorSystem()
case class Person(name: String, age: Double)
val personSource = Source(List(Person("1", 30),Person("1", 20),Person("1", 20),Person("1", 30),Person("2", 2)))
val meanPrintSink = Sink.foreach[Double](println)
val printSink = Sink.foreach[Double](println)
def calculateMean(values: List[Double]): Double = {
values.sum / values.size
}
personSource.groupBy(maxSubstreams = 2 , s => s.name)
.map(m => m.age)
.reduce(_ + _ )
.mergeSubstreams
.runForeach(println)
}
The output is :
2.0
100.0
Is there a way to keep the persons name as part of the reduce so that the following is produced in the output :
(2.0 , 2)
(100.0 , 1)
I've tried :
personSource.groupBy(maxSubstreams = 2 , s => s.name)
.reduce((x , y) => x.age + y.age)
.mergeSubstreams
.runForeach(println)
but throws compiler error :
type mismatch;
found : Double
required: CalculateMeanInStream.Person
.reduce((x , y) => x.age + y.age)

personSource
.groupBy(maxSubstreams = 2, s => s.name)
.reduce((person1, person2) => Person(person1.name, person1.age + person2.age))
.mergeSubstreams
.runForeach(println)

There might be a more elegant way but I'd do it like this:
personSource
.groupBy(maxSubstreams = 2, s => s.name)
.map(x => x.name -> x.age)
.reduce { case ((a, b) , (_, d)) => (a, b + d) }
.mergeSubstreams
.runForeach(println)

You can use fold(), which would let you skip the map. Instead of the map and reduce lines.
Just write as follows:
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Sink, Source}
object CalculateMeanInStream extends App {
implicit val actorSystem: ActorSystem = ActorSystem()
case class Person(name: String, age: Double)
val personSource = Source(List(Person("1", 30),Person("1", 20),Person("1", 20),Person("1", 30),Person("2", 2)))
val meanPrintSink = Sink.foreach[Double](println)
val printSink = Sink.foreach[Double](println)
def calculateMean(values: List[Double]): Double = {
values.sum / values.size
}
val reducer: ((String, Double), (String, Double)) => (String, Double) =
(person, accPerson) => (person._1, person._2 + accPerson._2)
personSource.groupBy(maxSubstreams = 2 , s => s.name)
.fold((0D, "")){ case ((sum, _), x) => (sum + x.age, x.name )}
.mergeSubstreams
.runForeach(println)
.onComplete(_ => actorSystem.terminate())(actorSystem.dispatcher)
}
Your output will be:
(2.0, 2)
(100.0, 1)
as you need per requirements.

Related

How to convert digital information(e.g.KB,MB,B) units in Scala?

I tried to unify the digital information unit in Scala.How to use combineByKey() to operate on value.
val avg = scores.combineByKey(
(v) => (v, 1),
(acc: (Float, Int), v) => (acc._1 + v, acc._2 + 1),
(acc1:(Float, Int), acc2:(Float, Int)) => (acc1._1 + acc2._1, acc1._2 + acc2._2)
)
I tried to use this function, but don't know how to separate letters and numbers.
e.g.
input:
(text1,3B)
(text1,45KB)
(text2,88MB)
(text2,98KB)
(text3,25B)
output:
(text1,List(3B,46080B))
(text2,List(92274688B,100352B))
(text3,List(25B))

Not sure if I understood your question, but from your example it looks like you are looking for something like this:
val input = Seq(
("text1","3B"),
("text1","45KB"),
("text2","88MB"),
("text2","98KB"),
("text3","25B")
)
val grouped = input.groupBy(_._1)
// grouped:Map[String, Seq[(String, String)]] = Map("text3" -> List(("text3", "25B")), "text2" -> List(("text2", "88MB"), ("text2", "98KB")), "text1" -> List(("text1", "3B"), ("text1", "45KB")))
val sizesByText = grouped.mapValues(_.map(_._2))
// sizeByText = Map[String, Seq[String]] = Map("text3" -> List("25B"), "text2" -> List("88MB", "98KB"), "text1" -> List("3B", "45KB"))
val PATTERN = """(\d+)(B|KB|MB)""".r
def byteSize(size: String) = {
val bytes = size match {
case PATTERN(amount, "B") => amount.toInt
case PATTERN(amount, "KB") => (amount.toInt * 1024)
case PATTERN(amount, "MB") => (amount.toInt * 1024 * 1024)
case _ => throw new IllegalArgumentException(s"invalid size: $size")
}
bytes + "B"
}
val output = sizesByText.mapValues(_.map(byteSize)).toSeq
//output: Seq[(String, Seq[String])] = Vector(("text3", List("25B")), ("text2", List("92274688B", "100352B")), ("text1", List("3B", "46080B")))

Merge two Seq to create a Map

I have an object such as:
case class Person(name: String, number: Int)
And two Sequences of this object:
Seq(("abc", 1), ("def", 2))
Seq(("abc", 300), ("xyz", 400))
I want to merge these two sequences in a single Map whose key is the names and values this separate object:
case class CombineObject(firstNumber: Option[Int], secondNumber: Option[Int])
So that my final map would look like:
Map(
"abc" -> CombineObject(Some(1), Some(300)),
"def" -> CombineObject(Some(2), None)),
"xyz" -> CombineObject(None, Some(400))
)
All I can think here is to run 2 for loops over the sequence to create the map. Is there any better way to solve the problem?

Turn each Seq into its own Map. After that it's pretty easy.
case class Person( name : String
, number : Int )
val s1 = Seq(Person("abc",1),Person("def",2))
val s2 = Seq(Person("abc",300),Person("xyz",400))
val m1 = s1.foldLeft(Map.empty[String,Int]){case (m,p) => m+(p.name->p.number)}
val m2 = s2.foldLeft(Map.empty[String,Int]){case (m,p) => m+(p.name->p.number)}
case class CombineObject( firstNumber : Option[Int]
, secondNumber : Option[Int] )
val res = (m1.keySet ++ m2.keySet).foldLeft(Map.empty[String,CombineObject]){
case (m,k) => m+(k -> CombineObject(m1.get(k),m2.get(k)))
}
//res: Map[String,CombineObject] = Map(abc -> CombineObject(Some(1),Some(300))
// , def -> CombineObject(Some(2),None)
// , xyz -> CombineObject(None,Some(400)))
This assumes that each Seq has no duplicate name entries. It's not obvious how that situation should be handled.

Another proposal with a recursive function. First, it sorts both lists by key then does the processing.
case class Person(
name: String,
number: Int
)
case class CombineObject(
firstNumber : Option[Int],
secondNumber : Option[Int]
)
val left = List(Person("abc", 1), Person("def", 2))
val right = List(Person("abc", 300), Person("xyz", 400))
def merge(left: List[Person], right: List[Person]): Map[String, CombineObject] = {
#tailrec
def doMerge(left: List[Person], right: List[Person], acc: Map[String, CombineObject] = Map.empty): Map[String, CombineObject] = {
(left, right) match {
case(Person(name1, number1) :: xs, Person(name2, number2) :: ys) =>
if(name1 == name2) {
doMerge(xs, ys, acc + (name1 -> CombineObject(Some(number1), Some(number2))))
} else {
doMerge(xs, ys, acc + (name2 -> CombineObject(None, Some(number2))) + (name1 -> CombineObject(Some(number1), None)))
}
//if both lists are always same size, next two cases are not needed
case (Nil, Person(name2, number2) :: ys) =>
doMerge(Nil, ys, acc + (name2 -> CombineObject(None, Some(number2))) )
case (Person(name1, name2) :: xs, Nil) =>
doMerge(xs, Nil, acc + (name1 -> CombineObject(None, Some(name2))))
case _ => acc
}
}
doMerge(left.sortBy(_.name), right.sortBy(_.name))
}
merge(left, right) //Map(xyz -> (None,Some(400)), def -> (Some(2),None), abc -> (Some(1),Some(300)))
Looks kind of scary :)

Another potential variation
case class Person(name : String, number : Int)
case class CombineObject(firstNumber : Option[Int], secondNumber : Option[Int])
val s1 = Seq(Person("abc",1),Person("def",2))
val s2 = Seq(Person("abc",300),Person("xyz",400))
(s1.map(_-> 1) ++ s2.map(_ -> 2))
.groupBy { case (person, seqTag) => person.name }
.mapValues {
case List((Person(name1, number1), _), (Person(name2, number2), _)) => CombineObject(Some(number1), Some(number2))
case List((Person(name, number), seqTag)) => if (seqTag == 1) CombineObject(Some(number), None) else CombineObject(None, Some(number))
case Nil => CombineObject(None, None)
}
which outputs
res1: Map[String,CombineObject] = Map(abc -> CombineObject(Some(1),Some(300)), xyz -> CombineObject(None,Some(400)), def -> CombineObject(Some(2),None)

Yet another solution, probably arguable :) ...
import scala.collection.immutable.TreeMap
case class CombineObject(firstNumber : Option[Int], secondNumber : Option[Int])
case class Person(name : String,number : Int)
val seq1 = Seq(Person("abc",1),Person("def",2))
val seq2 = Seq(Person("abc",300),Person("xyz",400))
def toExhaustiveMap(seq1:Seq[Person], seq2:Seq[Person]) = TreeMap(
seq1.map { case Person(s, i) => s -> Some(i) }: _*
) ++ ((seq2.map(_.name) diff seq1.map(_.name)).map(_ -> None))
val result = (toExhaustiveMap(seq1,seq2) zip toExhaustiveMap(seq2,seq1)).map {
case ((name1, number1), (_, number2)) => name1 -> CombineObject(number1, number2)
}
println(result)
Map(abc -> CombineObject(Some(1),Some(300)), def -> CombineObject(Some(2),None), xyz -> CombineObject(None,Some(400)))
Hope it helps.

Yet another alternative, if performance isn't a priority:
// val seq1 = Seq(("abc", 1), ("def", 2))
// val seq2 = Seq(("abc", 300), ("xyz", 400))
(seq1 ++ seq2)
.toMap
.keys
.map(k => k -> CombineObject(
seq1.collectFirst { case (`k`, v) => v },
seq2.collectFirst { case (`k`, v) => v }
))
.toMap
// Map(
// "abc" -> CombineObject(Some(1), Some(300)),
// "def" -> CombineObject(Some(2), None),
// "xyz" -> CombineObject(None, Some(400))
// )

scala Map with Option/Some gives match error

The following code is producing run time error as below. Could reason why the following error. Please explain.
Exception in thread "main" scala.MatchError: Some(Some(List(17))) (of class scala.Some)
at com.discrete.CountingSupp$.$anonfun$tuplesWithRestrictions1$1(CountingSupp.scala:43)
def tuplesWithRestrictions1(): (Int, Map[Int, Option[List[Int]]]) = {
val df = new DecimalFormat("#")
df.setMaximumFractionDigits(0)
val result = ((0 until 1000) foldLeft[(Int, Map[Int, Option[List[Int]]])] ((0, Map.empty[Int, Option[List[Int]]]))) {
(r: (Int, Map[Int, Option[List[Int]]]), x: Int) => {
val str = df.format(x).toCharArray
if (str.contains('7')) {
import scala.math._
val v = floor(log10(x)) - 1
val v1 = (pow(10, v)).toInt
val m: Map[Int, Option[List[Int]]] = (r._2).get(v1) match {
case None => r._2 + (v1 -> Some(List(x)))
case Some(xs: List[Int]) => r._2 updated(x, Some(x::xs))
}
val f = (r._1 + 1, m)
f
} else r
}
}
result
}

Return type of .get on map is
get(k: K): Option[V]
Scala doc
/** Optionally returns the value associated with a key.
*
* #param key the key value
* #return an option value containing the value associated with `key` in this map,
* or `None` if none exists.
*/
def get(key: K): Option[V]
Now,
r._2.get(v1) returns an option of Value. So the final return type would be Option[Option[List[Int]]].
You are trying to pattern match for Option[T] but the real value type is Option[Option[Int]] which is not captured in the match.
Use r._2(v1) to extract the value and match. Throws exception when v1 is not found in map.
Match inside map providing default value.
r._2.get(k1).map {
case None => r._2 + (v1 -> Some(List(x)))
case Some(value) => r._2 updated(x, Some(x::xs))
}.getOrElse(defaultValue)

def tuplesWithRestrictions1(): (Int, Map[Int, List[Int]]) = {
val df = new DecimalFormat("#")
df.setMaximumFractionDigits(0)
val result = ((0 until 1000) foldLeft[(Int, Map[Int, List[Int]])] ((0, Map.empty[Int, List[Int]]))) {
(r: (Int, Map[Int, List[Int]]), x: Int) => {
val str = df.format(x).toCharArray
if (str.contains('7')) {
import scala.math._
val v = floor(log10(x))
val v1 = (pow(10, v)).toInt
val m: Map[Int, List[Int]] = (r._2).get(v1) match {
case Some(xs: List[Int]) => r._2 updated(v1, x :: xs)
case None => r._2 + (v1 -> List(x))
}
val f = (r._1 + 1, m)
f
} else r
}
}
result
}

how to display value of case class in scala

case class Keyword(id: Int = 0, words: String)
val my= Keyword(123, "hello")
val fields: Array[Field] = my.getClass.getDeclaredFields
for (i <- fields.indices) {
println(fields(i).getName +":"+ my.productElement(i))
}
id:123
title:keyword's title
it's ok.
def outputCaseClass[A](obj:A){
val fields: Array[Field] = obj.getClass.getDeclaredFields
for (i <- fields.indices) {
println(fields(i).getName +":"+ obj.productElement(i))
}
}
outputCaseClass(my)
it's wrong

import scala.reflect.runtime.{universe => ru}
def printCaseClassParams[C: scala.reflect.ClassTag](instance: C):Unit = {
val runtimeMirror = ru.runtimeMirror(instance.getClass.getClassLoader)
val instanceMirror = runtimeMirror.reflect(instance)
val tpe = instanceMirror.symbol.toType
tpe.members
.filter(member => member.asTerm.isCaseAccessor && member.asTerm.isMethod)
.map(member => {
val term = member.asTerm
val termName = term.name.toString
val termValue = instanceMirror.reflectField(term).get
termName + ":" + termValue
})
.toList
.reverse
.foreach(s => println(s))
}
// Now you can use it with any case classes,
case class Keyword(id: Int = 0, words: String)
val my = Keyword(123, "hello")
printCaseClassParams(my)
// id:123
// words:hello

productElement is a Method of the Product Base trait.
Try to use a method signature like this:
def outputCaseClass[A <: Product](obj:A){ .. }
However it still won't work for inner case classes (fields also reports the $outer-Field, which productElement won't return and so it crashes with IndexOutOfBoundsException).

Error processing scala list

def trainBestSeller(events: RDD[BuyEvent], n: Int, itemStringIntMap: BiMap[String, Int]): Map[String, Array[(Int, Int)]] = {
val itemTemp = events
// map item from string to integer index
.flatMap {
case BuyEvent(user, item, category, count) if itemStringIntMap.contains(item) =>
Some((itemStringIntMap(item),category),count)
case _ => None
}
// cache to use for next times
.cache()
// top view with each category:
val bestSeller_Category: Map[String, Array[(Int, Int)]] = itemTemp.reduceByKey(_ + _)
.map(row => (row._1._2, (row._1._1, row._2)))
.groupByKey
.map { case (c, itemCounts) =>
(c, itemCounts.toArray.sortBy(_._2)(Ordering.Int.reverse).take(n))
}
.collectAsMap.toMap
// top view with all category => cateogory ALL
val bestSeller_All: Map[String, Array[(Int, Int)]] = itemTemp.reduceByKey(_ + _)
.map(row => ("ALL", (row._1._1, row._2)))
.groupByKey
.map {
case (c, itemCounts) =>
(c, itemCounts.toArray.sortBy(_._2)(Ordering.Int.reverse).take(n))
}
.collectAsMap.toMap
// merge 2 map bestSeller_All and bestSeller_Category
val bestSeller = bestSeller_Category ++ bestSeller_All
bestSeller
}

List processing
Your list processing seems okay. I did a small recheck
def main( args: Array[String] ) : Unit = {
case class JString(x: Int)
case class CompactBuffer(x: Int, y: Int)
val l = List( JString(2435), JString(3464))
val tuple: (List[JString], CompactBuffer) = ( List( JString(2435), JString(3464)), CompactBuffer(1,4) )
val result: List[(JString, CompactBuffer)] = tuple._1.map((_, tuple._2))
val result2: List[(JString, CompactBuffer)] = {
val l = tuple._1
val cb = tuple._2
l.map( x => (x,cb) )
}
println(result)
println(result2)
}
Result is (as expected)
List((JString(2435),CompactBuffer(1,4)), (JString(3464),CompactBuffer(1,4)))
Further analysis
Analysis is required, if that does not solve your problem:
Where are types JStream (from org.json4s.JsonAST ?) and CompactBuffer ( Spark I suppose ) from?
How exactly looks the code, that creates pair ? What exactly are you doing? Please provide code excerpts!