Merge two Seq to create a Map - scala

I have an object such as:
case class Person(name: String, number: Int)
And two Sequences of this object:
Seq(("abc", 1), ("def", 2))
Seq(("abc", 300), ("xyz", 400))
I want to merge these two sequences in a single Map whose key is the names and values this separate object:
case class CombineObject(firstNumber: Option[Int], secondNumber: Option[Int])
So that my final map would look like:
Map(
"abc" -> CombineObject(Some(1), Some(300)),
"def" -> CombineObject(Some(2), None)),
"xyz" -> CombineObject(None, Some(400))
)
All I can think here is to run 2 for loops over the sequence to create the map. Is there any better way to solve the problem?

Turn each Seq into its own Map. After that it's pretty easy.
case class Person( name : String
, number : Int )
val s1 = Seq(Person("abc",1),Person("def",2))
val s2 = Seq(Person("abc",300),Person("xyz",400))
val m1 = s1.foldLeft(Map.empty[String,Int]){case (m,p) => m+(p.name->p.number)}
val m2 = s2.foldLeft(Map.empty[String,Int]){case (m,p) => m+(p.name->p.number)}
case class CombineObject( firstNumber : Option[Int]
, secondNumber : Option[Int] )
val res = (m1.keySet ++ m2.keySet).foldLeft(Map.empty[String,CombineObject]){
case (m,k) => m+(k -> CombineObject(m1.get(k),m2.get(k)))
}
//res: Map[String,CombineObject] = Map(abc -> CombineObject(Some(1),Some(300))
// , def -> CombineObject(Some(2),None)
// , xyz -> CombineObject(None,Some(400)))
This assumes that each Seq has no duplicate name entries. It's not obvious how that situation should be handled.

Another proposal with a recursive function. First, it sorts both lists by key then does the processing.
case class Person(
name: String,
number: Int
)
case class CombineObject(
firstNumber : Option[Int],
secondNumber : Option[Int]
)
val left = List(Person("abc", 1), Person("def", 2))
val right = List(Person("abc", 300), Person("xyz", 400))
def merge(left: List[Person], right: List[Person]): Map[String, CombineObject] = {
#tailrec
def doMerge(left: List[Person], right: List[Person], acc: Map[String, CombineObject] = Map.empty): Map[String, CombineObject] = {
(left, right) match {
case(Person(name1, number1) :: xs, Person(name2, number2) :: ys) =>
if(name1 == name2) {
doMerge(xs, ys, acc + (name1 -> CombineObject(Some(number1), Some(number2))))
} else {
doMerge(xs, ys, acc + (name2 -> CombineObject(None, Some(number2))) + (name1 -> CombineObject(Some(number1), None)))
}
//if both lists are always same size, next two cases are not needed
case (Nil, Person(name2, number2) :: ys) =>
doMerge(Nil, ys, acc + (name2 -> CombineObject(None, Some(number2))) )
case (Person(name1, name2) :: xs, Nil) =>
doMerge(xs, Nil, acc + (name1 -> CombineObject(None, Some(name2))))
case _ => acc
}
}
doMerge(left.sortBy(_.name), right.sortBy(_.name))
}
merge(left, right) //Map(xyz -> (None,Some(400)), def -> (Some(2),None), abc -> (Some(1),Some(300)))
Looks kind of scary :)

Another potential variation
case class Person(name : String, number : Int)
case class CombineObject(firstNumber : Option[Int], secondNumber : Option[Int])
val s1 = Seq(Person("abc",1),Person("def",2))
val s2 = Seq(Person("abc",300),Person("xyz",400))
(s1.map(_-> 1) ++ s2.map(_ -> 2))
.groupBy { case (person, seqTag) => person.name }
.mapValues {
case List((Person(name1, number1), _), (Person(name2, number2), _)) => CombineObject(Some(number1), Some(number2))
case List((Person(name, number), seqTag)) => if (seqTag == 1) CombineObject(Some(number), None) else CombineObject(None, Some(number))
case Nil => CombineObject(None, None)
}
which outputs
res1: Map[String,CombineObject] = Map(abc -> CombineObject(Some(1),Some(300)), xyz -> CombineObject(None,Some(400)), def -> CombineObject(Some(2),None)

Yet another solution, probably arguable :) ...
import scala.collection.immutable.TreeMap
case class CombineObject(firstNumber : Option[Int], secondNumber : Option[Int])
case class Person(name : String,number : Int)
val seq1 = Seq(Person("abc",1),Person("def",2))
val seq2 = Seq(Person("abc",300),Person("xyz",400))
def toExhaustiveMap(seq1:Seq[Person], seq2:Seq[Person]) = TreeMap(
seq1.map { case Person(s, i) => s -> Some(i) }: _*
) ++ ((seq2.map(_.name) diff seq1.map(_.name)).map(_ -> None))
val result = (toExhaustiveMap(seq1,seq2) zip toExhaustiveMap(seq2,seq1)).map {
case ((name1, number1), (_, number2)) => name1 -> CombineObject(number1, number2)
}
println(result)
Map(abc -> CombineObject(Some(1),Some(300)), def -> CombineObject(Some(2),None), xyz -> CombineObject(None,Some(400)))
Hope it helps.

Yet another alternative, if performance isn't a priority:
// val seq1 = Seq(("abc", 1), ("def", 2))
// val seq2 = Seq(("abc", 300), ("xyz", 400))
(seq1 ++ seq2)
.toMap
.keys
.map(k => k -> CombineObject(
seq1.collectFirst { case (`k`, v) => v },
seq2.collectFirst { case (`k`, v) => v }
))
.toMap
// Map(
// "abc" -> CombineObject(Some(1), Some(300)),
// "def" -> CombineObject(Some(2), None),
// "xyz" -> CombineObject(None, Some(400))
// )

Related

How to keep attribute after reduce?

Using below code I'm attempting to output the name and the sum of the ages for each person :
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Sink, Source}
object CalculateMeanInStream extends App {
implicit val actorSystem = ActorSystem()
case class Person(name: String, age: Double)
val personSource = Source(List(Person("1", 30),Person("1", 20),Person("1", 20),Person("1", 30),Person("2", 2)))
val meanPrintSink = Sink.foreach[Double](println)
val printSink = Sink.foreach[Double](println)
def calculateMean(values: List[Double]): Double = {
values.sum / values.size
}
personSource.groupBy(maxSubstreams = 2 , s => s.name)
.map(m => m.age)
.reduce(_ + _ )
.mergeSubstreams
.runForeach(println)
}
The output is :
2.0
100.0
Is there a way to keep the persons name as part of the reduce so that the following is produced in the output :
(2.0 , 2)
(100.0 , 1)
I've tried :
personSource.groupBy(maxSubstreams = 2 , s => s.name)
.reduce((x , y) => x.age + y.age)
.mergeSubstreams
.runForeach(println)
but throws compiler error :
type mismatch;
found : Double
required: CalculateMeanInStream.Person
.reduce((x , y) => x.age + y.age)
personSource
.groupBy(maxSubstreams = 2, s => s.name)
.reduce((person1, person2) => Person(person1.name, person1.age + person2.age))
.mergeSubstreams
.runForeach(println)
There might be a more elegant way but I'd do it like this:
personSource
.groupBy(maxSubstreams = 2, s => s.name)
.map(x => x.name -> x.age)
.reduce { case ((a, b) , (_, d)) => (a, b + d) }
.mergeSubstreams
.runForeach(println)
You can use fold(), which would let you skip the map. Instead of the map and reduce lines.
Just write as follows:
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Sink, Source}
object CalculateMeanInStream extends App {
implicit val actorSystem: ActorSystem = ActorSystem()
case class Person(name: String, age: Double)
val personSource = Source(List(Person("1", 30),Person("1", 20),Person("1", 20),Person("1", 30),Person("2", 2)))
val meanPrintSink = Sink.foreach[Double](println)
val printSink = Sink.foreach[Double](println)
def calculateMean(values: List[Double]): Double = {
values.sum / values.size
}
val reducer: ((String, Double), (String, Double)) => (String, Double) =
(person, accPerson) => (person._1, person._2 + accPerson._2)
personSource.groupBy(maxSubstreams = 2 , s => s.name)
.fold((0D, "")){ case ((sum, _), x) => (sum + x.age, x.name )}
.mergeSubstreams
.runForeach(println)
.onComplete(_ => actorSystem.terminate())(actorSystem.dispatcher)
}
Your output will be:
(2.0, 2)
(100.0, 1)
as you need per requirements.

How do I remove an element from a list by value?

I am currently working on a function that takes in a Map[String, List[String]] and a String as arguments. The map contains a user Id and the IDs of films that they liked. What I need to do is, to return a List[List[String]] which contains the other movies that where liked by the user who liked the movie that was passed into the function.
The function declaration looks as follows:
def movies(m: Map[String, List[String]], mov: String) : List[List[String]]= {
}
So lets imagine the following:
val m1 : [Map[Int, List[String]]] = Map(1 ‐> List("b", "a"), 2 ‐> List("y", "x"), 3 ‐> List("c", "a"))
val movieID = "a"
movies(m1, movieId)
This should return:
List(List("b"), List("c"))
I have tried using
m1.filter(x => x._2.contains(movieID))
So that only Lists containing movieID are kept in the map, but my problem is that I need to remove movieID from every list it occurs in, and then return the result as a List[List[String]].
You could use collect:
val m = Map("1" -> List("b", "a"), "2" -> List("y", "x"), "3" -> List("c", "a"))
def movies(m: Map[String, List[String]], mov: String) = m.collect {
case (_, l) if l.contains(mov) => l.filterNot(_ == mov)
}
movies(m, "a") //List(List(b), List(c))
Problem with this approach is, that it would iterate over every movie list twice, the first time with contains and the second time with filterNot. We could optimize it tail-recursive function, which would look for element and if found just return list without it:
import scala.annotation.tailrec
def movies(m: Map[String, List[String]], mov: String) = {
#tailrec
def withoutElement[T](l: List[T], mov: T, acc: List[T] = Nil): Option[List[T]] = {
l match {
case x :: xs if x == mov => Some(acc.reverse ++ xs)
case x :: xs => withoutElement(xs, mov, x :: acc)
case Nil => None
}
}
m.values.flatMap(withoutElement(_, mov))
}
The solution from Krzysztof is a good one. Here's an alternate way to traverse every List just once.
def movies(m: Map[String, List[String]], mov: String) =
m.values.toList.flatMap{ss =>
val tpl = ss.foldLeft((false, List.empty[String])){
case ((_,res), `mov`) => (true, res)
case ((keep,res), str) => (keep, str::res)
}
if (tpl._1) Some(tpl._2) else None
}
This should work for you:
object DemoAbc extends App {
val m1 = Map(1 -> List("b", "a"), 2 -> List("y", "x"), 3 -> List("c", "a"))
val movieID = "a"
def movies(m: Map[Int, List[String]], mov: String): List[List[String]] = {
val ans = m.foldLeft(List.empty[List[String]])((a: List[List[String]], b: (Int, List[String])) => {
if (b._2.contains(mov))
b._2.filter(_ != mov) :: a
else a
})
ans
}
print(movies(m1, movieID))
}

How to convert digital information(e.g.KB,MB,B) units in Scala?

I tried to unify the digital information unit in Scala.How to use combineByKey() to operate on value.
val avg = scores.combineByKey(
(v) => (v, 1),
(acc: (Float, Int), v) => (acc._1 + v, acc._2 + 1),
(acc1:(Float, Int), acc2:(Float, Int)) => (acc1._1 + acc2._1, acc1._2 + acc2._2)
)
I tried to use this function, but don't know how to separate letters and numbers.
e.g.
input:
(text1,3B)
(text1,45KB)
(text2,88MB)
(text2,98KB)
(text3,25B)
output:
(text1,List(3B,46080B))
(text2,List(92274688B,100352B))
(text3,List(25B))
Not sure if I understood your question, but from your example it looks like you are looking for something like this:
val input = Seq(
("text1","3B"),
("text1","45KB"),
("text2","88MB"),
("text2","98KB"),
("text3","25B")
)
val grouped = input.groupBy(_._1)
// grouped:Map[String, Seq[(String, String)]] = Map("text3" -> List(("text3", "25B")), "text2" -> List(("text2", "88MB"), ("text2", "98KB")), "text1" -> List(("text1", "3B"), ("text1", "45KB")))
val sizesByText = grouped.mapValues(_.map(_._2))
// sizeByText = Map[String, Seq[String]] = Map("text3" -> List("25B"), "text2" -> List("88MB", "98KB"), "text1" -> List("3B", "45KB"))
val PATTERN = """(\d+)(B|KB|MB)""".r
def byteSize(size: String) = {
val bytes = size match {
case PATTERN(amount, "B") => amount.toInt
case PATTERN(amount, "KB") => (amount.toInt * 1024)
case PATTERN(amount, "MB") => (amount.toInt * 1024 * 1024)
case _ => throw new IllegalArgumentException(s"invalid size: $size")
}
bytes + "B"
}
val output = sizesByText.mapValues(_.map(byteSize)).toSeq
//output: Seq[(String, Seq[String])] = Vector(("text3", List("25B")), ("text2", List("92274688B", "100352B")), ("text1", List("3B", "46080B")))

How to group objects using a classifier function in FS2?

I have a stream of unordered measurements, that I'd like to group into batches of a fixed size, so that I can persist them efficiently later:
val measurements = for {
id <- Seq("foo", "bar", "baz")
value <- 1 to 5
} yield (id, value)
fs2.Stream.emits(scala.util.Random.shuffle(measurements)).toVector
That is, instead of:
(bar,4)
(foo,5)
(baz,3)
(baz,5)
(baz,4)
(foo,2)
(bar,2)
(foo,4)
(baz,1)
(foo,1)
(foo,3)
(bar,1)
(bar,5)
(bar,3)
(baz,2)
I'd like to have the following structure for a batch size equal to 3:
(bar,[4,2,1])
(foo,[5,2,4])
(baz,[3,5,4])
(baz,[1,2])
(foo,[1,3])
(bar,[5,3])
Is there a simple, idiomatic way to achieve this in FS2? I know there's a groupAdjacentBy function, but this will take into account neighbouring items only.
I'm on 0.10.5 at the moment.
This can be achieved with fs2 Pull:
import cats.data.{NonEmptyList => Nel}
import fs2._
object GroupingByKey {
def groupByKey[F[_], K, V](limit: Int): Pipe[F, (K, V), (K, Nel[V])] = {
require(limit >= 1)
def go(state: Map[K, List[V]]): Stream[F, (K, V)] => Pull[F, (K, Nel[V]), Unit] = _.pull.uncons1.flatMap {
case Some(((key, num), tail)) =>
val prev = state.getOrElse(key, Nil)
if (prev.size == limit - 1) {
val group = Nel.ofInitLast(prev.reverse, num)
Pull.output1(key -> group) >> go(state - key)(tail)
} else {
go(state.updated(key, num :: prev))(tail)
}
case None =>
val chunk = Chunk.vector {
state
.toVector
.collect { case (key, last :: revInit) =>
val group = Nel.ofInitLast(revInit.reverse, last)
key -> group
}
}
Pull.output(chunk) >> Pull.done
}
go(Map.empty)(_).stream
}
}
Usage:
import cats.data.{NonEmptyList => Nel}
import cats.implicits._
import cats.effect.{ExitCode, IO, IOApp}
import fs2._
object Answer extends IOApp {
type Key = String
override def run(args: List[String]): IO[ExitCode] = {
require {
Stream('a -> 1).through(groupByKey(2)).compile.toList ==
List('a -> Nel.one(1))
}
require {
Stream('a -> 1, 'a -> 2).through(groupByKey(2)).compile.toList ==
List('a -> Nel.of(1, 2))
}
require {
Stream('a -> 1, 'a -> 2, 'a -> 3).through(groupByKey(2)).compile.toList ==
List('a -> Nel.of(1, 2), 'a -> Nel.one(3))
}
val infinite = (for {
prng <- Stream.eval(IO { new scala.util.Random() })
keys <- Stream(Vector[Key]("a", "b", "c", "d", "e", "f", "g"))
key = Stream.eval(IO {
val i = prng.nextInt(keys.size)
keys(i)
})
num = Stream.eval(IO { 1 + prng.nextInt(9) })
} yield (key zip num).repeat).flatten
infinite
.through(groupByKey(3))
.showLinesStdOut
.compile
.drain
.as(ExitCode.Success)
}
}

Error processing scala list

def trainBestSeller(events: RDD[BuyEvent], n: Int, itemStringIntMap: BiMap[String, Int]): Map[String, Array[(Int, Int)]] = {
val itemTemp = events
// map item from string to integer index
.flatMap {
case BuyEvent(user, item, category, count) if itemStringIntMap.contains(item) =>
Some((itemStringIntMap(item),category),count)
case _ => None
}
// cache to use for next times
.cache()
// top view with each category:
val bestSeller_Category: Map[String, Array[(Int, Int)]] = itemTemp.reduceByKey(_ + _)
.map(row => (row._1._2, (row._1._1, row._2)))
.groupByKey
.map { case (c, itemCounts) =>
(c, itemCounts.toArray.sortBy(_._2)(Ordering.Int.reverse).take(n))
}
.collectAsMap.toMap
// top view with all category => cateogory ALL
val bestSeller_All: Map[String, Array[(Int, Int)]] = itemTemp.reduceByKey(_ + _)
.map(row => ("ALL", (row._1._1, row._2)))
.groupByKey
.map {
case (c, itemCounts) =>
(c, itemCounts.toArray.sortBy(_._2)(Ordering.Int.reverse).take(n))
}
.collectAsMap.toMap
// merge 2 map bestSeller_All and bestSeller_Category
val bestSeller = bestSeller_Category ++ bestSeller_All
bestSeller
}
List processing
Your list processing seems okay. I did a small recheck
def main( args: Array[String] ) : Unit = {
case class JString(x: Int)
case class CompactBuffer(x: Int, y: Int)
val l = List( JString(2435), JString(3464))
val tuple: (List[JString], CompactBuffer) = ( List( JString(2435), JString(3464)), CompactBuffer(1,4) )
val result: List[(JString, CompactBuffer)] = tuple._1.map((_, tuple._2))
val result2: List[(JString, CompactBuffer)] = {
val l = tuple._1
val cb = tuple._2
l.map( x => (x,cb) )
}
println(result)
println(result2)
}
Result is (as expected)
List((JString(2435),CompactBuffer(1,4)), (JString(3464),CompactBuffer(1,4)))
Further analysis
Analysis is required, if that does not solve your problem:
Where are types JStream (from org.json4s.JsonAST ?) and CompactBuffer ( Spark I suppose ) from?
How exactly looks the code, that creates pair ? What exactly are you doing? Please provide code excerpts!