Error processing scala list - scala

def trainBestSeller(events: RDD[BuyEvent], n: Int, itemStringIntMap: BiMap[String, Int]): Map[String, Array[(Int, Int)]] = {
val itemTemp = events
// map item from string to integer index
.flatMap {
case BuyEvent(user, item, category, count) if itemStringIntMap.contains(item) =>
Some((itemStringIntMap(item),category),count)
case _ => None
}
// cache to use for next times
.cache()
// top view with each category:
val bestSeller_Category: Map[String, Array[(Int, Int)]] = itemTemp.reduceByKey(_ + _)
.map(row => (row._1._2, (row._1._1, row._2)))
.groupByKey
.map { case (c, itemCounts) =>
(c, itemCounts.toArray.sortBy(_._2)(Ordering.Int.reverse).take(n))
}
.collectAsMap.toMap
// top view with all category => cateogory ALL
val bestSeller_All: Map[String, Array[(Int, Int)]] = itemTemp.reduceByKey(_ + _)
.map(row => ("ALL", (row._1._1, row._2)))
.groupByKey
.map {
case (c, itemCounts) =>
(c, itemCounts.toArray.sortBy(_._2)(Ordering.Int.reverse).take(n))
}
.collectAsMap.toMap
// merge 2 map bestSeller_All and bestSeller_Category
val bestSeller = bestSeller_Category ++ bestSeller_All
bestSeller
}

List processing
Your list processing seems okay. I did a small recheck
def main( args: Array[String] ) : Unit = {
case class JString(x: Int)
case class CompactBuffer(x: Int, y: Int)
val l = List( JString(2435), JString(3464))
val tuple: (List[JString], CompactBuffer) = ( List( JString(2435), JString(3464)), CompactBuffer(1,4) )
val result: List[(JString, CompactBuffer)] = tuple._1.map((_, tuple._2))
val result2: List[(JString, CompactBuffer)] = {
val l = tuple._1
val cb = tuple._2
l.map( x => (x,cb) )
}
println(result)
println(result2)
}
Result is (as expected)
List((JString(2435),CompactBuffer(1,4)), (JString(3464),CompactBuffer(1,4)))
Further analysis
Analysis is required, if that does not solve your problem:
Where are types JStream (from org.json4s.JsonAST ?) and CompactBuffer ( Spark I suppose ) from?
How exactly looks the code, that creates pair ? What exactly are you doing? Please provide code excerpts!

Related

how to get the index of the duplicate pair in the a list using scala

I have a Scala list below :
val numList = List(1,2,3,4,5,1,2)
I want to get index of the same element pair in the list. The output should look like (0,5),(1,6)
How can I achieve using map?
def catchDuplicates(num : List[Int]) : (Int , Int) = {
val count = 0;
val emptyMap: HashMap[Int, Int] = HashMap.empty[Int, Int]
for (i <- num)
if (emptyMap.contains(i)) {
emptyMap.put(i, (emptyMap.get(i)) + 1) }
else {
emptyMap.put(i, 1)
}
}
Let's make the challenge a little more interesting.
val numList = List(1,2,3,4,5,1,2,1)
Now the result should be something like (0, 5, 7),(1, 6), which makes it pretty clear that returning one or more tuples is not going to be feasible. Returning a List of List[Int] would make much more sense.
def catchDuplicates(nums: List[Int]): List[List[Int]] =
nums.zipWithIndex //List[(Int,Int)]
.groupMap(_._1)(_._2) //Map[Int,List[Int]]
.values //Iterable[List[Int]]
.filter(_.lengthIs > 1)
.toList //List[List[Int]]
You might also add a .view in order to minimize the number of traversals and intermediate collections created.
def catchDuplicates(nums: List[Int]): List[List[Int]] =
nums.view
.zipWithIndex
.groupMap(_._1)(_._2)
.collect{case (_,vs) if vs.sizeIs > 1 => vs.toList}
.toList
How can I achieve using map?
You can't.
Because you only want to return the indexes of the elements that appear twice; which is a very different kind of transformation than the one that map expects.
You can use foldLeft thought.
object catchDuplicates {
final case class Result[A](elem: A, firstIdx: Int, secondIdx: Int)
private final case class State[A](seenElements: Map[A, Int], duplicates: List[Result[A]]) {
def next(elem: A, idx: Int): State[A] =
seenElements.get(key = elem).fold(
ifEmpty = this.copy(seenElements = this.seenElements + (elem -> idx))
) { firstIdx =>
State(
seenElements = this.seenElements.removed(key = elem),
duplicates = Result(elem, firstIdx, secondIdx = idx) :: this.duplicates
)
}
}
private object State {
def initial[A]: State[A] =
State(
seenElements = Map.empty,
duplicates = List.empty
)
}
def apply[A](data: List[A]): List[Result[A]] =
data.iterator.zipWithIndex.foldLeft(State.initial[A]) {
case (acc, (elem, idx)) =>
acc.next(elem, idx)
}.duplicates // You may add a reverse here if order is important.
}
Which can be used like this:
val numList = List(1,2,3,4,5,1,2)
val result = catchDuplicates(numList)
// result: List[Result] = List(Result(2,1,6), Result(1,0,5))
You can see the code running here.
I think returning tuple is not a good option instead you should try Map like -
object FindIndexOfDupElement extends App {
val numList = List(1, 2, 3, 4, 5, 1, 2)
#tailrec
def findIndex(elems: List[Int], res: Map[Int, List[Int]] = Map.empty, index: Int = 0): Map[Int, List[Int]] = {
elems match {
case head :: rest =>
if (res.get(head).isEmpty) {
findIndex(rest, res ++ Map(head -> (index :: Nil)), index + 1)
} else {
val updatedMap: Map[Int, List[Int]] = res.map {
case (key, indexes) if key == head => (key, (indexes :+ index))
case (key, indexes) => (key, indexes)
}
findIndex(rest, updatedMap, index + 1)
}
case _ => res
}
}
println(findIndex(numList).filter(x => x._2.size > 1))
}
you can clearly see the number(key) and respective index in the map -
HashMap(1 -> List(0, 5), 2 -> List(1, 6))

Improve Two sum problem using Map in scala

I am trying to solve Two sum problem using scala
val list = List(1,2,3,4,5)
val map = collection.mutable.Map.empty[Int, Int]
val sum = 9
for {
i <- 0 until list.size
} yield {
map.get(sum - list(i)) match {
case None => map += (list(i) -> i)
case Some(previousIndex) => println(s" Indexes $previousIndex $i")
}
}
Can anyone suggest an O(n) solution without using mutable map using scala
If you are trying to solve "Two sum problem" - meaning you need from given list find two numbers which gives sum equal to given, can go with:
val list = List(1,2,3,4,5)
val sum = 9
val set = list.toSet
val solution = list.flatMap { item =>
val rest = sum - item
val min = Math.min(item, rest)
val max = Math.max(item, rest)
if (set(rest)) Some(min, max) else None
}.toSet
println(solution)
Print result:
Set((4,5))
ScalaFiddle: https://scalafiddle.io/sf/LA6P3eh/0
UPDATE
The result required to return indices not values:
val list = List(1,2,3,4,5)
val sum = 9
val inputMap = list.zipWithIndex.toMap
val solution = list.zipWithIndex.flatMap { case (item, itemIndex) =>
inputMap.get(sum - item).map { restIndex =>
val minIndex = Math.min(itemIndex, restIndex)
val maxIndex = Math.max(itemIndex, restIndex)
minIndex -> maxIndex
}
}.toSet
println(solution)
Printout: Set((3,4))
ScalaFiddle: https://scalafiddle.io/sf/LA6P3eh/1
You can try something as follows for the first result:
object Solution extends App {
def twoSums(xs: List[Int], target: Int): Option[(Int,Int)] = {
#annotation.tailrec def go(zipped: List[(Int,Int)], map: Map[Int,Int] = Map.empty): Option[(Int,Int)] = {
zipped match {
case Nil => None
case (ele, idx) :: tail =>
map.get(target - ele) match {
case Some(prevIdx) => Some((prevIdx, idx))
case None => go(tail, map + (ele -> idx))
}
}
}
go(xs.zipWithIndex)
}
val res = twoSums(List(1,2,3,4,5), 9)
println(res)
}
Or via foldLeft for all results:
object Solution extends App {
def twoSums(xs: List[Int], target: Int): List[(Int, Int)] = {
xs.zipWithIndex.foldLeft((Map.empty[Int,Int], List.empty[(Int,Int)])) {
case ((map, results), (ele, idx)) =>
map.get(target - ele) match {
case Some(prevIdx) =>(map, (prevIdx, idx) :: results)
case None => (map + (ele -> idx), results)
}
}
}._2
val res = twoSums(List(1,2,3,4,5), 9)
println(res)
}

How do I remove an element from a list by value?

I am currently working on a function that takes in a Map[String, List[String]] and a String as arguments. The map contains a user Id and the IDs of films that they liked. What I need to do is, to return a List[List[String]] which contains the other movies that where liked by the user who liked the movie that was passed into the function.
The function declaration looks as follows:
def movies(m: Map[String, List[String]], mov: String) : List[List[String]]= {
}
So lets imagine the following:
val m1 : [Map[Int, List[String]]] = Map(1 ‐> List("b", "a"), 2 ‐> List("y", "x"), 3 ‐> List("c", "a"))
val movieID = "a"
movies(m1, movieId)
This should return:
List(List("b"), List("c"))
I have tried using
m1.filter(x => x._2.contains(movieID))
So that only Lists containing movieID are kept in the map, but my problem is that I need to remove movieID from every list it occurs in, and then return the result as a List[List[String]].
You could use collect:
val m = Map("1" -> List("b", "a"), "2" -> List("y", "x"), "3" -> List("c", "a"))
def movies(m: Map[String, List[String]], mov: String) = m.collect {
case (_, l) if l.contains(mov) => l.filterNot(_ == mov)
}
movies(m, "a") //List(List(b), List(c))
Problem with this approach is, that it would iterate over every movie list twice, the first time with contains and the second time with filterNot. We could optimize it tail-recursive function, which would look for element and if found just return list without it:
import scala.annotation.tailrec
def movies(m: Map[String, List[String]], mov: String) = {
#tailrec
def withoutElement[T](l: List[T], mov: T, acc: List[T] = Nil): Option[List[T]] = {
l match {
case x :: xs if x == mov => Some(acc.reverse ++ xs)
case x :: xs => withoutElement(xs, mov, x :: acc)
case Nil => None
}
}
m.values.flatMap(withoutElement(_, mov))
}
The solution from Krzysztof is a good one. Here's an alternate way to traverse every List just once.
def movies(m: Map[String, List[String]], mov: String) =
m.values.toList.flatMap{ss =>
val tpl = ss.foldLeft((false, List.empty[String])){
case ((_,res), `mov`) => (true, res)
case ((keep,res), str) => (keep, str::res)
}
if (tpl._1) Some(tpl._2) else None
}
This should work for you:
object DemoAbc extends App {
val m1 = Map(1 -> List("b", "a"), 2 -> List("y", "x"), 3 -> List("c", "a"))
val movieID = "a"
def movies(m: Map[Int, List[String]], mov: String): List[List[String]] = {
val ans = m.foldLeft(List.empty[List[String]])((a: List[List[String]], b: (Int, List[String])) => {
if (b._2.contains(mov))
b._2.filter(_ != mov) :: a
else a
})
ans
}
print(movies(m1, movieID))
}

groupby scala list of string

I am facing a problem to calculate the sum of elements in Scala having the same title (my key in this case).
Currently my input can be described as:
val listInput1 =
List(
"itemA,CATA,2,4 ",
"itemA,CATA,3,1 ",
"itemB,CATB,4,5",
"itemB,CATB,4,6"
)
val listInput2 =
List(
"itemA,CATA,2,4 ",
"itemB,CATB,4,5",
"itemC,CATC,1,2"
)
The required output for lists in input should be
val listoutput1 =
List(
"itemA,CATA,5,5 ",
"itemB,CATB,8,11"
)
val listoutput2 =
List(
"itemA , CATA, 2,4 ",
"itemB,CATB,4,5",
"itemC,CATC,1,2"
)
I wrote the following function:
def sumByTitle(listInput: List[String]): List[String] =
listInput.map(_.split(",")).groupBy(_(0)).map {
case (title, features) =>
"%s,%s,%d,%d".format(
title,
features.head.apply(1),
features.map(_(2).toInt).sum,
features.map(_(3).toInt).sum)}.toList
It doesn't give me the expected result as it changes the order of lines.
How can I fix that?
The ListMap is designed to preserve the order of items inserted to the Map.
import collection.immutable.ListMap
def sumByTitle(listInput: List[String]): List[String] = {
val itemPttrn = raw"(.*)(\d+),(\d+)\s*".r
listInput.foldLeft(ListMap.empty[String, (Int,Int)].withDefaultValue((0,0))) {
case (lm, str) =>
val itemPttrn(k, a, b) = str //unsafe
val (x, y) = lm(k)
lm.updated(k, (a.toInt + x, b.toInt + y))
}.toList.map { case (k, (a, b)) => s"$k$a,$b" }
}
This is a bit unsafe in that it will throw if the input string doesn't match the regex pattern.
sumByTitle(listInput1)
//res0: List[String] = List(itemA,CATA,5,5, itemB,CATB,8,11)
sumByTitle(listInput2)
//res1: List[String] = List(itemA,CATA,2,4, itemB,CATB,4,5, itemC,CATC,1,2)
You'll note that the trailing space, if there is one, is not preserved.
If you are just interested in sorting you can simply return the sorted list:
val listInput1 =
List(
"itemA , CATA, 2,4 ",
"itemA , CATA, 3,1 ",
"itemB,CATB,4,5",
"itemB,CATB,4,6"
)
val listInput2 =
List(
"itemA , CATA, 2,4 ",
"itemB,CATB,4,5",
"itemC,CATC,1,2"
)
def sumByTitle(listInput: List[String]): List[String] =
listInput.map(_.split(",")).groupBy(_(0)).map {
case (title, features) =>
"%s,%s,%d,%d".format(
title,
features.head.apply(1),
features.map(_(2).trim.toInt).sum,
features.map(_(3).trim.toInt).sum)}.toList.sorted
println("LIST 1")
sumByTitle(listInput1).foreach(println)
println("LIST 2")
sumByTitle(listInput2).foreach(println)
You can find the code on Scastie for you to play around with.
As a side note, you may be interested in separating the serialization and deserialization from your business logic.
Here you can find another Scastie notebook with a relatively naive approach for a first step towards separating concerns.
def foldByTitle(listInput: List[String]): List[Item] =
listInput.map(Item.parseItem).foldLeft(List.empty[Item])(sumByTitle)
val sumByTitle: (List[Item], Item) => List[Item] = (acc, curr) =>
acc.find(_.name == curr.name).fold(curr :: acc) { i =>
acc.filterNot(_.name == curr.name) :+ i.copy(num1 = i.num1 + curr.num1, num2 = i.num2 + curr.num2)
}
case class Item(name: String, category: String, num1: Int, num2: Int)
object Item {
def parseItem(serializedItem: String): Item = {
val itemTokens = serializedItem.split(",").map(_.trim)
Item(itemTokens.head, itemTokens(1), itemTokens(2).toInt, itemTokens(3).toInt)
}
}
This way the initial order of the elements to kept.

group by with foldleft scala

I have the following list in input:
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
and I want to write a function in scala using groupBy and foldleft ( just one function) in order to sum up third and fourth colum for lines having the same title(first column here), the wanted output is :
val listOutput1 =
List(
"itemA,CATS,5,5",
"itemB,CATQ,8,11",
"itemC,CARC,5,10"
)
def sumIndex (listIn:List[String]):List[String]={
listIn.map(_.split(",")).groupBy(_(0)).map{
case (title, label) =>
"%s,%s,%d,%d".format(
title,
label.head.apply(1),
label.map(_(2).toInt).sum,
label.map(_(3).toInt).sum)}.toList
}
Kind regards
The logic in your code looks sound, here it is with a case class implemented as that handles edge cases more cleanly:
// represents a 'row' in the original list
case class Item(
name: String,
category: String,
amount: Int,
price: Int
)
// safely converts the row of strings into case class, throws exception otherwise
def stringsToItem(strings: Array[String]): Item = {
if (strings.length != 4) {
throw new Exception(s"Invalid row: ${strings.foreach(print)}; must contain only 4 entries!")
} else {
val n = strings.headOption.getOrElse("N/A")
val cat = strings.lift(1).getOrElse("N/A")
val amt = strings.lift(2).filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
val p = strings.lastOption.filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
Item(n, cat, amt, p)
}
}
// original code with case class and method above used
listInput1.map(_.split(","))
.map(stringsToItem)
.groupBy(_.name)
.map { case (name, items) =>
Item(
name,
category = items.head.category,
amount = items.map(_.amount).sum,
price = items.map(_.price).sum
)
}.toList
You can solve it with a single foldLeft, iterating the input list only once. Use a Map to aggregate the result.
listInput1.map(_.split(",")).foldLeft(Map.empty[String, Int]) {
(acc: Map[String, Int], curr: Array[String]) =>
val label: String = curr(0)
val oldValue: Int = acc.getOrElse(label, 0)
val newValue: Int = oldValue + curr(2).toInt + curr(3).toInt
acc.updated(label, newValue)
}
result: Map(itemA -> 10, itemB -> 19, itemC -> 15)
If you have a list as
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
Then you can write a general function that can be used with foldLeft and reduceLeft as
def accumulateLeft(x: Map[String, Tuple3[String, Int, Int]], y: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and you can call them as
foldLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldLeft(Map.empty[String, Tuple3[String, Int, Int]])(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res0: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
reduceLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceLeft(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res1: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
Similarly you can just interchange the variables in the general function so that it can be used with foldRight and reduceRight as
def accumulateRight(y: Map[String, Tuple3[String, Int, Int]], x: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and calling the function would give you
foldRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldRight(Map.empty[String, Tuple3[String, Int, Int]])(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res2: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
reduceRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceRight(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res3: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
So you don't really need a groupBy and can use any of the foldLeft, foldRight, reduceLeft or reduceRight functions to get your desired output.