Error processing scala list

Error processing scala list - scala

def trainBestSeller(events: RDD[BuyEvent], n: Int, itemStringIntMap: BiMap[String, Int]): Map[String, Array[(Int, Int)]] = {
val itemTemp = events
// map item from string to integer index
.flatMap {
case BuyEvent(user, item, category, count) if itemStringIntMap.contains(item) =>
Some((itemStringIntMap(item),category),count)
case _ => None
}
// cache to use for next times
.cache()
// top view with each category:
val bestSeller_Category: Map[String, Array[(Int, Int)]] = itemTemp.reduceByKey(_ + _)
.map(row => (row._1._2, (row._1._1, row._2)))
.groupByKey
.map { case (c, itemCounts) =>
(c, itemCounts.toArray.sortBy(_._2)(Ordering.Int.reverse).take(n))
}
.collectAsMap.toMap
// top view with all category => cateogory ALL
val bestSeller_All: Map[String, Array[(Int, Int)]] = itemTemp.reduceByKey(_ + _)
.map(row => ("ALL", (row._1._1, row._2)))
.groupByKey
.map {
case (c, itemCounts) =>
(c, itemCounts.toArray.sortBy(_._2)(Ordering.Int.reverse).take(n))
}
.collectAsMap.toMap
// merge 2 map bestSeller_All and bestSeller_Category
val bestSeller = bestSeller_Category ++ bestSeller_All
bestSeller
}

List processing
Your list processing seems okay. I did a small recheck
def main( args: Array[String] ) : Unit = {
case class JString(x: Int)
case class CompactBuffer(x: Int, y: Int)
val l = List( JString(2435), JString(3464))
val tuple: (List[JString], CompactBuffer) = ( List( JString(2435), JString(3464)), CompactBuffer(1,4) )
val result: List[(JString, CompactBuffer)] = tuple._1.map((_, tuple._2))
val result2: List[(JString, CompactBuffer)] = {
val l = tuple._1
val cb = tuple._2
l.map( x => (x,cb) )
}
println(result)
println(result2)
}
Result is (as expected)
List((JString(2435),CompactBuffer(1,4)), (JString(3464),CompactBuffer(1,4)))
Further analysis
Analysis is required, if that does not solve your problem:
Where are types JStream (from org.json4s.JsonAST ?) and CompactBuffer ( Spark I suppose ) from?
How exactly looks the code, that creates pair ? What exactly are you doing? Please provide code excerpts!

Related

how to get the index of the duplicate pair in the a list using scala

I have a Scala list below :
val numList = List(1,2,3,4,5,1,2)
I want to get index of the same element pair in the list. The output should look like (0,5),(1,6)
How can I achieve using map?
def catchDuplicates(num : List[Int]) : (Int , Int) = {
val count = 0;
val emptyMap: HashMap[Int, Int] = HashMap.empty[Int, Int]
for (i <- num)
if (emptyMap.contains(i)) {
emptyMap.put(i, (emptyMap.get(i)) + 1) }
else {
emptyMap.put(i, 1)
}
}

Let's make the challenge a little more interesting.
val numList = List(1,2,3,4,5,1,2,1)
Now the result should be something like (0, 5, 7),(1, 6), which makes it pretty clear that returning one or more tuples is not going to be feasible. Returning a List of List[Int] would make much more sense.
def catchDuplicates(nums: List[Int]): List[List[Int]] =
nums.zipWithIndex //List[(Int,Int)]
.groupMap(_._1)(_._2) //Map[Int,List[Int]]
.values //Iterable[List[Int]]
.filter(_.lengthIs > 1)
.toList //List[List[Int]]
You might also add a .view in order to minimize the number of traversals and intermediate collections created.
def catchDuplicates(nums: List[Int]): List[List[Int]] =
nums.view
.zipWithIndex
.groupMap(_._1)(_._2)
.collect{case (_,vs) if vs.sizeIs > 1 => vs.toList}
.toList

How can I achieve using map?
You can't.
Because you only want to return the indexes of the elements that appear twice; which is a very different kind of transformation than the one that map expects.
You can use foldLeft thought.
object catchDuplicates {
final case class Result[A](elem: A, firstIdx: Int, secondIdx: Int)
private final case class State[A](seenElements: Map[A, Int], duplicates: List[Result[A]]) {
def next(elem: A, idx: Int): State[A] =
seenElements.get(key = elem).fold(
ifEmpty = this.copy(seenElements = this.seenElements + (elem -> idx))
) { firstIdx =>
State(
seenElements = this.seenElements.removed(key = elem),
duplicates = Result(elem, firstIdx, secondIdx = idx) :: this.duplicates
)
}
}
private object State {
def initial[A]: State[A] =
State(
seenElements = Map.empty,
duplicates = List.empty
)
}
def apply[A](data: List[A]): List[Result[A]] =
data.iterator.zipWithIndex.foldLeft(State.initial[A]) {
case (acc, (elem, idx)) =>
acc.next(elem, idx)
}.duplicates // You may add a reverse here if order is important.
}
Which can be used like this:
val numList = List(1,2,3,4,5,1,2)
val result = catchDuplicates(numList)
// result: List[Result] = List(Result(2,1,6), Result(1,0,5))
You can see the code running here.

I think returning tuple is not a good option instead you should try Map like -
object FindIndexOfDupElement extends App {
val numList = List(1, 2, 3, 4, 5, 1, 2)
#tailrec
def findIndex(elems: List[Int], res: Map[Int, List[Int]] = Map.empty, index: Int = 0): Map[Int, List[Int]] = {
elems match {
case head :: rest =>
if (res.get(head).isEmpty) {
findIndex(rest, res ++ Map(head -> (index :: Nil)), index + 1)
} else {
val updatedMap: Map[Int, List[Int]] = res.map {
case (key, indexes) if key == head => (key, (indexes :+ index))
case (key, indexes) => (key, indexes)
}
findIndex(rest, updatedMap, index + 1)
}
case _ => res
}
}
println(findIndex(numList).filter(x => x._2.size > 1))
}
you can clearly see the number(key) and respective index in the map -
HashMap(1 -> List(0, 5), 2 -> List(1, 6))

Improve Two sum problem using Map in scala

I am trying to solve Two sum problem using scala
val list = List(1,2,3,4,5)
val map = collection.mutable.Map.empty[Int, Int]
val sum = 9
for {
i <- 0 until list.size
} yield {
map.get(sum - list(i)) match {
case None => map += (list(i) -> i)
case Some(previousIndex) => println(s" Indexes $previousIndex $i")
}
}
Can anyone suggest an O(n) solution without using mutable map using scala

If you are trying to solve "Two sum problem" - meaning you need from given list find two numbers which gives sum equal to given, can go with:
val list = List(1,2,3,4,5)
val sum = 9
val set = list.toSet
val solution = list.flatMap { item =>
val rest = sum - item
val min = Math.min(item, rest)
val max = Math.max(item, rest)
if (set(rest)) Some(min, max) else None
}.toSet
println(solution)
Print result:
Set((4,5))
ScalaFiddle: https://scalafiddle.io/sf/LA6P3eh/0
UPDATE
The result required to return indices not values:
val list = List(1,2,3,4,5)
val sum = 9
val inputMap = list.zipWithIndex.toMap
val solution = list.zipWithIndex.flatMap { case (item, itemIndex) =>
inputMap.get(sum - item).map { restIndex =>
val minIndex = Math.min(itemIndex, restIndex)
val maxIndex = Math.max(itemIndex, restIndex)
minIndex -> maxIndex
}
}.toSet
println(solution)
Printout: Set((3,4))
ScalaFiddle: https://scalafiddle.io/sf/LA6P3eh/1

You can try something as follows for the first result:
object Solution extends App {
def twoSums(xs: List[Int], target: Int): Option[(Int,Int)] = {
#annotation.tailrec def go(zipped: List[(Int,Int)], map: Map[Int,Int] = Map.empty): Option[(Int,Int)] = {
zipped match {
case Nil => None
case (ele, idx) :: tail =>
map.get(target - ele) match {
case Some(prevIdx) => Some((prevIdx, idx))
case None => go(tail, map + (ele -> idx))
}
}
}
go(xs.zipWithIndex)
}
val res = twoSums(List(1,2,3,4,5), 9)
println(res)
}
Or via foldLeft for all results:
object Solution extends App {
def twoSums(xs: List[Int], target: Int): List[(Int, Int)] = {
xs.zipWithIndex.foldLeft((Map.empty[Int,Int], List.empty[(Int,Int)])) {
case ((map, results), (ele, idx)) =>
map.get(target - ele) match {
case Some(prevIdx) =>(map, (prevIdx, idx) :: results)
case None => (map + (ele -> idx), results)
}
}
}._2
val res = twoSums(List(1,2,3,4,5), 9)
println(res)
}

How do I remove an element from a list by value?

I am currently working on a function that takes in a Map[String, List[String]] and a String as arguments. The map contains a user Id and the IDs of films that they liked. What I need to do is, to return a List[List[String]] which contains the other movies that where liked by the user who liked the movie that was passed into the function.
The function declaration looks as follows:
def movies(m: Map[String, List[String]], mov: String) : List[List[String]]= {
}
So lets imagine the following:
val m1 : [Map[Int, List[String]]] = Map(1 ‐> List("b", "a"), 2 ‐> List("y", "x"), 3 ‐> List("c", "a"))
val movieID = "a"
movies(m1, movieId)
This should return:
List(List("b"), List("c"))
I have tried using
m1.filter(x => x._2.contains(movieID))
So that only Lists containing movieID are kept in the map, but my problem is that I need to remove movieID from every list it occurs in, and then return the result as a List[List[String]].

You could use collect:
val m = Map("1" -> List("b", "a"), "2" -> List("y", "x"), "3" -> List("c", "a"))
def movies(m: Map[String, List[String]], mov: String) = m.collect {
case (_, l) if l.contains(mov) => l.filterNot(_ == mov)
}
movies(m, "a") //List(List(b), List(c))
Problem with this approach is, that it would iterate over every movie list twice, the first time with contains and the second time with filterNot. We could optimize it tail-recursive function, which would look for element and if found just return list without it:
import scala.annotation.tailrec
def movies(m: Map[String, List[String]], mov: String) = {
#tailrec
def withoutElement[T](l: List[T], mov: T, acc: List[T] = Nil): Option[List[T]] = {
l match {
case x :: xs if x == mov => Some(acc.reverse ++ xs)
case x :: xs => withoutElement(xs, mov, x :: acc)
case Nil => None
}
}
m.values.flatMap(withoutElement(_, mov))
}

The solution from Krzysztof is a good one. Here's an alternate way to traverse every List just once.
def movies(m: Map[String, List[String]], mov: String) =
m.values.toList.flatMap{ss =>
val tpl = ss.foldLeft((false, List.empty[String])){
case ((_,res), `mov`) => (true, res)
case ((keep,res), str) => (keep, str::res)
}
if (tpl._1) Some(tpl._2) else None
}

This should work for you:
object DemoAbc extends App {
val m1 = Map(1 -> List("b", "a"), 2 -> List("y", "x"), 3 -> List("c", "a"))
val movieID = "a"
def movies(m: Map[Int, List[String]], mov: String): List[List[String]] = {
val ans = m.foldLeft(List.empty[List[String]])((a: List[List[String]], b: (Int, List[String])) => {
if (b._2.contains(mov))
b._2.filter(_ != mov) :: a
else a
})
ans
}
print(movies(m1, movieID))
}

groupby scala list of string

I am facing a problem to calculate the sum of elements in Scala having the same title (my key in this case).
Currently my input can be described as:
val listInput1 =
List(
"itemA,CATA,2,4 ",
"itemA,CATA,3,1 ",
"itemB,CATB,4,5",
"itemB,CATB,4,6"
)
val listInput2 =
List(
"itemA,CATA,2,4 ",
"itemB,CATB,4,5",
"itemC,CATC,1,2"
)
The required output for lists in input should be
val listoutput1 =
List(
"itemA,CATA,5,5 ",
"itemB,CATB,8,11"
)
val listoutput2 =
List(
"itemA , CATA, 2,4 ",
"itemB,CATB,4,5",
"itemC,CATC,1,2"
)
I wrote the following function:
def sumByTitle(listInput: List[String]): List[String] =
listInput.map(_.split(",")).groupBy(_(0)).map {
case (title, features) =>
"%s,%s,%d,%d".format(
title,
features.head.apply(1),
features.map(_(2).toInt).sum,
features.map(_(3).toInt).sum)}.toList
It doesn't give me the expected result as it changes the order of lines.
How can I fix that?

The ListMap is designed to preserve the order of items inserted to the Map.
import collection.immutable.ListMap
def sumByTitle(listInput: List[String]): List[String] = {
val itemPttrn = raw"(.*)(\d+),(\d+)\s*".r
listInput.foldLeft(ListMap.empty[String, (Int,Int)].withDefaultValue((0,0))) {
case (lm, str) =>
val itemPttrn(k, a, b) = str //unsafe
val (x, y) = lm(k)
lm.updated(k, (a.toInt + x, b.toInt + y))
}.toList.map { case (k, (a, b)) => s"$k$a,$b" }
}
This is a bit unsafe in that it will throw if the input string doesn't match the regex pattern.
sumByTitle(listInput1)
//res0: List[String] = List(itemA,CATA,5,5, itemB,CATB,8,11)
sumByTitle(listInput2)
//res1: List[String] = List(itemA,CATA,2,4, itemB,CATB,4,5, itemC,CATC,1,2)
You'll note that the trailing space, if there is one, is not preserved.

If you are just interested in sorting you can simply return the sorted list:
val listInput1 =
List(
"itemA , CATA, 2,4 ",
"itemA , CATA, 3,1 ",
"itemB,CATB,4,5",
"itemB,CATB,4,6"
)
val listInput2 =
List(
"itemA , CATA, 2,4 ",
"itemB,CATB,4,5",
"itemC,CATC,1,2"
)
def sumByTitle(listInput: List[String]): List[String] =
listInput.map(_.split(",")).groupBy(_(0)).map {
case (title, features) =>
"%s,%s,%d,%d".format(
title,
features.head.apply(1),
features.map(_(2).trim.toInt).sum,
features.map(_(3).trim.toInt).sum)}.toList.sorted
println("LIST 1")
sumByTitle(listInput1).foreach(println)
println("LIST 2")
sumByTitle(listInput2).foreach(println)
You can find the code on Scastie for you to play around with.
As a side note, you may be interested in separating the serialization and deserialization from your business logic.
Here you can find another Scastie notebook with a relatively naive approach for a first step towards separating concerns.

def foldByTitle(listInput: List[String]): List[Item] =
listInput.map(Item.parseItem).foldLeft(List.empty[Item])(sumByTitle)
val sumByTitle: (List[Item], Item) => List[Item] = (acc, curr) =>
acc.find(_.name == curr.name).fold(curr :: acc) { i =>
acc.filterNot(_.name == curr.name) :+ i.copy(num1 = i.num1 + curr.num1, num2 = i.num2 + curr.num2)
}
case class Item(name: String, category: String, num1: Int, num2: Int)
object Item {
def parseItem(serializedItem: String): Item = {
val itemTokens = serializedItem.split(",").map(_.trim)
Item(itemTokens.head, itemTokens(1), itemTokens(2).toInt, itemTokens(3).toInt)
}
}
This way the initial order of the elements to kept.

group by with foldleft scala

I have the following list in input:
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
and I want to write a function in scala using groupBy and foldleft ( just one function) in order to sum up third and fourth colum for lines having the same title(first column here), the wanted output is :
val listOutput1 =
List(
"itemA,CATS,5,5",
"itemB,CATQ,8,11",
"itemC,CARC,5,10"
)
def sumIndex (listIn:List[String]):List[String]={
listIn.map(_.split(",")).groupBy(_(0)).map{
case (title, label) =>
"%s,%s,%d,%d".format(
title,
label.head.apply(1),
label.map(_(2).toInt).sum,
label.map(_(3).toInt).sum)}.toList
}
Kind regards

The logic in your code looks sound, here it is with a case class implemented as that handles edge cases more cleanly:
// represents a 'row' in the original list
case class Item(
name: String,
category: String,
amount: Int,
price: Int
)
// safely converts the row of strings into case class, throws exception otherwise
def stringsToItem(strings: Array[String]): Item = {
if (strings.length != 4) {
throw new Exception(s"Invalid row: ${strings.foreach(print)}; must contain only 4 entries!")
} else {
val n = strings.headOption.getOrElse("N/A")
val cat = strings.lift(1).getOrElse("N/A")
val amt = strings.lift(2).filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
val p = strings.lastOption.filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
Item(n, cat, amt, p)
}
}
// original code with case class and method above used
listInput1.map(_.split(","))
.map(stringsToItem)
.groupBy(_.name)
.map { case (name, items) =>
Item(
name,
category = items.head.category,
amount = items.map(_.amount).sum,
price = items.map(_.price).sum
)
}.toList

You can solve it with a single foldLeft, iterating the input list only once. Use a Map to aggregate the result.
listInput1.map(_.split(",")).foldLeft(Map.empty[String, Int]) {
(acc: Map[String, Int], curr: Array[String]) =>
val label: String = curr(0)
val oldValue: Int = acc.getOrElse(label, 0)
val newValue: Int = oldValue + curr(2).toInt + curr(3).toInt
acc.updated(label, newValue)
}
result: Map(itemA -> 10, itemB -> 19, itemC -> 15)

If you have a list as
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
Then you can write a general function that can be used with foldLeft and reduceLeft as
def accumulateLeft(x: Map[String, Tuple3[String, Int, Int]], y: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and you can call them as
foldLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldLeft(Map.empty[String, Tuple3[String, Int, Int]])(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res0: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
reduceLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceLeft(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res1: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
Similarly you can just interchange the variables in the general function so that it can be used with foldRight and reduceRight as
def accumulateRight(y: Map[String, Tuple3[String, Int, Int]], x: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and calling the function would give you
foldRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldRight(Map.empty[String, Tuple3[String, Int, Int]])(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res2: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
reduceRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceRight(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res3: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
So you don't really need a groupBy and can use any of the foldLeft, foldRight, reduceLeft or reduceRight functions to get your desired output.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Error processing scala list - scala

Related

how to get the index of the duplicate pair in the a list using scala

Improve Two sum problem using Map in scala

How do I remove an element from a list by value?

groupby scala list of string

group by with foldleft scala

Categories

Resources