padTo error inside a foldLeft - scala

I'm learning myself Scala and one of the small test application I wrote just isn't working the way I expect it to. Can someone please help me understand why my test application is failing.
My small test application consists of a "decompress" method that does the following "decompression"
val testList = List(Tuple2(4, 'a'), Tuple2(1, 'b'), Tuple2(2, 'c'), Tuple2(2, 'a'), Tuple2(1, 'd'), Tuple2(4, 'e'))
require(decompress(testList) == List('a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e'))
In other words the Tuple2 objects should just be "decompressed" into a more verbose form. Yet all that I get back from the method is List('a', 'a', 'a', 'a') - the padTo statement works for the first Tuple2 but then it just suddenly stops working? If I however do the padding per element using a for loop - everything works...?
The full code:
object P12 extends App {
def decompress(tList: List[Tuple2[Int,Any]]): List[Any] = {
val startingList: List[Any] = List();
val newList = tList.foldLeft(startingList)((b,a) => {
val padCount = a._1;
val padElement = a._2;
println
println(" Current list: " + b)
println(" Current padCount: " + padCount)
println(" Current padElement: " + padElement)
println(" Padded using padTo: " + b.padTo(padCount, padElement))
println
// This doesn't work
b.padTo(padCount, padElement)
// // This works, yay
// var tmpNewList = b;
// for (i <- 1 to padCount)
// tmpNewList = tmpNewList :+ padElement
// tmpNewList
})
newList
}
val testList = List(Tuple2(4, 'a'), Tuple2(1, 'b'), Tuple2(2, 'c'), Tuple2(2, 'a'), Tuple2(1, 'd'), Tuple2(4, 'e'))
require(decompress(testList) == List('a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e'))
println("Everything is okay!")
}
Any help appreciated - learning Scala, just can't figure out this problem on my own with my current Scala knowledge.

The problem is that padTo actually fills the list up to a given size. So the first time it works with 4 elements padded, but the next time you'll have to add the actual length of the curent list - hence:
def decompress(tList: List[Tuple2[Int,Any]]): List[Any] = {
val newList = tList.foldLeft(List[Any]())((b,a) => {
b.padTo(a._1+b.length, a._2)
})
newList
}

You could do your decompress like this:
val list = List(Tuple2(4, 'a'), Tuple2(1, 'b'), Tuple2(2, 'c'), Tuple2(2, 'a'), Tuple2(1, 'd'), Tuple2(4, 'e'))
list.flatMap{case (times, value) => Seq.fill(times)(value)}

This works:
scala> testList.foldLeft(List[Char]()){ case (xs, (count, elem)) => xs ++ List(elem).padTo(count, elem)}
res7: List[Char] = List(a, a, a, a, b, c, c, a, a, d, e, e, e, e)
The problem actually is that when you say b.padTo(padCount, padElement) you use always the same list (b) to fill up the elements. Because the first tuple data generate the most elements nothing is added in the next step of foldLeft. If you change the second tuple data you will see a change:
scala> val testList = List(Tuple2(3, 'a'), Tuple2(4, 'b'))
testList: List[(Int, Char)] = List((3,a), (4,b))
scala> testList.foldLeft(List[Char]()){ case (xs, (count, elem)) => xs.padTo(count, elem)}
res11: List[Char] = List(a, a, a, b)
Instead of foldLeft you can also use flatMap to generate the elements:
scala> testList flatMap { case (count, elem) => List(elem).padTo(count, elem) }
res8: List[Char] = List(a, a, a, a, b, c, c, a, a, d, e, e, e, e)
By the way, Tuple(3, 'a') can be written (3, 'a') or 3 -> 'a'
Note that padTo doesn't work as expected when you have data with a count of <= 0:
scala> List(0 -> 'a') flatMap { case (count, elem) => List(elem).padTo(count, elem) }
res31: List[Char] = List(a)
Thus use the solution mentioned by Garret Hall:
def decompress[A](xs: Seq[(Int, A)]) =
xs flatMap { case (count, elem) => Seq.fill(count)(elem) }
scala> decompress(List(2 -> 'a', 3 -> 'b', 2 -> 'c', 0 -> 'd'))
res34: Seq[Char] = List(a, a, b, b, b, c, c)
scala> decompress(List(2 -> 0, 3 -> 1, 2 -> 2))
res35: Seq[Int] = List(0, 0, 1, 1, 1, 2, 2)
Using a generic type signature should be referred in order to return always correct type.

Related

Scala: How to merge lists by the first element of the tuple

Let say I have a list:
[(A, a), (A, b), (A, c), (B, a), (B, d)]
How do I make that list into:
[(A, [a,b,c]), (B, [a,d])]
with a single function?
Thanks
The groupBy function allows you to achieve this:
scala> val list = List((1, 'a'), (1, 'b'), (1, 'c'), (2, 'a'), (2, 'd'))
list: List[(Int, Char)] = List((1,a), (1,b), (1,c), (2,a), (2,d))
scala> list.groupBy(_._1) // grouping by the first item in the tuple
res0: scala.collection.immutable.Map[Int,List[(Int, Char)]] = Map(2 -> List((2,a), (2,d)), 1 -> List((1,a), (1,b), (1,c)))
Just doing groupBy won't give you the expected format you desire. So i suggest you write a custom method for this.
def groupTuples[A,B](seq: Seq[(A,B)]): List[(A, List[B])] = {
seq.
groupBy(_._1).
mapValues(_.map(_._2).toList).toList
}
Then then invoke it to get the desired result.
val t = Seq((1,"I"),(1,"AM"),(1, "Koby"),(2,"UP"),(2,"UP"),(2,"AND"),(2,"AWAY"))
groupTuples[Int, String](t)

scala intersection with count

I have a simple question, suppose I have 2 RDDs:
RDD1: [a,b,b,c,c,c,d] RDD2:[a,b,c,d]
and I want to find out how many a,b,c,d are there such that the returned results should be something like:
RDD:[(a,b,c,d),(1,2,3,1)]
It can be easily done using Lists, but in RDD, I seem to have to collect them first into Array and do something like:
count(_==string)
is there something easier that I could work with?
I have very Less knowledge about RDD or Spark. but in scala you can try something like this :
val l1 = List('a', 'b', 'c', 'd')
val l2 = List('a', 'b', 'b', 'c', 'c', 'c', 'd')
def f(l1: List[Char], l2: List[Char]):(List[Char],List[Int]) = {
val count = l1.map {
x => l2.count(_ == x)
}.toList
(l1, count)
}
f(l1,l2)
Output at REPL :
res0: (List[Char], List[Int]) = (List(a, b, c, d),List(1, 2, 3, 1))

Scala - byte array of UTF8 strings

I have a byte array (or more precisely a ByteString) of UTF8 strings, which are prefixed by their length as 2-bytes (msb, lsb). For example:
val z = akka.util.ByteString(0, 3, 'A', 'B', 'C', 0, 5,
'D', 'E', 'F', 'G', 'H',0,1,'I')
I would like to convert this to a list of strings, so it should similar to List("ABC", "DEFGH", "I").
Is there an elegant way to do this?
(EDIT) These strings are NOT null terminated, the 0 you are seeing in the array is just the MSB. If the strings were long enough, the MSB would be greater than zero.
Edit: Updated based on clarification in comments that first 2 bytes define an int. So I converted it manually.
def convert(bs: List[Byte]) : List[String] = {
bs match {
case count_b1 :: count_b2 :: t =>
val count = ((count_b1 & 0xff) << 8) | (count_b2 & 0xff)
val (chars, leftover) = t.splitAt(count)
new String(chars.toArray, "UTF-8") :: convert(leftover)
case _ => List()
}
}
Call convert(z.toList)
Consider multiSpan method as defined here which is a repeated application of span over a given list,
z.multiSpan(_ == 0).map( _.drop(2).map(_.toChar).mkString )
Here the spanning condition is whether an item equals 0, then we drop the first two prefixing bytes, and convert the remaining to a String.
Note On using multiSpan, recall to import annotation.tailrec .
Here is my answer with foldLeft.
def convert(z : ByteString) = z.foldLeft((List() : List[String], ByteString(), 0, 0))((p, b : Byte) => {
p._3 match {
case 0 if p._2.nonEmpty => (p._2.utf8String :: p._1, ByteString(), -1, b.toInt)
case 0 => (p._1, p._2, -1, b.toInt)
case -1 => (p._1, p._2, (p._4 << 8) + b.toInt, 0)
case _ => (p._1, p._2 :+ b, p._3 - 1, 0)
}
})
It works like this:
scala> val bs = ByteString(0, 3, 'A', 'B', 'C', 0, 5, 'D', 'E', 'F', 'G', 'H',0,1,'I')
scala> val k = convert(bs); (k._2.utf8String :: k._1).reverse
k: (List[String], akka.util.ByteString, Int, Int) = (List(DEFGH, ABC),ByteString(73),0,0)
res20: List[String] = List(ABC, DEFGH, I)

How does one replace the first matching item in a list in Scala?

Let's say you have:
List(('a', 1), ('b', 1), ('c', 1), ('b', 1))
and you want to replace the first ('b', 1) with ('b', 2), and you don't want it to (a) waste time evaluating past the first match and (b) update any further matching tuples.
Is there a relatively concise way of doing this in Scala (i.e., without taking the list apart and re-concatenating it). Something like an imaginary function mapFirst that returns the list with the first matching value incremented:
testList.mapFirst { case ('b', num) => ('b', num + 1) }
You don't have to take the whole List apart i guess. (Only until the element is found)
def replaceFirst[A](a : List[A], repl : A, replwith : A) : List[A] = a match {
case Nil => Nil
case head :: tail => if(head == repl) replwith :: tail else head :: replaceFirst(tail, repl, replwith)
}
The call for example:
replaceFirst(List(('a', 1), ('b', 1), ('c', 1), ('b', 1)), ('b', 1), ('b', 2))
Result:
List((a,1), (b,2), (c,1), (b,1))
A way with a partial function and implicits (which looks more like your mapFirst):
implicit class MyRichList[A](val list: List[A]) {
def mapFirst(func: PartialFunction[A, A]) = {
def mapFirst2[A](a: List[A], func: PartialFunction[A, A]): List[A] = a match {
case Nil => Nil
case head :: tail => if (func.isDefinedAt(head)) func.apply(head) :: tail else head :: mapFirst2(tail, func)
}
mapFirst2(list, func)
}
}
And use it like this:
List(('a', 1), ('b', 1), ('c', 1), ('b', 1)).mapFirst {case ('b', num) => ('b', num + 1)}
You can emulate such function relatively easily. The quickest (implementation-wise, not necessarily performance-wise) I could think of was something like this:
def replaceFirst[A](a:List[A], condition: (A)=>Boolean, transform:(A)=>(A)) = {
val cutoff =a.indexWhere(condition)
val (h,t) = a.splitAt(cutoff)
h ++ (transform(t.head) :: t.tail)
}
scala> replaceFirst(List(1,2,3,4,5),{x:Int => x%2==0}, { x:Int=> x*2 })
res4: List[Int] = List(1, 4, 3, 4, 5)
scala> replaceFirst(List(('a',1),('b',2),('c',3),('b',4)), {m:(Char,Int) => m._1=='b'},{m:(Char,Int) => (m._1,m._2*2)})
res6: List[(Char, Int)] = List((a,1), (b,4), (c,3), (b,4))
Using span to find first element only. It shouldn't throw an exception even when case is not satified. Need less to say, you can specify as many cases as you like.
implicit class MyRichieList[A](val l: List[A]) {
def mapFirst(pf : PartialFunction[A, A]) =
l.span(!pf.isDefinedAt(_)) match {
case (x, Nil) => x
case (x, y :: ys) => (x :+ pf(y)) ++ ys
}
}
val testList = List(('a', 1), ('b', 1), ('c', 1), ('b', 1))
testList.mapFirst {
case ('b', n) => ('b', n + 1)
case ('a', 9) => ('z', 9)
}
// result --> List((a,1), (b,2), (c,1), (b,1))

In Scala, is it possible to zip two lists of differing sizes?

For example suppose I have
val letters = ('a', 'b', 'c', 'd', 'e')
val numbers = (1, 2)
Is it possible to produce a list
(('a',1), ('b',2), ('c',1),('d',2),('e',1))
Your letters and numbers are tuples, not lists. So let's fix that
scala> val letters = List('a', 'b', 'c', 'd', 'e')
letters: List[Char] = List(a, b, c, d, e)
scala> val numbers = List(1,2)
numbers: List[Int] = List(1, 2)
Now, if we zip them we don't get the desired result
scala> letters zip numbers
res11: List[(Char, Int)] = List((a,1), (b,2))
But that suggests that if numbers were repeated infinitely then the problem would be solved
scala> letters zip (Stream continually numbers).flatten
res12: List[(Char, Int)] = List((a,1), (b,2), (c,1), (d,2), (e,1))
Unfortunately, that's based on knowledge that numbers is shorter than letters. So to fix it all up
scala> ((Stream continually letters).flatten zip (Stream continually numbers).flatten take (letters.size max numbers.size)).toList
res13: List[(Char, Int)] = List((a,1), (b,2), (c,1), (d,2), (e,1))
The shorter of the lists needs to be repeated indefinitely. In this case it's obvious that numbers is shorter, but in case you need it to work in general, here is how you can do it:
def zipLongest[T](list1 : List[T], list2 : List[T]) : Seq[(T, T)] =
if (list1.size < list2.size)
Stream.continually(list1).flatten zip list2
else
list1 zip Stream.continually(list2).flatten
val letters = List('a', 'b', 'c', 'd', 'e')
val numbers = List(1, 2)
println(zipLongest(letters, numbers))
You could do a simple one liner, using the map method
val letters = List('a', 'b', 'c', 'd', 'e')
val numbers = List(1, 2)
val longZip1 = letters.zipWithIndex.map( x => (x._1, numbers(x._2 % numbers.length)) )
//or, using a for loop
//for (x <- letters.zipWithIndex) yield (x._1, numbers(x._2 % numbers.size))
And let's consider your lists are way longer:
val letters = List('a', 'b', 'c', 'd', 'e' /* 'f', ...*/)
val numbers = List(1, 2 /* 3, ... */)
val (longest, shortest) = (letters.toArray, numbers.toArray)
val longZip1 = longest
.zipWithIndex
.map(x => (x._1, shortest(x._2 % shortest.length)))
If you do not want to reuse any of the list data however you will need to know what the gaps are to be filled with ahead of time:
val result = (0 to (Math.max(list1.size, list2.size) - 1)) map { index =>
(list1.lift(index).getOrElse(valWhen1Empty),list2.lift(index).getOrElse(valWhen2Empty))
}
I doubt this will work well with infinite lists or streams of course...