Sorting by alphabet List of Lists - scala

I had an exercise where I had to put duplicate elements from List to individual lists. Everything works fine, but my question is how to order it alphabetic now, so the result starts from List(a,a,a,a,a,a), not from List(e,e,e,e)
Tried to use sortBy in the end, but none combination worked for me.
My code:
val list2 = List('a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e')
def sortedSublist(l: List[Char]): List[List[Char]] = {
l.groupBy(identity).map{case (key, values) => values}.toList
}
println(sortedSublist(list2))
Current result is:
List(List(e, e, e, e), List(a, a, a, a, a, a), List(b), List(c, c), List(d))

mck suggested a good answer.
You can also sort the map by key after groupBy
l.groupBy(identity).toList.sortBy(_._1).map(_._2)

Related

list to match another nested list

second list sorted according to the first
[['E', 'E', 'C], ['B', 'C', 'A'], ['E', 'B', 'F'], ['D', 'F', 'E']]
ref_list = ['c','b','a']
# sort [1,2,3] according to ref_list... viz: ['c','a','b'] => [3,1,2]
ordering = sorted(range(len(ref_list)), key=lambda i: ref_list[i])
for j in range(len(list2)):
list2[j] = [list2[j][i] for i in ordering]
Here is some quick code I made, it works but it may need some refactoring:
def getListIndexes(some_list):
return [x for x in enumerate(some_list)]
list1 = [['C', 'B', 'A']]
list2 = [['C', 'E', 'E'], ['C', 'B', 'A'], ['F', 'B', 'E'], ['E', 'F', 'D']]
values_before_1 = getListIndexes(list1[0])
values_after_1 = getListIndexes(list(sorted(list1[0])))
mapping = dict.fromkeys(list(range(len(list1[0]))))
for item in values_before_1:
before, then = [(item[0],x[0]) for i, x in enumerate(values_after_1) if item[1]==x[1]][0]
mapping[then] = before
results = [[None, None, None] for l in range(len(list2))]
for i, item in enumerate(list2):
values = getListIndexes(item)
for j in range(len(item)):
results[i][mapping[j]] = list2[i][j]
print "[*] list1 with content: {} has been sorted and now is like: {}\n".format(list1[0], sorted(list1[0]))
print "[*] list2 with content:\n\n{}\n".format(list2)
print "...has been sorted based on list1 sorting and now looks like this...\n"
print results
Output:
[*] list1 with content: ['C', 'B', 'A'] has been sorted and now is like: ['A', 'B', 'C']
[*] list2 with content:
[['C', 'E', 'E'], ['C', 'B', 'A'], ['F', 'B', 'E'], ['E', 'F', 'D']]
...has been sorted based on list1 sorting and now looks like this...
[['E', 'E', 'C'], ['A', 'B', 'C'], ['E', 'B', 'F'], ['D', 'F', 'E']]

scala intersection with count

I have a simple question, suppose I have 2 RDDs:
RDD1: [a,b,b,c,c,c,d] RDD2:[a,b,c,d]
and I want to find out how many a,b,c,d are there such that the returned results should be something like:
RDD:[(a,b,c,d),(1,2,3,1)]
It can be easily done using Lists, but in RDD, I seem to have to collect them first into Array and do something like:
count(_==string)
is there something easier that I could work with?
I have very Less knowledge about RDD or Spark. but in scala you can try something like this :
val l1 = List('a', 'b', 'c', 'd')
val l2 = List('a', 'b', 'b', 'c', 'c', 'c', 'd')
def f(l1: List[Char], l2: List[Char]):(List[Char],List[Int]) = {
val count = l1.map {
x => l2.count(_ == x)
}.toList
(l1, count)
}
f(l1,l2)
Output at REPL :
res0: (List[Char], List[Int]) = (List(a, b, c, d),List(1, 2, 3, 1))

Scala - byte array of UTF8 strings

I have a byte array (or more precisely a ByteString) of UTF8 strings, which are prefixed by their length as 2-bytes (msb, lsb). For example:
val z = akka.util.ByteString(0, 3, 'A', 'B', 'C', 0, 5,
'D', 'E', 'F', 'G', 'H',0,1,'I')
I would like to convert this to a list of strings, so it should similar to List("ABC", "DEFGH", "I").
Is there an elegant way to do this?
(EDIT) These strings are NOT null terminated, the 0 you are seeing in the array is just the MSB. If the strings were long enough, the MSB would be greater than zero.
Edit: Updated based on clarification in comments that first 2 bytes define an int. So I converted it manually.
def convert(bs: List[Byte]) : List[String] = {
bs match {
case count_b1 :: count_b2 :: t =>
val count = ((count_b1 & 0xff) << 8) | (count_b2 & 0xff)
val (chars, leftover) = t.splitAt(count)
new String(chars.toArray, "UTF-8") :: convert(leftover)
case _ => List()
}
}
Call convert(z.toList)
Consider multiSpan method as defined here which is a repeated application of span over a given list,
z.multiSpan(_ == 0).map( _.drop(2).map(_.toChar).mkString )
Here the spanning condition is whether an item equals 0, then we drop the first two prefixing bytes, and convert the remaining to a String.
Note On using multiSpan, recall to import annotation.tailrec .
Here is my answer with foldLeft.
def convert(z : ByteString) = z.foldLeft((List() : List[String], ByteString(), 0, 0))((p, b : Byte) => {
p._3 match {
case 0 if p._2.nonEmpty => (p._2.utf8String :: p._1, ByteString(), -1, b.toInt)
case 0 => (p._1, p._2, -1, b.toInt)
case -1 => (p._1, p._2, (p._4 << 8) + b.toInt, 0)
case _ => (p._1, p._2 :+ b, p._3 - 1, 0)
}
})
It works like this:
scala> val bs = ByteString(0, 3, 'A', 'B', 'C', 0, 5, 'D', 'E', 'F', 'G', 'H',0,1,'I')
scala> val k = convert(bs); (k._2.utf8String :: k._1).reverse
k: (List[String], akka.util.ByteString, Int, Int) = (List(DEFGH, ABC),ByteString(73),0,0)
res20: List[String] = List(ABC, DEFGH, I)

padTo error inside a foldLeft

I'm learning myself Scala and one of the small test application I wrote just isn't working the way I expect it to. Can someone please help me understand why my test application is failing.
My small test application consists of a "decompress" method that does the following "decompression"
val testList = List(Tuple2(4, 'a'), Tuple2(1, 'b'), Tuple2(2, 'c'), Tuple2(2, 'a'), Tuple2(1, 'd'), Tuple2(4, 'e'))
require(decompress(testList) == List('a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e'))
In other words the Tuple2 objects should just be "decompressed" into a more verbose form. Yet all that I get back from the method is List('a', 'a', 'a', 'a') - the padTo statement works for the first Tuple2 but then it just suddenly stops working? If I however do the padding per element using a for loop - everything works...?
The full code:
object P12 extends App {
def decompress(tList: List[Tuple2[Int,Any]]): List[Any] = {
val startingList: List[Any] = List();
val newList = tList.foldLeft(startingList)((b,a) => {
val padCount = a._1;
val padElement = a._2;
println
println(" Current list: " + b)
println(" Current padCount: " + padCount)
println(" Current padElement: " + padElement)
println(" Padded using padTo: " + b.padTo(padCount, padElement))
println
// This doesn't work
b.padTo(padCount, padElement)
// // This works, yay
// var tmpNewList = b;
// for (i <- 1 to padCount)
// tmpNewList = tmpNewList :+ padElement
// tmpNewList
})
newList
}
val testList = List(Tuple2(4, 'a'), Tuple2(1, 'b'), Tuple2(2, 'c'), Tuple2(2, 'a'), Tuple2(1, 'd'), Tuple2(4, 'e'))
require(decompress(testList) == List('a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e'))
println("Everything is okay!")
}
Any help appreciated - learning Scala, just can't figure out this problem on my own with my current Scala knowledge.
The problem is that padTo actually fills the list up to a given size. So the first time it works with 4 elements padded, but the next time you'll have to add the actual length of the curent list - hence:
def decompress(tList: List[Tuple2[Int,Any]]): List[Any] = {
val newList = tList.foldLeft(List[Any]())((b,a) => {
b.padTo(a._1+b.length, a._2)
})
newList
}
You could do your decompress like this:
val list = List(Tuple2(4, 'a'), Tuple2(1, 'b'), Tuple2(2, 'c'), Tuple2(2, 'a'), Tuple2(1, 'd'), Tuple2(4, 'e'))
list.flatMap{case (times, value) => Seq.fill(times)(value)}
This works:
scala> testList.foldLeft(List[Char]()){ case (xs, (count, elem)) => xs ++ List(elem).padTo(count, elem)}
res7: List[Char] = List(a, a, a, a, b, c, c, a, a, d, e, e, e, e)
The problem actually is that when you say b.padTo(padCount, padElement) you use always the same list (b) to fill up the elements. Because the first tuple data generate the most elements nothing is added in the next step of foldLeft. If you change the second tuple data you will see a change:
scala> val testList = List(Tuple2(3, 'a'), Tuple2(4, 'b'))
testList: List[(Int, Char)] = List((3,a), (4,b))
scala> testList.foldLeft(List[Char]()){ case (xs, (count, elem)) => xs.padTo(count, elem)}
res11: List[Char] = List(a, a, a, b)
Instead of foldLeft you can also use flatMap to generate the elements:
scala> testList flatMap { case (count, elem) => List(elem).padTo(count, elem) }
res8: List[Char] = List(a, a, a, a, b, c, c, a, a, d, e, e, e, e)
By the way, Tuple(3, 'a') can be written (3, 'a') or 3 -> 'a'
Note that padTo doesn't work as expected when you have data with a count of <= 0:
scala> List(0 -> 'a') flatMap { case (count, elem) => List(elem).padTo(count, elem) }
res31: List[Char] = List(a)
Thus use the solution mentioned by Garret Hall:
def decompress[A](xs: Seq[(Int, A)]) =
xs flatMap { case (count, elem) => Seq.fill(count)(elem) }
scala> decompress(List(2 -> 'a', 3 -> 'b', 2 -> 'c', 0 -> 'd'))
res34: Seq[Char] = List(a, a, b, b, b, c, c)
scala> decompress(List(2 -> 0, 3 -> 1, 2 -> 2))
res35: Seq[Int] = List(0, 0, 1, 1, 1, 2, 2)
Using a generic type signature should be referred in order to return always correct type.

In Scala, is it possible to zip two lists of differing sizes?

For example suppose I have
val letters = ('a', 'b', 'c', 'd', 'e')
val numbers = (1, 2)
Is it possible to produce a list
(('a',1), ('b',2), ('c',1),('d',2),('e',1))
Your letters and numbers are tuples, not lists. So let's fix that
scala> val letters = List('a', 'b', 'c', 'd', 'e')
letters: List[Char] = List(a, b, c, d, e)
scala> val numbers = List(1,2)
numbers: List[Int] = List(1, 2)
Now, if we zip them we don't get the desired result
scala> letters zip numbers
res11: List[(Char, Int)] = List((a,1), (b,2))
But that suggests that if numbers were repeated infinitely then the problem would be solved
scala> letters zip (Stream continually numbers).flatten
res12: List[(Char, Int)] = List((a,1), (b,2), (c,1), (d,2), (e,1))
Unfortunately, that's based on knowledge that numbers is shorter than letters. So to fix it all up
scala> ((Stream continually letters).flatten zip (Stream continually numbers).flatten take (letters.size max numbers.size)).toList
res13: List[(Char, Int)] = List((a,1), (b,2), (c,1), (d,2), (e,1))
The shorter of the lists needs to be repeated indefinitely. In this case it's obvious that numbers is shorter, but in case you need it to work in general, here is how you can do it:
def zipLongest[T](list1 : List[T], list2 : List[T]) : Seq[(T, T)] =
if (list1.size < list2.size)
Stream.continually(list1).flatten zip list2
else
list1 zip Stream.continually(list2).flatten
val letters = List('a', 'b', 'c', 'd', 'e')
val numbers = List(1, 2)
println(zipLongest(letters, numbers))
You could do a simple one liner, using the map method
val letters = List('a', 'b', 'c', 'd', 'e')
val numbers = List(1, 2)
val longZip1 = letters.zipWithIndex.map( x => (x._1, numbers(x._2 % numbers.length)) )
//or, using a for loop
//for (x <- letters.zipWithIndex) yield (x._1, numbers(x._2 % numbers.size))
And let's consider your lists are way longer:
val letters = List('a', 'b', 'c', 'd', 'e' /* 'f', ...*/)
val numbers = List(1, 2 /* 3, ... */)
val (longest, shortest) = (letters.toArray, numbers.toArray)
val longZip1 = longest
.zipWithIndex
.map(x => (x._1, shortest(x._2 % shortest.length)))
If you do not want to reuse any of the list data however you will need to know what the gaps are to be filled with ahead of time:
val result = (0 to (Math.max(list1.size, list2.size) - 1)) map { index =>
(list1.lift(index).getOrElse(valWhen1Empty),list2.lift(index).getOrElse(valWhen2Empty))
}
I doubt this will work well with infinite lists or streams of course...