Scala - byte array of UTF8 strings - scala

I have a byte array (or more precisely a ByteString) of UTF8 strings, which are prefixed by their length as 2-bytes (msb, lsb). For example:
val z = akka.util.ByteString(0, 3, 'A', 'B', 'C', 0, 5,
'D', 'E', 'F', 'G', 'H',0,1,'I')
I would like to convert this to a list of strings, so it should similar to List("ABC", "DEFGH", "I").
Is there an elegant way to do this?
(EDIT) These strings are NOT null terminated, the 0 you are seeing in the array is just the MSB. If the strings were long enough, the MSB would be greater than zero.

Edit: Updated based on clarification in comments that first 2 bytes define an int. So I converted it manually.
def convert(bs: List[Byte]) : List[String] = {
bs match {
case count_b1 :: count_b2 :: t =>
val count = ((count_b1 & 0xff) << 8) | (count_b2 & 0xff)
val (chars, leftover) = t.splitAt(count)
new String(chars.toArray, "UTF-8") :: convert(leftover)
case _ => List()
}
}
Call convert(z.toList)

Consider multiSpan method as defined here which is a repeated application of span over a given list,
z.multiSpan(_ == 0).map( _.drop(2).map(_.toChar).mkString )
Here the spanning condition is whether an item equals 0, then we drop the first two prefixing bytes, and convert the remaining to a String.
Note On using multiSpan, recall to import annotation.tailrec .

Here is my answer with foldLeft.
def convert(z : ByteString) = z.foldLeft((List() : List[String], ByteString(), 0, 0))((p, b : Byte) => {
p._3 match {
case 0 if p._2.nonEmpty => (p._2.utf8String :: p._1, ByteString(), -1, b.toInt)
case 0 => (p._1, p._2, -1, b.toInt)
case -1 => (p._1, p._2, (p._4 << 8) + b.toInt, 0)
case _ => (p._1, p._2 :+ b, p._3 - 1, 0)
}
})
It works like this:
scala> val bs = ByteString(0, 3, 'A', 'B', 'C', 0, 5, 'D', 'E', 'F', 'G', 'H',0,1,'I')
scala> val k = convert(bs); (k._2.utf8String :: k._1).reverse
k: (List[String], akka.util.ByteString, Int, Int) = (List(DEFGH, ABC),ByteString(73),0,0)
res20: List[String] = List(ABC, DEFGH, I)

Related

Reduce sequence by parts

I have a sequence Seq[T] and I want to do partial reduce. For example for a Seq[Int] I want to get Seq[Int] consisting of the longest partial sums of monotonic regions. For example:
val s = Seq(1, 2, 4, 3, 2, -1, 0, 6, 8)
groupMonotionic(s) = Seq(1 + 2 + 4, 3 + 2 + (-1), 0 + 6 + 8)
I was looking for some method like conditional fold with the signature fold(z: B)((B, T) => B, (T, T) => Boolean) where the predicate states for where to terminate current sum aggregation, but it seems there is no something like that in the subtrait hierarchy of Seq.
What would be a solution using Scala Collection API and without using mutable variables?
Here is one way amongst many to do this (using Scala 2.13's List#unfold):
// val items = Seq(1, 2, 4, 3, 2, -1, 0, 6, 8)
items match {
case first :: _ :: _ => // If there are more than 2 items
List
.unfold(items.sliding(2).toList) { // We slid items to work on pairs of consecutive items
case Nil => // No more items to unfold
None // None signifies the end of the unfold
case rest # Seq(a, b) :: _ => // We span based on the sign of a-b
Some(rest.span(x => (x.head - x.last).signum == (a-b).signum))
}
.map(_.map(_.last)) // back from slided pairs
match { case head :: rest => (first :: head) :: rest }
case _ => // If there is 0 or 1 item
items.map(List(_))
}
// List(List(1, 2, 4), List(3, 2, -1), List(0, 6, 8))
List.unfold iterates as long as the unfolding function provides Some. It starts with an initial state which is the list of items to unfold. At each iteration, we span the state (remaining elements to unfold) based on the sign of the heading two elements difference. The unfolded elements are heading items sharing the same monotony and the unfolding state becomes the other remaining elements.
List#span splits a list into a tuple whose first part contains elements matching the predicate applied until the predicate stops being valid. The second part of the tuple contains the rest of the elements. Which fits perfectly the expected return type of List.unfold's unfolding function, which is Option[(A, S)] (In this case Option[(List[Int], List[Int])]).
Int.signum returns -1, 0 or 1 depending on the sign of the integer it's applied on.
Note that the first item has to be put back in the result as it hasn't an ancestor determining its signum (match { case head :: rest => (first :: head) :: rest }).
To apply the reducing function (in this case a sum), we can map the final result: .map(_.sum)
Works in Scala 2.13+ with cats
import scala.util.chaining._
import cats.data._
import cats.implicits._
val s = List(1, 2, 4, 3, 2, -1, 0, 6, 8)
def isLocalExtrema(a: List[Int]) =
a.max == a(1) || a.min == a(1)
implicit class ListOps[T](ls: List[T]) {
def multiSpanUntil(f: T => Boolean): List[List[T]] = ls.span(f) match {
case (h, Nil) => List(h)
case (h, t) => (h ::: t.take(1)) :: t.tail.multiSpanUntil(f)
}
}
def groupMonotionic(groups: List[Int]) = groups match {
case Nil => Nil
case x if x.length < 3 => List(groups.sum)
case _ =>
groups
.sliding(3).toList
.map(isLocalExtrema)
.pipe(false :: _ ::: List(false))
.zip(groups)
.multiSpanUntil(!_._1)
.pipe(Nested.apply)
.map(_._2)
.value
.map(_.sum)
}
println(groupMonotionic(s))
//List(7, 4, 14)
Here's one way using foldLeft to traverse the numeric list with a Tuple3 accumulator (listOfLists, prevElem, prevTrend) that stores the previous element and previous trend to conditionally assemble a list of lists in the current iteration:
val list = List(1, 2, 4, 3, 2, -1, 0, 6, 8)
val isUpward = (a: Int, b: Int) => a < b
val initTrend = isUpward(list.head, list.tail.head)
val monotonicLists = list.foldLeft( (List[List[Int]](), list.head, initTrend) ){
case ((lol, prev, prevTrend), curr) =>
val currTrend = isUpward(curr, prev)
if (currTrend == prevTrend)
((curr :: lol.head) :: lol.tail , curr, currTrend)
else
(List(curr) :: lol , curr, currTrend)
}._1.reverse.map(_.reverse)
// monotonicLists: List[List[Int]] = List(List(1, 2, 4), List(3, 2, -1), List(0, 6, 8))
To sum the individual nested lists:
monotonicLists.map(_.sum)
// res1: List[Int] = List(7, 4, 14)

Scala: concise syntax for constructing `Map` that assigns int-values to characters?

I want to define a Map that assigns values to letters like so:
'A', 'B', 'C' should be assigned value 1
'D', 'E', 'F' should be assigned value 2
etc.
Here is what I tried:
def lettersAndValues = Map(
1 -> Set('A', 'B', 'C'),
2 -> Set('D', 'E', 'F'),
).flatMap {case (value, letters) => letters.map(letter =>(letter, value))}
Now I want to use the values of the letters to compute a score for words, for instance calculating the value of "ABCD" should give 1+1+1+2 = 5. How can I define the score function? Are there other more concise ways to assign values to letters for calculations?
If your goal is to quickly define values of many letters, and then define a score function, here is a shorter way to do this:
val letterToValue = List(
"ABC" -> 1,
"DEF" -> 2
).flatMap{
case (letters, value) => letters.map(letter => (letter, value))
}.toMap
def score(word: String) = word.map(letterToValue).sum
println(score("BED"))
println(score("BAD"))
println(score("CAFEBABE"))
It prints:
5
4
11
A for-comprehension approach:
val generatorMap = Map(
"ABC" -> 1,
"DEF" -> 2
)
val letterToValue: Map[Char, Int] = for {
(ls, v) <- generatorMap
l <- ls
} yield {
(l, v)
}
def score(word: String) = word.map(letterToValue).sum

scala intersection with count

I have a simple question, suppose I have 2 RDDs:
RDD1: [a,b,b,c,c,c,d] RDD2:[a,b,c,d]
and I want to find out how many a,b,c,d are there such that the returned results should be something like:
RDD:[(a,b,c,d),(1,2,3,1)]
It can be easily done using Lists, but in RDD, I seem to have to collect them first into Array and do something like:
count(_==string)
is there something easier that I could work with?
I have very Less knowledge about RDD or Spark. but in scala you can try something like this :
val l1 = List('a', 'b', 'c', 'd')
val l2 = List('a', 'b', 'b', 'c', 'c', 'c', 'd')
def f(l1: List[Char], l2: List[Char]):(List[Char],List[Int]) = {
val count = l1.map {
x => l2.count(_ == x)
}.toList
(l1, count)
}
f(l1,l2)
Output at REPL :
res0: (List[Char], List[Int]) = (List(a, b, c, d),List(1, 2, 3, 1))

padTo error inside a foldLeft

I'm learning myself Scala and one of the small test application I wrote just isn't working the way I expect it to. Can someone please help me understand why my test application is failing.
My small test application consists of a "decompress" method that does the following "decompression"
val testList = List(Tuple2(4, 'a'), Tuple2(1, 'b'), Tuple2(2, 'c'), Tuple2(2, 'a'), Tuple2(1, 'd'), Tuple2(4, 'e'))
require(decompress(testList) == List('a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e'))
In other words the Tuple2 objects should just be "decompressed" into a more verbose form. Yet all that I get back from the method is List('a', 'a', 'a', 'a') - the padTo statement works for the first Tuple2 but then it just suddenly stops working? If I however do the padding per element using a for loop - everything works...?
The full code:
object P12 extends App {
def decompress(tList: List[Tuple2[Int,Any]]): List[Any] = {
val startingList: List[Any] = List();
val newList = tList.foldLeft(startingList)((b,a) => {
val padCount = a._1;
val padElement = a._2;
println
println(" Current list: " + b)
println(" Current padCount: " + padCount)
println(" Current padElement: " + padElement)
println(" Padded using padTo: " + b.padTo(padCount, padElement))
println
// This doesn't work
b.padTo(padCount, padElement)
// // This works, yay
// var tmpNewList = b;
// for (i <- 1 to padCount)
// tmpNewList = tmpNewList :+ padElement
// tmpNewList
})
newList
}
val testList = List(Tuple2(4, 'a'), Tuple2(1, 'b'), Tuple2(2, 'c'), Tuple2(2, 'a'), Tuple2(1, 'd'), Tuple2(4, 'e'))
require(decompress(testList) == List('a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e'))
println("Everything is okay!")
}
Any help appreciated - learning Scala, just can't figure out this problem on my own with my current Scala knowledge.
The problem is that padTo actually fills the list up to a given size. So the first time it works with 4 elements padded, but the next time you'll have to add the actual length of the curent list - hence:
def decompress(tList: List[Tuple2[Int,Any]]): List[Any] = {
val newList = tList.foldLeft(List[Any]())((b,a) => {
b.padTo(a._1+b.length, a._2)
})
newList
}
You could do your decompress like this:
val list = List(Tuple2(4, 'a'), Tuple2(1, 'b'), Tuple2(2, 'c'), Tuple2(2, 'a'), Tuple2(1, 'd'), Tuple2(4, 'e'))
list.flatMap{case (times, value) => Seq.fill(times)(value)}
This works:
scala> testList.foldLeft(List[Char]()){ case (xs, (count, elem)) => xs ++ List(elem).padTo(count, elem)}
res7: List[Char] = List(a, a, a, a, b, c, c, a, a, d, e, e, e, e)
The problem actually is that when you say b.padTo(padCount, padElement) you use always the same list (b) to fill up the elements. Because the first tuple data generate the most elements nothing is added in the next step of foldLeft. If you change the second tuple data you will see a change:
scala> val testList = List(Tuple2(3, 'a'), Tuple2(4, 'b'))
testList: List[(Int, Char)] = List((3,a), (4,b))
scala> testList.foldLeft(List[Char]()){ case (xs, (count, elem)) => xs.padTo(count, elem)}
res11: List[Char] = List(a, a, a, b)
Instead of foldLeft you can also use flatMap to generate the elements:
scala> testList flatMap { case (count, elem) => List(elem).padTo(count, elem) }
res8: List[Char] = List(a, a, a, a, b, c, c, a, a, d, e, e, e, e)
By the way, Tuple(3, 'a') can be written (3, 'a') or 3 -> 'a'
Note that padTo doesn't work as expected when you have data with a count of <= 0:
scala> List(0 -> 'a') flatMap { case (count, elem) => List(elem).padTo(count, elem) }
res31: List[Char] = List(a)
Thus use the solution mentioned by Garret Hall:
def decompress[A](xs: Seq[(Int, A)]) =
xs flatMap { case (count, elem) => Seq.fill(count)(elem) }
scala> decompress(List(2 -> 'a', 3 -> 'b', 2 -> 'c', 0 -> 'd'))
res34: Seq[Char] = List(a, a, b, b, b, c, c)
scala> decompress(List(2 -> 0, 3 -> 1, 2 -> 2))
res35: Seq[Int] = List(0, 0, 1, 1, 1, 2, 2)
Using a generic type signature should be referred in order to return always correct type.

In Scala, is it possible to zip two lists of differing sizes?

For example suppose I have
val letters = ('a', 'b', 'c', 'd', 'e')
val numbers = (1, 2)
Is it possible to produce a list
(('a',1), ('b',2), ('c',1),('d',2),('e',1))
Your letters and numbers are tuples, not lists. So let's fix that
scala> val letters = List('a', 'b', 'c', 'd', 'e')
letters: List[Char] = List(a, b, c, d, e)
scala> val numbers = List(1,2)
numbers: List[Int] = List(1, 2)
Now, if we zip them we don't get the desired result
scala> letters zip numbers
res11: List[(Char, Int)] = List((a,1), (b,2))
But that suggests that if numbers were repeated infinitely then the problem would be solved
scala> letters zip (Stream continually numbers).flatten
res12: List[(Char, Int)] = List((a,1), (b,2), (c,1), (d,2), (e,1))
Unfortunately, that's based on knowledge that numbers is shorter than letters. So to fix it all up
scala> ((Stream continually letters).flatten zip (Stream continually numbers).flatten take (letters.size max numbers.size)).toList
res13: List[(Char, Int)] = List((a,1), (b,2), (c,1), (d,2), (e,1))
The shorter of the lists needs to be repeated indefinitely. In this case it's obvious that numbers is shorter, but in case you need it to work in general, here is how you can do it:
def zipLongest[T](list1 : List[T], list2 : List[T]) : Seq[(T, T)] =
if (list1.size < list2.size)
Stream.continually(list1).flatten zip list2
else
list1 zip Stream.continually(list2).flatten
val letters = List('a', 'b', 'c', 'd', 'e')
val numbers = List(1, 2)
println(zipLongest(letters, numbers))
You could do a simple one liner, using the map method
val letters = List('a', 'b', 'c', 'd', 'e')
val numbers = List(1, 2)
val longZip1 = letters.zipWithIndex.map( x => (x._1, numbers(x._2 % numbers.length)) )
//or, using a for loop
//for (x <- letters.zipWithIndex) yield (x._1, numbers(x._2 % numbers.size))
And let's consider your lists are way longer:
val letters = List('a', 'b', 'c', 'd', 'e' /* 'f', ...*/)
val numbers = List(1, 2 /* 3, ... */)
val (longest, shortest) = (letters.toArray, numbers.toArray)
val longZip1 = longest
.zipWithIndex
.map(x => (x._1, shortest(x._2 % shortest.length)))
If you do not want to reuse any of the list data however you will need to know what the gaps are to be filled with ahead of time:
val result = (0 to (Math.max(list1.size, list2.size) - 1)) map { index =>
(list1.lift(index).getOrElse(valWhen1Empty),list2.lift(index).getOrElse(valWhen2Empty))
}
I doubt this will work well with infinite lists or streams of course...