Scala library to convert numbers (Int, Long, Double) to/from Array[Byte] - scala

As the title says, is there any Scala library that exports functions to convert, preferably fluently, a byte array to an Int, to a Long or to a Double?
I need something compatible with 2.9.1 and FOSS.
If you happen to know exactly what I need and where to find it, a line for SBT and a line for an example will be enough! :)
If there's no such thing as what I'm looking for, the closest thing in Java will also work...

You can use Java NIO's ByteBuffer:
import java.nio.ByteBuffer
ByteBuffer.wrap(Array[Byte](1, 2, 3, 4)).getInt
ByteBuffer.wrap(Array[Byte](1, 2, 3, 4, 5, 6, 7, 8)).getDouble
ByteBuffer.wrap(Array[Byte](1, 2, 3, 4, 5, 6, 7, 8)).getLong
No extra dependencies required.

You can also use BigInt from the scala standard library.
import scala.math.BigInt
val bytearray = BigInt(1337).toByteArray
val int = BigInt(bytearray)

Java's nio.ByteBuffer is the way to go for now:
val bb = java.nio.ByteBuffer.allocate(4)
val i = 5
bb.putInt(i)
bb.flip // now can read instead of writing
val j = bb.getInt
bb.clear // ready to go again
You can also put arrays of bytes, etc.
Keep in mind the little/big-endian thing. bb.order(java.nio.ByteOrder.nativeOrder) is probably what you want.

For Double <-> ByteArray, you can use java.lang.Double.doubleToLongBits and java.lang.Double.longBitsToDouble.
import java.lang.Double
def doubleToByteArray(x: Double) = {
val l = java.lang.Double.doubleToLongBits(x)
val a = Array.fill(8)(0.toByte)
for (i <- 0 to 7) a(i) = ((l >> ((7 - i) * 8)) & 0xff).toByte
a
}
def byteArrayToDouble(x: Array[scala.Byte]) = {
var i = 0
var res = 0.toLong
for (i <- 0 to 7) {
res += ((x(i) & 0xff).toLong << ((7 - i) * 8))
}
java.lang.Double.longBitsToDouble(res)
}
scala> val x = doubleToByteArray(12.34)
x: Array[Byte] = Array(64, 40, -82, 20, 122, -31, 71, -82)
scala> val y = byteArrayToDouble(x)
y: Double = 12.34
Or ByteBuffer can be used:
import java.nio.ByteBuffer
def doubleToByteArray(x: Double) = {
val l = java.lang.Double.doubleToLongBits(x)
ByteBuffer.allocate(8).putLong(l).array()
}
def byteArrayToDouble(x:Array[Byte]) = ByteBuffer.wrap(x).getDouble

The following worked for me using Scala:
import org.apache.kudu.client.Bytes
Bytes.getFloat(valueToConvert)

You can also use:
Bytes.toInt(byteArray)
Worked like a charm!

Related

How to convert an Array of Array containing Integer to a Scala Spark List/Seq?

I am new to Scala spark. I have a string of array-like "[[1,2,100], [1, 2, 111]]" I don't how to convert that in Scala List or Sequence. I could not found a solution to it.
I tried to use circe parse method but it did not help me out.
val e = parse(json_string).getOrElse(Json.Null)
e.asArray.foreach(l => {
println(l)
})
val r = "\\[(:?\\d+,? *)+\\]".r
r.findAllMatchIn(s).map { m =>
val s = m.toString
s.substring(1, s.length - 1).split(", *").map(_.toInt)
}.toArray
For your example, it produces:
res26: Array[Array[Int]] = Array(Array(1, 2, 100), Array(1, 2, 111))
Not sure what you want to do with the result after extracting it

Scala hex string to bytes

Is there a neat way in Scala to convert a hexadecimally encoded String to a protobuf ByteString (and back again)?
You can use (without additional dependencies) DatatypeConverter as:
import com.google.protobuf.ByteString
import javax.xml.bind.DatatypeConverter
val hexString: String = "87C2D268483583714CD5"
val byteString: ByteString = ByteString.copyFrom(
DatatypeConverter.parseHexBinary(hexString)
)
val originalString = DatatypeConverter.printHexBinary(byteString.toByteArray)
You can use java.math.BigInteger to parse a String, get the Array[Byte] and from there turn it into a ByteString. Here would be the first step:
import java.math.BigInteger
val s = "f263575e7b00a977a8e9a37e08b9c215feb9bfb2f992b2b8f11e"
val bs = new BigInteger(s, 16).toByteArray
The content of bs is now:
Array(0, -14, 99, 87, 94, 123, 0, -87, 119, -88, -23, -93, 126, 8, -71, -62, 21, -2, -71, -65, -78, -7, -110, -78, -72, -15, 30)
You can then use (for example) the copyFrom method (JavaDoc here) to turn it into a ByteString.
Starting with Java 17, you can use a standard API for parsing HEX strings to byte array.
import java.util.HexFormat
HexFormat.of.parseHex("d719af")
Since the title of the question doesn't mention Protobuf, if anyone is looking for a solution that doesn't require any dependencies for converting a hex String to Seq[Byte] for any sized array: (don't forget to add input validation as necessarily)
val zeroChar: Byte = '0'.toByte
val aChar: Byte = 'a'.toByte
def toHex(bytes: Seq[Byte]): String = bytes.map(b => f"$b%02x").mkString
def toBytes(hex: String): Seq[Byte] = {
val lowerHex = hex.toLowerCase
val (result: Array[Byte], startOffset: Int) =
if (lowerHex.length % 2 == 1) {
// Odd
val r = new Array[Byte]((lowerHex.length >> 1) + 1)
r(0) = toNum(lowerHex(0))
(r, 1)
} else {
// Even
(new Array[Byte](lowerHex.length >> 1), 0)
}
var inputIndex = startOffset
var outputIndex = startOffset
while (outputIndex < result.length) {
val byteValue = (toNum(lowerHex(inputIndex)) * 16) +
toNum(lowerHex(inputIndex + 1))
result(outputIndex) = byteValue.toByte
inputIndex += 2
outputIndex += 1
}
result
}
def toNum(lowerHexChar: Char): Byte =
(if (lowerHexChar < 'a') lowerHexChar.toByte - zeroChar else 10 +
lowerHexChar.toByte - aChar).toByte
https://scalafiddle.io/sf/PZPHBlT/2
A simple solution without any dependency or intermediate object could be
def toBytes(hex: String): Seq[Byte] = {
assert(hex.length % 2 == 0) // only manage canonical case
hex.sliding(2, 2).map(Integer.parseInt(_, 16).toByte).toSeq
}
assert(toBytes("1234") == Seq[Byte](18,52))
code online

How to count the number of occurences of an element with scala/spark?

I had a file that contained a list of elements like this
00|905000|20160125204123|79644809999||HGMTC|1||22|7905000|56321647569|||34110|I||||||250995210056537|354805064211510||56191|||38704||A|||11|V|81079681404134|5||||SE|||G|144|||||||||||||||Y|b00534589.huawei_anadyr.20151231184912||1|||||79681404134|0|||+##+1{79098509982}2{2}3{2}5{79644809999}6{0000002A7A5AC635}7{79681404134}|20160125|
Through a series of steps, I managed to convert it to a list of elements like this
(902996760100000,CompactBuffer(6, 5, 2, 2, 8, 6, 5, 3))
Where 905000 and 902996760100000 are keys and 6, 5, 2, 2, 8, 6, 5, 3 are values. Values can be numbers from 1 to 8. Are there any ways to count number of occurences of these values using spark, so the result looks like this?
(902996760100000, 0_1, 2_2, 1_3, 0_4, 2_5, 2_6, 0_7, 1_8)
I could do it with if else blocks and staff, but that won't be pretty, so I wondered if there are any instrumets I could use in scala/spark.
This is my code.
class ScalaJob(sc: SparkContext) {
def run(cdrPath: String) : RDD[(String, Iterable[String])] = {
//pass the file
val fileCdr = sc.textFile(cdrPath);
//find values in every raw cdr
val valuesCdr = fileCdr.map{
dataRaw =>
val p = dataRaw.split("[|]",-1)
(p(1), ScalaJob.processType(ScalaJob.processTime(p(2)) + "_" + p(32)))
}
val x = valuesCdr.groupByKey()
return x
}
Any advice on optimizing it would be appreciated. I'm really new to scala/spark.
First, Scala is a type-safe language and so is Spark's RDD API - so it's highly recommended to use the type system instead of going around it by "encoding" everything into Strings.
So I'll suggest a solution that creates an RDD[(String, Seq[(Int, Int)])] (with second item in tuple being a sequence of (ID, count) tuples) and not a RDD[(String, Iterable[String])] which seems less useful.
Here's a simple function that counts the occurrences of 1 to 8 in a given Iterable[Int]:
def countValues(l: Iterable[Int]): Seq[(Int, Int)] = {
(1 to 8).map(i => (i, l.count(_ == i)))
}
You can use mapValues with this function (place the function in the object for serializability, like you did with the rest) on an RDD[(String, Iterable[Int])] to get the result:
valuesCdr.groupByKey().mapValues(ScalaJob.countValues)
The entire solution can then be simplified a bit:
class ScalaJob(sc: SparkContext) {
import ScalaJob._
def run(cdrPath: String): RDD[(String, Seq[(Int, Int)])] = {
val valuesCdr = sc.textFile(cdrPath)
.map(_.split("\\|"))
.map(p => (p(1), processType(processTime(p(2)), p(32))))
valuesCdr.groupByKey().mapValues(countValues)
}
}
object ScalaJob {
val dayParts = Map((6 to 11) -> 1, (12 to 18) -> 2, (19 to 23) -> 3, (0 to 5) -> 4)
def processTime(s: String): Int = {
val hour = DateTime.parse(s, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getHourOfDay
dayParts.filterKeys(_.contains(hour)).values.head
}
def processType(dayPart: Int, s: String): Int = s match {
case "S" => 2 * dayPart - 1
case "V" => 2 * dayPart
}
def countValues(l: Iterable[Int]): Seq[(Int, Int)] = {
(1 to 8).map(i => (i, l.count(_ == i)))
}
}

Reading in a .csv file to scala while preserving some of the structure

I have just started Scala and came from Python.
I would like to read in a '|' delimited file and preserve the structure of the tables. Say I have a file that contains something like this:
1|2|3|4
5|6|7|8
9|10|11|12
I would like a function that would return a structure like this:
List(List(1, 2, 3, 4), List(5, 6, 7, 8), List(9, 10, 11, 12))
My code thus far (doesn't work because of type mismatch):
import scala.io.Source
def CSVReader(absPath:String, delimiter:String): List[List[Any]] = {
println("Now reading... " + absPath)
val MasterList = Source.fromFile(absPath).getLines().toList
return MasterList
}
var ALHCorpus = "//Users//grant//devel//Scala-codes//ALHCorpusList"
var delimiter = "|"
var CSVContents = CSVReader(ALHCorpus, delimiter)
I would just use a CSV library for this sort of thing. When I had to do something similar, I used scala-csv.
If you do not want to do that though, couldn't you simply split by your delimiter? I.e.,
import scala.io.Source
def CSVReader(absPath:String, delimiter:String): List[List[Any]] = {
println("Now reading... " + absPath)
val MasterList = Source.fromFile(absPath).getLines().toList map {
// String#split() takes a regex, thus escaping.
_.split("""\""" + delimiter).toList
}
return MasterList
}
var ALHCorpus = "//Users//grant//devel//Scala-codes//ALHCorpusList"
var delimiter = "|" // I changed your delimiter to pipe since that's what's in your sample data.
var CSVContents = CSVReader(ALHCorpus, delimiter)
To start with I would try to let the type be inferred by not specifying a return type. Once you get the proper results then start constraining the return type and adjusting what CSVContents returns accordingly. This will fix the type error.
def CSVReader(absPath:String, delimiter:String) = { ...}
CSVContents then returns this:
scala> CSVContents
res0: List[String] = List(1|2|3|4, 5|6|7|8, 9|10|11|12)
One way to go from res0 to List[List[Any]] is with a regular expression to greedily extract digits. The regular expression for this is simply "\\d+".r in Scala.
val digitRegex = "\\d+".r
var CSVContents = CSVReader(ALHCorpus, delimiter).map(x => digitRegex.findAllIn(x).toList)
Now CSVContents becomes this:
CSVContents: List[List[String]] = List(List(1, 2, 3, 4), List(5, 6, 7, 8), List(9, 10, 11, 12))
Assuming a Seq of tuples would be acceptable (and looking at your comments this is what you probably want) you can do this with product-collections. product-collections uses opencsv internally.
scala> CsvParser[Int,Int,Int,Int].parseFile("x", delimiter="|")
res2: org.catch22.collections.immutable.CollSeq4[Int,Int,Int,Int] =
CollSeq((1,2,3,4),
(5,6,7,8),
(9,10,11,12))

How does Scala's mutable Map update [map(key) = newValue] syntax work?

I'm working through Cay Horstmann's Scala for the Impatient book where I came across this way of updating a mutable map.
scala> val scores = scala.collection.mutable.Map("Alice" -> 10, "Bob" -> 3, "Cindy" -> 8)
scores: scala.collection.mutable.Map[String,Int] = Map(Bob -> 3, Alice -> 10, Cindy -> 8)
scala> scores("Alice") // retrieve the value of type Int
res2: Int = 10
scala> scores("Alice") = 5 // Update the Alice value to 5
scala> scores("Alice")
res4: Int = 5
It looks like scores("Alice") hits apply in MapLike.scala. But this only returns the value, not something that can be updated.
Out of curiosity I tried the same syntax on an immutable map and was presented with the following error,
scala> val immutableScores = Map("Alice" -> 10, "Bob" -> 3, "Cindy" -> 8)
immutableScores: scala.collection.immutable.Map[String,Int] = Map(Alice -> 10, Bob -> 3, Cindy -> 8)
scala> immutableScores("Alice") = 5
<console>:9: error: value update is not a member of scala.collection.immutable.Map[String,Int]
immutableScores("Alice") = 5
^
Based on that, I'm assuming that scores("Alice") = 5 is transformed into scores update ("Alice", 5) but I have no idea how it works, or how it is even possible.
How does it work?
This is an example of the apply, update syntax.
When you call map("Something") this calls map.apply("Something") which in turn calls get.
When you call map("Something") = "SomethingElse" this calls map.update("Something", "SomethingElse") which in turn calls put.
Take a look at this for a fuller explanation.
Can you try this: => to update list of Map
import java.util.concurrent.ConcurrentHashMap
import scala.collection.JavaConverters._
import scala.collection.concurrent
val map: concurrent.Map[String, List[String]] = new ConcurrentHashMap[String, List[String]].asScala
def updateMap(key: String, map: concurrent.Map[String, List[String]], value: String): Unit = {
map.get(key) match {
case Some(list: List[String]) => {
val new_list = value :: list
map.put(key, new_list)
}
case None => map += (key -> List(value))
}
}
The problem is you're trying to update immutable map. I had the same error message when my map was declared as
var m = new java.util.HashMap[String, Int]
But when i replaced the definition by
var m = new scala.collection.mutable.HashMap[String, Int]
the m.update worked.