How can I determine if Akka bytestring contains a given substring? - scala

Given a sample text file, how can one use Akka ByteStrings and either convert it to plain text or run a "find" on the ByteString itself?
val file = new File("sample.txt")
val fileSource = SynchronousFileSource(file, 4096)
val messageStream = fileSource.map(chunk => sendMessage(chunk.toString()))
messageStream.to(Sink.foreach(println(_))).run
The "toString()" functionality above literally spits out a string containing the text "ByteString", followed by bytes represented as integers. For example:
chunk.toString() ==> "ByteString(111, 112, 119, 111)"

You can use containsSlice to find sub ByteString.
scala> import akka.util.ByteString;
import akka.util.ByteString
scala> val target = ByteString("hello world");
target: akka.util.ByteString = ByteString(104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100)
scala> val sub = ByteString("world")
sub: akka.util.ByteString = ByteString(119, 111, 114, 108, 100)
scala> target.containsSlice(sub)
res0: Boolean = true
If you want to convert akka.util.ByteString to String, you can use decodeString
scala> ByteString("hello").decodeString("UTF-8")
res3: String = hello
See the doc for more detail: http://doc.akka.io/api/akka/2.3.13/index.html#akka.util.ByteString

Related

What is the error "not found: type Score"?

main.scala:
import java.time.LocalDate
object OptionAnswer {
def main(args: Array[String]): Unit = {
case class Score(
name: String, // 学生の名前
english: Int, // 英語の点数
math: Int, // 数学の点数
science: Int, // 理科の点数
date: LocalDate // 受験日
)
val scoreOfAlice = Score(name = "Alice", english = 77, math = 74, science = 26, date = LocalDate.of(2020, 1, 30))
val scoreOfBob = Score(name = "Bob", english = 100, math = 74, science = 14, date = LocalDate.of(2020, 1, 26))
val scoreOfCharlie = Score(name = "Charlie", english = 100, math = 74, science = 99, date = LocalDate.of(2020, 1, 26))
val scoreOfDave = Score(name = "Dave", english = 50, math = 81, science = 88, date = LocalDate.of(2020, 1, 30))
val scoreSeq: Seq[Score] = List(scoreOfAlice,scoreOfBob,scoreOfCharlie,scoreOfDave)
println(getTotalRanking(scoreSeq))
}
def getTotalRanking(scoreSeq: Seq[Score]): Seq[String] = {
val p: Seq[Score] = scoreSeq.sortBy(score => score.english+score.math+score.science)(Ordering.Int.reverse)
p.map(score=>score.english+score.math+score.science)
}
}
error:
not found: type Score
[error] def getTotalRanking(scoreSeq: Seq[Score]): Seq[String] = {
[error]
not found: type Score
[error] val p: Seq[Score] = scoreSeq.sortBy(score => score.english+score.math+score.science)(Ordering.Int.reverse)
[error] ^
[error] two errors found
How do I resolve this?
Class Score is nested into the method. It's not seen outside. Move it outside.
You can read about scopes in Scala. For example if you define val i inside the method, it (the name or identifier i) will not be seen outside either.

Http Load Testing

I am trying to create a simple microservice that can handle 1 million requests at a time. But I am getting connection reset error on my client side. Correct me in case I am making any mistake.
Server Code
1. Listener :
object Collection {
case class calculate(values:Double)
}
object EngineController{
import Collection._
def main(args: Array[String]): Unit = {
try {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
val requestHandler = system.actorOf(RoundRobinPool(3).props(RequestHandler.props), "round-robin-pool")
val route: Route = {
implicit val timeout = Timeout(100.seconds)
path("aggregate") {
log.info("REQUEST RECEIVED")
post {
entity(as[String]) { values =>
onSuccess(requestHandler ? calculate(values)) {
case result: Double =>
log.info("Response Sent -" + result)
complete(s"${result}")
}
}
}
}
}
val routeBinding :Future[ServerBinding] = Http().bindAndHandle(route, "localhost", 8080)
log.info("Connection Established! Waiting for Request")
routeBinding.failed.foreach { ex =>
log.error(ex, "Failed to bind to {}:{}!", host, port)
}
StdIn.readLine()
routeBinding.flatMap(_.unbind())
system.terminate()
}
catch {
case ex: Exception =>
log.error(ex, ex)
}
}
}
RequestHandler: This actor return max of the number.
object RequestHandler extends App {
def props: Props = {
Props(classOf[RequestHandler])
}
class RequestHandler extends Actor {
var doubleArray: Array[Double] = Array.empty
val system1 = ActorSystem("system2")
var routees: List[ActorRef] = _
override def preStart() = {
routees = List.fill(5)(
context.actorOf(AggregateCalculator.props)
)
}
val aggregateActor = system1.actorOf(AggregateCalculator.props.withRouter(RandomPool(100)), "ag")
//CONVERT STRING TO ARRAY
def stringToArray(values: String): Array[Double] = {
return values.split(",").map(x => x.toDouble)
}
override def receive: Receive = {
case calculate(values) =>
doubleArray = stringToArray(values)
sender() ! doubleArray.max
}
}
}
Client Code :
package Test
import scalaj.http.{Http, HttpResponse}
import org.apache.log4j.Logger
//libraryDependencies += "org.scalaj" % "scalaj-http_2.11" % "2.3.0"
object RequestSender1 {
val log = Logger.getLogger(getClass.getName)
def main(args: Array[String]): Unit = {
try {
val str = "22.78, -1.23, 50, 60, 3, 32, 11, 54, 72, 78, 99, 70, 19, 47, 90, 81, 50, 69, 69, 72, 83, 14.7, 8, 41, 65, 73, 48, 63, 47, 17, 55, 39, 50, 87, 76, 8, 67, 51, 55, 94, 75, 14, 91, 35, 87, 36, 42, 74, 70, 81, 18, 14, 50, 22, 16, 55, 71, 17, 39, 44, 58, 61, 16, 4, 74, 61, 37, 31, 62, 36, 53, 30, 82, 72, 89, 96, 28, 36, 77, 89, 30, 2, 31, 79, 50, 34, 81, 39, 91, 85, 94, 25, 68, 98, 46, 42,14,14"
var result: HttpResponse[String] = null
var counter = 0
for (i <- 1 to 300) {
for (i <- 1 to 13000) {
val thread = new Thread {
override def run {
while (counter < 1000024) {
try {
counter += 1
result = scalaj.http.Http("http://localhost:8080/aggregate").postData(str).timeout(1200000, 120000000) //192.168.0.157:8089
.header("Content-Type", "text/plain").asString
println("Thread Count----" + java.lang.Thread.activeCount())
j += 1
println(result.body + " " + j)
} catch {
case ex:Exception =>
log.error(ex)
}
}
}
}
thread.start
// slow the loop down a bit
}
println("Sent request with --" + i)
Thread.sleep(1)
// slow the loop down a bit
}
}
catch {
case ex:Exception =>
println("Exception"+">>>>>>>>>>")
}
}
}
Error 1 :
09:58:07 ERROR [Thread-12803] - Test.RequestSender1$.run 32 - java.net.BindException: Address already in use: connect
Error 2 :
Thread-12418" java.net.SocketException: Connection reset
Sorry for Bad Allignment
I think this has nothing to do with Scala or HTTP. It is a limitation of the TCP protocol.
Unfortunately you just can't create 1 million outbound TCP connections from a single machine. TCP connection is specified by two pairs (IP address, port) for client and server. The server might have all connections coming to the same port and distinguish them by the client information. But for any typical TCP/IP implementation each outbound connection will have its own unique TCP port assigned to it (and even if you roll out some custom implementation you still can't have two outgoing connections to the same server/port from the same client/port because they will be indistinguishable). TCP port is actually just a 16-bit number. It means there are only about 65k ports altogether which is obviously much less than 1 million you want. So to make 1 million connections you will require many machines (or at least virtual machines) running the test at the same time.

Scala hex string to bytes

Is there a neat way in Scala to convert a hexadecimally encoded String to a protobuf ByteString (and back again)?
You can use (without additional dependencies) DatatypeConverter as:
import com.google.protobuf.ByteString
import javax.xml.bind.DatatypeConverter
val hexString: String = "87C2D268483583714CD5"
val byteString: ByteString = ByteString.copyFrom(
DatatypeConverter.parseHexBinary(hexString)
)
val originalString = DatatypeConverter.printHexBinary(byteString.toByteArray)
You can use java.math.BigInteger to parse a String, get the Array[Byte] and from there turn it into a ByteString. Here would be the first step:
import java.math.BigInteger
val s = "f263575e7b00a977a8e9a37e08b9c215feb9bfb2f992b2b8f11e"
val bs = new BigInteger(s, 16).toByteArray
The content of bs is now:
Array(0, -14, 99, 87, 94, 123, 0, -87, 119, -88, -23, -93, 126, 8, -71, -62, 21, -2, -71, -65, -78, -7, -110, -78, -72, -15, 30)
You can then use (for example) the copyFrom method (JavaDoc here) to turn it into a ByteString.
Starting with Java 17, you can use a standard API for parsing HEX strings to byte array.
import java.util.HexFormat
HexFormat.of.parseHex("d719af")
Since the title of the question doesn't mention Protobuf, if anyone is looking for a solution that doesn't require any dependencies for converting a hex String to Seq[Byte] for any sized array: (don't forget to add input validation as necessarily)
val zeroChar: Byte = '0'.toByte
val aChar: Byte = 'a'.toByte
def toHex(bytes: Seq[Byte]): String = bytes.map(b => f"$b%02x").mkString
def toBytes(hex: String): Seq[Byte] = {
val lowerHex = hex.toLowerCase
val (result: Array[Byte], startOffset: Int) =
if (lowerHex.length % 2 == 1) {
// Odd
val r = new Array[Byte]((lowerHex.length >> 1) + 1)
r(0) = toNum(lowerHex(0))
(r, 1)
} else {
// Even
(new Array[Byte](lowerHex.length >> 1), 0)
}
var inputIndex = startOffset
var outputIndex = startOffset
while (outputIndex < result.length) {
val byteValue = (toNum(lowerHex(inputIndex)) * 16) +
toNum(lowerHex(inputIndex + 1))
result(outputIndex) = byteValue.toByte
inputIndex += 2
outputIndex += 1
}
result
}
def toNum(lowerHexChar: Char): Byte =
(if (lowerHexChar < 'a') lowerHexChar.toByte - zeroChar else 10 +
lowerHexChar.toByte - aChar).toByte
https://scalafiddle.io/sf/PZPHBlT/2
A simple solution without any dependency or intermediate object could be
def toBytes(hex: String): Seq[Byte] = {
assert(hex.length % 2 == 0) // only manage canonical case
hex.sliding(2, 2).map(Integer.parseInt(_, 16).toByte).toSeq
}
assert(toBytes("1234") == Seq[Byte](18,52))
code online

How to correctly generate SHA-256 checksum for a string in scala?

I need to generate an SHA-256 checksum from a string that will be sent as a get param.
If found this link to generate the checksum.
Genrating the checksum like so:
val digest = MessageDigest.getInstance("SHA-256");
private def getCheckSum() = {
println(new String(digest.digest(("Some String").getBytes(StandardCharsets.UTF_8))))
}
prints checksum similar to this:
*║┼¼┬]9AòdJb:#↓o6↓T╞B5C♀¼O~╟╙àÿG
The API that we need to send this to says the checksum should look like this:
45e00158bc8454049b7208e76670466d49a5dfb2db4196
What am I doing wrong?
Please advise.
Thanks.
Equivalent, but a bit more efficient:
MessageDigest.getInstance("SHA-256")
.digest("some string".getBytes("UTF-8"))
.map("%02x".format(_)).mkString
java.security.MessageDigest#digest gives a byte array.
scala> import java.security.MessageDigest
scala> import java.math.BigInteger
scala> MessageDigest.getInstance("SHA-256").digest("some string".getBytes("UTF-8"))
res1: Array[Byte] = Array(97, -48, 52, 71, 49, 2, -41, -38, -61, 5, -112, 39, 112, 71, 31, -43, 15, 76, 91, 38, -10, -125, 26, 86, -35, -112, -75, 24, 75, 60, 48, -4)
To create the hex, use String.format,
scala> val hash = String.format("%032x", new BigInteger(1, MessageDigest.getInstance("SHA-256").digest("some string".getBytes("UTF-8"))))
hash: String = 61d034473102d7dac305902770471fd50f4c5b26f6831a56dd90b5184b3c30fc
You can verify hash with command line tool in linux, unix
$ echo -n "some string" | openssl dgst -sha256
61d034473102d7dac305902770471fd50f4c5b26f6831a56dd90b5184b3c30fc
NOTE:
In case java returns hash of length lesser than 64 chars you can left pad with 0. (eg. 39)
def hash64(data: String) = {
val hash = String.format(
"%032x",
new BigInteger(1, MessageDigest.getInstance("SHA-256").digest(data.getBytes("UTF-8")))
)
val hash64 = hash.reverse.padTo(64, "0").reverse.mkString
hash64
}
Can use DatatypeConverter.printHexBinary.
Something like:
DatatypeConverter.printHexBinary(
MessageDigest
.getInstance(algorithm)
.digest("some string").getBytes("UTF-8")))
Since jdk 17, we can use java.util.HexFormat
import java.security.MessageDigest
import java.util.HexFormat
val bytes = MessageDigest.getInstance("SHA-256")
.digest("any string".getBytes("UTF-8"))
val sha256 = HexFormat.of().formatHex(bytes)
// 1e57a452a094728c291bc42bf2bc7eb8d9fd8844d1369da2bf728588b46c4e75
val another = HexFormat.ofDelimiter(":").withUpperCase().formatHex(bytes)
// 1E:57:A4:52:A0:94:72:8C:29:1B:C4:2B:F2:BC:7E:B8:D9:FD:88:44:D1:36:9D:A2:BF:72:85:88:B4:6C:4E:75

How to evaluate binary key-value?

I am writing a external merge sort for big input files in Binary using Scala.
I generate input using gensort and evaluate output using valsort from this website: http://www.ordinal.com/gensort.html
I will read 100 bytes at a time, first 10 bytes for Key(List[Byte]) and the rest 90 bytes for Value(List[Byte])
After sorting, my output is evaluated by valsort, and it's wrong.
But when I using input in ASCII, my output is right.
So I wonder how to sort binary inputs in the right way?
Valsort said that my first unordered record is 56, here is what I printed out:
50 --> Key(List(-128, -16, 5, -10, -83, 23, -107, -109, 42, -11))
51 --> Key(List(-128, -16, 5, -10, -83, 23, -107, -109, 42, -11))
52 --> Key(List(-128, -10, -10, 68, -94, 37, -103, 30, 90, 16))
53 --> Key(List(-128, -10, -10, 68, -94, 37, -103, 30, 90, 16))
54 --> Key(List(-128, -10, -10, 68, -94, 37, -103, 30, 90, 16))
55 --> Key(List(-128, -10, -10, 68, -94, 37, -103, 30, 90, 16))
56 --> Key(List(-128, 0, -27, -4, -82, -82, 121, -125, -22, 99))
57 --> Key(List(-128, 0, -27, -4, -82, -82, 121, -125, -22, 99))
58 --> Key(List(-128, 0, -27, -4, -82, -82, 121, -125, -22, 99))
59 --> Key(List(-128, 0, -27, -4, -82, -82, 121, -125, -22, 99))
60 --> Key(List(-128, 7, -65, 118, 121, -12, 48, 50, 59, -8))
61 --> Key(List(-128, 7, -65, 118, 121, -12, 48, 50, 59, -8))
62 --> Key(List(-128, 7, -65, 118, 121, -12, 48, 50, 59, -8))
This is my external sorting code:
package externalsorting
import java.io.{BufferedOutputStream, File, FileOutputStream}
import java.nio.channels.FileChannel
import java.util.Calendar
import scala.collection.mutable
import readInput._
import scala.collection.mutable.ListBuffer
/**
* Created by hminle on 12/5/2016.
*/
object ExternalSortingExample extends App{
val dir: String = "C:\\ShareUbuntu\\testMerge"
val listFile: List[File] = Utils.getListOfFiles(dir)
listFile foreach(x => println(x.getName))
var fileChannelsInput: List[(FileChannel, Boolean)] = listFile.map{input => (Utils.getFileChannelFromInput(input), false)}
val tempDir: String = dir + "/tmp/"
val tempDirFile: File = new File(tempDir)
val isSuccessful: Boolean = tempDirFile.mkdir()
if(isSuccessful) println("Create temp dir successfully")
else println("Create temp dir failed")
var fileNameCounter: Int = 0
val chunkSize = 100000
// Split big input files into small chunks
while(!fileChannelsInput.isEmpty){
if(Utils.estimateAvailableMemory() > 400000){
val fileChannel = fileChannelsInput(0)._1
val (chunks, isEndOfFileChannel) = Utils.getChunkKeyAndValueBySize(chunkSize, fileChannel)
if(isEndOfFileChannel){
fileChannel.close()
fileChannelsInput = fileChannelsInput.drop(1)
} else {
val sortedChunk: List[(Key, Value)] = Utils.getSortedChunk(chunks)
val fileName: String = tempDir + "partition-" + fileNameCounter
Utils.writePartition(fileName, sortedChunk)
fileNameCounter += 1
}
} else {
println(Thread.currentThread().getName +"There is not enough available free memory to continue processing" + Utils.estimateAvailableMemory())
}
}
val listTempFile: List[File] = Utils.getListOfFiles(tempDir)
val start = Calendar.getInstance().getTime
val tempFileChannels: List[FileChannel] = listTempFile.map(Utils.getFileChannelFromInput(_))
val binaryFileBuffers: List[BinaryFileBuffer] = tempFileChannels.map(BinaryFileBuffer(_))
binaryFileBuffers foreach(x => println(x.toString))
val pq1: ListBuffer[BinaryFileBuffer] = ListBuffer.empty
binaryFileBuffers.filter(!_.isEmpty()).foreach(pq1.append(_))
val outputDir: String = dir + "/mergedOutput"
val bos = new BufferedOutputStream(new FileOutputStream(outputDir))
// Start merging temporary files
while(pq1.length > 0){
val pq2 = pq1.toList.sortWith(_.head()._1 < _.head()._1)
val buffer: BinaryFileBuffer = pq2.head
val keyVal: (Key, Value) = buffer.pop()
val byteArray: Array[Byte] = Utils.flattenKeyValue(keyVal).toArray[Byte]
Stream.continually(bos.write(byteArray))
if(buffer.isEmpty()){
buffer.close()
pq1 -= buffer
}
count+=1
}
bos.close()
}
This is BinaryFileBuffer.scala --> which is just a wrapper
package externalsorting
import java.nio.channels.FileChannel
import readInput._
/**
* Created by hminle on 12/5/2016.
*/
object BinaryFileBuffer{
def apply(fileChannel: FileChannel): BinaryFileBuffer = {
val buffer: BinaryFileBuffer = new BinaryFileBuffer(fileChannel)
buffer.reload()
buffer
}
}
class BinaryFileBuffer(fileChannel: FileChannel) extends Ordered[BinaryFileBuffer] {
private var cache: Option[(Key, Value)] = _
def isEmpty(): Boolean = cache == None
def head(): (Key, Value) = cache.get
def pop(): (Key, Value) = {
val answer = head()
reload()
answer
}
def reload(): Unit = {
this.cache = Utils.get100BytesKeyAndValue(fileChannel)
}
def close(): Unit = fileChannel.close()
def compare(that: BinaryFileBuffer): Int = {
this.head()._1.compare(that.head()._1)
}
}
This is my Utils.scala:
package externalsorting
import java.io.{BufferedOutputStream, File, FileOutputStream}
import java.nio.ByteBuffer
import java.nio.channels.FileChannel
import java.nio.file.Paths
import readInput._
import scala.annotation.tailrec
import scala.collection.mutable.ListBuffer
/**
* Created by hminle on 12/5/2016.
*/
object Utils {
def getListOfFiles(dir: String): List[File] = {
val d = new File(dir)
if(d.exists() && d.isDirectory){
d.listFiles.filter(_.isFile).toList
} else List[File]()
}
def get100BytesKeyAndValue(fileChannel: FileChannel): Option[(Key, Value)] = {
val size = 100
val buffer = ByteBuffer.allocate(size)
buffer.clear()
val numOfByteRead = fileChannel.read(buffer)
buffer.flip()
if(numOfByteRead != -1){
val data: Array[Byte] = new Array[Byte](numOfByteRead)
buffer.get(data, 0, numOfByteRead)
val (key, value) = data.splitAt(10)
Some(Key(key.toList), Value(value.toList))
} else {
None
}
}
def getFileChannelFromInput(file: File): FileChannel = {
val fileChannel: FileChannel = FileChannel.open(Paths.get(file.getPath))
fileChannel
}
def estimateAvailableMemory(): Long = {
System.gc()
val runtime: Runtime = Runtime.getRuntime
val allocatedMemory: Long = runtime.totalMemory() - runtime.freeMemory()
val presFreeMemory: Long = runtime.maxMemory() - allocatedMemory
presFreeMemory
}
def writePartition(dir: String, keyValue: List[(Key, Value)]): Unit = {
val byteArray: Array[Byte] = flattenKeyValueList(keyValue).toArray[Byte]
val bos = new BufferedOutputStream(new FileOutputStream(dir))
Stream.continually(bos.write(byteArray))
bos.close()
}
def flattenKeyValueList(keyValue: List[(Key,Value)]): List[Byte] = {
keyValue flatten {
case (Key(keys), Value(values)) => keys:::values
}
}
def flattenKeyValue(keyVal: (Key, Value)): List[Byte] = {
keyVal._1.keys:::keyVal._2.values
}
def getChunkKeyAndValueBySize(size: Int, fileChannel: FileChannel): (List[(Key, Value)], Boolean) = {
val oneKeyValueSize = 100
val countMax = size / oneKeyValueSize
var isEndOfFileChannel: Boolean = false
var count = 0
val chunks: ListBuffer[(Key, Value)] = ListBuffer.empty
do{
val keyValue = get100BytesKeyAndValue(fileChannel)
if(keyValue.isDefined) chunks.append(keyValue.get)
isEndOfFileChannel = !keyValue.isDefined
count += 1
}while(!isEndOfFileChannel && count < countMax)
(chunks.toList, isEndOfFileChannel)
}
def getSortedChunk(oneChunk: List[(Key, Value)]): List[(Key, Value)] = {
oneChunk.sortWith((_._1 < _._1))
}
}
How I define Key and Value:
case class Key(keys: List[Byte]) extends Ordered[Key] {
def isEmpty(): Boolean = keys.isEmpty
def compare(that: Key): Int = {
compare_aux(this.keys, that.keys)
}
private def compare_aux(keys1: List[Byte], keys2: List[Byte]): Int = {
(keys1, keys2) match {
case (Nil, Nil) => 0
case (list, Nil) => 1
case (Nil, list) => -1
case (hd1::tl1, hd2::tl2) => {
if(hd1 > hd2) 1
else if(hd1 < hd2) -1
else compare_aux(tl1, tl2)
}
}
}
}
case class Value(values: List[Byte])
I've found the answer. Reading from Binary and ASCII are different.
In what order should the sorted file be?
For binary records (GraySort or MinuteSort), the 10-byte keys should be ordered as arrays of unsigned bytes. The memcmp() library routine can be used for this purpose.
For sorting Binary, I need to convert signed bytes into unsigned bytes.