scala priority queue not ordering properly? - scala

I'm seeing some strange behavior with Scala's collection.mutable.PriorityQueue. I'm performing an external sort and testing it with 1M records. Each time I run the test and verify the results between 10-20 records are not sorted properly. I replace the scala PriorityQueue implementation with a java.util.PriorityQueue and it works 100% of the time. Any ideas?
Here's the code (sorry it's a bit long...). I test it using the tools gensort -a 1000000 and valsort from http://sortbenchmark.org/
def externalSort(inFileName: String, outFileName: String)
(implicit ord: Ordering[String]): Int = {
val MaxTempFiles = 1024
val TempBufferSize = 4096
val inFile = new java.io.File(inFileName)
/** Partitions input file and sorts each partition */
def partitionAndSort()(implicit ord: Ordering[String]):
List[java.io.File] = {
/** Gets block size to use */
def getBlockSize: Long = {
var blockSize = inFile.length / MaxTempFiles
val freeMem = Runtime.getRuntime().freeMemory()
if (blockSize < freeMem / 2)
blockSize = freeMem / 2
else if (blockSize >= freeMem)
System.err.println("Not enough free memory to use external sort.")
blockSize
}
/** Sorts and writes data to temp files */
def writeSorted(buf: List[String]): java.io.File = {
// Create new temp buffer
val tmp = java.io.File.createTempFile("external", "sort")
tmp.deleteOnExit()
// Sort buffer and write it out to tmp file
val out = new java.io.PrintWriter(tmp)
try {
for (l <- buf.sorted) {
out.println(l)
}
} finally {
out.close()
}
tmp
}
val blockSize = getBlockSize
var tmpFiles = List[java.io.File]()
var buf = List[String]()
var currentSize = 0
// Read input and divide into blocks
for (line <- io.Source.fromFile(inFile).getLines()) {
if (currentSize > blockSize) {
tmpFiles ::= writeSorted(buf)
buf = List[String]()
currentSize = 0
}
buf ::= line
currentSize += line.length() * 2 // 2 bytes per char
}
if (currentSize > 0) tmpFiles ::= writeSorted(buf)
tmpFiles
}
/** Merges results of sorted partitions into one output file */
def mergeSortedFiles(fs: List[java.io.File])
(implicit ord: Ordering[String]): Int = {
/** Temp file buffer for reading lines */
class TempFileBuffer(val file: java.io.File) {
private val in = new java.io.BufferedReader(
new java.io.FileReader(file), TempBufferSize)
private var curLine: String = ""
readNextLine() // prep first value
def currentLine = curLine
def isEmpty = curLine == null
def readNextLine() {
if (curLine == null) return
try {
curLine = in.readLine()
} catch {
case _: java.io.EOFException => curLine = null
}
if (curLine == null) in.close()
}
override protected def finalize() {
try {
in.close()
} finally {
super.finalize()
}
}
}
val wrappedOrd = new Ordering[TempFileBuffer] {
def compare(o1: TempFileBuffer, o2: TempFileBuffer): Int = {
ord.compare(o1.currentLine, o2.currentLine)
}
}
val pq = new collection.mutable.PriorityQueue[TempFileBuffer](
)(wrappedOrd)
// Init queue with item from each file
for (tmp <- fs) {
val buf = new TempFileBuffer(tmp)
if (!buf.isEmpty) pq += buf
}
var count = 0
val out = new java.io.PrintWriter(new java.io.File(outFileName))
try {
// Read each value off of queue
while (pq.size > 0) {
val buf = pq.dequeue()
out.println(buf.currentLine)
count += 1
buf.readNextLine()
if (buf.isEmpty) {
buf.file.delete() // don't need anymore
} else {
// re-add to priority queue so we can process next line
pq += buf
}
}
} finally {
out.close()
}
count
}
mergeSortedFiles(partitionAndSort())
}

My tests don't show any bugs in PriorityQueue.
import org.scalacheck._
import Prop._
object PriorityQueueProperties extends Properties("PriorityQueue") {
def listToPQ(l: List[String]): PriorityQueue[String] = {
val pq = new PriorityQueue[String]
l foreach (pq +=)
pq
}
def pqToList(pq: PriorityQueue[String]): List[String] =
if (pq.isEmpty) Nil
else { val h = pq.dequeue; h :: pqToList(pq) }
property("Enqueued elements are dequeued in reverse order") =
forAll { (l: List[String]) => l.sorted == pqToList(listToPQ(l)).reverse }
property("Adding/removing elements doesn't break sorting") =
forAll { (l: List[String], s: String) =>
(l.size > 0) ==>
((s :: l.sorted.init).sorted == {
val pq = listToPQ(l)
pq.dequeue
pq += s
pqToList(pq).reverse
})
}
}
scala> PriorityQueueProperties.check
+ PriorityQueue.Enqueued elements are dequeued in reverse order: OK, passed
100 tests.
+ PriorityQueue.Adding/removing elements doesn't break sorting: OK, passed
100 tests.
If you could somehow reduce the input enough to make a test case, it would help.

I ran it with five million inputs several times, output matched expected always. My guess from looking at your code is that your Ordering is the problem (i.e. it's giving inconsistent answers.)

Related

How to define multiple Custom Delimiter for input file in spark?

The default input file delimiter while reading a file via Spark is newline character(\n). It is possible to define a custom delimiter by using "textinputformat.record.delimiter" property.
But, Is it possible to specify multiple delimiter for the same file ?
Suppose a file has following content :
COMMENT,A,B,C
COMMENT,D,E,
F
LIKE,I,H,G
COMMENT,J,K,
L
COMMENT,M,N,O
I want to read this file with delimiter as COMMENT and LIKE instead of newline character.
Although, i came up with an alternative if multiple delimiters are not allowed in spark.
val ss = SparkSession.builder().appName("SentimentAnalysis").master("local[*]").getOrCreate()
val sc = ss.sparkContext
sc.hadoopConfiguration.set("textinputformat.record.delimiter", "COMMENT")
val rdd = sc.textFile("<filepath>")
val finalRdd = rdd.flatmap(f=>f.split("LIKE"))
But still, i think it will better to have multiple custom delimiter. Is it possible in spark ? or do i have to use the above alternative ?
Solved the above issue by creating a custom TextInputFormat class that splits on two types delimiter strings. The post pointed by #puhlen in the comments was a great help. Find below the code snippet which i used :
class CustomInputFormat extends TextInputFormat {
override def createRecordReader(inputSplit: InputSplit, taskAttemptContext: TaskAttemptContext): RecordReader[LongWritable, Text] = {
return new ParagraphRecordReader();
}
}
class ParagraphRecordReader extends RecordReader[LongWritable, Text] {
var end: Long = 0L;
var stillInChunk = true;
var key = new LongWritable();
var value = new Text();
var fsin: FSDataInputStream = null;
val buffer = new DataOutputBuffer();
val tempBuffer1 = MutableList[Int]();
val tempBuffer2 = MutableList[Int]();
val endTag1 = "COMMENT".getBytes();
val endTag2 = "LIKE".getBytes();
#throws(classOf[IOException])
#throws(classOf[InterruptedException])
override def initialize(inputSplit: org.apache.hadoop.mapreduce.InputSplit, taskAttemptContext: org.apache.hadoop.mapreduce.TaskAttemptContext) {
val split = inputSplit.asInstanceOf[FileSplit];
val conf = taskAttemptContext.getConfiguration();
val path = split.getPath();
val fs = path.getFileSystem(conf);
fsin = fs.open(path);
val start = split.getStart();
end = split.getStart() + split.getLength();
fsin.seek(start);
if (start != 0) {
readUntilMatch(endTag1, endTag2, false);
}
}
#throws(classOf[IOException])
override def nextKeyValue(): Boolean = {
if (!stillInChunk) return false;
val status = readUntilMatch(endTag1, endTag2, true);
value = new Text();
value.set(buffer.getData(), 0, buffer.getLength());
key = new LongWritable(fsin.getPos());
buffer.reset();
if (!status) {
stillInChunk = false;
}
return true;
}
#throws(classOf[IOException])
#throws(classOf[InterruptedException])
override def getCurrentKey(): LongWritable = {
return key;
}
#throws(classOf[IOException])
#throws(classOf[InterruptedException])
override def getCurrentValue(): Text = {
return value;
}
#throws(classOf[IOException])
#throws(classOf[InterruptedException])
override def getProgress(): Float = {
return 0;
}
#throws(classOf[IOException])
override def close() {
fsin.close();
}
#throws(classOf[IOException])
def readUntilMatch(match1: Array[Byte], match2: Array[Byte], withinBlock: Boolean): Boolean = {
var i = 0;
var j = 0;
while (true) {
val b = fsin.read();
if (b == -1) return false;
if (b == match1(i)) {
tempBuffer1.+=(b)
i = i + 1;
if (i >= match1.length) {
tempBuffer1.clear()
return fsin.getPos() < end;
}
} else if (b == match2(j)) {
tempBuffer2.+=(b)
j = j + 1;
if (j >= match2.length) {
tempBuffer2.clear()
return fsin.getPos() < end;
}
} else {
if (tempBuffer1.size != 0)
tempBuffer1.foreach { x => if (withinBlock) buffer.write(x) }
else if (tempBuffer2.size != 0)
tempBuffer2.foreach { x => if (withinBlock) buffer.write(x) }
tempBuffer1.clear()
tempBuffer2.clear()
if (withinBlock) buffer.write(b);
i = 0;
j = 0;
}
}
return false;
}
Use the following class in while reading file from filesystem and your file will get read with two delimiters as required. :)
val rdd = sc.newAPIHadoopFile("<filepath>", classOf[ParagraphInputFormat], classOf[LongWritable], classOf[Text], sc.hadoopConfiguration)

Split a list in several files scala

I have the following list,
10,44,22
10,47,12
15,38,3
15,41,30
16,44,15
16,47,18
22,38,21
22,41,42
34,44,40
34,47,36
40,38,39
40,41,42
45,38,27
45,41,30
46,44,45
46,47,48
Then I am creating one file with it is content with the following code:
val fstream:FileWriter = new FileWriter("patSPO.csv")
var out:BufferedWriter = new BufferedWriter(fstream)
val sl = listSPO.sortBy(l => (l.sub, l.pre))
for ( a <- 0 to listSPO.size-1){
out.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
out.close()
However I want to divide the content in a n files, then I try the following for 4 files:
val fstream:FileWriter = new FileWriter("patSPO.csv")
val fstream1:FileWriter = new FileWriter("patSPO1.csv")
val fstream2:FileWriter = new FileWriter("patSPO2.csv")
val fstream3:FileWriter = new FileWriter("patSPO3.csv")
val fstream4:FileWriter = new FileWriter("patSPO4.csv")
var out:BufferedWriter = new BufferedWriter(fstream)
var out1:BufferedWriter = new BufferedWriter(fstream1)
var out2:BufferedWriter = new BufferedWriter(fstream2)
var out3:BufferedWriter = new BufferedWriter(fstream3)
var out4:BufferedWriter = new BufferedWriter(fstream4)
val b :Int = listSPO.size/4
val sl = listSPO.sortBy(l => (l.sub, l.pre))
for ( a <- 0 to listSPO.size-1){
out.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
for ( a <- 0 to b-1){
out1.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
for ( a <- b to (b*2)-1){
out2.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
for ( a <- b*2 to (b*3)-1){
out3.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
for ( a <- b*3 to (b*4)-1){
out4.write(sl(a).sub.toString+","+sl(a).pre.toString+","+sl(a).obj.toString+"\n")
}
out.close()
out1.close()
out2.close()
out3.close()
out4.close()
Then my question is if exist a general code where I put the number of files to generate, for example 32, and not to write 32 times the out, the for and the fstream?
First, let's make some utilities to eliminate ugly boilerplate:
def open(
number: Int = 1,
prefix: String,
ext: String
) = (0 to n-1)
.iterator
.map { i =>
"%s%s%s".format(prefix, if(i == 0) "" else i.toString, postfix)
}.map(new FileWriter(_))
.map(new BufferedWriter(_))
def toString(l: WhateverTheType) = Seq(
l.sub,
l.pre,
l.obj
).mkString(",") + "\n"
And now to the implementation:
writeToFiles(
listSPO: List[WhateverTheType],
numFiles: Int = 1,
prefix: String = "pasSPO",
ext: String = ".csv"
) = listSPO
.grouped((listSPO.size+1)/numFiles)
.zip(open(numFiles, prefix, ext))
.foreach { case (input, file) =>
try {
input.foreach(file.write(toString(input)))
} finally {
file.close
}
}
Assuming the objects in the list have this type (otherwise, relpace SPO in the below code with your type):
case class SPO(sub: Int, pre: Int, obj: Int)
This should do it:
val sl = listSPO.sortBy(l => (l.sub, l.pre))
val files = 5 // whatever number you want
// helper function to write a single list - similar to your implementation but shorter
def writeListToFile(name: String, list: List[SPO]): Unit = {
val writer = new BufferedWriter(new FileWriter(name))
list.foreach(spo => writer.write(s"${spo.sub},${spo.pre},${spo.obj}\n"))
writer.close()
}
sl.grouped(sl.size / files) // split into sublists
.zipWithIndex // add index of sublists for file name
.foreach {
case (sublist, 0) => writeListToFile(s"pasSPO.csv", sublist) // if indeed you want the first file name NOT to include the index
case (sublist, index) => writeListToFile(s"pasSPO$index.csv", sublist)
}

Scala: Haar Wavelet Transform

I am trying to implement Haar Wavelet Transform in Scala. I am using this Python Code for reference Github Link to Python implementation of HWT
I am also giving here my Scala code version. I am new to Scala so forgive me for not-so-good-code.
/**
* Created by vipul vaibhaw on 1/11/2017.
*/
import scala.collection.mutable.{ListBuffer, MutableList,ArrayBuffer}
object HaarWavelet {
def main(args: Array[String]): Unit = {
var samples = ListBuffer(
ListBuffer(1,4),
ListBuffer(6,1),
ListBuffer(0,2,4,6,7,7,7,7),
ListBuffer(1,2,3,4),
ListBuffer(7,5,1,6,3,0,2,4),
ListBuffer(3,2,3,7,5,5,1,1,0,2,5,1,2,0,1,2,0,2,1,0,0,2,1,2,0,2,1,0,0,2,1,2)
)
for (i <- 0 to samples.length){
var ubound = samples(i).max+1
var length = samples(i).length
var deltas1 = encode(samples(i), ubound)
var deltas = deltas1._1
var avg = deltas1._2
println( "Input: %s, boundary = %s, length = %s" format(samples(i), ubound, length))
println( "Haar output:%s, average = %s" format(deltas, avg))
println("Decoded: %s" format(decode(deltas, avg, ubound)))
}
}
def wrap(value:Int, ubound:Int):Int = {
(value+ubound)%ubound
}
def encode(lst1:ListBuffer[Int], ubound:Int):(ListBuffer[Int],Int)={
//var lst = ListBuffer[Int]()
//lst1.foreach(x=>lst+=x)
var lst = lst1
var deltas = new ListBuffer[Int]()
var avg = 0
while (lst.length>=2) {
var avgs = new ListBuffer[Int]()
while (lst.nonEmpty) {
// getting first two element from the list and removing them
val a = lst.head
lst -= 1 // removing index 0 element from the list
val b = lst.head
lst -= 1 // removing index 0 element from the list
if (a<=b) {
avg = (a + b)/2
}
else{
avg = (a+b+ubound)/2
}
var delta = wrap(b-a,ubound)
avgs += avg
deltas += delta
}
lst = avgs
}
(deltas, avg%ubound)
}
def decode(deltas:ListBuffer[Int],avg:Int,ubound:Int):ListBuffer[Int]={
var avgs = ListBuffer[Int](avg)
var l = 1
while(deltas.nonEmpty){
for(i <- 0 to l ){
val delta = deltas.last
deltas -= -1
val avg = avgs.last
avgs -= -1
val a = wrap(math.ceil(avg-delta/2.0).toInt,ubound)
val b = wrap(math.ceil(avg+delta/2.0).toInt,ubound)
}
l*=2
}
avgs
}
def is_pow2(n:Int):Boolean={
(n & -n) == n
}
}
But Code gets stuck at "var deltas1 = encode(samples(i), ubound)" and doesn't give any output. How can I improve my implementation? Thanks in advance!
Your error is on this line:
lst -= 1 // removing index 0 element from the list.
This doesn't remove index 0 from the list. It removes the element 1 (if it exists). This means that the list never becomes empty. The while-loop while (lst.nonEmpty) will therefore never terminate.
To remove the first element of the list, simply use lst.remove(0).

Scala: Save result in a toDf temp table

I'm trying save some analyses in toDF TempTable, but a receive the following error ":215: error: value toDF is not a member of Double".
I'm reading data of a Cassandra table, and i'm doing some calculations. I want save these results in a temp table.
I'm new in scala, somebody, can help me please?
my code
case class Consumo(consumo:Double, consumo_mensal: Double, mes: org.joda.time.DateTime,ano: org.joda.time.DateTime, soma_pf: Double,empo_gasto: Double);
object Analysegridata{
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host","127.0.0.1").setAppName("LiniarRegression")
.set("spark.cassandra.connection.port", "9042")
.set("spark.driver.allowMultipleContexts", "true")
.set("spark.streaming.receiver.writeAheadLog.enable", "true");
val sc = new SparkContext(conf);
val ssc = new StreamingContext(sc, Seconds(1))
val sqlContext = new org.apache.spark.sql.SQLContext(sc);
val checkpointDirectory = "/var/lib/cassandra/data"
ssc.checkpoint(checkpointDirectory) // set checkpoint directory
// val context = StreamingContext.getOrCreate(checkpointDirectory)
import sqlContext.implicits._
JavaSparkContext.fromSparkContext(sc);
def rddconsumo(rddData: Double): Double = {
val rddData: Double = {
implicit val data = conf
val grid = sc.cassandraTable("smartgrids", "analyzer").as((r:Double) => (r)).collect
def goto(cs: Array[Double]): Double = {
var consumo = 0.0;
var totaldias = 0;
var soma_pf = 0.0;
var somamc = 0.0;
var tempo_gasto = 0.0;
var consumo_mensal = 0.0;
var i=0
for (i <- 0 until grid.length) {
val minutos = sc.cassandraTable("smartgrids","analyzer_temp").select("timecol", "MINUTE");
val horas = sc.cassandraTable("smartgrids","analyzer_temp").select("timecol","HOUR_OF_DAY");
val dia = sc.cassandraTable("smartgrids","analyzer_temp").select("timecol", "DAY_OF_MONTH");
val ano = sc.cassandraTable("smartgrids","analyzer_temp").select("timecol", "YEAR");
val mes = sc.cassandraTable("smartgrids","analyzer_temp").select("timecol", "MONTH");
val potencia = sc.cassandraTable("smartgrids","analyzer_temp").select("n_pf1", "n_pf2", "n_pf3")
def convert_minutos (minuto : Int) : Double ={
minuto/60
}
dia.foreach (i => {
def adSum(potencia: Array[Double]) = {
var i=0;
while (i < potencia.length) {
soma_pf += potencia(i);
i += 1;
soma_pf;
println("Potemcia =" + soma_pf)
}
}
def tempo(minutos: Array[Int]) = {
var i=0;
while (i < minutos.length) {
somamc += convert_minutos(minutos(i))
i += 1;
somamc
}
}
def tempogasto(horas: Array[Int]) = {
var i=0;
while (i < horas.length) {
tempo_gasto = horas(i) + somamc;
i += 1;
tempo_gasto;
println("Temo que o aparelho esteve ligado =" + tempo_gasto)
}
}
def consumof(dia: Array[Int]) = {
var i=0;
while (i < dia.length) {
consumo = soma_pf * tempo_gasto;
i += 1;
consumo;
println("Consumo diario =" + consumo)
}
}
})
mes.foreach (i => {
def totaltempo(dia: Array[Int]) = {
var i = 0;
while(i < dia.length){
totaldias += dia(i);
i += 1;
totaldias;
println("Numero total de dias =" + totaldias)
}
}
def consumomensal(mes: Array[Int]) = {
var i = 0;
while(i < mes.length){
consumo_mensal = consumo * totaldias;
i += 1;
consumo_mensal;
println("Consumo Mensal =" + consumo_mensal);
}
}
})
}
consumo;
totaldias;
consumo_mensal;
soma_pf;
tempo_gasto;
somamc
}
rddData
}
rddData.toDF().registerTempTable("rddData")
}
ssc.start()
ssc.awaitTermination()
error: value toDF is not a member of Double"
It's rather unclear what you're trying to do exactly (too much code, try providing a minimal example), but there are a few apparent issues:
rddData has type Double: seems like it should be RDD[Double] (which is a distributed collection of Double values). Trying to save a single Double value as a table doesn't make much sense, and indeed - doesn't work (toDF can be called on an RDD, not any type, specifically not on Double, as the compiler warns).
you collect() the data: if you want to load an RDD, transform it using some manipulation, and then save it as a table - collect() should probably not be called on that RDD. collect() sends all the data (distributed across the cluster) into the single "driver" machine (the one running this code) - after which you're not taking advantage of the cluster, and again not using the RDD data structure so you can't convert the data into a DataFrame using toDF.

Reading lines and raw bytes from the same source in scala

I need to write code that does the following:
Connect to a tcp socket
Read a line ending in "\r\n" that contains a number N
Read N bytes
Use those N bytes
I am currently using the following code:
val socket = new Socket(InetAddress.getByName(host), port)
val in = socket.getInputStream;
val out = new PrintStream(socket.getOutputStream)
val reader = new DataInputStream(in)
val baos = new ByteArrayOutputStream
val buffer = new Array[Byte](1024)
out.print(cmd + "\r\n")
out.flush
val firstLine = reader.readLine.split("\\s")
if(firstLine(0) == "OK") {
def read(written: Int, max: Int, baos: ByteArrayOutputStream): Array[Byte] = {
if(written >= max) baos.toByteArray
else {
val count = reader.read(buffer, 0, buffer.length)
baos.write(buffer, 0, count)
read(written + count, max, baos)
}
}
read(0, firstLine(1).toInt, baos)
} else {
// RAISE something
}
baos.toByteArray()
The problem with this code is that the use of DataInputStream#readLine raises a deprecation warning, but I can't find a class that implements both read(...) and readLine(...). BufferedReader for example, implements read but it reads Chars and not Bytes. I could cast those chars to bytes but I don't think it's safe.
Any other ways to write something like this in scala?
Thank you
be aware that on the JVM a char has 2 bytes, so "\r\n" is 4 bytes. This is generally not true for Strings stored outside of the JVM.
I think the safest way would be to read your file in raw bytes until you reache your Binary representation of "\r\n", now you can create a Reader (makes bytes into JVM compatible chars) on the first bytes, where you can be shure that there is Text only, parse it, and contiue safely with the rest of the binary data.
You can achive the goal to use read(...) and readLine(...) in one class. The idea is use BufferedReader.read():Int. The BufferedReader class has buffered the content so you can read one byte a time without performance decrease.
The change can be: (without scala style optimization)
import java.io.BufferedInputStream
import java.io.BufferedReader
import java.io.ByteArrayOutputStream
import java.io.PrintStream
import java.net.InetAddress
import java.net.Socket
import java.io.InputStreamReader
object ReadLines extends App {
val host = "127.0.0.1"
val port = 9090
val socket = new Socket(InetAddress.getByName(host), port)
val in = socket.getInputStream;
val out = new PrintStream(socket.getOutputStream)
// val reader = new DataInputStream(in)
val bufIns = new BufferedInputStream(in)
val reader = new BufferedReader(new InputStreamReader(bufIns, "utf8"));
val baos = new ByteArrayOutputStream
val buffer = new Array[Byte](1024)
val cmd = "get:"
out.print(cmd + "\r\n")
out.flush
val firstLine = reader.readLine.split("\\s")
if (firstLine(0) == "OK") {
def read(written: Int, max: Int, baos: ByteArrayOutputStream): Array[Byte] = {
if (written >= max) {
println("get: " + new String(baos.toByteArray))
baos.toByteArray()
} else {
// val count = reader.read(buffer, 0, buffer.length)
var count = 0
var b = reader.read()
while(b != -1){
buffer(count) = b.toByte
count += 1
if (count < max){
b = reader.read()
}else{
b = -1
}
}
baos.write(buffer, 0, count)
read(written + count, max, baos)
}
}
read(0, firstLine(1).toInt, baos)
} else {
// RAISE something
}
baos.toByteArray()
}
for test, below is a server code:
object ReadLinesServer extends App {
val serverSocket = new ServerSocket(9090)
while(true){
println("accepted a connection.")
val socket = serverSocket.accept()
val ops = socket.getOutputStream()
val printStream = new PrintStream(ops, true, "utf8")
printStream.print("OK 2\r\n") // 1 byte for alpha-number char
printStream.print("ab")
}
}
Seems this is the best solution I can find:
val reader = new BufferedReader(new InputStreamReader(in))
val buffer = new Array[Char](1024)
out.print(cmd + "\r\n")
out.flush
val firstLine = reader.readLine.split("\\s")
if(firstLine(0) == "OK") {
def read(readCount: Int, acc: List[Byte]): Array[Byte] = {
if(readCount <= 0) acc.toArray
else {
val count = reader.read(buffer, 0, buffer.length)
val asBytes = buffer.slice(0, count).map(_.toByte)
read(readCount - count, acc ++ asBytes)
}
}
read(firstLine(1).toInt, List[Byte]())
} else {
// RAISE
}
That is, use buffer.map(_.toByte).toArray to transform a char Array into a Byte Array without caring about the encoding.