Scala bufferedinput stream reader - scala

I have a scala program as below
object TestApp {
def main(args: Array[String]): Unit = {
val in = new BufferedInputStream(new FileInputStream("testBytes.txt"))
val buffer = Array.ofDim[Byte](15)
while (in.read(buffer)>0) {
println(new String(buffer))
}
}
}
The input file contains "AAAAAAAAAABBBBBBBBBB"
When I run this program I'm getting the below results
AAAAAAAAAABBBBB
BBBBBAAAAABBBBB
I'm confused why the buffer keeps old read data still, Or any way to avoid this?
I'm expecting something like this
AAAAAAAAAABBBBB
BBBBB

The read method returns the number of bytes read. Only use these bytes in the buffer. E.g.
var len = 0
while ({len = in.read(buffer); len} > 0) {
println(new String(buffer, 0, len))
}

Related

Why does "Inflator" algorithm fails for UTF-8 encoding?

I wrote the following code in order to decompress messages that were zipped using the Deflator algorithm:
def decompressMsg[V: StringDecoder](msg: String): Try[V] = {
if (msg.startsWith(CompressionHeader)) {
logger.debug(s"Message before decompression is: ${msg}")
val compressedByteArray =
msg.drop(CompressionHeader.length).getBytes(StandardCharsets.UTF_8)
val inflaterInputStream = new InflaterInputStream(
new ByteArrayInputStream(compressedByteArray)
)
val decompressedByteArray = readDataFromInflaterInputStream(inflaterInputStream)
StringDecoder.decode[V](new String(decompressedByteArray, StandardCharsets.UTF_8).tap {
decompressedMsg => logger.info(s"Message after decompression is: ${decompressedMsg}")
})
} else {
StringDecoder.decode[V](msg)
}
}
private def readDataFromInflaterInputStream(
inflaterInputStream: InflaterInputStream
): Array[Byte] = {
val outputStream = new ByteArrayOutputStream
var runLoop = true
while (runLoop) {
val buffer = new Array[Byte](BufferSize)
val len = inflaterInputStream.read(buffer) // ERROR IS THROWN FROM THIS LINE!!
outputStream.write(buffer, 0, len)
if (len < BufferSize) runLoop = false
}
outputStream.toByteArray
}
The input argument 'msg' was compressed using the Deflator. The above code fails with the error message:
invalid stored block lengths java.util.zip.ZipException: invalid
stored block lengths
After I saw this thread, I changed StandardCharsets.UTF_8 to StandardCharsets.ISO_8859_1 and surprisingly, the code passed and returned the desired behaviour.
I don't want to work with an encoding different than UTF_8. Do you have an idea how to make my code work with UTF_8 encoding?

Scala- Some bytes are missing because of 'hasNext' function?

Listing below is my code:
import scala.io.Source
object grouped_list {
def encrypt(file: String) = {
val text = Source.fromFile(file)
val list = text.toList
val blocks = list.grouped(501)
while (blocks.hasNext) {
val block0 = blocks.next()
val stringBlock = block0.mkString
println(stringBlock)
}
}
def main (args: Array[String]): Unit = {
val blockcipher = encrypt("E:\\test.txt")
}
The file ("E:\test.txt", https://pan.baidu.com/s/1jHKuWb8) is about 300KB, however, the 'stringBlock' (https://pan.baidu.com/s/1pLygsyR) which is supposed to be the same with the file is about only 80KB, where are the rest? When I am not doing the 'while(blocks.hasNext)', the first 501 bytes of 'stringBlock' is the same with the file's first 501 bytes. After I did the 'while(blocks.hasNext)' , the first 501 bytes of 'stringBlock' is no longer the same with the file's. Is my problem has something to do with the 'hasNext' function?

How to write a string to Scala Process?

I start and have running a Scala process.
val dir = "/path/to/working/dir/"
val stockfish = Process(Seq("wine", dir + "stockfish_8_x32.exe"))
val logger = ProcessLogger(printf("Stdout: %s%n", _))
val stockfishProcess = stockfish.run(logger, connectInput = true)
The process reads from and writes to standard IO (console). How can I send a string command to the process if it's been already started?
Scala process API has ProcessBuilder which has in turn bunch of useful methods. But ProcessBuilder is used before a process starts to compose complex shell commands. Also Scala has ProcessIO to handle input or output. I don't need it too. I just need to send message to my process.
In Java I would do something like this.
String dir = "/path/to/working/dir/";
ProcessBuilder builder = new ProcessBuilder("wine", dir + "stockfish_8_x32.exe");
Process process = builder.start();
OutputStream stdin = process.getOutputStream();
InputStream stdout = process.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(stdout));
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(stdin));
new Thread(() -> {
try {
String line;
while ((line = reader.readLine()) != null) {
System.out.println("Stdout: " + line);
}
} catch (IOException e) {
e.printStackTrace();
}
}).start();
Thread.sleep(5000); // it's just for example
writer.write("quit"); // send to the process command to stop working
writer.newLine();
writer.flush();
It works quite well. I start my process, get InputStream and OutputStream from it, and use the streams to interact with the process.
It appears Scala Process trait provides no ways to write to it. ProcessBuilder is useless after process run. And ProcessIO is just for IO catching and handling.
Are there any ways to write to Scala running process?
UPDATE:
I don't see how I may use ProcessIO to pass a string to running process.
I did the following.
import scala.io.Source
import scala.sys.process._
object Sample extends App {
def out = (output: java.io.OutputStream) => {
output.flush()
output.close()
}
def in = (input: java.io.InputStream) => {
println("Stdout: " + Source.fromInputStream(input).mkString)
input.close()
}
def go = {
val dir = "/path/to/working/dir/"
val stockfishSeq = Seq("wine", dir + "/stockfish_8_x32.exe")
val pio = new ProcessIO(out, in, err => {})
val stockfish = Process(stockfishSeq)
stockfish.run(pio)
Thread.sleep(5000)
System.out.write("quit\n".getBytes)
pio.writeInput(System.out) // "writeInput" is function "out" which I have passed to conforming ProcessIO instance. I can invoke it from here. It takes OutputStream but where can I obtain it? Here I just pass System.out for example.
}
go
}
Of course it does not work and I failed to understand how to implement functionality as in my Java snippet above. It would be great to have advice or snippet of Scala code clearing my issue.
I think the documentation around Scala processes (specifically the usage and semantics of ProcessIO) could use some improvement. The first time I tried using this API, I also found it very confusing, and it took some trial and error to get my subprocess i/o working correctly.
I think seeing a simple example is probably all you really need. I'll do something really simple: invoking bc as a subprocess to do some trivial computations, and then printing the answers to my stdout. My goal is to do something like this (but from Scala rather than from my shell):
$ printf "1+2\n3+4\n" | bc
3
7
Here's how I'd do it in Scala:
import scala.io.Source
import scala.sys.process._
object SimpleProcessExample extends App {
def out = (output: java.io.OutputStream) => {
output.flush()
output.close()
}
def in = (input: java.io.InputStream) => {
println("Stdout: " + Source.fromInputStream(input).mkString)
input.close()
}
// limit scope of any temporary variables
locally {
val calcCommand = "bc"
// strings are implicitly converted to ProcessBuilder
// via scala.sys.process.ProcessImplicits.stringToProcess(_)
val calcProc = calcCommand.run(new ProcessIO(
// Handle subprocess's stdin
// (which we write via an OutputStream)
in => {
val writer = new java.io.PrintWriter(in)
writer.println("1 + 2")
writer.println("3 + 4")
writer.close()
},
// Handle subprocess's stdout
// (which we read via an InputStream)
out => {
val src = scala.io.Source.fromInputStream(out)
for (line <- src.getLines()) {
println("Answer: " + line)
}
src.close()
},
// We don't want to use stderr, so just close it.
_.close()
))
// Using ProcessBuilder.run() will automatically launch
// a new thread for the input/output routines passed to ProcessIO.
// We just need to wait for it to finish.
val code = calcProc.exitValue()
println(s"Subprocess exited with code $code.")
}
}
Notice that you don't actually call any of the methods of the ProcessIO object directly because they're automatically called by the ProcessBuilder.
Here's the result:
$ scala SimpleProcessExample
Answer: 3
Answer: 7
Subprocess exited with code 0.
If you wanted interaction between the input and output handlers to the subprocess, you can use standard thread communication tools (e.g., have both close over an instance of BlockingQueue).
Here is an example of obtaining input and output streams from a process, which you can write to and read from after the process starts:
object demo {
import scala.sys.process._
def getIO = {
// create piped streams that can attach to process streams:
val procInput = new java.io.PipedOutputStream()
val procOutput = new java.io.PipedInputStream()
val io = new ProcessIO(
// attach to the process's internal input stream
{ in =>
val istream = new java.io.PipedInputStream(procInput)
val buf = Array.fill(100)(0.toByte)
var br = 0
while (br >= 0) {
br = istream.read(buf)
if (br > 0) { in.write(buf, 0, br) }
}
in.close()
},
// attach to the process's internal output stream
{ out =>
val ostream = new java.io.PipedOutputStream(procOutput)
val buf = Array.fill(100)(0.toByte)
var br = 0
while (br >= 0) {
br = out.read(buf)
if (br > 0) { ostream.write(buf, 0, br) }
}
out.close()
},
// ignore stderr
{ err => () }
)
// run the command with the IO object:
val cmd = List("awk", "{ print $1 + $2 }")
val proc = cmd.run(io)
// wrap the raw streams in formatted IO objects:
val procO = new java.io.BufferedReader(new java.io.InputStreamReader(procOutput))
val procI = new java.io.PrintWriter(procInput, true)
(procI, procO)
}
}
Here's a short example of using the input and output objects. Note that it's hard to guarantee that the process will receive it's input until you close the input streams/objects, since everything is piped, buffered, etc.
scala> :load /home/eje/scala/input2proc.scala
Loading /home/eje/scala/input2proc.scala...
defined module demo
scala> val (procI, procO) = demo.getIO
procI: java.io.PrintWriter = java.io.PrintWriter#7e809b79
procO: java.io.BufferedReader = java.io.BufferedReader#5cc126dc
scala> procI.println("1 2")
scala> procI.println("3 4")
scala> procI.println("5 6")
scala> procI.close()
scala> procO.readLine
res4: String = 3
scala> procO.readLine
res5: String = 7
scala> procO.readLine
res6: String = 11
scala>
In general, if you are managing both input and output simultaneously in the same thread, there is the potential for deadlock, since either read or write can block waiting for the other. It is safest to run input logic and output logic in their own threads. With these threading concerns in mind, it is also possible to just put the input and output logic directly into the definitions { in => ... } and { out => ... }, as these are both run in separate threads automatically
I haven't actually tried this, but the documentation says that you can use a instance of ProcessIO to handle the Process's input and output in a manner similar to what you would do in Java.
var outPutStream: Option[OutputStream] = None
val io = new ProcessIO(
{ outputStream =>
outPutStream = Some(outputStream)
},
Source.fromInputStream(_).getLines().foreach(println),
Source.fromInputStream(_).getLines().foreach(println)
)
command run io
val out = outPutStream.get
out.write("test" getBytes())
You can get an InputStream in the same way.

Creating serializable objects from Scala source code at runtime

To embed Scala as a "scripting language", I need to be able to compile text fragments to simple objects, such as Function0[Unit] that can be serialised to and deserialised from disk and which can be loaded into the current runtime and executed.
How would I go about this?
Say for example, my text fragment is (purely hypothetical):
Document.current.elements.headOption.foreach(_.open())
This might be wrapped into the following complete text:
package myapp.userscripts
import myapp.DSL._
object UserFunction1234 extends Function0[Unit] {
def apply(): Unit = {
Document.current.elements.headOption.foreach(_.open())
}
}
What comes next? Should I use IMain to compile this code? I don't want to use the normal interpreter mode, because the compilation should be "context-free" and not accumulate requests.
What I need to get hold off from the compilation is I guess the binary class file? In that case, serialisation is straight forward (byte array). How would I then load that class into the runtime and invoke the apply method?
What happens if the code compiles to multiple auxiliary classes? The example above contains a closure _.open(). How do I make sure I "package" all those auxiliary things into one object to serialize and class-load?
Note: Given that Scala 2.11 is imminent and the compiler API probably changed, I am happy to receive hints as how to approach this problem on Scala 2.11
Here is one idea: use a regular Scala compiler instance. Unfortunately it seems to require the use of hard disk files both for input and output. So we use temporary files for that. The output will be zipped up in a JAR which will be stored as a byte array (that would go into the hypothetical serialization process). We need a special class loader to retrieve the class again from the extracted JAR.
The following assumes Scala 2.10.3 with the scala-compiler library on the class path:
import scala.tools.nsc
import java.io._
import scala.annotation.tailrec
Wrapping user provided code in a function class with a synthetic name that will be incremented for each new fragment:
val packageName = "myapp"
var userCount = 0
def mkFunName(): String = {
val c = userCount
userCount += 1
s"Fun$c"
}
def wrapSource(source: String): (String, String) = {
val fun = mkFunName()
val code = s"""package $packageName
|
|class $fun extends Function0[Unit] {
| def apply(): Unit = {
| $source
| }
|}
|""".stripMargin
(fun, code)
}
A function to compile a source fragment and return the byte array of the resulting jar:
/** Compiles a source code consisting of a body which is wrapped in a `Function0`
* apply method, and returns the function's class name (without package) and the
* raw jar file produced in the compilation.
*/
def compile(source: String): (String, Array[Byte]) = {
val set = new nsc.Settings
val d = File.createTempFile("temp", ".out")
d.delete(); d.mkdir()
set.d.value = d.getPath
set.usejavacp.value = true
val compiler = new nsc.Global(set)
val f = File.createTempFile("temp", ".scala")
val out = new BufferedOutputStream(new FileOutputStream(f))
val (fun, code) = wrapSource(source)
out.write(code.getBytes("UTF-8"))
out.flush(); out.close()
val run = new compiler.Run()
run.compile(List(f.getPath))
f.delete()
val bytes = packJar(d)
deleteDir(d)
(fun, bytes)
}
def deleteDir(base: File): Unit = {
base.listFiles().foreach { f =>
if (f.isFile) f.delete()
else deleteDir(f)
}
base.delete()
}
Note: Doesn't handle compiler errors yet!
The packJar method uses the compiler output directory and produces an in-memory jar file from it:
// cf. http://stackoverflow.com/questions/1281229
def packJar(base: File): Array[Byte] = {
import java.util.jar._
val mf = new Manifest
mf.getMainAttributes.put(Attributes.Name.MANIFEST_VERSION, "1.0")
val bs = new java.io.ByteArrayOutputStream
val out = new JarOutputStream(bs, mf)
def add(prefix: String, f: File): Unit = {
val name0 = prefix + f.getName
val name = if (f.isDirectory) name0 + "/" else name0
val entry = new JarEntry(name)
entry.setTime(f.lastModified())
out.putNextEntry(entry)
if (f.isFile) {
val in = new BufferedInputStream(new FileInputStream(f))
try {
val buf = new Array[Byte](1024)
#tailrec def loop(): Unit = {
val count = in.read(buf)
if (count >= 0) {
out.write(buf, 0, count)
loop()
}
}
loop()
} finally {
in.close()
}
}
out.closeEntry()
if (f.isDirectory) f.listFiles.foreach(add(name, _))
}
base.listFiles().foreach(add("", _))
out.close()
bs.toByteArray
}
A utility function that takes the byte array found in deserialization and creates a map from class names to class byte code:
def unpackJar(bytes: Array[Byte]): Map[String, Array[Byte]] = {
import java.util.jar._
import scala.annotation.tailrec
val in = new JarInputStream(new ByteArrayInputStream(bytes))
val b = Map.newBuilder[String, Array[Byte]]
#tailrec def loop(): Unit = {
val entry = in.getNextJarEntry
if (entry != null) {
if (!entry.isDirectory) {
val name = entry.getName
// cf. http://stackoverflow.com/questions/8909743
val bs = new ByteArrayOutputStream
var i = 0
while (i >= 0) {
i = in.read()
if (i >= 0) bs.write(i)
}
val bytes = bs.toByteArray
b += mkClassName(name) -> bytes
}
loop()
}
}
loop()
in.close()
b.result()
}
def mkClassName(path: String): String = {
require(path.endsWith(".class"))
path.substring(0, path.length - 6).replace("/", ".")
}
A suitable class loader:
class MemoryClassLoader(map: Map[String, Array[Byte]]) extends ClassLoader {
override protected def findClass(name: String): Class[_] =
map.get(name).map { bytes =>
println(s"defineClass($name, ...)")
defineClass(name, bytes, 0, bytes.length)
} .getOrElse(super.findClass(name)) // throws exception
}
And a test case which contains additional classes (closures):
val exampleSource =
"""val xs = List("hello", "world")
|println(xs.map(_.capitalize).mkString(" "))
|""".stripMargin
def test(fun: String, cl: ClassLoader): Unit = {
val clName = s"$packageName.$fun"
println(s"Resolving class '$clName'...")
val clazz = Class.forName(clName, true, cl)
println("Instantiating...")
val x = clazz.newInstance().asInstanceOf[() => Unit]
println("Invoking 'apply':")
x()
}
locally {
println("Compiling...")
val (fun, bytes) = compile(exampleSource)
val map = unpackJar(bytes)
println("Classes found:")
map.keys.foreach(k => println(s" '$k'"))
val cl = new MemoryClassLoader(map)
test(fun, cl) // should call `defineClass`
test(fun, cl) // should find cached class
}

Reading lines and raw bytes from the same source in scala

I need to write code that does the following:
Connect to a tcp socket
Read a line ending in "\r\n" that contains a number N
Read N bytes
Use those N bytes
I am currently using the following code:
val socket = new Socket(InetAddress.getByName(host), port)
val in = socket.getInputStream;
val out = new PrintStream(socket.getOutputStream)
val reader = new DataInputStream(in)
val baos = new ByteArrayOutputStream
val buffer = new Array[Byte](1024)
out.print(cmd + "\r\n")
out.flush
val firstLine = reader.readLine.split("\\s")
if(firstLine(0) == "OK") {
def read(written: Int, max: Int, baos: ByteArrayOutputStream): Array[Byte] = {
if(written >= max) baos.toByteArray
else {
val count = reader.read(buffer, 0, buffer.length)
baos.write(buffer, 0, count)
read(written + count, max, baos)
}
}
read(0, firstLine(1).toInt, baos)
} else {
// RAISE something
}
baos.toByteArray()
The problem with this code is that the use of DataInputStream#readLine raises a deprecation warning, but I can't find a class that implements both read(...) and readLine(...). BufferedReader for example, implements read but it reads Chars and not Bytes. I could cast those chars to bytes but I don't think it's safe.
Any other ways to write something like this in scala?
Thank you
be aware that on the JVM a char has 2 bytes, so "\r\n" is 4 bytes. This is generally not true for Strings stored outside of the JVM.
I think the safest way would be to read your file in raw bytes until you reache your Binary representation of "\r\n", now you can create a Reader (makes bytes into JVM compatible chars) on the first bytes, where you can be shure that there is Text only, parse it, and contiue safely with the rest of the binary data.
You can achive the goal to use read(...) and readLine(...) in one class. The idea is use BufferedReader.read():Int. The BufferedReader class has buffered the content so you can read one byte a time without performance decrease.
The change can be: (without scala style optimization)
import java.io.BufferedInputStream
import java.io.BufferedReader
import java.io.ByteArrayOutputStream
import java.io.PrintStream
import java.net.InetAddress
import java.net.Socket
import java.io.InputStreamReader
object ReadLines extends App {
val host = "127.0.0.1"
val port = 9090
val socket = new Socket(InetAddress.getByName(host), port)
val in = socket.getInputStream;
val out = new PrintStream(socket.getOutputStream)
// val reader = new DataInputStream(in)
val bufIns = new BufferedInputStream(in)
val reader = new BufferedReader(new InputStreamReader(bufIns, "utf8"));
val baos = new ByteArrayOutputStream
val buffer = new Array[Byte](1024)
val cmd = "get:"
out.print(cmd + "\r\n")
out.flush
val firstLine = reader.readLine.split("\\s")
if (firstLine(0) == "OK") {
def read(written: Int, max: Int, baos: ByteArrayOutputStream): Array[Byte] = {
if (written >= max) {
println("get: " + new String(baos.toByteArray))
baos.toByteArray()
} else {
// val count = reader.read(buffer, 0, buffer.length)
var count = 0
var b = reader.read()
while(b != -1){
buffer(count) = b.toByte
count += 1
if (count < max){
b = reader.read()
}else{
b = -1
}
}
baos.write(buffer, 0, count)
read(written + count, max, baos)
}
}
read(0, firstLine(1).toInt, baos)
} else {
// RAISE something
}
baos.toByteArray()
}
for test, below is a server code:
object ReadLinesServer extends App {
val serverSocket = new ServerSocket(9090)
while(true){
println("accepted a connection.")
val socket = serverSocket.accept()
val ops = socket.getOutputStream()
val printStream = new PrintStream(ops, true, "utf8")
printStream.print("OK 2\r\n") // 1 byte for alpha-number char
printStream.print("ab")
}
}
Seems this is the best solution I can find:
val reader = new BufferedReader(new InputStreamReader(in))
val buffer = new Array[Char](1024)
out.print(cmd + "\r\n")
out.flush
val firstLine = reader.readLine.split("\\s")
if(firstLine(0) == "OK") {
def read(readCount: Int, acc: List[Byte]): Array[Byte] = {
if(readCount <= 0) acc.toArray
else {
val count = reader.read(buffer, 0, buffer.length)
val asBytes = buffer.slice(0, count).map(_.toByte)
read(readCount - count, acc ++ asBytes)
}
}
read(firstLine(1).toInt, List[Byte]())
} else {
// RAISE
}
That is, use buffer.map(_.toByte).toArray to transform a char Array into a Byte Array without caring about the encoding.