Interact (i/o) with an external process in Scala - scala

I'm looking for a simple way to start an external process and then write strings to its input and read its output.
In Python, this works:
mosesProcess = subprocess.Popen([mosesBinPath, '-f', mosesModelPath], stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE);
# ...
mosesProcess.stdin.write(aRequest);
mosesAnswer = mosesProcess.stdout.readline().rstrip();
# ...
mosesProcess.stdin.write(anotherRequest);
mosesAnswer = mosesProcess.stdout.readline().rstrip();
# ...
mosesProcess.stdin.close();
I think in Scala this should be done with scala.sys.process.ProcessBuilder and scala.sys.process.ProcessIO but I don't get how they work (especially the latter).
EDIT:
I have tried things like:
val inputStream = new scala.concurrent.SyncVar[java.io.OutputStream];
val outputStream = new scala.concurrent.SyncVar[java.io.InputStream];
val errStream = new scala.concurrent.SyncVar[java.io.InputStream];
val cmd = "myProc";
val pb = process.Process(cmd);
val pio = new process.ProcessIO(stdin => inputStream.put(stdin),
stdout => outputStream.put(stdout),
stderr => errStream.put(stderr));
pb.run(pio);
inputStream.get.write(("request1" + "\n").getBytes);
println(outputStream.get.read); // It is blocked here
inputStream.get.write(("request2" + "\n").getBytes);
println(outputStream.get.read);
inputStream.get.close()
But the execution gets stuck.

Granted, attrib below is not a great example on the write side of things. I have an EchoServer that would input/output
import scala.sys.process._
import java.io._
object EchoClient{
def main(args: Array[String]) {
var bContinue=true
var cmd="C:\\\\windows\\system32\\attrib.exe"
println(cmd)
val process = Process (cmd)
val io = new ProcessIO (
writer,
out => {scala.io.Source.fromInputStream(out).getLines.foreach(println)},
err => {scala.io.Source.fromInputStream(err).getLines.foreach(println)})
while (bContinue) {
process run io
var answer = readLine("Run again? (y/n)? ")
if (answer=="n" || answer=="N")
bContinue=false
}
}
def reader(input: java.io.InputStream) = {
// read here
}
def writer(output: java.io.OutputStream) = {
// write here
//
}
// TODO: implement an error logger
}
output below :
C:\\windows\system32\attrib.exe
A C:\dev\EchoClient$$anonfun$1.class
A C:\dev\EchoClient$$anonfun$2$$anonfun$apply$1.class
A C:\dev\EchoClient$$anonfun$2.class
A C:\dev\EchoClient$$anonfun$3$$anonfun$apply$2.class
A C:\dev\EchoClient$$anonfun$3.class
A C:\dev\EchoClient$.class
A C:\dev\EchoClient.class
A C:\dev\EchoClient.scala
A C:\dev\echoServer.bat
A C:\dev\EchoServerChg$$anonfun$main$1.class
A C:\dev\EchoServerChg$.class
A C:\dev\EchoServerChg.class
A C:\dev\EchoServerChg.scala
A C:\dev\ScannerTest$$anonfun$main$1.class
A C:\dev\ScannerTest$.class
A C:\dev\ScannerTest.class
A C:\dev\ScannerTest.scala
Run again? (y/n)?

Scala API for ProcessIO:
new ProcessIO(in: (OutputStream) ⇒ Unit, out: (InputStream) ⇒ Unit, err: (InputStream) ⇒ Unit)
I suppose you should provide at least two arguments, 1 outputStream function (writing to the process), 1 inputStream function (reading from the process).
For instance:
def readJob(in: InputStream) {
// do smthing with in
}
def writeJob(out: OutputStream) {
// do somthing with out
}
def errJob(err: InputStream) {
// do smthing with err
}
val process = new ProcessIO(writeJob, readJob, errJob)
Please keep in mind that the streams are Java streams so you will have to check Java API.
Edit: the package page provides examples, maybe you could take a look at them.

ProcessIO is the way to go for low level control and input and output interaction. There even is an often overlooked helper object BasicIO that assists with creating common ProcessIO instances for reading, connecting in/out streams with utility functions. You can look at the source for BasicIO.scala to see what it is doing internally in creating the ProcessIO Instances.
You can sometimes find inspiration from test cases or tools created for the class itself by the project. In the case of Scala, have a look at the source on GitHub. We are fortunate in that there is a detailed example of ProcessIO being used for the scala GraphViz Dot process runner DotRunner.scala!

Related

How to handle asynchronous API response in scala

I have an API that I need to query in scala. API returns a code that would be equal to 1 when results are ready.
I thought about an until loop to handle as the following:
var code= -1
while(code!=1){
var response = parse(Http(URL).asString.body)
code = response.get("code").get.asInstanceOf[BigInt].toInt
}
println(response)
But this code returns:
error: not found: value response
So I thought about doing the following:
var code = -1
var res = null.asInstanceOf[Map[String, Any]]
while(code!=1){
var response = parse(Http(URL).asString.body)
code = response.get("code").get.asInstanceOf[BigInt].toInt
res = response
}
println(res)
And it works. But I would like to know if this is really the best scala-friendly way to do so ?
How can I properly use a variable that outside of an until loop ?
When you say API, do you mean you use a http api and you're using a http library in scala, or do you mean there's some class/api written up in scala? If you have to keep checking then you have to keep checking I suppose.
If you're using a Scala framework like Akka or Play, they'd have solutions to asyncrhonously poll or schedule jobs in the background as a part of their solutions which you can read about.
If you're writing a Scala script, then from a design perspective I would either run the script every 1 minute and instead of having the while loop I'd just quit until code = 1. Otherwise I'd essentially do what you've done.
Another library that could help a scala script might be fs2 or ZIO which can allow you to setup tasks that periodically poll.
This appears to be a very open question about designing apps which do polling. A specific answer is hard to give.
You can just use simple recursion:
def runUntil[A](block: => A)(cond: A => Boolean): A = {
#annotation.tailrec
def loop(current: A): A =
if (cond(current)) current
else loop(current = block)
loop(current = block)
}
// Which can be used like:
val response = runUntil {
parse(Http(URL).asString.body)
} { res =>
res.getCode == 1
}
println(response)
An, if your real code uses some kind of effect type like IO or Future
// Actually cats already provides this, is called iterateUntil
def runUntil[A](program: IO[A])(cond: A => Boolean): IO[A] = {
def loop: IO[A] =
program.flatMap { a =>
if (cond(a)) IO.pure(a)
else loop
}
loop
}
// Used like:
val request = IO {
parse(Http(URL).asString.body)
}
val response = runUntil(request) { res =>
res.getCode == 1
}
response.flatMap(IO.println)
Note, for Future you would need to use () => Future instead to be able to re-execute the operation.
You can see the code running here.

Concurrency with scala.sys.process.ProcessBuilder

I'm using sPdf's run method to render HTML as a PDF file.
run uses scala.sys.process.ProcessBuilder and it's ! method:
/** Starts the process represented by this builder, blocks until it exits, and
* returns the exit code. Standard output and error are sent to the console.
*/
def ! : Int
My controller is using a Future to asynchronously execute the conversion but won't spdf block all other interim executions?
Future { pdf.run(sourceUrl, outputStream) } map { exitCode =>
outputSteam.toByteArray
}
UPDATE
Thanks for your answer Paul did a little test and yeah looks to be that way :)
If I update sPdf's run like so:
def run[A, B](sourceDocument: A, destinationDocument: B)(implicit sourceDocumentLike: SourceDocumentLike[A], destinationDocumentLike: DestinationDocumentLike[B]): Int = {
println("start/ " + System.currentTimeMillis)
< ... code removed ... >
val result = (sink compose source)(process).!
println("finish/ " + System.currentTimeMillis)
result
}
and I execute three consecutive requests the stdout prints
start 1461288779013
start 1461288779014
start 1461288779014
finish 1461288781018
finish 1461288781020
finish 1461288781020
Which looks like asynchronous execution.
This is Pdf#run:
def run[A, B](sourceDocument: A, destinationDocument: B)(implicit sourceDocumentLike: SourceDocumentLike[A], destinationDocumentLike: DestinationDocumentLike[B]): Int = {
val commandLine = toCommandLine(sourceDocument, destinationDocument)
val process = Process(commandLine)
def source = sourceDocumentLike.sourceFrom(sourceDocument) _
def sink = destinationDocumentLike.sinkTo(destinationDocument) _
(sink compose source)(process).!
}
There's no synchronization that prevents parallel execution. Assuming that the ExecutionContext has enough available threads,
Future { pdf.run(sourceUrl, outputStream) } map { exitCode =>
outputSteam.toByteArray
}
Future { pdf.run(sourceUrl, outputStream) } map { exitCode =>
outputSteam.toByteArray
}
will execute in parallel.
If instead, the run method had been, say, surrounded by a synchronized block, only one invocation would execute per Pdf instance. But there's no reason to prevent concurrency here, so the author didn't.

How can I integrate MongoDB Scala Async driver with Akka Streams?

I'm migrating my old Casbah Mongo drivers to the new async Scala drivers and I'm trying to use this in an Akka stream, and the stream is getting stuck.
I have a GraphStage with createLogic() defined. The code is below. This worked fine with Casbah and I'd hoped the async nature of the new mongo drivers would be a great fit, but here what happens...
If I stream 2 records through this code, the first record flows through and triggers the next step. See output below ('HERE IN SEND' confirms it got through). The second record seems to go through the right steps in BlacklistFilter but Akka never flows to the SEND step.
Any ideas why this is not working with the new drivers?
object BlacklistFilter {
type FilterShape = FanOutShape2[QM[RenderedExpression], QM[RenderedExpression], QM[Unit]]
}
import BlacklistFilter._
case class BlacklistFilter(facilities: Facilities, helloConfig: HelloConfig)(implicit asys: ActorSystem) extends GraphStage[FilterShape] {
val outPass: Outlet[QM[RenderedExpression]] = Outlet("Pass")
val outFail: Outlet[QM[Unit]] = Outlet("Fail")
val reIn: Inlet[QM[RenderedExpression]] = Inlet("Command")
override val shape: FilterShape = new FanOutShape2(reIn, outPass, outFail)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) {
override def preStart(): Unit = pull(reIn)
setHandler(reIn, new InHandler {
override def onPush(): Unit = {
val cmd = grab(reIn)
val re: RenderedExpression = cmd.body
val check = re.recipient.contacts(re.media).toString
// NEW NON-BLOCKING CODE
//-------------------------------------
facilities.withMongo(helloConfig.msgDB, helloConfig.blacklistColl) { coll =>
var found: Option[Document] = None
coll.find(Document("_id" -> check)).first().subscribe(
(doc: Document) => {
found = Some(doc)
println("BLACKLIST FAIL! " + check)
emit(outFail, cmd)
// no pull() here as this happens on complete below
},
(e: Throwable) => {
// Log something here!
emit(outFail, cmd)
pull(reIn)
},
() => {
if (found.isEmpty) {
println("BLACKLIST OK. " + check)
emit(outPass, cmd)
}
pull(reIn)
println("Pulled reIn...")
}
)
}
// OLD BLOCKING CASBAH CODE THAT WORKED
//-------------------------------------
// await(facilities.mongoAccess().mongo(helloConfig.msgDB, helloConfig.blacklistColl)(_.findOne(MongoDBObject("_id" -> check)))) match {
// case Some(_) => emit(outFail, cmd)
// case None => emit(outPass, cmd)
// }
// pull(reIn)
}
override def onUpstreamFinish(): Unit = {} // necessary for some reason!
})
setHandler(outPass, eagerTerminateOutput)
setHandler(outFail, eagerTerminateOutput)
}
}
Output:
BLACKLIST OK. jsmith#yahoo.com
Pulled reIn...
HERE IN SEND (TemplateRenderedExpression)!!!
ACK!
BLACKLIST OK. 919-919-9119
Pulled reIn...
You can see from the output that the first record flowed nicely to the SEND/ACK steps. The second record printed the BLACKLIST message, meaning it emitted outPass then called pull on reIn... but then nothing happens downstream.
Anyone know why this would work differently in Akka Streams than the Casbah version that worked fine (code shown commented out)?
(I could just convert the Mongo call to a Future and Await on it, and that should work like the old code, but that kinda defeats the whole point of going async!)
Well then... "never mind"! :-)
The code above seemed like it should work. I then noticed the Akka guys have just released a new version (2.0.1). I'm not sure what tweaks lay within, but whatever it was, the code above now works as I'd hoped w/o modification.
Left this post up just in case anyone hits a similar problem.

Stream input to external process in Scala

I have an Iterable[String] and I want to stream that to an external Process and return an Iterable[String] for the output.
I feel like this should work as it compiles
import scala.sys.process._
object PipeUtils {
implicit class IteratorStream(s: TraversableOnce[String]) {
def pipe(cmd: String) = s.toStream.#>(cmd).lines
def run(cmd: String) = s.toStream.#>(cmd).!
}
}
However, Scala tries to execute the contents of s instead of pass them in to standard in. Can anyone please tell me what I'm doing wrong?
UPDATE:
I think that my original problem was that the s.toStream was being implicity converted to a ProcessBuilder and then executed. This is incorrect as it's the input to the process.
I have come up with the following solution. This feels very hacky and wrong but it seems to work for now. I'm not writing this as an answer because I feel like the answer should be one line and not this gigantic thing.
object PipeUtils {
/**
* This class feels wrong. I think that for the pipe command it actually loads all of the output
* into memory. This could blow up the machine if used wrong, however, I cannot figure out how to get it to
* work properly. Hopefully http://stackoverflow.com/questions/28095469/stream-input-to-external-process-in-scala
* will get some good responses.
* #param s
*/
implicit class IteratorStream(s: TraversableOnce[String]) {
val in = (in: OutputStream) => {
s.foreach(x => in.write((x + "\n").getBytes))
in.close
}
def pipe(cmd: String) = {
val output = ListBuffer[String]()
val io = new ProcessIO(in,
out => {Source.fromInputStream(out).getLines.foreach(output += _)},
err => {Source.fromInputStream(err).getLines.foreach(println)})
cmd.run(io).exitValue
output.toIterable
}
def run(cmd: String) = {
cmd.run(BasicIO.standard(in)).exitValue
}
}
}
EDIT
The motivation for this comes from using Spark's .pipe function on an RDD. I want this exact same functionality on my local code.
Assuming scala 2.11+, you should use lineStream as suggested by #edi. The reason is that you get a streaming response as it becomes available instead of a batched response. Let's say I have a shell script echo-sleep.sh:
#/usr/bin/env bash
# echo-sleep.sh
while read line; do echo $line; sleep 1; done
and we want to call it from scala using code like the following:
import scala.sys.process._
import scala.language.postfixOps
import java.io.ByteArrayInputStream
implicit class X(in: TraversableOnce[String]) {
// Don't do the BAOS construction in real code. Just for illustration.
def pipe(cmd: String) =
cmd #< new ByteArrayInputStream(in.mkString("\n").getBytes) lineStream
}
Then if we do a final call like:
1 to 10 map (_.toString) pipe "echo-sleep.sh" foreach println
a number in the sequence appears on STDOUT every 1 second. If you buffer, and convert to an Iterable as in your example, you will lose this responsiveness.
Here's a solution demonstrating how to write the process code so that it streams both the input and output. The key is to produce a java.io.PipedInputStream that is passed to the input of the process. This stream is filled from the iterator asynchronously via a java.io.PipedOutputStream. Obviously, feel free to change the input type of the implicit class to an Iterable.
Here's an iterator used to show this works.
/**
* An iterator with pauses used to illustrate data streaming to the process to be run.
*/
class PausingIterator[A](zero: A, until: A, pauseMs: Int)(subsequent: A => A)
extends Iterator[A] {
private[this] var current = zero
def hasNext = current != until
def next(): A = {
if (!hasNext) throw new NoSuchElementException
val r = current
current = subsequent(current)
Thread.sleep(pauseMs)
r
}
}
Here's the actual code you want
import java.io.PipedOutputStream
import java.io.PipedInputStream
import java.io.InputStream
import java.io.PrintWriter
// For process stuff
import scala.sys.process._
import scala.language.postfixOps
// For asynchronous stream writing.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
/**
* A streaming version of the original class. This does not block to wait for the entire
* input or output to be constructed. This allows the process to get data ASAP and allows
* the process to return information back to the scala environment ASAP.
*
* NOTE: Don't forget about error handling in the final production code.
*/
implicit class X(it: Iterator[String]) {
def pipe(cmd: String) = cmd #< iter2is(it) lineStream
/**
* Convert an iterator to an InputStream for use in the pipe function.
* #param it an iterator to convert
*/
private[this] def iter2is[A](it: Iterator[A]): InputStream = {
// What is written to the output stream will appear in the input stream.
val pos = new PipedOutputStream
val pis = new PipedInputStream(pos)
val w = new PrintWriter(pos, true)
// Scala 2.11 (scala 2.10, use 'future'). Executes asynchrously.
// Fill the stream, then close.
Future {
it foreach w.println
w.close
}
// Return possibly before pis is fully written to.
pis
}
}
The final call will show display 0 through 9 and will pause for 3 seconds in between the displaying of each number (second pause on the scala side, 1 second pause on the shell script side).
// echo-sleep.sh is the same script as in my previous post
new PausingIterator(0, 10, 2000)(_ + 1)
.map(_.toString)
.pipe("echo-sleep.sh")
.foreach(println)
Output
0 [ pause 3 secs ]
1 [ pause 3 secs ]
...
8 [ pause 3 secs ]
9 [ pause 3 secs ]

sys.process to wrap a process as a function

I have an external process that I would like to treat as a
function from String=>String. Given a line of input, it will respond with a single line of output. It seems that I should use
scala.sys.process, which is clearly an elegant library that makes many
shell operations easily accessible from within scala. However, I
can't figure out how to perform this simple use case.
If I write a single line to the process' stdin, it prints the result
in a single line. How can I use sys.process to create a wrapper so I
can use the process interactively? For example, if I had an
implementation for ProcessWrapper, here is a program and it's output:
// abstract definition
class ProcessWrapper(executable: String) {
def apply(line: String): String
}
// program using an implementation
val process = new ProcessWrapper("cat -b")
println(process("foo"))
println(process("bar"))
println(process("baz"))
Output:
1 foo
2 bar
3 baz
It is important that the process is not reloaded for each call to process because there is a significant initialization step.
So - after my comment - this would be my solution
import java.io.BufferedReader
import java.io.File
import java.io.InputStream
import java.io.InputStreamReader
import scala.annotation.tailrec
class ProcessWrapper(cmdLine: String, lineListenerOut: String => Unit, lineListenerErr: String => Unit,
finishHandler: => Unit,
lineMode: Boolean = true, envp: Array[String] = null, dir: File = null) {
class StreamRunnable(val stream: InputStream, listener: String => Unit) extends Runnable {
def run() {
try {
val in = new BufferedReader(new InputStreamReader(this.stream));
#tailrec
def readLines {
val line = in.readLine
if (line != null) {
listener(line)
readLines
}
}
readLines
}
finally {
this.stream.close
finishHandler
}
}
}
val process = Runtime.getRuntime().exec(cmdLine, envp, dir);
val outThread = new Thread(new StreamRunnable(process.getInputStream, lineListenerOut), "StreamHandlerOut")
val errThread = new Thread(new StreamRunnable(process.getErrorStream, lineListenerErr), "StreamHandlerErr")
val sendToProcess = process.getOutputStream
outThread.start
errThread.start
def apply(txt: String) {
sendToProcess.write(txt.getBytes)
if (lineMode)
sendToProcess.write('\n')
sendToProcess.flush
}
}
object ProcessWrapper {
def main(args: Array[String]) {
val process = new ProcessWrapper("python -i", txt => println("py> " + txt),
err => System.err.println("py err> " + err), System.exit(0))
while (true) {
process(readLine)
}
}
}
The main part is the StreamRunnable, where the process is read in a thread and the received line is passed on to a "LineListener" (a simple String => Unit - function).
The main is just a sample implementation - calling python ;)
I'm not sure, but you want somethings like that ?
case class ProcessWrapper(executable: String) {
import java.io.ByteArrayOutputStream
import scala.concurrent.duration.Duration
import java.util.concurrent.TimeUnit
lazy val process = sys.runtime.exec(executable)
def apply(line: String, blockedRead: Boolean = true): String = {
process.getOutputStream().write(line.getBytes())
process.getOutputStream().flush()
val r = new ByteArrayOutputStream
if (blockedRead) {
r.write(process.getInputStream().read())
}
while (process.getInputStream().available() > 0) {
r.write(process.getInputStream().read())
}
r.toString()
}
def close() = process.destroy()
}
val process = ProcessWrapper("cat -b")
println(process("foo\n"))
println(process("bar\n"))
println(process("baz\n"))
println(process("buz\n"))
println(process("puz\n"))
process.close
Result :
1 foo
2 bar
3 baz
4 buz
5 puz
I think that PlayCLI is a better way.
http://blog.greweb.fr/2013/01/playcli-play-iteratees-unix-pipe/ came across this today and looks exactly like what you want
How about using an Akka actor. The actor can have state and thus a reference to an open program (in a thread). You can send messages to that actor.
ProcessWrapper might be a typed actor itself or just something that converts the calls of a function to a call of an actor. If you only have 'process' as method name, then wrapper ! "message" would be enough.
Having a program open and ready to receive commands sounds like an actor that receives messages.
Edit: Probably I got the requirements wrong. You want to send multiple lines to the same process. That's not possible with the below solution.
One possibility would be to add an extension method to the ProcessBuilder that allows for taking the input from a string:
implicit class ProcessBuilderWithStringInput(val builder: ProcessBuilder) extends AnyVal {
// TODO: could use an implicit for the character set
def #<<(s: String) = builder.#<(new ByteArrayInputStream(s.getBytes))
}
You can now use the method like this:
scala> ("bc":ProcessBuilder).#<<("3 + 4\n").!!
res9: String =
"7
"
Note that the type annotation is necessary, because we need two conversions (String -> ProcessBuilder -> ProcessBuilderWithStringInput, and Scala will only apply one conversion automatically.