Launch specific external process in Scala - scala

I've been struggling to launch a specific external process in Scala. It works for most programs, but when I try
Array("gnome-terminal", "--working-directory",
"/mydir", "-x", "bash", "-c", "tmux attach")
it fails. I tried the same using Python's subprocess.Popen and it worked perfectly. Any suggestion?
My code is:
object Test {
def main(args: Array[String]):Unit = {
val source = Source.fromFile("file.txt")
val lines = source.getLines
lines.toList.map(raw => {
val programAndOptions = raw.split('$')
// here I get the Array mentioned above
Process(programAndOptions) run
})
source.close
}
}
Update
My file.txt is something like this:
evince$/path/to/my/pdf
evince$/path/to/other/pdf
nautilus$/path/to/my/working/directory
gnome-terminal$--working-directory$/mydir$-x$bash$-c$tmux attach
Update2
I ran the same code again to test and try some other things and it worked 'as is'.

'Run' is non blocking so it execute the program and continue with the code, which is this case is source.close and exit. Exiting the jvm probably kill the app.
Try using the '!' function instead of 'run' or do something like:
val p = Process(programAndOptions) run
p.exitValue

Related

Debugging SCollection contents when running tests

Is there any way to view the contents of an SCollection when running a unit test (PipelineSpec)?
When running something in production on many machines there would be no way to see the entire collection in one machine, but I wonder is there a way to view the contents of an SCollection (for example when running a unit test in debug mode in intellij).
If you want to print debug statements to the console then you can use the debug method which is part of SCollections. A sample code shown below
val stdOutMock = new MockedPrintStream
Console.withOut(stdOutMock) {
runWithContext { sc =>
val r = sc.parallelize(1 to 3).debug(prefix = "===")
r should containInAnyOrder(Seq(1, 2, 3))
}
}
stdOutMock.message.filterNot(_ == "\n") should contain theSameElementsAs
Seq("===1", "===2", "===3")

How do I work with a Scala process interactively?

I'm writing a bot in Scala for a game that uses text input and output. So I want to work with a process interactively - that is, my code receives output from the process, works with it, and only then sends its next input to the process. So I want to give a function access to the inputStreams and the outputStream simultaneously.
This doesn't seem to fit into any of the factories in scala.sys.process.BasicIO or the constructor for scala.sys.process.ProcessIO (three functions, each of which has access to only one stream).
Here's how I'm doing it at the moment.
private var rogue_input: OutputStream = _
private var rogue_output: InputStream = _
private var rogue_error: InputStream = _
Process("python3 /home/robin/IdeaProjects/Rogomatic/python/rogue.py --rogomatic").run(
new ProcessIO(rogue_input = _, rogue_output = _, rogue_error = _)
)
try {
private val rogue_scanner = new Scanner(rogue_output)
private val rogue_writer = new PrintWriter(rogue_input, true)
// Play the game
} finally {
rogue_input.close()
rogue_output.close()
rogue_error.close()
}
This works, but it doesn't feel very Scala-like. Is there a more idiomatic way to do this?
So I want to work with a process interactively - that is, my code receives output from the process, works with it, and only then sends its next input to the process.
In general, this is traditionally solved by expect. There exist libraries and tools inspired by expect for various languages, including for Scala: https://github.com/Lasering/scala-expect.
The README of the project gives various examples. While I don't know exactly what your rouge.py expects in terms of stdin/stdout interactions, here's a quick "hello world" example showing how you could interact with a Python interpreter (using the Ammonite REPL, which has conveniently library importing capabilities):
import $ivy.`work.martins.simon::scala-expect:6.0.0`
import work.martins.simon.expect.core._
import work.martins.simon.expect.core.actions._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
val timeout = 5 seconds
val e = new Expect("python3 -i -", defaultValue = "?")(
new ExpectBlock(
new StringWhen(">>> ")(
Sendln("""print("hello, world")""")
)
),
new ExpectBlock(
new RegexWhen("""(.*)\n>>> """.r)(
ReturningWithRegex(_.group(1).toString)
)
)
)
e.run(timeout).onComplete(println)
What the code above does is it "expects" >>> to be sent to stdout, and when it finds that, it will send print("hello, world"), followed by a newline. From then, it reads and returns everything until the next prompt (>>>) using a regex.
Amongst other debug information, the above should result in Success(hello, world) being printed to your console.
The library has various other styles, and there may also exist other similar libraries out there. My main point is that an expect-inspired library is likely what you're looking for.

Unable to get testname while calling pytest execution from python or subprocess

I am trying to create test runner python file, that executes the pytest.exe in particular testcase folder and send the results via email.
Here is my code:
test_runner.py:
try:
command = "pytest.exe {app} > {log}".format(app=app_folder, log = log_name)
os.system(command)
except:
send_mail()
I use the following code in conftest.py to add screenshots to pytest-html report.
In conftest.py:
#pytest.mark.hookwrapper
def pytest_runtest_makereport(item, call):
pytest_html = item.config.pluginmanager.getplugin('html')
outcome = yield
report = outcome.get_result()
extra = getattr(report, 'extra', [])
if pytest_html:
xfail = hasattr(report, 'wasxfail')
if (report.skipped and xfail) or (report.failed and not xfail):
test_case = str(item._testcase).strip(")")
function_name = test_case.split(" ")[0]
file_and_class_name = ((test_case.split(" ")[1]).split("."))[-2:]
file_name = ".".join(file_and_class_name) + "." + function_name
Issue is, when I run the command "pytest.exe app_folder" in windows command prompt it is able to discover the test cases and execute them and get the results. But when I call the command from .py file either using os.command or subprocess it fails with the following exception:
\conftest.py", line 85, in pytest_runtest_makereport
INTERNALERROR> test_case = str(item._testcase).strip(")")
INTERNALERROR> AttributeError: 'TestCaseFunction' object has no attribute
'_testcase'
Can anyone please help me to understand whats happening here? or any other way to get the testcase name?
Update:
To overcome this issue, I alternatively used the TestResult object from pytest_runtest_makereport hook to get the test case details.
#pytest.mark.hookwrapper
def pytest_runtest_makereport(item, call):
outcome = yield
report = outcome.get_result()
In the above example, report variable contain the TestResult object. This can be manipulated to get the testcase/class/module name.
you can use shell=True option with subprocess to get the desired result
from subprocess import Popen
command='pytest.exe app_folder' #you can paste whole command which you run in cmd
p1=Popen(command,shell=True)
This would solve your purpose

How can commands be run from D?

So, I'm coming from Java and I'd like to use a small D script to start a server with a bunch of parameters. So, instead of typing
java -someargs... -jar really-long-jar-name.jar
I would like to just click on the executable.
Is there any equivalent to Runtime#exec in D?
You can use std.process.executeShell or std.process.execute to achieve this:
import std.process : executeShell;
auto res = executeShell("java -jar my_program.jar");
if(res.status != 0)
{
...
}

Spark: run an external process in parallel

Is it possible with Spark to "wrap" and run an external process managing its input and output?
The process is represented by a normal C/C++ application that usually runs from command line. It accepts a plain text file as input and generate another plain text file as output. As I need to integrate the flow of this application with something bigger (always in Spark), I was wondering if there is a way to do this.
The process can be easily run in parallel (at the moment I use GNU Parallel) just splitting its input in (for example) 10 part files, run 10 instances in memory of it, and re-join the final 10 part files output in one file.
The simplest thing you can do is to write a simple wrapper which takes data from standard input, writes to file, executes an external program, and outputs results to the standard output. After that all you have to do is to use pipe method:
rdd.pipe("your_wrapper")
The only serious considerations is IO performance. If it is possible it would be better to adjust program you want to call so it can read and write data directly without going through disk.
Alternativelly you can use mapPartitions combined with process and standard IO tools to write to the local file, call your program and read the output.
If you end up here based on the question title from a Google search, but you don't have the OP restriction that the external program needs to read from a file--i.e., if your external program can read from stdin--here is a solution. For my use case, I needed to call an external decryption program for each input file.
import org.apache.commons.io.IOUtils
import sys.process._
import scala.collection.mutable.ArrayBuffer
val showSampleRows = true
val bfRdd = sc.binaryFiles("/some/files/*,/more/files/*")
val rdd = bfRdd.flatMap{ case(file, pds) => { // pds is a PortableDataStream
val rows = new ArrayBuffer[Array[String]]()
var errors = List[String]()
val io = new ProcessIO (
in => { // "in" is an OutputStream; write the encrypted contents of the
// input file (pds) to this stream
IOUtils.copy(pds.open(), in) // open() returns a DataInputStream
in.close
},
out => { // "out" is an InputStream; read the decrypted data off this stream.
// Even though this runs in another thread, we can write to rows, since it
// is part of the closure for this function
for(line <- scala.io.Source.fromInputStream(out).getLines) {
// ...decode line here... for my data, it was pipe-delimited
rows += line.split('|')
}
out.close
},
err => { // "err" is an InputStream; read any errors off this stream
// errors is part of the closure for this function
errors = scala.io.Source.fromInputStream(err).getLines.toList
err.close
}
)
val cmd = List("/my/decryption/program", "--decrypt")
val exitValue = cmd.run(io).exitValue // blocks until subprocess finishes
println(s"-- Results for file $file:")
if (exitValue != 0) {
// TBD write to string accumulator instead, so driver can output errors
// string accumulator from #zero323: https://stackoverflow.com/a/31496694/215945
println(s"exit code: $exitValue")
errors.foreach(println)
} else {
// TBD, you'll probably want to move this code to the driver, otherwise
// unless you're using the shell, you won't see this output
// because it will be sent to stdout of the executor
println(s"row count: ${rows.size}")
if (showSampleRows) {
println("6 sample rows:")
rows.slice(0,6).foreach(row => println(" " + row.mkString("|")))
}
}
rows
}}
scala> :paste "test.scala"
Loading test.scala...
...
rdd: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[62] at flatMap at <console>:294
scala> rdd.count // action, causes Spark code to actually run
-- Results for file hdfs://path/to/encrypted/file1: // this file had errors
exit code: 255
ERROR: Error decrypting
my_decryption_program: Bad header data[0]
-- Results for file hdfs://path/to/encrypted/file2:
row count: 416638
sample rows:
<...first row shown here ...>
...
<...sixth row shown here ...>
...
res43: Long = 843039
References:
https://www.scala-lang.org/api/current/scala/sys/process/ProcessIO.html
https://alvinalexander.com/scala/how-to-use-closures-in-scala-fp-examples#using-closures-with-other-data-types