Unable to execute nested Unix commands in Spark scala - scala

I'm trying to list the folder in aws s3 and get only the filename out of it. The nested unix commands is not getting executed in Spark-shell and throwing error. I know we have other ways to do it by importing org.apache.hadoop.fs._
The command that I'm trying are :
import sys.process._
var cmd_exec = "aws s3 ls s3://<bucket-name>/<folder-name>/"
cmd_exec !!
If I execute it by nesting the cut command to the ls. It's throwing error.
import sys.process._
var cmd_exec = "aws s3 ls s3://<bucket-name>/<folder-name>/ | cut -d' ' -f9-"
cmd_exec !!
Error message: Unknown options: |,cut,-d',',-f9-
java.lang.RuntimeException: Nonzero exit value: 255
Any suggestion please?

AFAIK this is natural.
import scala.sys.process._
val returnValue: Int = Process("cat mycsv.csv | grep -i Lazio")!
above code also wont work...
| is redirect operator to execute another command. so instead of that....
capture the output and execute one more time..
you can see this article - A Scala shell script example as well.. where scala program can be executed as shell script... it might be useful.
TIY!

Related

How to force pytest return error code error code

I have following structure:
Koholo job that calling python script, the script returns error code (1 - failed, 0 - passed) as it ends. Koholo wait for the error code to continue to next job step (next scrips).
Now instead of python script I'm running pytest scrips (with command: python -m pytest test_name) but pytest is not returning error code, so the Kohola job timeouts.
Please let me know if there is a way that pytest will return error code as it finish's?
example you can pass any pytest argument that you normaly pass in the cli, i am just using the markers as an example
import sys
import pytest
results = pytest.main(["-m", "my_marker"])
sys.exit(results)
if you want more details
https://docs.pytest.org/en/7.1.x/reference/exit-codes.html
when pytest finish it calls pytest_sessionfinish(session, exitstatus) method.
try to add sys.exit(exitstatus) to this method.
import sys
def pytest_sessionfinish(session, exitstatus):
""" whole test run finishes. """
sys.exit(exitstatus)
also you can check by running this script to check the exit code
start /wait python -m pytest test_name
echo %errorlevel%

Use a variable in a shell command in a Scala program (not REPL)

Inside a program - not the REPL - is it possible to introduce a string variable to represent the shell command to be executed ?
import sys.process._
val npath = opath.substring(0,opath.lastIndexOf("/"))
s"rm -rf $npath/*" !
s"mv $tmpName/* $npath/" !
The compiler says:
:103: error: type mismatch;
found : String
required: scala.sys.process.ProcessLogger
s"mv $tmpName/* $npath/" !
^
Note that in the REPL this can be fixed by using
:power
But .. we're not in the REPL here.
I found a useful workaround that mostly preserves the intended structure:
Use the
Seq[String].!
syntax. But by using spaces as a delimiter we can still write it out in a kind of wysiwig way
import sys.process._
val npath = opath.substring(0,opath.lastIndexOf("/"))
s"rm -rf $npath/*".split(" ").toSeq.!
s"mv $tmpName/* $npath/".split(" ").toSeq.!
The limitation here is that embedded spaces in the command would not work - they would require an explicit Seq of each portion of the command.
Here is a bit nicer if there were a set of commands to run:
Seq(s"rm -rf $npath/*",s"mv $tmpName/* $npath/").foreach{ cmd=>
println(cmd)
cmd.split(" ").toSeq.!
}

how to execute a command in scala?

I want to execute this command "dot -Tpng overview.dot > overview.png ", which is used to generate an image by Graphviz.
The code in scala:
Process(Seq("dot -Tpng overview.dot > overview.png"))
It does not work.
And I also want to open this image in scala. I work under Ubuntu. By default, images will be opened by image viewer. But I type "eog overview.png" in terminal, it reports error
** (eog:18371): WARNING **: The connection is closed
Thus, I do not know how to let scala open this image.
Thanks in advance.
You can't redirect stdout using > in command string. You should use #> and #| operators. See examples in process package documentation.
This writes test into test.txt:
import scala.sys.process._
import java.io.File
// use scala.bat instead of scala on Windows
val cmd = Seq("scala", "-e", """println(\"test\")""") #> new File("test.txt")
cmd.!
In your case:
val cmd = "dot -Tpng overview.dot" #> new File("overview.png")
cmd.!
Or just this (since dot accepts output file name as -ooutfile):
"dot -Tpng overview.dot -ooverview.png".!

How do you write a Scala script that will react to file changes

I would like to change the following batch script to Scala (just for fun), however, the script must keep running and listen for changes to the *.mkd files. If any file is changed, then the script should re-generate the affected doc. File IO has always been my Achilles heel...
#!/bin/sh
for file in *.mkd
do
pandoc --number-sections $file -o "${file%%.*}.pdf"
done
Any ideas around a good approach to this will be appreciated.
The following code, taken from my answer on: Watch for project files also can watch a directory and execute a specific command:
#!/usr/bin/env scala
import java.nio.file._
import scala.collection.JavaConversions._
import scala.sys.process._
val file = Paths.get(args(0))
val cmd = args(1)
val watcher = FileSystems.getDefault.newWatchService
file.register(
watcher,
StandardWatchEventKinds.ENTRY_CREATE,
StandardWatchEventKinds.ENTRY_MODIFY,
StandardWatchEventKinds.ENTRY_DELETE
)
def exec = cmd run true
#scala.annotation.tailrec
def watch(proc: Process): Unit = {
val key = watcher.take
val events = key.pollEvents
val newProc =
if (!events.isEmpty) {
proc.destroy()
exec
} else proc
if (key.reset) watch(newProc)
else println("aborted")
}
watch(exec)
Usage:
watchr.scala markdownFolder/ "echo \"Something changed!\""
Extensions have to be made to the script to inject file names into the command. As of now this snippet should just be regarded as a building block for the actual answer.
Modifying the script to incorporate the *.mkd wildcards would be non-trivial as you'd have to manually search for the files and register a watch on all of them. Re-using the script above and placing all files in a directory has the added advantage of picking up new files when they are created.
As you can see it gets pretty big and messy pretty quick just relying on Scala & Java APIs, you would be better of relying on alternative libraries or just sticking to bash while using INotify.

How can I debug a funcargs function?

How can I drop into pdb inside a funcargs function? And how can I see output from print statements in funcargs functions?
My original question included the following, but it turns out I was simply instrumenting the wrong funcarg. Sigh.
I tried:
print "hi from inside funcargs"
invoking with and without -s.
I tried:
import pytest
pytest.set_trace()
And:
import pdb
pdb.set_trace()
And:
raise "hi from inside funcargs"
None produced any output or caused a test failure.
first thing that comes to mind is py.test -s
but by default funcargs give you tracebacks and output/error - what plugins are you using? something is clearly hiding it
for example for the program
def pytest_funcarg__foo(request):
print 'hi'
raise IOError
def test_fun(foo):
pass
a py.test call gives me both - a traceback in the funcarg function and text
To debug a funcarg:
def pytest_funcarg__myfuncarg(request):
import pytest
pytest.set_trace()
...
def test_function(myfuncarg):
...
Then:
python -m pytest test_function.py
As Ronny answered, to see output from a funcarg, pytest -s works.