Printing to Console in scalding script - scala

I am trying to display some content on the console in a scalding script. When I run the same logic in the scalding shell I get the desired output and when I run the script I get an error:
scripttest.scala:4: error: value dump is not a member of com.twitter.scalding.typed.TypedPipe[String]
The script is
import com.twitter.scalding._
class scripttest(args:Args) extends Job(args){
val hello = TypedPipe.from(TextLine("tutorial/data/hello.txt"))
hello.dump
}
When I ran the same logic in console, it ran successfully.
The output in console:
Hello world
Goodbye world
Please explain why this occurs and how to print to console in a scalding script.

After looking closely at the documentation, you will see in section "2.6 REPL Reference", subsection "2.6.1 Enrichments available on TypedPipe/Grouped/CoGrouped objects" :
.dump: Print the contents to stdout (uses .toIterator)
hence dump is available only in the REPL.
I don't see a "scalding way" to write on the console, nor do I think it would make sense: you are running a pipeline, so the only "guaranteed milestone" is the end of the pipeline, when you can just write your results into a file, as is done on all the tutorial scripts.
If it's just a matter of printing "hello my job started", remember it's just a Scala file and use println (for more advanced logging, Logback is your friend).
To run the script locally, after having cloned the repository:
> ./sbt assembly
> ./scripts/scald.rb --local MyScript.scala
The first line will run all tests and build "scald.rb", the script used in the second line to run your scalding script.

Related

Execute the scala script through spark-shell in silent mode

Need to execute the scala script through spark-shell with silent mode. When I am using spark-shell -i "file.scala", after the execution, I am getting into the scala interactive mode. I don't want to get into there.
I have tried to execute the spark-shell -i "file.scala". But I don't know how to execute the script in silent mode.
spark-shell -i "file.scala"
after execution, I get into
scala>
I don't want to get into the scala> mode
Updating (October 2019) for a script that terminates
This question is also about running a script that terminates, that is, a "scala script" that run by spark-shell -i script.scala > output.txt that stopts by yourself (internal instruction System.exit(0) terminates the script). See this question with a good example.
It also needs a "silent mode", it is expected to not pollute the output.txt.
Suppose Spark v2.2+.
PS: there are a lot of cases (typically small tools and module/algorithm tests) where Spark interpreter can be better than compiler... Please, "let's compile!" is not an answer here.
spark-shell -i file.scala keeps the interpreter open
in the end, so System.exit(0) is required to be at the end of your script. The most appropriate solution is to place your code in try {} and put System.exit(0) in finally {} section.
If logging is requiered you can use something like this:
spark-shell < file.scala > test.log 2>&1 &
If you have limitations on editing file and you can't add System.exit(0), use:
echo :quit | scala-shell -i file.scala
UPD
If you want to suppress everything in output except printlns you have to turn off logging for spark-shell. The sample of configs is here. Disabling any kind of logging in $SPARK-HOME/conf/log4j.properties should allow you to see only pritnlns. But I would not follow this approach with printlns. Using general Logging with log4j should be used instead of printlns. You can configure it so obtain the same results as with printlns. It boils down to configuring a pattern. This answer provides an example of a pattern that solves your issue.
The best way is definitively to compile your scala code to a jar and use spark-submit but if you're simply looking for a quick iteration loop, you can simply issue a :quit after parsing your scala code:
echo :quit | scala-shell -i yourfile.scala
Adding onto #rluta's answer. You can place the call to spark-shell command inside a shell script. Say the below in a shell script:
spark-shell < yourfile.scala
But this would require you to keep the lines of code within a line in case a statement is written on different lines.
OR
echo :quit | spark-shell -i yourfile.scala
This should

Executing sbt in command line via python script and outputting to file

I have a set of json files in directory /Desktop/jsons, and I have a Scala script which takes in a json and outputs stuff. I can run it manually in the terminal by cding into the directory of the Scala script (/Me/dev/scalastuff) and running
sbt --error "run /Desktop/jsons/jsonExample.json",
which outputs the stuff I want in the terminal.
I want to write a Python script which does this automatically and additionally outputs a json file with the "stuff" thats outputted by the Scala script.
My issues right now are using subprocessing. When I try to run
BASEDIR = '/Me/dev/scalastuff'
p = subprocess.Popen(['sbt --error "run /Desktop/jsons/jsonExample.json"'], cwd = BASEDIR, stdout = subprocess.PIPE)
out = p.stdout.read()
print out
I get OSError: [Errno 2] No such file or directory.
I'm completely stumped as to why this is occurring. I'm new to subprocess, so be light on me!
popen in python takes a list of shell arguments. You're passing only one!
So it's trying to execute a file named wholly 'sbt --error "run /Me/Desktop/jsons/jsonExample.json"'.
Obviously, this doesn't work.
If you use popen; only pass a simple array -- you needn't care about escaping:
subprocess.popen(['sbt', '--error', 'run /Me/Desktop/...'], cwd = BASEDIR, stdout = subprocess.PIPE)

Execute external command

I do not know whether it is a Scala or Play! question. I want to execute some external command from my Play application, get the output from the command and show a report to user based on the command output. Can anyone help?
For example, when I enter my-command from shell it shows output like below, which I want to capture and show in web:
Id Name IP
====================
1 A x.y.z.a
2 B p.q.r.s
Please, do not worry about format and parsing of the output. Functionally, I am looking something like PHP exec. I know about java Runtime.getRuntime().exec("command") but is there any Scala/Play version to serve the purpose?
The method !! of the Scala process package does what you need, it executes the statement and captures the text output. For example:
import scala.sys.process._
val cmd = "uname -a" // Your command
val output = cmd.!! // Captures the output
scala> import scala.sys.process._
scala> Process("cat temp.txt")!
This assumes there is a temp file in your home directory. ! is for actual execution of the command. See scala.sys.process for more info.
You can use the Process library: for instance
import scala.sys.process.Process
Process("ls").!!
to get the list of files in the folder as a string. The !! get the output of the command

Standard for feeding test data to a Nagios plugin?

I'm developing a Nagios plugin in Perl (no Nagios::Plugin, just plain Perl). The error condition I'm checking for normally comes from a command output, called inside the plugin. However, it would be very inconvenient to create the error condition, so I'm looking for a way to feed test output to the plugin to see if it works correctly.
The easiest way I found at the moment would be with a command line option to optionally read input from a file instead of calling the command.
if($opt_f) {
open(FILE, $opt_f);
#output = <FILE>;
close FILE;
}
else {
#output = `my_command`;
}
Are there other, better ways to do this?
Build a command line switch into your plugin, and if you set -t on the command line, you use your test command at /path/to/test/command, else you run the 'production' command at /path/to/production/command
The default action is production, only test it the switch indicating test mode is present.
Or you could have a test version of the command that returns various status for you to test (via a command line argument perhaps).
You put the test version of mycommnd in some test directory (/my/nagois/tests/bin).
Then you manipulate the PATH environment variable on the command line that runs the test.
$ env PATH=/my/nagois/tests/bin:$PATH nagios_pugin.pl
The change to $PATH will only last for as long as that one command executes. The change is localized to the subshell that is spawned to run the plugin.
The backticks used to execute the command will cause the shell to use the PATH to locate the command, and that will the the test version of the command, which lives in the directory that is now the first one on the search path.
let me know if I wasn't clear.
New answer for new method.

"scala" command terminates batch scripts

during my work here I collided with a somewhat peculiar problem. It's possible that there is a highly simple explanation for this behaviour, but to me it just doesn't make much sense.
Here's the situation:
I wrote a batch file "test.bat" that, right now, looks like this:
echo 1
scala myProgram
echo 2
When I open the command prompt in the according directory and run test.bat, it starts by echoing 1, then runs myProgram (which also has certain outputs that appear in the console, so the scala program myProgram works properly) - and then stops. 2 does not appear in the console and the console waits for me to input another command.
Why this behaviour? Is is a malfunction of the console? Or of the scala command? Or not a malfunction at all and it is actually meant to behave that way?
What I was actually trying to do is redirecting the output of "scala myProgram" to a file (which works well) and rename this file after the scala program has terminated, so my batch file originally looked somewhat like this:
scala myProgram > log.txt 2>&1
ren "log.txt" "log2.txt"
And I was confused about the fact that "log2.txt" was never created.
Your answers are greatly appreciated, thank you.
Adding -nc to scala command worked for me:
$ scala -nc /tmp/2.scala
Hello world
So I guess, the issue has something to do with the compilation daemon
-nc no compilation daemon: do not use the fsc offline compiler
Could you try that?