Running Hive Using Scala As a Script - scala

I was going through the processBuilder API of Scala, In order to run shell commands like we run in Shell Script, I could run few scripts but Having an issue with one type of Hive query execution.
When I run the below commands it was running success fully, but one type of format fails:
Running Shell Command(Successful):
scala> import sys.process._
scala> "ls -lrt /home/cloudera/Desktop".!
total 164
-rwxrwxr-x 1 cloudera cloudera 237 Apr 5 2016 Parcels.desktop
-rwxrwxr-x 1 cloudera cloudera 238 Apr 5 2016 Kerberos.desktop
-rwxrwxr-x 1 cloudera cloudera 259 Apr 5 2016 Express.desktop
Running Hive Query With File Option(Successful):
scala> "hive -f /home/cloudera/hi.hql" !!
warning: there was one feature warning; re-run with -feature for
details
ls: cannot access /usr/lib/spark/lib/spark-assembly-*.jar: No such
file or directory
2017-09-03 23:20:34,392 WARN [main] mapreduce.TableMapReduceUtil: The
hbase-prefix-tree module jar containing PrefixTreeCodec is not
present. Continuing without it.
Logging initialized using configuration in
file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 0.913 seconds, Fetched: 2 row(s)
res20: String =
"100 Amit 12000 10
101 Allen 22000 20 .
"
Running Hive Query With -e Option(Failed):
If i had the run the below query on the terminal I could run in the below given Format.
bash$ hive -e "select * staging.from employee_canada;"
The problem while running the same query in scala terminal fails because of the double quotes ("") # the Select query. How can I escape those and run successfully. Tried using triple quotes and as well as escape "\" sequence but still failed to execute.
scala> "hive -e select * staging.from employee_canada; "!!
Below is the Error:
FAILED: ParseException line 1:6 cannot recognize input near '<EOF>'
'<EOF>' '<EOF>' in select clause
java.lang.RuntimeException: Nonzero exit value: 64
at scala.sys.package$.error(package.scala:27)
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.slurp
at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang$bang(
ProcessBuilderImpl.scala:102)
... 32 elided

Related

MissingOutputException in snakemake

I'm trying to run a peak calling tool within a conda environment using snakemake.
The script looks as such (I only added the rows connect to the problem):
rule all:
input:
expand('{project}/{organism}/{mapper}/seacr/{pattern}.auc.threshold.bed', pattern = PATTERN, sample = IDS, organism = config['org'], project = config['project'], mapper = config['mapper']) # SEACR - run the peak calling
rule seacr_run:
input:
IP = '{project}/{organism}/{mapper}/seacr/IP_{PATTERN}.bedgraph',
IgG = '{project}/{organism}/{mapper}/seacr/IgG_{PATTERN}.bedgraph',
output:
bed1 = '{project}/{organism}/{mapper}/seacr/{PATTERN}.auc.threshold.bed',
shell:
'''
bash /fs/home/yeroslaviz/SEACR/SEACR_1.3.sh {input.IP} 0.01 non stringent {output.bed1}
'''
When running the -nps dryrun of the snamemake command I get the correct command printed to STDOUT
> snakemake -nps /fs/pool/pool-bcfngs/scripts/P193.ChipSeq.Snakemake -j 100
...
Building DAG of jobs...
Job counts:
count jobs
1 all
1 seacr_run
2
[Tue Mar 3 13:56:19 2020]
rule seacr_run:
input: P193/Mmu.GrCm38/bowtie2/seacr/IP_H3K4m3.bedgraph, P193/Mmu.GrCm38/bowtie2/seacr/IgG_H3K4m3.bedgraph
output: P193/Mmu.GrCm38/bowtie2/seacr/H3K4m3.auc.threshold.bed
jobid: 22
wildcards: project=P193, organism=Mmu.GrCm38, mapper=bowtie2, PATTERN=H3K4m3
bash /fs/home/yeroslaviz/SEACR/SEACR_1.3.sh P193/Mmu.GrCm38/bowtie2/seacr/IP_H3K4m3.bedgraph 0.01 non stringent P193/Mmu.GrCm38/bowtie2/seacr/H3K4m3.auc.threshold.bed
[Tue Mar 3 13:56:19 2020]
localrule all:
...
Job counts:
count jobs
1 all
1 seacr_run
2
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
When running the command above in the command line the tool works without problems. But hwhen I try to run it within the snakemake workflow I get the following error:
Waiting at most 5 seconds for missing files.
MissingOutputException in line 67 of /fs/pool/pool-bcfngs/scripts/P193.ChipSeq.Snakemake:
Missing files after 5 seconds:
P193/Mmu.GrCm38/bowtie2/seacr/H3K4m3.auc.threshold.bed
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Can anyone explain what is happening?
Thanks

PyTest Suppress Results Debug Statement

I am using PyTest with the following options: -s, -v, and --resultlog=results.txt. This suppresses print statements from my test, but prints the test names and results as they are run and logs the results to results.txt.
However, if any tests fail, I also get a spew of information containing traceback, debug, etc. Since I am logging this to a file anyway, I don't want it printed to the screen, cluttering up my output.
Is there any way to disable the printing of just these debug statements, but still have it logged to my results file?
Visual example:
Currently, I see something like this:
$ py.test -sv --resultlog=results.txt test.py
=============================== test session starts =========================
platform darwin -- Python 2.7.10, pytest-2.9.1, py-1.4.31, pluggy-0.3.1 -- /...
cachedir: .cache
rootdir: /Users/jdinkel/Documents, inifile:
plugins: profiling-1.1.1, session2file-0.1.9
collected 3 items
test.py::TestClass::test1 PASSED
test.py::TestClass::test2 PASSED
test.py::TestClass::test3 FAILED
===================================== FAILURES ==============================
__________________________________ TestClass.test3 __________________________
self = <test.TestClass instance at 0x10beb5320>
def test3(self):
> assert 0
E assert 0
test.py:7: AssertionError
========================== 1 failed, 2 passed in 0.01 seconds ===============
But I would like to see this:
$ py.test -sv --resultlog=results.txt test.py
=============================== test session starts =========================
platform darwin -- Python 2.7.10, pytest-2.9.1, py-1.4.31, pluggy-0.3.1 -- /...
cachedir: .cache
rootdir: /Users/jdinkel/Documents, inifile:
plugins: profiling-1.1.1, session2file-0.1.9
collected 3 items
test.py::TestClass::test1 PASSED
test.py::TestClass::test2 PASSED
test.py::TestClass::test3 FAILED
========================== 1 failed, 2 passed in 0.01 seconds ===============
With no change to the results.txt file.
You should use tb switch for controlling traceback.
e.g.
pytest tests/ -sv --tb=no --disable-warnings
--disable-warnings disable occasional pytest warnings which I assume you don't want either.
From pytest help:
--tb=style traceback print mode (auto/long/short/line/native/no).
In addition to the answer of #SilentGuy, -r N suppresses the summary of failed testcases.

erlang os:cmd() command with UTF8 binary

I'm trying to get an Erlang function to execute a bash command containing unicode characters. For example, I want to execute the equivalent of:
touch /home/jani/ჟანიweł
I put that command in variable D, for example:
io:fwrite("~ts", [list_to_binary(D)]).
touch /home/jani/ჟანიwełok
but after I execute:
os:cmd(D)
I get file called á??á??á??á??weÅ?. How can I fix it?
os:cmd(binary_to_list(unicode:characters_to_binary("touch /home/jani/编程"))).
Executing this command creates a file named ��, while executing the equivalent touch command directly in a terminal creates the file with the correct name.
Its because Erlang reads your source files like latin1 by default, but on newer versions of erlang you can set your files to use unicode.
%% coding: utf-8
-module(test).
-compile(export_all).
test() ->
COMMAND = "touch ჟანიweł",
os:cmd(COMMAND).
and then compiling and executing the module works fine
rorra-air:~ > erl
Erlang/OTP 17 [erts-6.4] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Eshell V6.4 (abort with ^G)
1> c(test).
{ok,test}
2> test:test().
[]
and it created the file on my filesystem
rorra-air:~ > ls -lta
total 144
-rw-r--r-- 1 rorra staff 0 Jun 9 15:18 ჟანიweł

How to get shell expansion on scala sys.process invocations

Consider a simple attempt at shell expansion:
Simple/direct approach :
scala> ("/bin/ls /tmp/*" run BasicIO(false, sb, None)).exitValue
ls: /tmp/*: No such file or directory
res18: Int = 1
I have tried a number of combinations of ProcessIO, BasicIO, Process, etc and can not figure out how to get shell expansion to work.
bash -c :
scala> ("bash -c \"/bin/ls /tmp/*\"" run BasicIO(false, sb, None)).exitValue
/tmp/*": -c: line 0: unexpected EOF while looking for matching `"'
/tmp/*": -c: line 1: syntax error: unexpected end of file
res19: Int = 2
Pipe to bash -s :
scala> ("echo \"/bin/ls /tmp/*\" | bash -s") run BasicIO(false, sb, None)).exitValue
<console>:1: error: ';' expected but ')' found.
("echo \"/bin/ls /tmp/*\" | bash -s") run BasicIO(false, sb, None)).exitValue
btw that last one in shell looks like the following (and works):
21:28:32/dstat $echo "/bin/ls /tmp/*" | bash -s
/tmp/OSL_PIPE_501_SingleOfficeIPC_a974a3af70d46eaeed927022833718b7
/tmp/oobelib.log
/tmp/spark-steve-org.apache.spark.deploy.master.Master-1.pid
/tmp/spark-steve-org.apache.spark.deploy.worker.Worker-1.pid
/tmp/KSOutOfProcessFetcher.501.qQkpPp2uZLdVc5pukHmfJMR4bkM=:
Shell escaping would also be of interest to understand wrt scala Process classes: an example including both expansion and escaping would be optimal.
UPDATE I found this JIRA - it may be relevant. I hope not - that means there were little/no hope for the functionality described here..
sys.process._ is so restrictive in what it can accept that is not usable https://issues.scala-lang.org/browse/SI-7027
Another update I found an old email thread involving the esteemed Daniel Sobral who mentions:
Because the quotes are not delimiting the parameter passed to bash --
it is Scala who decides how the arguments break up, and it simply
splits on spaces, without any quotation facility. If you try the
following, instead, it will work:
Seq("bash", "-c", "ls *.scala").!
It is looking ever more grim for a "fire and forget" version of running a shell command. I yearn for the ruby
%x{whatever shell string you want goes here}
e.g.
echo 'print %x{ls -lrtad /etc/* 2>&1}' | ruby
Which most satisfyingly returns:
-r--r--r-- 1 root wheel 69836 May 28 2013 /etc/php.ini.default-
5.2-previous~orig
lrwxr-xr-x 1 root wheel 30 Oct 22 2013 /etc/localtime -> /usr/share/zoneinfo/US/Pacific
-rw-r--r-- 1 root wheel 102 Nov 5 2013 /etc/hostconfig
-rw-r--r-- 1 root wheel 1286 Nov 5 2013 /etc/my.cnf
-rw-r--r-- 1 root wheel 4161 Dec 18 2013 /etc/sshd_config~previous
-rw-r--r-- 1 root wheel 199 Feb 7 2014 /etc/shells
-rw-r--r-- 1 root wheel 0 Sep 9 2014 /etc/xtab
-rw-r--r-- 1 root wheel 1316 Sep 9 2014 /etc/ttys
.. etc
but looks like that were not happening with Scala..
I fear you took my answer the wrong way. Ruby is doing the same thing, and I know this without looking at their code because what Java (and, by extension, Scala) presents as API is a direct translation of the Unix API.
It is shell that does shell expansions, and while it's perfectly possible for any other software to emulate them, it would be a losing game. Wildcards are often provided, but they are a small part of shell expansion.
My answer has everything you need, but let me expand on that:
import scala.sys.process._
implicit class RubyX(val sc: StringContext) extends AnyVal {
def x(args: Any*): ProcessBuilder = {
val strings = sc.parts.iterator
val expressions = args.iterator
var buf = new StringBuffer(strings.next)
while(strings.hasNext) {
buf append expressions.next
buf append strings.next
}
Process(Seq("bash", "-c", buf.toString))
}
}
then you can do
x"ls *.scala".!
I returned a ProcessBuilder, so you can pick whatever is the best form of execution for you, such as ! above that returns the exit code and echoes everything else to stdout.

JMeter Command Line Output

I'm running a JMeter test plan from command line and it's currently outputting something along the lines of:
Created the tree successfully using C:\*****\TestPlan.jmx
Starting the test # Thu Oct 11 10:20:43 EDT 2012 (1349965243947)
Waiting for possible shutdown message on port 4445
Tidying up ... # Thu Oct 11 10:20:46 EDT 2012 (1349965246384)
... end of run
Is there any way to turn off this output and have the plan execute 'silently'?
Found a way to do this, by following this article http://www.robvanderwoude.com/battech_redirection.php
and appending > NUL to the command
jmeter -n -t C:\***\TestPlan.jmx -Jhostname=%1 > NUL