is it possible to qsub a command instead of a script? - hpc

For example from this page:
https://bioinformatics.mdc-berlin.de/intro2UnixandSGE/sun_grid_engine_for_beginners/how_to_submit_a_job_using_qsub.html
The site save the following to this: runBowtie.sh
#!/bin/bash
#$ -N run_bowtie2
#$ -cwd
#$ -pe smp 6
#$ -l h_vmem=6G
infile=/data/bioinfo/READS2/R1_001.fastq.gz
outfile=/data/bioinfo/READS2/aln/R1_001.sam
btindex=/data/bioinfo/genome_data/Caenorhabditis_elegans/UCSC/ce10/Sequence/BowtieIndex/genome
gzip -dc $infile | bowtie --chunkmbs 300 --best -m 1 -p 6 --phred33 -q $btindex - -S $outfile
and then do
qsub -runBowtie.sh
I am wondering if it's possible to just
qsub -pe smp 33 gzip -dc /data/bioinfo/READS2/R1_001.fastq.gz | bowtie --chunkmbs 300 --best -m 1 -p 6 --phred33 -q $/data/bioinfo/genome_data/Caenorhabditis_elegans/UCSC/ce10/Sequence/BowtieIndex/genome - -S $/data/bioinfo/READS2/aln/R1_001.sam
thank you

Related

xargs lines containing -e and -n processed differently

When running the following command with xargs (GNU findutils) 4.7.0
xargs -n1 <<<"-d -e -n -o"
I get this output
-d
-o
Why is -e and -n not present in the output?
From man xargs:
[...] and executes the command (default is /bin/echo) [...]
So it runs:
echo -d
echo -e
echo -n
echo -o
But from man echo:
-n do not output the trailing newline
-e enable interpretation of backslash escapes
And echo -n outputs nothing, and echo -e outputs one empty newlines that you see in the output.

Ctrl+Z Character and EOF Issues With Pipes

I have a huge file provided by a third party, which appears to have been generated in a Windows/DOS-like environment. The last line of the file contains a ^Z character. I noticed this when I looked at the processed file and the last line contained a ^Z. I added some logic to skip this line from the input and it was working fine until I changed my code to take the input from stdin as opposed to a file.
Here is a simpler illustration of this issue. When I do a line count on a single file stream with and without ^Z skipping, it reports the correct values:
unzip -j -p -qq file1.zip | perl -nle 'print' | wc -l
3451
unzip -j -p -qq file2.zip | perl -nle 'print' | wc -l
3451
unzip -j -p -qq file1.zip | perl -nle 'next if /^\cZ/; print' | wc -l
3450
unzip -j -p -qq file2.zip | perl -nle 'next if /^\cZ/; print' | wc -l
3450
Now when I try to process both files at once, I lose one record. I am guessing this is something to do with the ^Z character but I cannot figure out what I can do about it:
unzip -j -p -qq '*.zip' | perl -nle 'print' | wc -l
6901 ## this should have been 6902
unzip -j -p -qq '*.zip' | perl -nle 'next if /^\cZ/; print' | wc -l
6899 ## this should have been 6900
These files are huge (each 20+GB) and they are to be read in groups of 3-6 files so I wanted to avoid processing them one by one and then concatenate later. Any thoughts on how to avoid the ^Z character without running into the above issue?
I am on a Linux machine. Btw, opening the file in vim does not display the last record (i.e., ^Z) and setting set ff=unix did not change this either. So vim reports 3450 lines for the single unzipped file and 6900 for the combined unzipped files.
Thanks!
Since the ^Z isn't followed by a line ending, unzip is producing
file1:1
file1:2
file1:3
^Zfile2:1
file2:2
file2:3
^Z
so you delete the first line of the second file. You could simply remove the ^Z instead of the entire line.
perl -pe's/^\cZ//'
That said, unzip -a is designed for exactly this situation. Not only will it strip the ^Z for you, it will also fix the line endings if necessary.
$ unzip -j -p -qq z.zip a.txt | od -c
0000000 a b c \r \n d e f \r \n 032
0000013
$ unzip -j -p -qq z.zip b.txt | od -c
0000000 g h i \r \n j k l \r \n 032
0000013
$ unzip -j -p -qq z.zip | od -c
0000000 a b c \r \n d e f \r \n 032 g h i \r \n
0000020 j k l \r \n 032
0000026
$ unzip -j -p -qq -a z.zip | od -c
0000000 a b c \n d e f \n g h i \n j k l \n
0000020

Unable to get Grep get information in Terminal

I'm unable to get 'get' in terminal using Grep.
This code used to work on Lion but in Maverick the GET doesn't show...
sudo tcpdump -i en1 -n -s 0 -w - | grep -a -o -E "Host\:\ .*|GET\ \/.*"
Any help or suggestions maybe?
Try:
sudo tcpdump -s 0 -A | egrep --color=never -a -o "Host\: .*|GET\ \/.*"
The -w - writes the raw packets whereas the -A decodes to ASCII; handy for web pages (per man)
I found that if grep was outputting color, the Host: lines were output as empty lines.

How to tell if my program is being piped to another (Perl)

"ls" behaves differently when its output is being piped:
> ls ???
bar foo
> ls ??? | cat
bar
foo
How does it know, and how would I do this in Perl?
In Perl, the -t file test operator indicates whether a filehandle
(including STDIN) is connected to a terminal.
There is also the -p test operator to indicate whether a filehandle
is attached to a pipe.
$ perl -e 'printf "term:%d, pipe:%d\n", -t STDIN, -p STDIN'
term:1, pipe:0
$ perl -e 'printf "term:%d, pipe:%d\n", -t STDIN, -p STDIN' < /tmp/foo
term:0, pipe:0
$ echo foo | perl -e 'printf "term:%d, pipe:%d\n", -t STDIN, -p STDIN'
term:0, pipe:1
File test operator documentation at perldoc -f -X.
use IO::Interactive qw(is_interactive);
is_interactive() or warn "Being piped\n";

Is `xargs -t` output stderr or stdout, and can you control it?

say i have a directory with hi.txt and blah.txt and i execute the following command on a linux-ish command line
ls *.* | xargs -t -i{} echo {}
the output you will see is
echo blah.txt
blah.txt
echo hi.txt
hi.txt
i'd like to redirect the stderr output (say 'echo blah.txt' fails...), leaving only the output from the xargs -t command written to std out, but it looks as if it's stderr as well.
ls *.* | xargs -t -i{} echo {} 2> /dev/null
Is there a way to control it, to make it output to stdout?
Use:
ls | xargs -t -i{} echo {} 2>&1 >/dev/null
The 2>&1 sends the standard error from xargs to where standard output is currently going; the >/dev/null sends the original standard output to /dev/null. So, the net result is that standard output contains the echo commands, and /dev/null contains the file names. We can debate about spaces in file names and whether it would be easier to use a sed script to put 'echo' at the front of each line (with no -t option), or whether you could use:
ls | xargs -i{} echo echo {}
(Tested: Solaris 10, Korn Shell ; should work on other shells and Unix platforms.)
If you don't mind seeing the inner workings of the commands, I did manage to segregate the error output from xargs and the error output of the command executed.
al * zzz | xargs -t 2>/tmp/xargs.stderr -i{} ksh -c "ls -dl {} 2>&1"
The (non-standard) command al lists its arguments one per line:
for arg in "$#"; do echo "$arg"; done
The first redirection (2>/tmp/xargs.stderr) sends the error output from xargs to the file /tmp/xargs.stderr. The command executed is 'ksh -c "ls -dl {} 2>&1"', which uses the Korn shell to run ls -ld on the file name with any error output going to standard output.
The output in /tmp/xargs.stderr looks like:
ksh -c ls -dl x1 2>&1
ksh -c ls -dl x2 2>&1
ksh -c ls -dl xxx 2>&1
ksh -c ls -dl zzz 2>&1
I used 'ls -ld' in place of echo to ensure I was testing errors - the files x1, x2, and xxx existed, but zzz does not.
The output on standard output looked like:
-rw-r--r-- 1 jleffler rd 1020 May 9 13:05 x1
-rw-r--r-- 1 jleffler rd 1069 May 9 13:07 x2
-rw-r--r-- 1 jleffler rd 87 May 9 20:42 xxx
zzz: No such file or directory
When run without the command wrapped in 'ksh -c "..."', the I/O redirection was passed as an argument to the command ('ls -ld'), and it therefore reported that it could not find the file '2>&1'. That is, xargs did not itself use the shell to do the I/O redirection.
It would be possible to arrange for various other redirections, but the basic problem is that xargs makes no provision for separating its own error output from that of the commands it executes, so it is hard to do.
The other rather obvious option is to use xargs to write a shell script, and then have the shell execute it. This is the option I showed before:
ls | xargs -i{} echo echo {} >/tmp/new.script
You can then see the commands with:
cat /tmp/new.script
You can run the commands to discard the errors with:
sh /tmp/new.script 2>/dev/null
And, if you don't want to see the standard output from the commands either, append 1>&2 to the end of the command.
So I believe what you want is to have as stdout is
the stdout from the utility that xargs executes
the listing of commands generated by xargs -t
You want to ignore the stderr stream generated by the
executed utility.
Please correct me if I'm wrong.
First, let's create a better testing utility:
% cat myecho
#!/bin/sh
echo STDOUT $#
echo STDERR $# 1>&2
% chmod +x myecho
% ./myecho hello world
STDOUT hello world
STDERR hello world
% ./myecho hello world >/dev/null
STDERR hello world
% ./myecho hello world 2>/dev/null
STDOUT hello world
%
So now we have something that actually outputs to both stdout and stderr, so we
can be sure we're only getting what we want.
A tangential way to do this is not to use xargs, but rather, make. Echoing a command
and then doing it is kind of what make does. That's its bag.
% cat Makefile
all: $(shell ls *.*)
$(shell ls): .FORCE
./myecho $# 2>/dev/null
.FORCE:
% make
./myecho blah.txt 2>/dev/null
STDOUT blah.txt
./myecho hi.txt 2>/dev/null
STDOUT hi.txt
% make >/dev/null
%
If you're tied to using xargs, then you need to modify your utility that
xargs uses so it surpresses stderr. Then you can use the 2>&1 trick others
have mentioned to move the command listing generated by xargs -t from stderr
to stdout.
% cat myecho2
#!/bin/sh
./myecho $# 2>/dev/null
% chmod +x myecho2
% ./myecho2 hello world
STDOUT hello world
% ls *.* | xargs -t -i{} ./myecho2 {} 2>&1
./myecho blah.txt 2>/dev/null
STDOUT blah.txt
./myecho hi.txt 2>/dev/null
STDOUT hi.txt
% ls *.* | xargs -t -i{} ./myecho2 {} 2>&1 | tee >/dev/null
%
So this approach works, and collapses everything you want to stdout (leaving out what you don't want).
If you find yourself doing this a lot, you can write a general utility to surpress stderr:
% cat surpress_stderr
#!/bin/sh
$# 2>/dev/null
% ./surpress_stderr ./myecho hello world
STDOUT hello world
% ls *.* | xargs -t -i{} ./surpress_stderr ./myecho {} 2>&1
./surpress_stderr ./myecho blah.txt 2>/dev/null
STDOUT blah.txt
./surpress_stderr ./myecho hi.txt 2>/dev/null
STDOUT hi.txt
%
xargs -t echos the commands to be executed to stderr before executing them. If you want them to instead echo to stderr, you can pipe stderr to stdout with the 2>&1 construct:
ls *.* | xargs -t -i{} echo {} 2>&1
It looks like xargs -t goes to stderr, and there's not much you can do about it.
You could do:
ls | xargs -t -i{} echo "Foo: {}" >stderr.txt | tee stderr.txt
to display only the stderr data on your terminal as your command runs, and then grep through stderr.txt after to see if anything unexpected occurred, along the lines of grep -v Foo: stderr.txt
Also note that on Unix, ls *.* isn't how you display everything. If you want to see all the files, just run ls on its own.
As I understand your problem using GNU Parallel http://www.gnu.org/software/parallel/ would do the right thing:
ls *.* | parallel -v echo {} 2> /dev/null