Samtools/hpc/truncated file - hpc

I have tried to submit the script below to HPC
#!/bin/bash
#PBS -N bwa_mem_tumor
#PBS -q batch
#PBS -l walltime=02:00:00
#PBS -l nodes=2:ppn=2
#PBS -j oe
sample=x
ref=absolute/path/GRCh38.p13.genome.fa
fwd=absolutepath/forward_read.fq.gz
rev=absolutepath/reverse_read.fq.gz
module load bio/samtools/1.9
bwa mem $ref $fwd $rev > $sample.tumor.sam && samtools view -S $sample.tumor.sam -b > $sample.tumor.bam && samtools sort $sample.tumor.bam > $sample.tumor.sorted.bam
However as an output I can get only the $sample.tumor.sam and log file says that
Lmod has detected the following error: The following module(s) are unknown:
"bio/samtools/1.9"
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore-cache load "bio/samtools/1.9"
Also make sure that all modulefiles written in TCL start with the string
#%Module
However when I input modeles avail it shows that bio/samtools/1.9 is on the list.
Also when i use the option module --ignore-cache load "bio/samtools/1.9"
the result is the same
If i try to continue working with the sam file and input manually the command line
samtools view -b RS0107.tumor.sam > RS0107.tumor.bam
it shows
[W::sam_read1] Parse error at line 200943
[main_samview] truncated file.
What's possibly wrong with the samtools module ir we with the script?

Related

Can you get full command line from process ID (including command line arguments, etc)?

This question is in addition to the question asked here: https://unix.stackexchange.com/questions/163145/how-to-get-whole-command-line-from-a-process. On my system, the following command results in a PID (as expected):
CUDA_VISIBLE_DEVICES=4,5 python3 main.py 1> out.txt 2> err.txt &
Now, the methods in the stack exchange link above provide many solutions. However, when trying these solutions, I only receive the following information:
python3 main.py
Is there a way to return the entire command line "CUDA_VISIBLE_DEVICES=4,5 python3 main.py 1> out.txt 2> err.txt &", not just the portion "python3 main.py"?
No.
Assuming you're on a Linux system, you can find the individual bits, but you can't put it together.
Assume also that the process's PID is in $pid
The CUDA_VISIBLE_DEVICES=4,5 variable gets added to the environment of the python command. You can find it in /proc/$pid/environ but you can't tell which of those variables were specified on the command line: the user could have written
export CUDA_VISIBLE_DEVICES=4,5
python3 main.py 1> out.txt 2> err.txt &
The file redirections are available in /proc/$pid/fd:
/proc/$pid/fd/1 is a symbolic link to out.txt
/proc/$pid/fd/2 is a symbolic link to err.txt
I don't know how to tell if a process is running in the background.
Since you're just interested in the environment: with bash
declare -A environ
while IFS='=' read -r -d '' var value; do
environ["$var"]="$value"
done < /proc/$pid/environ
echo "process has CUDA_VISIBLE_DEVICE value ${environ[CUDA_VISIBLE_DEVICE]}"

Running perl files from a text file

There're multiple perl scripts that is ran from CYGWIN terminal. An example is,
$ perl IdGeneratorTool.pl JSmith -i userInfo.adb -o JSmith.txt
The above is an example. Were based on input parameter JSmith, it reads a db file, generate an ID and output that to a text file.
Now these perl scripts running on the CYGWIN keeps growing and it's added to a text file like shown below,
$ perl IdGeneratorTool.pl JSmith -i userInfo.adb -o JSmith.txt
$ perl IdGeneratorTool.pl PTesk -i userInfo.adb -o PTesk.txt
$ perl IdGeneratorTool.pl CMorris -i userInfo.adb -o CMorris.txt
$ perl IdGeneratorTool.pl JLawrence -i userInfo.adb -o JLawrence.txt
$ perl IdGeneratorTool.pl TCruise -i userInfo.adb -o TCruise.txt
...
....
......
.......
.........
And the list keeps growing.
I would like to know whether there's a way to execute all these perl scripts which are in a text file in one go.
I'm new to perl and doesn't have much idea as to what are the options.
An ideal scenario might be, a tool where i can open this text file and click a execute button and then it executes all the scripts and output multiple *.txt files into the same directory.
Or maybe a simple perl script that can do it.
Put them into a file makeall (or whatever you want to call it.
Put as a first line #!/bin/bash into the file
In cygwin enter chmod +x makeall
in cygwin enter ./makeall
With this you've created a bash script which'll do all your calls of the perl script.
Another option would to just put all the user information into a csv file and read that one in order to call your script.
WAIT! Even easier!
Put into the makeall script this:
#!/bin/bash
for user in \
JSmith \
PTesk \
CMorris \
JLawrence \
TCruise \
; do
perl IdGeneratorTool.pl "$user" -i userInfo.adb -o "$user".txt
done
Now you just need to add any additional user the same way I did for your examples.
Without seeing the source for IdGeneratorTool.pl it's hard to give any specific advice; but it is generally not hard to turn something like
do_stuff($ARGV[0], $opt_i, $opt_o);
into
while (<>) {
chomp;
$user, $adb, $outputfile = split('\t');
do_stuff($user, $adb, $outputfile);
}
to read the input from a tab-delimited file instead of from command-line arguments.
You can create text file with list of users (one per line) for example user_list.txt
JSmith
PTesk
CMorris
JLawrence
TCruise
Then create bash script process_list.sh with following content in same directory
#!/bin/bash
for user in `cat user_list.txt`
do
perl IdGeneratorTool.pl $user -i userInfo.adb -o ${user}.txt
done
Now make bash script executable chmod +x process_list.sh and it is ready for execution.
Once you need to add new user edit user_list.txt to add one more line into the file.
Polar Bear

WEKA - Get InfoGainAttribute selection output using command line

When I use the Weka Explorer to Select Attributes with InfoGainAttribute evaluator, I get all featured ranked in the Attribute selection output panel
But I need now to do the same operation with the command line. The problem is that I don'nt know how to get the Attribute selection output with the command line
This is my command line:
java -classpath weka.jar weka.filters.supervised.attribute.AttributeSelection -E weka.attributeSelection.InfoGainAttributeEval -S "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1" -i input.arff -o output.arff
The result of this operation is the output.arff file, with the same content that input.arff file.
I need the Ranked Attributes.
When you look into the Explorer's log (click on button in bottom right corner), you will see something like this in the output:
12:13:46: Started weka.attributeSelection.InfoGainAttributeEval
12:13:46: Command: weka.attributeSelection.InfoGainAttributeEval -s "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1"
12:13:46: Filter command: weka.filters.supervised.attribute.AttributeSelection -E "weka.attributeSelection.InfoGainAttributeEval " -S "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1"
12:13:46: Meta-classifier command: weka.classifiers.meta.AttributeSelectedClassifier -E "weka.attributeSelection.InfoGainAttributeEval " -S "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1" -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
12:13:46: Finished weka.attributeSelection.InfoGainAttributeEval weka.attributeSelection.Ranker
If you want to get the output from the Explorer, you use Command, if you want to filter your data, you use Filter command and when you want to train a classifier, you use Meta-classifier command.
In your case, the command therefore would something like this (no output file!):
java -classpath weka.jar weka.attributeSelection.InfoGainAttributeEval -s "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1" -i input.arff
That you received the same data in output.arff as you used as input via input.arff is expected, as the Ranker only ranks the attributes, but doesn't actually modify the data itself.
Rule of thumb: SubsetEval schemes will modify the dataset, AttributeEval schemes can be used for ranking.

Torque qsub -o command not work

I made a test script test.qsub:
#!/bin/bash
#PBS -q batch
#PBS -o output.txt
#PBS -e Error.err
echo "hello world"
When running qsub test.qsub it does not generate the output.txt file nor the file error.txt. I also believe that the other options do not work either, appreciate your help ! It is said you should configure the torque.cfg but in my installation the file is not generated and not in /var/spool/torque.
Try "#PBS -k oe". This directs pbs to keep stdout and stderr.

how to retrive a perl file using wget and execute it using a one-liner?

I'm looking to use wget to retrieve a perl file and execute it in one line. Does anyone know if this is possible/how I would go about doing this?
In order to use wget for this purpose, you would use the -O flag and give it the '-' character as an argument. From the manpage:
-O file
--output-document=file
Giving '-' as the "file" option to -O tells it to send it's output to stdout, which can then be piped into the Perl command.
You can provide the -q flag as well to turn off wget's own warning and message output:
-q
--quiet
Turn off Wget's output.
This will make things look cleaner in the shell.
So you would end up with something like:
wget -qO - http://127.0.0.1/myscript.pl | perl -
For more information on I/O redirection take a look at this:
http://www.tldp.org/LDP/abs/html/io-redirection.html
Just download and pipe to perl
curl -L http://your_location.pl | perl -
You'll sometimes see code like for install modules like cpanm.