WEKA - Get InfoGainAttribute selection output using command line - command-line

When I use the Weka Explorer to Select Attributes with InfoGainAttribute evaluator, I get all featured ranked in the Attribute selection output panel
But I need now to do the same operation with the command line. The problem is that I don'nt know how to get the Attribute selection output with the command line
This is my command line:
java -classpath weka.jar weka.filters.supervised.attribute.AttributeSelection -E weka.attributeSelection.InfoGainAttributeEval -S "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1" -i input.arff -o output.arff
The result of this operation is the output.arff file, with the same content that input.arff file.
I need the Ranked Attributes.

When you look into the Explorer's log (click on button in bottom right corner), you will see something like this in the output:
12:13:46: Started weka.attributeSelection.InfoGainAttributeEval
12:13:46: Command: weka.attributeSelection.InfoGainAttributeEval -s "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1"
12:13:46: Filter command: weka.filters.supervised.attribute.AttributeSelection -E "weka.attributeSelection.InfoGainAttributeEval " -S "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1"
12:13:46: Meta-classifier command: weka.classifiers.meta.AttributeSelectedClassifier -E "weka.attributeSelection.InfoGainAttributeEval " -S "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1" -W weka.classifiers.trees.J48 -- -C 0.25 -M 2
12:13:46: Finished weka.attributeSelection.InfoGainAttributeEval weka.attributeSelection.Ranker
If you want to get the output from the Explorer, you use Command, if you want to filter your data, you use Filter command and when you want to train a classifier, you use Meta-classifier command.
In your case, the command therefore would something like this (no output file!):
java -classpath weka.jar weka.attributeSelection.InfoGainAttributeEval -s "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N -1" -i input.arff
That you received the same data in output.arff as you used as input via input.arff is expected, as the Ranker only ranks the attributes, but doesn't actually modify the data itself.
Rule of thumb: SubsetEval schemes will modify the dataset, AttributeEval schemes can be used for ranking.

Related

Samtools/hpc/truncated file

I have tried to submit the script below to HPC
#!/bin/bash
#PBS -N bwa_mem_tumor
#PBS -q batch
#PBS -l walltime=02:00:00
#PBS -l nodes=2:ppn=2
#PBS -j oe
sample=x
ref=absolute/path/GRCh38.p13.genome.fa
fwd=absolutepath/forward_read.fq.gz
rev=absolutepath/reverse_read.fq.gz
module load bio/samtools/1.9
bwa mem $ref $fwd $rev > $sample.tumor.sam && samtools view -S $sample.tumor.sam -b > $sample.tumor.bam && samtools sort $sample.tumor.bam > $sample.tumor.sorted.bam
However as an output I can get only the $sample.tumor.sam and log file says that
Lmod has detected the following error: The following module(s) are unknown:
"bio/samtools/1.9"
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore-cache load "bio/samtools/1.9"
Also make sure that all modulefiles written in TCL start with the string
#%Module
However when I input modeles avail it shows that bio/samtools/1.9 is on the list.
Also when i use the option module --ignore-cache load "bio/samtools/1.9"
the result is the same
If i try to continue working with the sam file and input manually the command line
samtools view -b RS0107.tumor.sam > RS0107.tumor.bam
it shows
[W::sam_read1] Parse error at line 200943
[main_samview] truncated file.
What's possibly wrong with the samtools module ir we with the script?

Determining throughput from pcap containing flow records

I have a single packet capture (acquired via tcpdump) that contains flow records between an exporter and a collector.
I want to determine throughput across a given interface using the bytes (octets) field in the v9 record. I have filtered down to the network that I want like so:
tshark -r input.pcap -Y "ip.src == X.X.X.X" -F pcap -w filtered.pcap
I further filtered to the interface that I needed like so:
tshark -r filtered.pcap -Y "cflow.inputint == Y" -F pcap -w filtered2.pcap
I'm lost after that. Is there a better tool to aggregate across the flows to get throughput?
Any help would be greatly appreciated!
You may try to print netflow fields and then process the results.
For example:
tshark -T fields -e cflow.version -e cflow.srcaddr -e cflow.dstaddr -e cflow.octets -e cflow.timedelta -e cflow.abstimestart
Field names are visible in wireshark status bar when you select packet details.
Better option:
install or compile https://github.com/phaag/nfdump with --enable-readpcap flag.
process your pcap nfcapd -f <path to your pcap file> -l <path to output directory> -T all
count statistics nfdump -o extended -r <path to output directory>

Log bjob info immediately after bsub

I'm looking for a way to log information to a file about a submitted job immediately after it starts.
Normally all the job status is appended to the log file after a job has completed, but I'd like to know the information it has when it starts.
I know there's the -B flag but I want it in a file, and I could also do something like:
bsub -J jobby -o run_job.log bjobs -l -J jobby > jobby.log; run_job
but maybe someone knows of a funkier way of doing this.
There are some subtle variations that essentially accomplish the same thing:
You can use a pre-exec to do a similar thing instead of doing the
bjobs as part of the command:
bsub -J jobby -E "bjobs -l -J jobby > jobby.log" run_job
You can use the job's environment to get your own jobid instead of
using -J if you write your submission as a script:
#!/bin/sh
#BSUB -o run_job.log
bjobs -l $LSB_JOBID > $LSB_JOBID.log
run_job
Then submit your job like this:
bsub < jobscript.sh
You can do some combination of the above: use $LSB_JOBID in a
pre-execution script.
That's about as 'funky' as it gets AFAIK :)

Torque qsub -o command not work

I made a test script test.qsub:
#!/bin/bash
#PBS -q batch
#PBS -o output.txt
#PBS -e Error.err
echo "hello world"
When running qsub test.qsub it does not generate the output.txt file nor the file error.txt. I also believe that the other options do not work either, appreciate your help ! It is said you should configure the torque.cfg but in my installation the file is not generated and not in /var/spool/torque.
Try "#PBS -k oe". This directs pbs to keep stdout and stderr.

Regex with wget?

I'm using wget to download some useful website:
wget -k -m -r -q -t 1 http://www.web.com/
but I want replace some bad words with my own choice (like Yahoo pipes regex)
If you want to regexp out words from within the page you are fetching with wget, you should pipe the output through sed.
For example:
wget -k -m -r -q -t 1 -O - http://www.web.com/ | sed 's/cat/dog/g' > output.html
Use the -O - flag to write the output to stdout, and the -q flag to make wget run in quiet mode.
Haven't got a shell atm to check my syntax but that should set you on the right path!
You can use sed -i.
find www.web.com -type f -exec sed -i 's/word1\|word2\|word3//ig' {} +
word1, word2, word3, etc. are the words to delete.