How to automate the LSF waiting based on job name in perl - perl

I have a perl code where I am submitting few jobs at once in parallel via LSF bsub command and once all these jobs finish want to submit a final job.
For example I have these three bsub commands where first two bsub commands submits jobs t1 and t2 and third command checks whether t1 and t2 are finished or not and wait on them with -w argument.
system(" bsub -E "xyz" -n 1 -q queueType -J t1 sleep 30")
system("bsub -E "xyz" -n 1 -q queueType -J t2 sleep 30")
system("bsub -E "xyz" -n 1 -q queueType -J t3 -w "done(t1)&&done(t2)" sleep 30")
So for automating -w argument I have this
my $count=2;
my $i;
system(" bsub -E "xyz" -n 1 -q queueType -J t3 \"foreach my $i (0..$count) {print " done(t_$i)&&";}\" sleep 30 ")
I get this error:
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `bsub -E "/pkg/ice/sysadmin/bin/linux-pre-exec" -n 1 -q short -J t3 -w "foreach (0..7) {print \"done(t)&&\";}" sleep 30'
EDIT: Yes I am using system command to submit these jobs from perl

If you want to generate the done(...)&&done(...) string dynamically, you can use
my $count = 7;
my $done_all = join '&&', map "done(t$_)", 1 .. $count;
That is, for each number in the range 1 .. 7, produce a string "done(t$_)", which gives a list "done(t1)", "done(t2)", ... "done(t7)". The elements of this list are then join'd together with a separator of &&, producing "done(t1)&&done(t2)&&...&&done(t7)".
To run an external command, you can use system. Passing a list to system avoids going through the shell, which avoids all kinds of nasty quoting issues:
system('bsub', '-E', 'xyz', '-n', '1', '-q', 'queueType', '-J', 't3', '-w', $done_all, 'sleep', '30');
# or alternatively:
system(qw(bsub -E xyz -n 1 -q queueType -J t3 -w), $done_all, qw(sleep 30));
Your code tries to pass Perl code to bsub, but that won't work. You have to generate the command string beforehand and pass the result to bsub.

Related

Validate if a text file contains identical records at specific line's number?

my command looks like:
for i in *.fasta ; do
parallel -j 10 python script.py $i > $i.out
done
I want to add a test condition to this loop where it only executes the parallel python script if there are no identical lines in the .fasta file
an example .fasta file below:
>ref2
GGTTAGGGCCGCCTGTTGGTGGGCGGGAATCAAGCAGCATTTTGGAATTCCCTACAATCC
CCAAAGTCAAGGAGTAGTAGAATCTATGCGGAAAGAATTAAAGAAAATTATAGGACAGGT
AAGAGATCAGGCTGAACATCTTAAGACAGCAGTACAAATGGC
>mut_1_2964_0
AAAAAAAAACGCCTGTTGGTGGGCGGGAATCAAGCAGGTATTTGGAATTCCCTACAATCC
CCAAAGTCAAGGAGTAGTAGAATCTATGTTGAAAGAATTAAAGAAAATTATAGGACAGGT
AAGAGATCAGGCTGAACATCTTAAGACAGCAGTACAAATGGC
an example .fasta file that I would like excluded because lines 2 and 4 are identical.
>ref2
GGTTAGGGCCGCCTGTTGGTGGGCGGGAATCAAGCAGCATTTTGGAATTCCCTACAATCC
CCAAAGTCAAGGAGTAGTAGAATCTATGCGGAAAGAATTAAAGAAAATTATAGGACAGGT
AAGAGATCAGGCTGAACATCTTAAGACAGCAGTACAAATGGC
>mut_1_2964_0
GGTTAGGGCCGCCTGTTGGTGGGCGGGAATCAAGCAGCATTTTGGAATTCCCTACAATCC
CCAAAGTCAAGGAGTAGTAGAATCTATGCGGAAAGAATTAAAGAAAATTATAGGACAGGT
AAGAGATCAGGCTGAACATCTTAAGACAGCAGTACAAATGGC
The input files always have 4 lines exactly, and lines 2 and 4 are always the lines to be compared.
I've been using sort file.fasta | uniq -c to see if there are identical lines, but I don't know how to incorporate this into my bash loop.
EDIT:
command:
for i in read_00.fasta ; do lines=$(awk 'NR % 4 == 2' $i | sort | uniq -c | awk '$1 > 1'); if [ -z "$lines" ]; then echo $i >> not.identical.txt; fi;
read_00.fasta:
>ref
GGTGCCCACACTAATGATGTAAAACAATTAACAGAGGCAGTGCAAAAAATAACCACAGAAAGCATAGTAATATGGGGAAAGACTCCTAAATTTAAACTGCCCATACAAAAGGAAACATGGGAAACATGGTGGACAGAGTATTGGCAAGCCACCTGGATTCCTGAGTGGGAGTTTGTTAATACCCCTCCCTTAGTGAAATTATGGTACCAGTTAGA
>mut_1_2964_0
GGTGCCCACACTAATGATGTAAAACAATTAACAGAGGCAGTGCAAAAAATAACCACAGAAAGCATAGTAATATGGGGAAAGACTCCTAAATTTAAACTGCCCATACAAAAGGAAACATGGGAAACATGGTGGACAGAGTATTGGCAAGCCACCTGGATTCCTGAGTGGGAGTTTGTTAATACCCCTCCCTTAGTGAAATTATGGTACCAGTTAGA
Verify those specifc lines content with below awk and exit failure when lines were identical or exit success otherwise (instead of exit, you can do whatever you want to print/do for you);
awk 'NR==2{ prev=$0 } NR==4{ if(prev==$0) exit 1; else exit }' "./$yourFile"
or to output fileName instead when 2nd and 4th lines were differ:
awk 'NR==2{ prev=$0 } NR==4{ if(prev!=$0) print FILENAME; exit }' ./*.fasta
Using the exit-status of the first command then you can easily execute your next second command, like:
for file in ./*.fasta; do
awk 'NR==2{ prev=$0 } NR==4{ if(prev==$0) exit 1; else exit }' "$file" &&
{ parallel -j 10 python script.py "$file" > "$file.out"; }
done

Run a for loop using a comma separated bash variable

I have a list of collections as a comma seperated variable in Bash like below
list_collection=$collection_1,$collection_2,$collection_2,$collection_4
I want to connect to Mongodb and run some commands on these collections
I have done like below but I am not getting the loop to work
${Mongo_Home}/mongo ${mongo_host}/${mongo_db} -u ${mongo_user} -p ${mongo_password} <<EOF
use ${mongo_db};for i in ${list_collection//,/ }
do
db.${i}.reIndex();
db.${i}.createIndex({
"recon_type":1.0,
"account_name":1.0,
"currency":1.0,
"funds":1.0,
"recon_status":1.0,
"transaction_date":1.0},
{name:"index_def"});
if [ $? -ne 0 ] ; then
echo "Mongo Query to reindex ${i} failed"
exit 200
fi
done
EOF
What wrong AM I doing?
What is the correct way?
It's hard to guess what your desired behavior is from a bunch of code that doesn't exhibit that behavior, but to take a shot at it, the following will run mongo once per item in list_collection, with a different heredoc each time:
#!/usr/bin/env bash
# read your string into a single array
IFS=, read -r -a listItems <<<"$list_collection"
# iterate over items in that array
for i in "${listItems[#]}"; do
{ # this brace group lets the redirection apply to the whole complex command
"${Mongo_Home}/mongo" "${mongo_host}/${mongo_db}" \
-u "${mongo_user}" -p "${mongo_password}" ||
{ echo "Mongo query to reindex $i failed" >&2; exit 200; }
} <<EOF
use ${mongo_db};
db.${i}.reIndex();
db.${i}.createIndex({
"recon_type":1.0,
"account_name":1.0,
"currency":1.0,
"funds":1.0,
"recon_status":1.0,
"transaction_date":1.0
}, {name:"index_def"});
EOF
done
Alternately, to run mongo just once (but lose the ability to tell which index a failure happened for) might look like:
#!/usr/bin/env bash
# read your string into a single array
IFS=, read -r -a listItems <<<"$list_collection"
buildMongoCommand() {
printf '%s\n' "use $mongo_db;"
for i in "${listItems[#]}"; do
cat <<EOF
db.${i}.reIndex();
db.${i}.createIndex({
"recon_type":1.0,
"account_name":1.0,
"currency":1.0,
"funds":1.0,
"recon_status":1.0,
"transaction_date":1.0
}, {name:"index_def"});
EOF
done
}
"${Mongo_Home}/mongo" "${mongo_host}/${mongo_db}" \
-u "${mongo_user}" -p "${mongo_password}" \
< <(buildMongoCommand) \
|| { echo "Mongo query failed" >&2; exit 200; }

Generate random date in specific range in spark scala

For testing one of my application I need some huge data in parquet format. which I don't have.I have written a shell script which is performing slow.I wanted go with spark.How can I generate random data using spark scala.
Each filed has to be in specific range.Id should be in between (1-10),date(any date from 2010-2018),start and end time should be any.
My shell scipt.
!/bin/bash
if [ $# -eq 2 ]; then
LIMIT=40 # to generate 40KB file
for((i=0;i<$2;i++))
{
FILE_NAME="$1$i.csv"
echo "id,date,start_time,end_time,distance,amount,persons,longitude,latitude" >> "$FILE_NAME"
while [ $(du -k $FILE_NAME | cut -f 1) -le $LIMIT ]
do
start_time=`date -d "$(date +%H:%M:%S) + $(shuf -i 0-24 -n 1) hours $(shuf -i 0-60 -n 1) minutes $(shuf -i 0-60 -n 1) seconds" +'%H:%M:%S'`
echo "`shuf -i 1-10 -n 1`,`date -d "2011-01-01 + $(shuf -i 1-2557 -n 1) days" +'%m-%d-%Y'`,$start_time,`date -d "$start_time + $(shuf -i 1-6 -n 1) hours $(shuf -i 0-60 -n 1) minutes $(shuf -i 0-60 -n 1) seconds" +'%H:%M:%S'`,`shuf -i 1-60 -n 1`,`shuf -i 100-1500 -n 1`,`shuf -i 1-6 -n 1`,`shuf -i 10-99 -n 1`.`shuf -i 100000-999999 -n 1`,`shuf -i 10-99 -n 1`.`shuf -i 100000-999999 -n 1`" >> "$FILE_NAME"
done
}
else
printf "Usage: sh GenerateCSV.sh <filename without extension> <No of files to generate> \nThe files will be generated with .csv extension\n"
fi
I want data to be like this,which should be parquet format.
2,20-10-2010,23:18:10,23:40:40
How can I do it in spark.
You can try this option.
Below are the Unix Timestamp values for the dates mentioned.
val ss = SparkSession.builder().appName("local").master("local[*]").getOrCreate()
ss.sqlContext.sql("SELECT unix_timestamp ('2010-01-01', 'yyyy-MM-dd')") // 1262284200
ss.sqlContext.sql("SELECT unix_timestamp ('2018-12-31', 'yyyy-MM-dd')") // 1546194600
You can code in this below way for generating random number between the above numbers.
val r = new scala.util.Random
val x = 1262284200 + r.nextInt((1546194600-1262284200))
You can code in this below way for generating random date between the dates by using above generated value x
ss.sqlContext.sql(s"SELECT FROM_UNIXTIME($x)")

Colorize running log after marker

Often I need to analyze large logs in console.
I use the following command to colorize important keywords:
echo "string1\nerror\nsuccess\nstring2\nfail" | perl -p -e 's/(success)/\e[1;32;10m$&\e[0m/g;' -e 's/(error|fail)/\e[0;31;10m$&\e[0m/g'
It will colorize "success" with green, and error messages with red and keeps others lines unchanged (as they contain some useful info).
But in some cases I need to colorize values after some marker, but not marker itself, i.e. in these lines
Marker1: value1
Marker2: value2
need to highlight only value1 and value2 by known markers.
I'm looking for a way to modify my current oneliner to add this function
Also I tried the following solution, which I like less
#!/bin/bash
default=$(tput op)
red=$(tput setaf 1 || tput AF 1)
green=$(tput setaf 2 || tput AF 2)
sed -u -r "s/(Marker1: )(.+)$/\1${red}\2${default}/
s/(Marker2: )(.+)$/\1${green}\2${default}/" "${#}"`
But it has some problem with buffering, so it's ok for some constant file, but log which is continuosly running is not displayed at all
UPDATE:
Found a solution with help of some perl guru.
echo -e "string1\nerror\nsuccess\nstring2\nfail\nMaker1: value1\nMaker2: value2" | \
perl -p \
-e 's/(success)/\e[32m$&\e[0m/g;' \
-e 's/(error|fail)/\e[31m$&\e[0m/g;' \
-e 's/(Maker1:) (.*)/$1 \e[36m$2\e[0m/m;' \
-e 's/(Maker2:) (.*)/$1 \e[01;34m$2\e[0m/m;'
echo -e "string1\nerror\nsuccess\nstring2\nfail\nMaker1: value1\nMaker2: value2" | \
perl -p \
-e 's/(success)/\e[32m$&\e[0m/g;' \
-e 's/(error|fail)/\e[31m$&\e[0m/g;' \
-e 's/(Maker1:) (.*)/$1 \e[36m$2\e[0m/m;' \
-e 's/(Maker2:) (.*)/$1 \e[01;34m$2\e[0m/m;'
#!/bin/bash
default=$(tput op)
red=$(tput setaf 1 || tput AF 1)
green=$(tput setaf 2 || tput AF 2)
#default='e[0m'
#red='e[0;31;10m'
#green='e[1;32;10m'
# automaticaly use passed argument file if any or stdin if not
sed -u -r \
"/success/ s//${green}&${default}/
/error|fail/ s//${red}&${default}/
/^Marker1:/ {s//\1${red}/;s/$/${default}/;}
/^Marker2:/ {s//\1${green}/;s/$/${default}/;}" \
$( [ ${##} -gt 0 ] && echo ${#} )
For a one line:
remove other line thans sed one
replace newline in sed by ;
use directly the terminal code in place of variable
remove the last line if you pipe or use specific file instead

Creating autocomplete script with sub commands

I'm trying to create an autocomplete script for use with fish; i'm porting over a bash completion script for the same program.
The program has three top level commands, say foo, bar, and baz and each has some subcommands, just say a b and c for each.
What I'm seeing is that the top level commands auto complete ok, so if I type f I'm getting foo to autocomplete, but then if I hit tab again to see what it's sub commands are, i see foo, bar, baz, a, b, c and it should just be a, b, c
I am using as a reference the git completion script since it seems to work right. I am also using the git flow script as a reference as well.
I think this is handled in the git completion script by:
function __fish_git_needs_command
set cmd (commandline -opc)
if [ (count $cmd) -eq 1 -a $cmd[1] = 'git' ]
return 0
end
return 1
end
Which makes sense, you can only use the completion if there is a single arg to the command, the script itself; if you use that as the condition (-n) for the call to complete on the top level commands, I think the right thing would happen.
However, what I'm seeing is not the case. I copied that function over to my script, changed "git" appropriately, and did not have any luck.
The trimmed down script is as follows:
function __fish_prog_using_command
set cmd (commandline -opc)
set subcommands $argv
if [ (count $cmd) = (math (count $subcommands) + 1) ]
for i in (seq (count $subcommands))
if not test $subcommands[$i] = $cmd[(math $i + 1)]
return 1
end
end
return 0
end
return 1
end
function __fish_git_needs_command
set cmd (commandline -opc)
set startsWith (echo "$cmd[1]" | grep -E 'prog$')
# there's got to be a better way to do this regex, fish newb alert
if [ (count $cmd) = 1 ]
# Is this always false? Is this the problem?
if [ $cmd[1] -eq $cmd[1] ]
return 1
end
end
return 0
end
complete --no-files -c prog -a bar -n "__fish_git_needs_command"
complete --no-files -c prog -a foo -n "__fish_git_needs_command"
complete --no-files -c prog -a a -n "__fish_prog_using_command foo"
complete --no-files -c prog -a b -n "__fish_prog_using_command foo"
complete --no-files -c prog -a c -n "__fish_prog_using_command foo"
complete --no-files -c prog -a baz -n "__fish_git_needs_command"
Any suggestions on how to make this work is much appreciated.
I guess you are aware that return 0 means true and that return 1 means false?
From your output it looks like your needs_command function is not working properly, thus showing bar even when it has subcommands.
I just tried the following code and it works as expected:
function __fish_prog_needs_command
set cmd (commandline -opc)
if [ (count $cmd) -eq 1 -a $cmd[1] = 'prog' ]
return 0
end
return 1
end
function __fish_prog_using_command
set cmd (commandline -opc)
if [ (count $cmd) -gt 1 ]
if [ $argv[1] = $cmd[2] ]
return 0
end
end
return 1
end
complete -f -c prog -n '__fish_prog_needs_command' -a bar
complete -f -c prog -n '__fish_prog_needs_command' -a foo
complete -f -c prog -n '__fish_prog_using_command foo' -a a
complete -f -c prog -n '__fish_prog_using_command foo' -a b
complete -f -c prog -n '__fish_prog_using_command foo' -a c
complete -f -c prog -n '__fish_prog_needs_command' -a baz
Output from completion:
➤ prog <Tab>
bar baz foo
➤ prog foo <Tab>
a b c
➤ prog foo
Is this what you want?