I have the following program snippet
my $nfdump_command = "nfdump -M /data/nfsen/profiles-data/live/upstream1 -T -R ${syear}/${smonth}/${sday}/nfcapd.${syear}${smonth}${sday}0000:${eyear}/${emonth}/${eday}/nfcapd.${eyear}${emonth}${eday}2355 -n 100 -s ip/bytes -N -o csv -q | awk 'BEGIN { FS = \",\" } ; { if (NR > 1) print \$5, \$10 }'";
syslog("info", $nfdump_command);
my %args;
Nfcomm::socket_send_ok ($socket, \%args);
my #nfdump_output = `$nfdump_command`;
my %domain_name_to_bytes;
my %domain_name_to_ip_addresses;
syslog("info", Dumper(\#nfdump_output));
foreach my $a_line (#nfdump_output) {
syslog("info", "LINE: " . $a_line);
}
Bug: #nfdump_output is empty.
The $nfdump_command is correct and it printing output when ran individually
This program was working for sometime and then it broke. Couldn't figure out why. After moving my development setup to another virtual machine, I found out that using absolute path to nfdump fixes it
Related
my command looks like:
for i in *.fasta ; do
parallel -j 10 python script.py $i > $i.out
done
I want to add a test condition to this loop where it only executes the parallel python script if there are no identical lines in the .fasta file
an example .fasta file below:
>ref2
GGTTAGGGCCGCCTGTTGGTGGGCGGGAATCAAGCAGCATTTTGGAATTCCCTACAATCC
CCAAAGTCAAGGAGTAGTAGAATCTATGCGGAAAGAATTAAAGAAAATTATAGGACAGGT
AAGAGATCAGGCTGAACATCTTAAGACAGCAGTACAAATGGC
>mut_1_2964_0
AAAAAAAAACGCCTGTTGGTGGGCGGGAATCAAGCAGGTATTTGGAATTCCCTACAATCC
CCAAAGTCAAGGAGTAGTAGAATCTATGTTGAAAGAATTAAAGAAAATTATAGGACAGGT
AAGAGATCAGGCTGAACATCTTAAGACAGCAGTACAAATGGC
an example .fasta file that I would like excluded because lines 2 and 4 are identical.
>ref2
GGTTAGGGCCGCCTGTTGGTGGGCGGGAATCAAGCAGCATTTTGGAATTCCCTACAATCC
CCAAAGTCAAGGAGTAGTAGAATCTATGCGGAAAGAATTAAAGAAAATTATAGGACAGGT
AAGAGATCAGGCTGAACATCTTAAGACAGCAGTACAAATGGC
>mut_1_2964_0
GGTTAGGGCCGCCTGTTGGTGGGCGGGAATCAAGCAGCATTTTGGAATTCCCTACAATCC
CCAAAGTCAAGGAGTAGTAGAATCTATGCGGAAAGAATTAAAGAAAATTATAGGACAGGT
AAGAGATCAGGCTGAACATCTTAAGACAGCAGTACAAATGGC
The input files always have 4 lines exactly, and lines 2 and 4 are always the lines to be compared.
I've been using sort file.fasta | uniq -c to see if there are identical lines, but I don't know how to incorporate this into my bash loop.
EDIT:
command:
for i in read_00.fasta ; do lines=$(awk 'NR % 4 == 2' $i | sort | uniq -c | awk '$1 > 1'); if [ -z "$lines" ]; then echo $i >> not.identical.txt; fi;
read_00.fasta:
>ref
GGTGCCCACACTAATGATGTAAAACAATTAACAGAGGCAGTGCAAAAAATAACCACAGAAAGCATAGTAATATGGGGAAAGACTCCTAAATTTAAACTGCCCATACAAAAGGAAACATGGGAAACATGGTGGACAGAGTATTGGCAAGCCACCTGGATTCCTGAGTGGGAGTTTGTTAATACCCCTCCCTTAGTGAAATTATGGTACCAGTTAGA
>mut_1_2964_0
GGTGCCCACACTAATGATGTAAAACAATTAACAGAGGCAGTGCAAAAAATAACCACAGAAAGCATAGTAATATGGGGAAAGACTCCTAAATTTAAACTGCCCATACAAAAGGAAACATGGGAAACATGGTGGACAGAGTATTGGCAAGCCACCTGGATTCCTGAGTGGGAGTTTGTTAATACCCCTCCCTTAGTGAAATTATGGTACCAGTTAGA
Verify those specifc lines content with below awk and exit failure when lines were identical or exit success otherwise (instead of exit, you can do whatever you want to print/do for you);
awk 'NR==2{ prev=$0 } NR==4{ if(prev==$0) exit 1; else exit }' "./$yourFile"
or to output fileName instead when 2nd and 4th lines were differ:
awk 'NR==2{ prev=$0 } NR==4{ if(prev!=$0) print FILENAME; exit }' ./*.fasta
Using the exit-status of the first command then you can easily execute your next second command, like:
for file in ./*.fasta; do
awk 'NR==2{ prev=$0 } NR==4{ if(prev==$0) exit 1; else exit }' "$file" &&
{ parallel -j 10 python script.py "$file" > "$file.out"; }
done
I can track the rsync progress via Whiptail, using Awk to parse the rsync output, however, I'm puzzled by why the Perl counterpart doesn't work (the Whiptail gauge stays stuck at 0).
This is the working Awk commandline:
rsync --info=progress2 --no-inc-recursive --human-readable <source> <destination> |
stdbuf -o0 awk -v RS='\r' '$2 ~ /%$/ { print substr($2, 0, length($2) - 1) }' |
whiptail --gauge Syncing 20 80 0
This is the Perl (I assume) equivalent:
rsync --info=progress2 --no-inc-recursive --human-readable <source> <destination> |
stdbuf -o0 perl -lne 'BEGIN { $/ = "\r" } print /(\d+)%/' |
whiptail --gauge Syncing 20 80 0
If I remove the Whiptail command from the Perl version, the percentage numbers are printed as expected.
How do I need to modify the Perl version?
You may be suffering from buffering. Try setting autoflush on STDOUT.
BEGIN { $/ = "\r"; $|++ }
or if Perl is at least version 5.14, or otherwise with adding the -MIO::Handle switch, you can be more explicit:
BEGIN { $/ = "\r"; *STDOUT->autoflush }
I'm having a problem dealing with some files. I need to perform a column count for every line in a file and depending the number of columns i need to add severals ',' in in the end of each line. All lines should have 36 columns separated by ','
This line solves my problem, but how do I run it in a folder with several files in a automated way?
awk ' BEGIN { FS = "," } ;
{if (NF == 32) { print $0",,,," } else if (NF==31) { print $0",,,,," }
}' <SOURCE_FILE> > <DESTINATION_FILE>
Thank you for all your support
R&P
The answer depends on your OS, which you haven't told us. On UNIX and assuming you want to modify each original file, it'd be:
for file in *
do
awk '...' "$file" > tmp$$ && mv tmp$$ "$file"
done
Also, in general to get all records in a file to have the same number of fields you can do this without needing to specify what that number of fields is (though you can if appropriate):
$ cat tst.awk
BEGIN { FS=OFS=","; ARGV[ARGC++] = ARGV[ARGC-1] }
NR==FNR { nf = (NF > nf ? NF : nf); next }
{
tail = sprintf("%*s",nf-NF,"")
gsub(/ /,OFS,tail)
print $0 tail
}
$
$ cat file
a,b,c
a,b
a,b,c,d,e
$
$ awk -f tst.awk file
a,b,c,,
a,b,,,
a,b,c,d,e
$
$ awk -v nf=10 -f tst.awk file
a,b,c,,,,,,,
a,b,,,,,,,,
a,b,c,d,e,,,,,
It's a short one-liner with Perl:
perl -i.bak -F, -alpe '$_ .= "," x (36-#F)' *
if this is only a single folder without subfolders, use:
for oldfile in /path/to/files/*
do
newfile="${oldfile}.new"
awk '...' "${oldfile}" > "${newfile}"
done
if you also want to include subdirectories recursively, it's probably easiest to put the awk+redirection into a small shell-script, like this:
#!/bin/bash
oldfile=$1
newfile="${oldfile}.new"
awk '...' "${oldfile}" > "${newfile}"
and then run this script (let's calls it runawk.sh) via find:
find /path/to/files/ -type f -not -name "*.new" -exec runawk.sh \{\} \;
I am executing some shell commands via a perl script and capturing output, like this,
$commandOutput = `cat /path/to/file | grep "some text"`;
I also check if the command ran successfully or not like this,
if(!$commandOutput)
{
# command not run!
}
else
{
# further processing
}
This usually works and I get the output correctly. The problem is, in some cases, the command itself does not produce any output. For instance, sometimes the text I am trying to grep will not be present in the target file, so no output will be provided as a result. In this case, my script detects this as "command not run", while its not true.
What is the correct way to differentiate between these 2 cases in perl?
you can use this to know whether the command failed or the command return nothing
$val = `cat text.txt | grep -o '[0-9]*'`;
print "command failed" if (!$?);
print "empty string" if(! length($val) );
print "val = $val";
assume that text.txt contain "123ab" from which you want to get number only.
Use $? to check if the command executed successfully: see backticks do not return any value in perl for an example.
If you're not piping to |grep you can check $? for more specific exit status,
my $commandOutput = `grep "some text" /path/to/file`;
if ($? < 0)
{
# command not run!
}
elsif ($? >> 8 > 1)
{
# file not found
}
else
{
# further processing
}
Could someone help out on this
I want to print all line between the search pattern (START & END) to different files (new_file_name can be any incremental name provided)
But the search pattern repeats in file hence each time it finds the pattern it should dump the line b/w them into different files
The file is something like this
START --- ./body1/b1
##########################
123body1
abcbody1
##########################
END --- ./body1/b1
START --- ./body2/b2
##########################
123body2
defbody2
##########################
END --- ./body2/b2
perl solution,
perl -MFile::Basename -MFile::Path -ne '
($a) = /^START.+?(\S+)$/;
$b = /^END/;
$a..$b or next;
if ($a){ mkpath(dirname $a); open STDOUT,">",$a; }
$a||$b or print;
' file
Here is my awk solution:
# print_between_patterns.awk
/^START/ { filename = $NF ; next } # On START, use the last field as file name
/^END/ { next } # On END, skip
{ print > filename } # For the rest of the lines, print to file
Assume your data file is called data.txt, the following will do what you want:
awk -f print_between_patterns.awk data.txt
Discussion
After the script ran, you will have ./body1, ./body2, and so on.
If you don't want to skip the BEGIN and END parts, remove the next commands.
Update
If you want to control the output filename in a sequential way:
/^START/ { filename = sprintf("out%04d.txt", ++count) ; next }
/^END/ { next }
{ print > filename }
To get automatically generated incremental file names:
awk '
/^END/ { inBlock=0 }
inBlock { print > outfile }
/^START/ { inBlock=1; outfile = "outfile" ++count }
' file
To use the file names from your input:
awk '
/^END/ { inBlock=0 }
inBlock { print > outfile }
/^START/ {
inBlock=1
outdir = outfile = $NF
sub(/\/[^\/]+$/,"",outdir)
system("mkdir -p \"" outdir "\"")
}
' file
The problem #JamesBond was having below was that I wasn't escaping the "/" within the character list in the sub() so I've updated my answer above to do that now. There's absolutely no reason why that should need to be escaped but apparently both nawk and /usr/xpg4/bin/awk require it:
$ cat file
the
quick/brown
dog
$ gawk '/[/]/' file
quick/brown
$ nawk '/[/]/' file
nawk: nonterminated character class [
source line number 1
context is
>>> /[/ <<< ]/
$ /usr/xpg4/bin/awk '/[/]/' file
/usr/xpg4/bin/awk: /[/: [ ] imbalance or syntax error Context is:
>>> /[/ <<<
and gawk doesn't care either way:
$ gawk --lint --posix '/[/]/' file
quick/brown
$ gawk --lint '/[/]/' file
quick/brown
$ gawk --lint --posix '/[\/]/' file
quick/brown
$ gawk --lint '/[\/]/' file
quick/brown
They all work just fine if I escape the backslash without putting it in a character list:
$ /usr/xpg4/bin/awk '/\//' file
quick/brown
$ nawk '/\//' file
quick/brown
$ gawk '/\//' file
quick/brown
So I guess that's something worth remembering for portability in future!
Using awk:
awk 'sub(/^START/, ""){out=sprintf("out%d", c++); p=1}
sub(/^END/, ""){print > out; p=0} p{print > out}' file
This will find and store each match between START and END into separate files named out1, out2 etc.
This is one way to do it in Bash.
#!/bin/bash
[ -n "$BASH_VERSION" ] || {
echo "You need Bash to run this script."
exit 1
}
shopt -s extglob || {
echo "Unable to enable extglob shell option."
exit 1
}
IFS=$' \t\n' ## Use default.
while read KEY DASH FILENAME; do
if [[ $KEY == START && $DASH == --- && -n $FILENAME ]]; then
CURRENT_FILENAME=$FILENAME
DIRNAME=${FILENAME%%+([^/])}
if [[ -n $DIRNAME ]]; then
mkdir -p "$DIRNAME" || {
echo "Unable to create directory $DIRNAME."
exit 1
}
fi
exec 4>"$CURRENT_FILENAME" || {
echo "Unable to open $CURRENT_FILENAME for output."
exit 1
}
for (( ;; )); do
IFS= read -r LINE || {
echo "End of file reached finding END block of $CURRENT_FILENAME."
exec 4>&-
exit 1
}
read -r KEY DASH FILENAME <<< "$LINE"
if [[ $KEY == END && $DASH == --- && $FILENAME == "$CURRENT_FILENAME" ]]; then
break
else
echo "$LINE" >&4
fi
done
exec 4>&-
fi
done
Make sure you save the script in UNIX file format then run it as bash script.sh < file.
I guess you need to see this.
perl -lne 'print if((/START/../END/) and ($_!~/START/ and $_!~/END/))' your_file
Tested below:
> cat temp
START --- ./body1
##########################
123body1
abcbody1
##########################
END --- ./body1
START --- ./body2
##########################
123body2
defbody2
##########################
END --- ./body2
> perl -lne 'print if((/START/../END/) and ($_!~/START/ and $_!~/END/))' temp
##########################
123body1
abcbody1
##########################
##########################
123body2
defbody2
##########################
>
This might work for you:
csplit -z file '/^START/' '{*}'
Files will be named xx00 xx01 xx..