perl or awk: zero proof division with perl or awk - perl

I have to add a field showing the difference in percentage between 2 fields in a file like:
BI,1266,908
BIL,494,414
BKC,597,380
BOOM,2638,654
BRER,1453,1525
BRIG,1080,763
DCLE,0,775
The output should be:
BI,1266,908,-28.3%
BIL,494,414,-16.2%
BKC,597,380,-36.35%
BOOM,2638,654,-75.2%
BRER,1453,1525,5%
BRIG,1080,763,-29.4%
DCLE,0,775,-
Note the zero in the last row. Either of these fields could be zero. If a zero is present in either field, N/A or - is acceptable.
What I'm trying --
Perl:
perl -F, -ane 'if ($F[2] > 0 || $F[3] > 0){print $F[0],",",$F[1],",",$F[2],100*($F[2]/$F[3])}' file
I get Illegal division by zero at -e line 1, <> line 2. If I change the || to && it prints nothing.
In awk:
awk '$2>0{$4=sprintf("%d(%.2f%)", $3, ($3/$2)*100)}1' file
Just prints the file.

$ awk -F, '$2 == 0 || $3 == 0 { printf("%s,-\n", $0); next }
{ printf("%s,%.2f%%\n", $0, 100 * ($3 / $2) - 100) }' input.csv
BI,1266,908,-28.28%
BIL,494,414,-16.19%
BKC,597,380,-36.35%
BOOM,2638,654,-75.21%
BRER,1453,1525,4.96%
BRIG,1080,763,-29.35%
DCLE,0,775,-
How it works: if the second or third columns are equal to 0, add a - field to the line. Otherwise, calculate the percentage difference and add that.

Your perl's main issue was confusing awk's 1-based column indexes with perl's 0-based column indexes.
perl -F, -ane 'print "$1," if /(.+)/;if ($F[1] > 0 && $F[2] > 0){printf ("%.2f%", ((100*$F[2]/$F[1])-100)) } else {print "-"};print "\n"' file
The $1 here refers to the capture group (.+) which means "The whole line but the linefeed". The rest is probably self-explanatory if you understand the awk.

You're not telling awk that the fields are separated by commas so it's assuming the default, spaces, and so $2 is never greater than zero because it's null as there's only 1 space-separated field per line. Change it to:
$ awk 'BEGIN{FS=OFS=","} $2>0{$4=sprintf("%d(%.2f%)", $3, ($3/$2)*100)}1' file
BI,1266,908,908(71.72%)
BIL,494,414,414(83.81%)
BKC,597,380,380(63.65%)
BOOM,2638,654,654(24.79%)
BRER,1453,1525,1525(104.96%)
BRIG,1080,763,763(70.65%)
DCLE,0,775
and then tweak it for your desired output:
$ awk 'BEGIN{FS=OFS=","} {$4=($2 && $3 ? sprintf("%.2f%", (($3/$2)-1)*100) : "N/A")} 1' file
BI,1266,908,-28.28%
BIL,494,414,-16.19%
BKC,597,380,-36.35%
BOOM,2638,654,-75.21%
BRER,1453,1525,4.96%
BRIG,1080,763,-29.35%
DCLE,0,775,N/A

Related

Arrange columns using awk or sed?

I have a file that has around 500 rows and 480K columns, I am required to move columns 2,3 and 4 at the end. My file is a comma separated file, is there a quicker way to arrange this using awk or sed?
You can try below solution -
perl -F"," -lane 'print "#F[0]"," ","#F[4..$#F]"," ","#F[1..3]"' input.file
You can copy the columns easily, moving will take too long for 480K columns.
$ awk 'BEGIN{FS=OFS=","} {print $0,$2,$3,$4}' input.file > output.file
what kind of a data format is this?
Another technique, just bash:
while IFS=, read -r a b c d e; do
echo "$a,$e,$b,$c,$d"
done < file
Testing with 5 fields:
$ cat foo
1,2,3,4,5
a,b,c,d,e
$ cat program.awk
{
$6=$2 OFS $3 OFS $4 OFS $1 # copy fields to the end and $1 too
sub(/^([^,],){4}/,"") # remove 4 first columns
$1=$5 OFS $1 # catenate current $5 (was $1) to $1
NF=4 # reduce NF
} 1 # print
Run it:
$ awk -f program.awk FS=, OFS=, foo
1,5,2,3,4
a,e,b,c,d
So theoretically this should work:
{
$480001=$2 OFS $3 OFS $4 OFS $1
sub(/^([^,],){4}/,"")
$1=$480000 OFS $1
NF=479999
} 1
EDIT: It did work.
Perhaps perl:
perl -F, -lane 'print join(",", #F[0,4..$#F,1,2,3])' file
or
perl -F, -lane '#x = splice #F, 1, 3; print join(",", #F, #x)' file
Another approach: regular expressions
perl -lpe 's/^([^,]+)(,[^,]+,[^,]+,[^,]+)(.*)/$1$3$2/' file
Timing it with a 500 line file, each line containing 480,000 fields
$ time perl -F, -lane 'print join(",", #F[0,4..$#F,1,2,3])' file.csv > file2.csv
40.13user 1.11system 0:43.92elapsed 93%CPU (0avgtext+0avgdata 67960maxresident)k
0inputs+3172752outputs (0major+16088minor)pagefaults 0swaps
$ time perl -F, -lane '#x = splice #F, 1, 3; print join(",", #F, #x)' file.csv > file2.csv
34.82user 1.18system 0:38.47elapsed 93%CPU (0avgtext+0avgdata 52900maxresident)k
0inputs+3172752outputs (0major+12301minor)pagefaults 0swaps
And pure text manipulation is the winner
$ time perl -lpe 's/^([^,]+)(,[^,]+,[^,]+,[^,]+)(.*)/$1$3$2/' file.csv > file2.csv
4.63user 1.36system 0:20.81elapsed 28%CPU (0avgtext+0avgdata 20612maxresident)k
0inputs+3172752outputs (0major+149866minor)pagefaults 0swaps

Get values from two rows and convert to variables

I have values from two rows, want to get all values and make them to variables.
Output is from emc storage:
Bus 0 Enclosure 0 Disk 0
State: Enabled
Bus 0 Enclosure 0 Disk 1
State: Enabled
Expected result:
Bus:0|Enclosure:0|Disk:0|State:Enabled
Or just need somebody to give me direction how to get the last row ...
This might work for you (GNU sed):
sed '/^Bus/!d;N;s/[0-9]\+/:&|/g;s/\s//g' file
To get only the last row:
sed '/^Bus/{N;h};$!d;x;s/[0-9]\+/:&|/g;s/\s//g' file
Try this awk:
$ awk '/^Bus/{for(i=1;i<=NF;i+=2) printf "%s:%s|", $i,$(i+1)}/^State/{printf "%s%s\n", $1, $2}' file
Bus:0|Enclosure:0|Disk:0|State:Enabled
Bus:0|Enclosure:0|Disk:1|State:Enabled
To handle multiple words in the last field, you can do:
$ awk '/^Bus/{for(i=1;i<=NF;i+=2) printf "%s:%s|", $i,$(i+1)}/^State/{printf "%s", $1; for (i=2;i<=NF;i++) printf "%s ", $i; print ""}' file
Bus:0|Enclosure:0|Disk:0|State:Enabled
Bus:0|Enclosure:0|Disk:1|State:hot space
perl -00anE 's/:// for #F; say join "|", map { $_%2 ? () : "$F[$_]:$F[$_+1]" } 0..$#F' file
output
Bus:0|Enclosure:0|Disk:0|State:Enabled
Bus:0|Enclosure:0|Disk:1|State:Enabled
With GNU awk you could do:
$ awk 'NR>1{$6=$6$7;NF--;print RS,$0}' RS='Bus' OFS='|' file
Bus|0|Enclosure|0|Disk|0|State:Enabled
Bus|0|Enclosure|0|Disk|1|State:Enabled
And for the last row only:
$ awk 'END{$6=$6$7;NF--;print RS,$0}' RS='Bus' OFS='|' file
Bus|0|Enclosure|0|Disk|1|State:Enabled

Keeping first character in string, in a specific single field

I am trying to remove all but the first character of a specific field in a .tab file. I want to keep only first character in fields 10 and 11.
Normally the fields have 35 characters in them, so I used:
awk '{gsub ("..................................$","",$10;print} file
however, there are some fields which have less than 35, and were ignored by this replace function. I tired using substring, but I cannot figure out how to make it field specific. I believe there is a way to use perl inside awk so that I can use the function
perl -pe 's/(.).*/$1/g'
but I am not sure how to do that and use the field as the input value, so the file comes out identical except for the altered field.
is there a way to do the perl equivalent with gsub, or the awk equivalent with perl?
help is appreciated!
One way using awk:
awk '{ for (i=10;i<=11;i++) { $i = substr( $i, 1, 1) } } { print }' infile
Another way using gensub function of gawk
gawk '{ for (i=10;i<=11;i++) { $i = gensub(/(.).*/ , "\\1", G , $i) } }1' infile
A shortest awk version, I could figure out:
awk '($10=substr($10,1,1))&&$11=substr($11,1,1)' infile
If the 10th and/or 11th field is not existing then the line is not printed.
Similar version in perl
perl -ane '$F[9]=~s/(.).*/$1/;$F[10]=~s/(.).*/$1/;print "#F\n"' infile
This prints the line even if 10th and/or 11th field is not defined.
Another way with perl:
perl -pe '$c=0; s/(\S+)/(++$c < 10 || $c > 11) ? $1 : substr($1,0,1)/eg' filename

Add column to middle of tab-delimited file (sed/awk/whatever)

I'm trying to add a column (with the content '0') to the middle of a pre-existing tab-delimited text file. I imagine sed or awk will do what I want. I've seen various solutions online that do approximately this but they're not explained simply enough for me to modify!
I currently have this content:
Affx-11749850 1 555296 CC
I need this content
Affx-11749850 1 0 555296 CC
Using the command awk '{$3=0}1' filename messes up my formatting AND replaces column 3 with a 0, rather than adding a third column with a 0.
Any help (with explanation!) so I can solve this problem, and future similar problems, much appreciated.
Using the implicit { print } rule and appending the 0 to the second column:
awk '$2 = $2 FS "0"' file
Or with sed, assuming single space delimiters:
sed 's/ / 0 /2' file
Or perl:
perl -lane '$, = " "; $F[1] .= " 0"; print #F'
awk '{$2=$2" "0; print }' your_file
tested below:
> echo "Affx-11749850 1 555296 CC"|awk '{$2=$2" "0;print}'
Affx-11749850 1 0 555296 CC

Perform action on line range in sed/awk

How can I extract certain variables from a specific range of lines in sed/awk?
Example: I want to exctract the host and port from .tnsnames.ora
from this section that starts at line 105.
DB_CONNECTION=
(description=
(address=
(protocol=tcp)
(host=127.0.0.1)
(port=1234)
)
(connect_data=
(sid=ABCD)
(sdu=4321)
)
The gawk can use regular expression in field separater(FS).
'$0=$2' is always true, so automatically this script print $2.
$ gawk -F'[()]' 'NR>105&&NR<115&&(/host/||/port/)&&$0=$2' .tnsnames.ora
use:
sed '105,$< whatever sed code you want here >'
If you specifically want the host and the port you can do something like:
sed .tnsnames.ora '105,115p'|grep -e 'host=' -e 'port='
You can use address ranges to specify to which section to apply the regular expressions. If you leave the end line address out (keep the comma) it will match to the end of file. You can also 'chain' multiple expressions by using '-e' multiple times. The following expression will just print the port and host value to standard out. It uses back references (\1) in order to just print the matching parts.
sed -n -e '105,115s/(port=\([0-9].*\))/\1/p' -e '105,115s/(host=\([0-9\.].*\))/\1/p' tnsnames.ora
#lk, to address the answer you posted:
You can write awk code like C, but it's more succinctly expressed as "pattern {action}" pairs.
If you have gawk or nawk, the field separator is an ERE as Hirofumi Saito said
gawk -F'[()=]' '
NR < 105 {next}
NR > 115 {exit}
$2 == "host" || $2 == "port" {
# do stuff with $2 and $3
print $2 "=" $3
}
'