sorting some data - perl

I am trying to sort some data in bash. Data looks like below.
20110724.gz 1347
20110724.gz 2128
20110725.gz 1315
20110725.gz 2334
20110726.gz 808
20110726.gz 1088
-bash-3.2$
After sorting, it should look like
20110724.gz 3475
20110725.gz 3649
20110726.gz 1896
Basically, for a given date, the data are summed up. Can somebody help? Thanks.
hmm, hopefully I figure it out in a few days.

Here's a quick and dirty perl oneliner:
$ perl -e 'my %h = (); while (<>) { chomp; my ($fname, $count) = split; $h{$fname} += $count;} foreach my $k (sort keys %h) {print $k, " ", $h{$k}, "\n"}' < datafile

Here's a perl solution.
Usage: script.pl input.txt > output.txt
Code:
use warnings;
use strict;
use ARGV::readonly;
my %sums;
while (<>) {
my ($date, $num) = split;
$sums{$date} += $num;
}
for my $date (sort keys %sums) {
print "$date $sums{$date}\n";
}
Or as a one-liner:
$ perl -we 'my %h; while(<>) { ($d,$n)=split; $h{$d}+=$n; } print "$_ $h{$_}\n" for sort keys %h;' data2.txt
In case you do need a numerical sort on the dates:
sort { substr($a,0,8) <=> substr($b,0,8) } keys %sums;

You don't need perl for doing that. Some shell trickery will help :)
sort -n -k1,8 <file | while true ; do
if ! read line ; then
test -n "$accfile" && echo $accfile $value
break
fi
line=$(echo $line | tr -s ' ' )
curfile=$(echo $line | cut -d\ -f1)
curvalue=$(echo $line | cut -d\ -f2)
if [ $curfile != "$accfile" ] ; then
# new file, output the last if not empty
test -n "$accfile" && echo $accfile $value
accfile=$curfile
value=$curvalue
else
value=$(expr $value \+ $curvalue)
fi
done
The k parameter tells sort what characters use to sort. As dates are put in number-ordered format, a number sort (-n) works.

Related

awk output to variable and change directory

In the below script. am not able to change the directory.i need the output like above 70% disk inside that directory which one is consuming more space.
#!/usr/bin/perl
use strict;
use warnings;
my $test=qx("df -h |awk \+\$5>=70 {print \$6} ");
chdir($test) or die "$!";
print $test;
system("du -sh * | grep 'G'");
No need to call awk in your case because Perl is quite good at splitting and printing certain lines itself. Your code has some issues:
The code qx("df -h |awk \+\$5>=70 {print \$6} ") tries to execute the string "df -h | awk ..." as a command which fails because there is no such command called "df -h | awk". When I run that code I get sh: 1: df -h |awk +>=70 {print } : not found. You can fix that by dropping the quotes " because qx() already is quoting. The variable $test is empty afterwards, so the chdir changes to your $HOME directory.
Then you'll see the next error: awk: line 1: syntax error at or near end of line, because it calls awk +\$5>=70 {print \$6}. Correct would be awk '+\$5>=70 {print \$6}', i.e. with ticks ' around the awk scriptlet.
As stated in a comment, df -h splits long lines into two lines. Example:
Filesystem 1K-blocks Used Available Use% Mounted on
/long/and/possibly/remote/file/system
10735331328 10597534720 137796608 99% /local/directory
Use df -hP to get guaranteed column order and one line output.
The last system call shows the directory usage (space) for all lines containing the letter G. I reckon that's not exactly what you want.
I suggest the following Perl script:
#!/usr/bin/env perl
use strict;
use warnings;
foreach my $line ( qx(df -hP) ) {
my ($fs, $size, $used, $avail, $use, $target) = split(/\s+/, $line);
next unless ($use =~ /^\d+\s*\%$/); # skip header line
# now $use is e.g. '90%' and we drop the '%' sign:
$use =~ s/\%$//;
if ($use > 70) {
print "almost full: $target; top 5 directories:\n";
# no need to chdir here. Simply use $target/* as search pattern,
# reverse-sort by "human readable" numbers, and show the top 5:
system("du -hs $target/* 2>/dev/null | sort -hr | head -5");
print "\n\n";
}
}
#!/usr/bin/perl
use strict;
use warnings;
my #bigd = map { my #f = split " "; $f[5] }
grep { my #f = split " "; $f[4] =~ /^(\d+)/ && $1 >= 70}
split "\n", `df -hP`;
print "big directories: $_\n" for #bigd;
for my $bigd (#bigd) {
chdir($bigd);
my #bigsubd = grep { my #f = split " "; $f[0] =~ /G/ }
split "\n", `du -sh *`;
print "big subdirectories in $bigd:\n";
print "$_\n" for #bigsubd;
}
I belive you wanted to do something like this.

Perl command line - assume while loop around

Can anyone explain the difference in output of the two perl (using cygwin) commands below:
$ echo abc | perl -n -e 'if ($_ =~ /a/) {print 1;}'
prints :
1
$ echo abc | perl -e 'if ($_ =~ /a/) {print 1;}'
The first prints '1' while second one outputs blank?
Thanks
-n switch adds while loop around your code, so in your case $_ is populated from standard input. In second example there is no while loop thus $_ is leaved undefined.
Using Deparse you can ask perl to show how your code is parsed,
perl -MO=Deparse -n -e 'if ($_ =~ /a/) {print 1;}'
LINE: while (defined($_ = <ARGV>)) {
if ($_ =~ /a/) {
print 1;
}
}
perl -MO=Deparse -e 'if ($_ =~ /a/) {print 1;}'
if ($_ =~ /a/) {
print 1;
}

How can I sum up the exponent value in bash shell?

here are the example values
2.31312e+06
4.34234234e+07
4.578362e+06
3.213124124e+06
how can I add them?
Numbers are args:
perl -le'$s += $_ for #ARGV; END { print $s }'
Numbers on STDIN or file named as argument (one per line):
perl -nle'$s += $_; END { print $s }'
Use printf '%e\n', $s instead of print $s if you want the result in exponent notation.
You could use awk. The following assumes that every number in the file is on a separate line:
awk '{a+=$0}END{print a}' filename
For your input, it'd produce:
5.3528e+07
If all the numbers in the file are on the same line, say:
awk '{for(i=1;i<=NF;++i) a+=$i}END{print a}' filename
Here is a Perl version:
#!/usr/bin/perl
use warnings;
use strict;
my $sum = 0;
while (<DATA>) {
$sum += $_;
}
print "$sum\n";
__DATA__
2313120
43423423.4
4578362
3213124.124
Here is the one-liner version, if you prefer this style:
perl -ne ' $s += $_; END { print "$s\n" } ' datafile

Awk Equivalent in perl

I am understanding perl in command line, please help me
what is equivalent in perl
awk '{for(i=1;i<=NF;i++)printf i < NF ? $i OFS : $i RS}' file
awk '!x[$0]++' file
awk 'FNR==NR{A[$0];next}($0 in A)' file1 file2
awk 'FNR==NR{A[$1]=$5 OFS $6;next}($1 in A){print $0,A[$1];delete A[$1]}' file1 file1
Please someone help me...
Try the awk to perl translator. For example:
$ echo awk '!x[$0]++' file | a2p
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
# process any FOO=bar switches
while (<>) {
chomp; # strip record separator
print $_ if $awk;print $_ if !($X{$_}++ . $file);
}
You can ignore the boiler plate at the beginning and see the meat of the perl in the while loop. The translation is seldom perfect (even in this simple example, the perl code omits newlines), but it usually provides a reasonable approximation.
Another example (the one Peter is having trouble with in the comments):
$ echo '{for(i=1;i<=NF;i++)printf( i < NF ? ( $i OFS ) : ($i RS))}' | a2p
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if $running_under_some_shell;
# this emulates #! processing on NIH machines.
# (remove #! line above if indigestible)
eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
# process any FOO=bar switches
$, = ' '; # set output field separator
while (<>) {
chomp; # strip record separator
#Fld = split(' ', $_, -1);
for ($i = 1; $i <= ($#Fld+1); $i++) {
printf (($i < ($#Fld+1) ? ($Fld[$i] . $,) : ($Fld[$i] . $/)));
}
}

Perl : sort <=> not working

I am not sure why this Perl sorting is not working.
Please suggest how to resolve this.
while (<>) {
chomp;
if (/VIOLATE/) {
#lines = split " ", $_;
#print "$lines[-2]\n"; ## Print last but one column
my #viol = "$lines[-2]\n";
#sorted = sort {$a <=> $b} #viol;
print "#sorted";
}
};
Command : perl test.pl test.log
test.log :
0.98 2.04 -1.106 VIOLATE
0.98 2.04 3.06
0.98 2.04 -11.06 VIOLATE
0.98 2.04 -1.06 VIOLATE
0.98 2.04 1.06
0.98 2.04 -0.226 VIOLATE
0.98 2.04 -2.06 VIOLATE
Are you trying to match any line with VIOLATE in it, put the result in an array then sort all the violations? If so you need to declare and sort #viol outside the loop:
use strict;
use warnings; # Don't forget these!
my #viol;
while (<>) {
chomp;
if (/VIOLATE/) {
my #lines = split(/\s+/); # Split on one or more whitespace characters.
push #viol, $lines[-2];
}
}
# sort and print
my #sorted = sort {$a <=> $b} #viol;
print "#sorted";
This outputs: -11.06 -2.06 -1.106 -1.06 -0.226
Your sort works just fine. The only problem is that your array only has one element. Right above the sort, you do the assignment.
If you want this to work, you need to fill your array before you sort.
This is also a one-liner:
perl -lanwe 'push(#a, $F[-2]) if /VIOLATE/ }{ print for sort { $a <=> $b } #a'
Note the use of the "Eskimo Kiss" operator, }{. It works in a way similar to an END { ... } block, in that whatever comes after it is executed at the end of the input.
For the curious: The "Eskimo Kiss" works because the switch -n adds a while(<>) { ... } loop around the -e program string, in a very literal way. Deparsed, it looks like this, with comments for clarity:
perl -MO=Deparse -lanwe 'push(#a, $F[-2]) if /VIOLATE/ }{ print for sort { $a <=> $b } #a'
BEGIN { $^W = 1; } # warnings enabled by -w
BEGIN { $/ = "\n"; $\ = "\n"; } # line endings enabled by -l
LINE: while (defined($_ = <ARGV>)) { # while(<>) loop added by -n
chomp $_; # chomp added by -l
our(#F) = split(' ', $_, 0); # autosplit enabled by -a
push #a, $F[-2] if /VIOLATE/; # our code
} # eskimo kiss close
{ # eskimo kiss open
print $_ foreach (sort {$a <=> $b} #a); # our END code
} # closing bracket added by -n
-e syntax OK