Sum of tagged lines perl/shellscript - perl

I have input that looks like
5 X
8 Y
3 Z
9 X
I want output that sums the numerical values for each 'tag'; e.g.
14 X
8 Y
3 Z
Wondering if there is a slick one liner I can use (along the lines of the ones for summing a list of integers using awk).

Something like this should do the trick:
perl -ne '$table{$2} += $1 if /(\d+)\s+(.+)/; END {print "$table{$_} $_\n" for keys %table}'
or to use auto-splitting:
perl -ane '$table{$F[1] or next} += $F[0]; END {print "$table{$_} $_\n" for keys %table}'

As slick as I can make it:
perl -alne 'END{print"$X{$_} $_"for sort{$X{$b}<=>$X{$a}}keys%X}$X{$F[1]}+=$F[0]'

Tried to make it as little obfuscated as possible :)
Sorts output by 'tag'.
perl -alne '$counts{$F[1]} += $F[0]; END { print "$counts{$_} $_" for sort(keys %counts) }'

Output in random order
perl -alne'$t{$F[1]}+=$F[0]}{print"$t{$_} $_"for keys%t'
alphabetically sorted by tag
perl -alne'$t{$F[1]}+=$F[0]}{print"$t{$_} $_"for sort keys%t'
sorted by value
perl -alne'$t{$F[1]}+=$F[0]}{print"$t{$_} $_"for sort{$t{$b}<=>$t{$a}}keys%t'

gawk "{count[$2]+=$1}END{for(i in count)print count[i],i}" 1.t

Related

Can "perl -a" somehow re-join #F using the original whitespace?

My input has a mix of tabs and spaces for readability. I want to modify a field using perl -a, then print out the line in its original form. (The data is from findup, showing me a count of duplicate files and the space they waste.) Input is:
2 * 4096 backup/photos/photo.jpg photos/photo.jpg
2 * 111276032 backup/books/book.pdf book.pdf
The output would convert field 3 to kilobytes, like this:
2 * 4 KB backup/photos/photo.jpg photos/photo.jpg
2 * 108668 KB backup/books/book.pdf book.pdf
In my dream world, this would be my code, since I could just will perl to automatically recombine #F and preserve the original whitespace:
perl -lanE '$F[2]=int($F[2]/1024)." KB"; print;'
In real life, joining with a single space seems like my only option:
perl -lanE '$F[2]=int($F[2]/1024)." KB"; print join(" ", #F);'
Is there any automatic variable which remembers the delimiters? If I had a magic array like that, the code would be:
perl -lanE 'BEGIN{use List::Util "reduce";} $F[2]=int($F[2]/1024)." KB"; print reduce { $a . shift(#magic) . $b } #F;'
No, there is no such magic object. You can do it by hand though
perl -wnE'#p = split /(\s+)/; $p[4] = int($p[4]/1024); print #p' input.txt
The capturing parens in split's pattern mean that it is also returned, so you catch exact spaces. Since spaces are in the array we now need the fifth field.
As it turns out, -F has this same property. Thanks to Сухой27. Then
perl -F'(\s+)' -lanE'$F[4] = int($F[4]/1024); say #F' input.txt
Note: with 5.20.0 "-F now implies -a and -a implies -n". Thanks to ysth.
You could just find the correct part of the line and modify it:
perl -wpE's/^\s*+(?>\S+\s+){2}\K(\S+)/int($1\/1024) . " KB"/e'

how do you select column from a text file using perl

I want to subtract values in one column from another column and add the differences.How do I do this in perl? I am new to perl.Hence I am unable to figure out how to go about it. Kindly help me.
The first thing is to separate the data into columns. In this case, the columns are separated by a space. split(/ /) will return a list of the columns.
To subtract one from the other, its pulling the values out of the the list and subtracting them.
At the end, you add the difference to the running sum and then loop over the data.
#!/usr/bin/perl
use strict;
my $sum = 0;
while(<DATA>) {
my #vals = split(/ /);
my $diff = $vals[1] - $vals[0];
$sum += $diff;
}
print $sum,"\n";
__DATA__
1 3
3 5
5 7
This will print out 6 --- (3 - 1) + (5 - 3) + (7 - 5)
FYI, if you combine the autosplit (-a), loop (n) and command-line program (-e) arguments (see perlrun), you can shorten this to a one-liner, much like awk:
perl -ane "$sum += $F[1] - $F[0]; END { print $sum }" filename

need perl one liner to get a specific content out of the line and possibly average it

I have a file which had many lines which containts "x_y=XXXX" where XXXX can be a number from 0 to some N.
Now,
a) I would like to get only the XXXX part of the line in every such line.
b) I would like to get the average
Possibly both of these in one liners.
I am trying out sometihng like
cat filename.txt | grep x_y | (this need to be filled)
I am not sure what to file
In the past I have used commands like
perl -pi -e 's/x_y/m_n/g'
to replace all the instances of x_y.
But now, I would like to match for x_y=XXXX and get the XXXX out and then possibly average it out for the entire file.
Any help on this will be greatly appreciated. I am fairly new to perl and regexes.
Timtowtdi (as usual).
perl -nE '$s+=$1, ++$n if /x_y=(\d+)/; END { say "avg:", $s/$n }' data.txt
The following should do:
... | grep 'x_y=' | perl -ne '$x += (split /=/, $_)[1]; $y++ }{ print $x/$y, "\n"'
The }{ is colloquially referred to as eskimo operator and works because of the code which -n places around the -e (see perldoc perlrun).
Using awk:
/^[^_]+_[^=]+=[0-9]+$/ {sum=sum+$2; cnt++}
END {
print "sum:", sum, "items:", cnt, "avg:", sum/cnt
}
$ awk -F= -f cnt.awk data.txt
sum: 55 items: 10 avg: 5.5
Pure bash-solution:
#!/bin/bash
while IFS='=' read str num
do
if [[ $str == *_* ]]
then
sum=$((sum + num))
cnt=$((cnt + 1))
fi
done < data.txt
echo "scale=4; $sum/$cnt" | bc ;exit
Output:
$ ./cnt.sh
5.5000
As a one-liner, split up with comments.
perl -nlwe '
push #a, /x_y=(\d+)/g # push all matches onto an array
}{ # eskimo-operator, is evaluated last
$sum += $_ for #a; # get the sum
print "Average: ", $sum / #a; # divide by the size of the array
' input.txt
Will extract multiple matches on a line, if they exist.
Paste version:
perl -nlwe 'push #a, /x_y=(\d+)/g }{ $sum += $_ for #a; print "Average: ", $sum / #a;' input.txt

How do I increment a value with leading zeroes in Perl?

It's the same question as this one, but using Perl!
I would like to iterate over a value with just one leading zero.
The equivalent in shell would be:
for i in $(seq -w 01 99) ; do echo $i ; done
Since the leading zero is significant, presumably you want to use these as strings, not numbers. In that case, there is a different solution that does not involve sprintf:
for my $i ("00" .. "99") {
print "$i\n";
}
Try something like this:
foreach (1 .. 99) {
$s = sprintf("%02d",$_);
print "$s\n";
}
The .. is called the Range Operator and can do different things depending on its context. We're using it here in a list context so it counts up by ones from the left value to the right value. So here's a simpler example of it being used; this code:
#list = 1 .. 10;
print "#list";
has this output:
1 2 3 4 5 6 7 8 9 10
The sprintf function allows us to format output. The format string %02d is broken down as follows:
% - start of the format string
0 - use leading zeroes
2 - at least two characters wide
d - format value as a signed integer.
So %02d is what turns 2 into 02.
printf("%02d\n",$_) foreach (1..20)
print foreach ("001" .. "099")
foreach $i (1..99) {printf "%02d\n", $i;}
I would consider to use sprinft to format $i according to your requirements. E.g. printf '<%06s>', 12; prints <000012>.
Check Perl doc about sprinft in case you are unsure.
Well, if we're golfing, why not:
say for "01".."99"`
(assuming you're using 5.10 and have done a use 5.010 at the top of your program, of course.)
And if you do it straight from the shell, it'd be:
perl -E "say for '01'..'99'"

What's the quickest way to get the mean of a set of numbers from the command line?

Using any tools which you would expect to find on a nix system (in fact, if you want, msdos is also fine too), what is the easiest/fastest way to calculate the mean of a set of numbers, assuming you have them one per line in a stream or file?
awk ' { n += $1 }; END { print n / NR }'
This accumulates the sum in n, then divides by the number of items (NR = Number of Records).
Works for integers or reals.
Awk
awk '{total += $1; count++ } END {print total/count}'
Using Num-Utils for UNIX:
average 1 2 3 4 5 6 7 8 9
perl -e 'while (<>) { $sum += $_; $count++ } print $sum / $count, "\n"';
Using "st" (https://github.com/nferraz/st):
$ st numbers.txt
N min max sum mean sd
10.00 1.00 10.00 55.00 5.50 3.03
Specify an option to see individual stats:
$ st numbers.txt --mean
5.5
(DISCLAIMER: I wrote this tool :))
In Powershell, it would be
get-content .\meanNumbers.txt | measure-object -average
Of course, that's the verbose syntax. If you typed it using aliases,
gc .\meanNumbers.txt | measure-object -a
Perl.
#a = <STDIN>;
for($i = 0; $i < ##a; $i++)
{
$sum += $a[i];
}
print $a[i]/##a;
Caveat Emptor: My syntax may be a little whiffly.
Ruby one liner
cat numbers.txt | ruby -ne 'BEGIN{$sum=0}; $sum=$sum+$_.to_f; END{puts $sum/$.}'
source