perl add contents of a column of a file - perl

Column A | Column B | Column C | Column D
35627799100 8 8 2
35627788000 60 34 45
35627799200 10 21 21
35627780000 60 5 8
Basically I have a file as shown above and would like to add the contents of Column B i.e 8+60+10+60. To be frank I'm not sure if need to remove the first line being text and if I can use the split function and put it in a hash something along the lines:
my %hash = map {split/\s+/,$_,4} <$file>;
Thanks in advance for the help.

If you just want to sum up the second column, a hash is overkill. You can do something like this and calculate the sum directly in the map.
my $sum;
$sum += (split /\s+/, $_)[1] while <$file>;
Edit: If you have header rows or other rows with non-numeric values in column 2, then as the comments below indicate, you will run into problems. You can avoid this by trading split for a regular expression, like so:
my $sum = 0;
while (<STDIN>)
{
$sum += $1 if $_ =~ /^\S+\s+(\d+)/;
}
If it's possible that column 1 has no text (ie. the line starts with a single blank and the first non-blank represents the second column), then change the first part of the pattern from ^\S+ to ^\S*.

This is an example based on your data:
use strict;
use warnings;
my $sum_column_b = 0;
<DATA>; #drop header
while( my $line = <DATA>) {
$line =~ m/\s+(\d+)/; #regexpr to catch second column values
$sum_column_b += $1;
}
print $sum_column_b, "\n"; #<-- prints: 138
__DATA__
Column A | Column B | Column C | Column D
35627799100 8 8 2
35627788000 60 34 45
35627799200 10 21 21
35627780000 60 5 8

Related

how do you select column from a text file using perl

I want to subtract values in one column from another column and add the differences.How do I do this in perl? I am new to perl.Hence I am unable to figure out how to go about it. Kindly help me.
The first thing is to separate the data into columns. In this case, the columns are separated by a space. split(/ /) will return a list of the columns.
To subtract one from the other, its pulling the values out of the the list and subtracting them.
At the end, you add the difference to the running sum and then loop over the data.
#!/usr/bin/perl
use strict;
my $sum = 0;
while(<DATA>) {
my #vals = split(/ /);
my $diff = $vals[1] - $vals[0];
$sum += $diff;
}
print $sum,"\n";
__DATA__
1 3
3 5
5 7
This will print out 6 --- (3 - 1) + (5 - 3) + (7 - 5)
FYI, if you combine the autosplit (-a), loop (n) and command-line program (-e) arguments (see perlrun), you can shorten this to a one-liner, much like awk:
perl -ane "$sum += $F[1] - $F[0]; END { print $sum }" filename

Delete the first character of array elements in Perl

I would like to remove the first character from a the elements of an array in a Perl script.
I have this line of script:
#dash = split /\s+/, $dash;
The variable "dash" is read from a particular row of my file: Example
21 A10 A11 A12 A13 ..
Then I have tried to push these values to my hash called "flowers"
for $i (1..$#dash) {
push(#flowers, $line[$i]);
}
This seems to work for what I need in my subsequent lines of script but I have found out that $dash contains unwanted character in front of each values:
A10 A11 A12 A13 ..
instead of
10 11 12 13 .....
but I wanted #flowers to contain:
10 11 12 13 ....
How can I delete the first character Before I pushed it to my hash (#flowers)
chop(#flowers);
could have worked but it only chops out the last character. When I tried to use
substr($dash, 0, 2)
It does produce 10, but all the rest of the values A11 A12 A13 is no longer in my #flowers.
Any help is appreciated.
This will operate on each element of the #dash array :
#dash = split /\s+/, $dash;
shift #dash;
#dash = map { substr($_, 1) } #dash;
Your substr($dash, 0, 2) was operating on the line as one string, not each element of it.
And, unless you need the index for some other operation :
push #flowers, #dash
That will push all elements of #dash onto #flowers. Which looks like what you're doing.
Why not just change the regex in the split?
split /\s+\D?/, $dash;
Adding them to #flowers this way if you want:
push( #flowers, split(/\s+\D?/, $dash) );
You need some kind of loop, since you want to do something to each element of #dash other than the first. map is convenient here.
my #flowers = map substr($dash[$_], 1), 1..$#dash;
which is the short way of writing
my #flowers;
for (1..$#dash) {
push #flowers, substr($dash[$_], 1);
}
I suggest that you just pull out all the digit sequences from $dash, like this:
my $dash = '21 A10 A11 A12 A13 .. ';
my #flowers = $dash =~ /\d+/g;
shift #flowers;
print "#flowers";
output
10 11 12 13
This is a possible solution:
use strict;
use warnings;
my $dash = "21 A10 A11 A12 A13"; #test data
my #dash = split /\s+/, $dash; #split into #dash array
shift #dash; #delete first array value
$_ = substr($_,1) for #dash; #for each item in array, remove the first character
print "#dash\n"; #prints: 10 11 12 13

Perl Hash Script

My question in Perl is like this:
Read a series of employee numbers and daily working hours from standard input, one set perl line.The employee number and the hours worked should be separated by a space. Use hashes and calculate the total number of hours worked and the average number of hours per work period. Print out a report by sorted employee number, the number of work periods, the total hours worked, and the average number of hours per work period. Assume that some of the employees are on a part-time schedule and do not work the same number of days or hours as regular employees.
My script is:
#!/usr/bin/perl
use strict;
use warnings;
my #series = qw(41234 9 67845 8 32543 10 84395 7 57543 9 23545 11);
my $workper = 3;
my %empwork;
while (my $series = shift #series) {
my $nums = shift #series;
$empwork{$series} += $nums;
}
my $tot;
foreach (sort keys %empwork) {
$tot += $empwork{$_};
}
my $avg = $tot/$workper;
print "Sorted Employee Numbers:\n";
foreach my $empnum(sort keys %empwork) {
print "$empnum\n";
}
print "The number of work periods is $workper\n";
print "Total number of hours is $tot\n";
print "Average number of hours per work period is $avg\n";
My Output is:
Sorted Employee Numbers:
23545
32543
41234
57543
67845
84395
The number of work periods is 3
Total number of hours is 54
Average number of hours per work period is 18
Can anyone please tell me whether I have done anything wrong in the script. If yes, please help. Thanks in advance.
If I use loop through %empwork once like this:
foreach my $empnum(sort keys %empwork) {
$tot += $empwork{$_};
print "$empnum\n";
}
Then I will get the output as:
Sorted Employee Numbers:
23545
32543
41234
57543
67845
84395
The number of work periods is 3
Total number of hours is 0
Average number of hours per work period is 0
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
I tried the program as below. But its not working.
#!/usr/bin/perl
use strict;
use warnings;
my #series = qw(41234 9 67845 8 32543 10 84395 7 57543 9 23545 11 23545 1 23545 2 23545 6);
my $total_periods = 0;
my $total_hours = 0;
my %empwork;
while (my $series = shift #series)
{
my $nums = shift #series;
$empwork{$series} += $nums;
}
print "Sorted Employee Numbers:\n";
foreach my $empnum(sort keys %empwork)
{
my $periods=0;
$periods++;
my $hours = 0;
$hours += $empwork{$empnum};
my $avg = $hours/$periods;
$total_periods += $periods;
$total_hours += $hours;
print "$empnum\n$periods periods\n$hours hours\n$avg average\n\n";
}
my $grand_avg = $total_hours/$total_periods;
print "The number of work periods is $total_periods\n";
print "Total number of hours is $total_hours\n";
print "Average number of hours per work period is $grand_avg\n";
Where am I going wrong?
This snippet of code has a problem:
foreach my $empnum(sort keys %empwork) {
$tot += $empwork{$_};
print "$empnum\n";
}
You are using $empnum as the loop iterator variable, but then referencing $empwork{$_}. That is why you get the errors. Simply replace that with $empwork{$empnum} and you will be fine.
The rest of the code that you show above works fine. However, a few suggestions:
Will there be duplicate employee numbers in your source array? The sample data doesn't show any. If there are no duplicates, you can simply do this to populate the hash, and do away with your while loop:
%empwork = #series;
Also, in this portion:
foreach (sort keys %empwork) {
$tot += $empwork{$_};
}
There is no reason to sort the keys when you are not doing something order dependent. It just makes the interpreter do unnecessary work. In this case, you don't even need the keys; you are only interested in adding up the values. So, you could do this, which is more efficient:
foreach (values %empwork)
{
$tot += $_;
}
(Of course, you could combine the two loops instead).
Update: here is the complete corrected code that I believe will meet all of your requirements.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/sum/;
my #series = qw(41234 9 67845 8 32543 10 84395 7 57543 9 23545 11 23545 1 23545 2 23545 7);
my $total_periods = 0;
my $total_hours = 0;
my %empwork;
while (my $series = shift #series) {
#For each employee, save a list of the number of times they worked
push #{$empwork{$series}}, shift #series;
}
print "Sorted Employee Numbers:\n";
foreach my $empnum(sort keys %empwork) {
my $periods = #{ $empwork{$empnum} };
my $hours = sum(#{ $empwork{$empnum} });
my $avg = $hours/$periods;
$total_periods += $periods;
$total_hours += $hours;
print "$empnum\n$periods periods\n$hours hours\n$avg average\n\n";
}
my $grand_avg = $total_hours/$total_periods;
print "The number of work periods is $total_periods\n";
print "Total number of hours is $total_hours\n";
print "Average number of hours per work period is $grand_avg\n";

How do I split an output file into multiple blocks based on a pattern

I have an output file with the following content. I want to split it into blocks based on "pattern" and store in a array.
Sample output:
100 pattern
line 1
line 2
line 3
101 pattern
line 4
102 pattern
line 5
line 6
...
Content between nth and (n+1)th occurrence of "pattern" is a block:
Block 1:
100 pattern
line 1
line 2
line 3
Block 2:
101 pattern
line 4
Block 3:
102 pattern
line 5
line 6
Basically I am searching for a pattern across lines and storing the content in between into an array.
Please let me know how do I achieve in perl
Assuming that your pattern are full lines containg the word pattern (and normal lines do not) and you want array elements to be entire blocks:
my #array;
my $i = 0;
for my $line ( <DATA> ) {
$i++ if ( $line =~ /pattern/ );
$array[$i] .= $line;
}
shift #array unless defined $array[0]; # if the first line matched the pattern
I know you have accepted an answer, but I wanted to show how you might do it by reading in the data and using a regular expression to split it.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $input = do { local $/; <DATA> };
my #input = split /(?=\d+ pattern)/, $input;
foreach (0 .. $#input) {
say "Record $_ is: $input[$_]";
}
__DATA__
100 pattern
line 1
line 2
line 3
101 pattern
line 4
102 pattern
line 5
line 6

How do I calculate the difference between each element in two arrays?

I have a text file with numbers which I have grouped as follows, seperated by blank line:
42.034 41.630 40.158 26.823 26.366 25.289 23.949
34.712 35.133 35.185 35.577 28.463 28.412 30.831
33.490 33.839 32.059 32.072 33.425 33.349 34.709
12.596 13.332 12.810 13.329 13.329 13.569 11.418
Note: the groups are always of equal length and can be arranged in more than one line long, if the group is large, say 500 numbers long.
I was thinking of putting the groups in arrays and iterate along the length of the file.
My first question is: how should I subtract the first element of array 2 from array 1, array 3 from array 2, similarly for the second element and so on till the end of the group?
i.e.:
34.712-42.034,35.133-41.630,35.185-40.158 ...till the end of each group
33.490-34.712,33.839-35.133 ..................
and then save the differences of the first element in one group (second question: how ?) till the end
i.e.:
34.712-42.034 ; 33.490-34.712 ; and so on in one group
35.133-41.630 ; 33.839-35.133 ; ........
I am a beginner so any suggestions would be helpful.
Assuming you have your file opened, the following is a quick sketch
use List::MoreUtils qw<pairwise>;
...
my #list1 = split ' ', <$file_handle>;
my #list2 = split ' ', <$file_handle>;
my #diff = pairwise { $a - $b } #list1, #list2;
pairwise is the simplest way.
Otherwise there is the old standby:
# construct a list using each index in #list1 ( 0..$#list1 )
# consisting of the difference at each slot.
my #diff = map { $list1[$_] - $list2[$_] } 0..$#list1;
Here's the rest of the infrastructure to make Axeman's code work:
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw<pairwise>;
my (#prev_line, #this_line, #diff);
while (<>) {
next if /^\s+$/; # skip leading blank lines, if any
#prev_line = split;
last;
}
# get the rest of the lines, calculating and printing the difference,
# then saving this line's values in the previous line's values for the next
# set of differences
while (<>) {
next if /^\s+$/; # skip embedded blank lines
#this_line = split;
#diff = pairwise { $a - $b } #this_line, #prev_line;
print join(" ; ", #diff), "\n\n";
#prev_line = #this_line;
}
So given the input:
1 1 1
3 2 1
2 2 2
You'll get:
2 ; 1 ; 0
-1 ; 0 ; 1