Perl Hash Script - perl

My question in Perl is like this:
Read a series of employee numbers and daily working hours from standard input, one set perl line.The employee number and the hours worked should be separated by a space. Use hashes and calculate the total number of hours worked and the average number of hours per work period. Print out a report by sorted employee number, the number of work periods, the total hours worked, and the average number of hours per work period. Assume that some of the employees are on a part-time schedule and do not work the same number of days or hours as regular employees.
My script is:
#!/usr/bin/perl
use strict;
use warnings;
my #series = qw(41234 9 67845 8 32543 10 84395 7 57543 9 23545 11);
my $workper = 3;
my %empwork;
while (my $series = shift #series) {
my $nums = shift #series;
$empwork{$series} += $nums;
}
my $tot;
foreach (sort keys %empwork) {
$tot += $empwork{$_};
}
my $avg = $tot/$workper;
print "Sorted Employee Numbers:\n";
foreach my $empnum(sort keys %empwork) {
print "$empnum\n";
}
print "The number of work periods is $workper\n";
print "Total number of hours is $tot\n";
print "Average number of hours per work period is $avg\n";
My Output is:
Sorted Employee Numbers:
23545
32543
41234
57543
67845
84395
The number of work periods is 3
Total number of hours is 54
Average number of hours per work period is 18
Can anyone please tell me whether I have done anything wrong in the script. If yes, please help. Thanks in advance.
If I use loop through %empwork once like this:
foreach my $empnum(sort keys %empwork) {
$tot += $empwork{$_};
print "$empnum\n";
}
Then I will get the output as:
Sorted Employee Numbers:
23545
32543
41234
57543
67845
84395
The number of work periods is 3
Total number of hours is 0
Average number of hours per work period is 0
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
Use of uninitialized value in hash element at /tmp/135043087931085.pl line 16.
Use of uninitialized value in addition (+) at /tmp/135043087931085.pl line 16.
I tried the program as below. But its not working.
#!/usr/bin/perl
use strict;
use warnings;
my #series = qw(41234 9 67845 8 32543 10 84395 7 57543 9 23545 11 23545 1 23545 2 23545 6);
my $total_periods = 0;
my $total_hours = 0;
my %empwork;
while (my $series = shift #series)
{
my $nums = shift #series;
$empwork{$series} += $nums;
}
print "Sorted Employee Numbers:\n";
foreach my $empnum(sort keys %empwork)
{
my $periods=0;
$periods++;
my $hours = 0;
$hours += $empwork{$empnum};
my $avg = $hours/$periods;
$total_periods += $periods;
$total_hours += $hours;
print "$empnum\n$periods periods\n$hours hours\n$avg average\n\n";
}
my $grand_avg = $total_hours/$total_periods;
print "The number of work periods is $total_periods\n";
print "Total number of hours is $total_hours\n";
print "Average number of hours per work period is $grand_avg\n";
Where am I going wrong?

This snippet of code has a problem:
foreach my $empnum(sort keys %empwork) {
$tot += $empwork{$_};
print "$empnum\n";
}
You are using $empnum as the loop iterator variable, but then referencing $empwork{$_}. That is why you get the errors. Simply replace that with $empwork{$empnum} and you will be fine.
The rest of the code that you show above works fine. However, a few suggestions:
Will there be duplicate employee numbers in your source array? The sample data doesn't show any. If there are no duplicates, you can simply do this to populate the hash, and do away with your while loop:
%empwork = #series;
Also, in this portion:
foreach (sort keys %empwork) {
$tot += $empwork{$_};
}
There is no reason to sort the keys when you are not doing something order dependent. It just makes the interpreter do unnecessary work. In this case, you don't even need the keys; you are only interested in adding up the values. So, you could do this, which is more efficient:
foreach (values %empwork)
{
$tot += $_;
}
(Of course, you could combine the two loops instead).
Update: here is the complete corrected code that I believe will meet all of your requirements.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/sum/;
my #series = qw(41234 9 67845 8 32543 10 84395 7 57543 9 23545 11 23545 1 23545 2 23545 7);
my $total_periods = 0;
my $total_hours = 0;
my %empwork;
while (my $series = shift #series) {
#For each employee, save a list of the number of times they worked
push #{$empwork{$series}}, shift #series;
}
print "Sorted Employee Numbers:\n";
foreach my $empnum(sort keys %empwork) {
my $periods = #{ $empwork{$empnum} };
my $hours = sum(#{ $empwork{$empnum} });
my $avg = $hours/$periods;
$total_periods += $periods;
$total_hours += $hours;
print "$empnum\n$periods periods\n$hours hours\n$avg average\n\n";
}
my $grand_avg = $total_hours/$total_periods;
print "The number of work periods is $total_periods\n";
print "Total number of hours is $total_hours\n";
print "Average number of hours per work period is $grand_avg\n";

Related

How to count the odd number of occurrences in Perl?

I have a program in Perl that is supposed to count the number of times an element appears in an array, and prints out the value of the element if the number of times it appears is odd.
Here is my code.
#!/usr/bin/perl
use strict;
use warnings;
sub FindOddCount($)
{
my #arraynumber = #_;
my $Even = 0;
my $i = 0;
my $j = 0;
my $array_length = scalar(#_);
for ($i = 0; $i <= $array_length; $i++)
{
my $IntCount = 0;
for ($j = 0; $j <= $array_length; $j++)
{
if ($arraynumber[$i] == $arraynumber[$j])
{
$IntCount++;
print($j);
}
}
$Even = $IntCount % 2;
if ($Even != 0)
{
return $arraynumber[$i];
}
}
if ($Even == 0)
{
return "none";
}
}
my #array1 = (1,1,2,2,3,3,4,4,5,5,6,7,7,7,7);
my #array2 = (10,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10,10);
my #array3 = (6,6,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10.10);
my #array4 = (10,10,7,7,2,2,3,3,4,4,5,5,7,7,7,7,10,10,6);
my #array5 = (6,6);
my #array6 = (1);
my $return_value1 = FindOddCount(#array1);
my $return_value2 = FindOddCount(#array2);
my $return_value3 = FindOddCount(#array3);
my $return_value4 = FindOddCount(#array4);
my $return_value5 = FindOddCount(#array5);
my $return_value6 = FindOddCount(#array6);
print "The Odd value for the first array is $return_value1\n";
print "The Odd value for the 2nd array is $return_value2\n ";
print "The Odd value for the 3rd array is $return_value3\n ";
print "The Odd value for the 4th array is $return_value4\n ";
print "The Odd value for the 5th array is $return_value5\n ";
print "The Odd value for the sixth array is $return_value6\n ";
Here are my results.
The Odd value for the first array is 15
The Odd value for the first array is 21
The Odd value for the first array is 21
The Odd value for the first array is 19
The Odd value for the first array is 2
The Odd value for the first array is 1
If you can't tell. It is printing the count of all of the elements of the array instead of returning the element that occurs an odd number of times. In addition I get this error.
Use of uninitialized value in numeric eq (==) at OddCount.pl line 17.
Line 17 is where the 1st array and the 2nd array are compared. Yet the values are clearly instantiated and they work when I print them out. What is the issue?
Build a frequency hash for an array then go through it to see which elements have odd counts
use warnings;
use strict;
use feature 'say';
my #ary = qw(7 o1 7 o2 o1 z z o1); # o1,o2 appear odd number of times
my %freq;
++$freq{$_} for #ary;
foreach my $key (sort keys %freq) {
say "$key => $freq{$key}" if $freq{$key} & 1;
}
This is far simpler than the code in the question -- but which is easily fixed, too. See below.
Some notes
++$freq{$_} increments the value for the key $_ in the hash %freq by 1, or it adds the key to the hash if it doesn't exist (by autovivification) and sets its value to one. So when an array is iterated over with this code in the end the hash %freq contains for keys the array elements and for their values the elements' counts
Test $n & 1 uses the bitwise AND -- it is true if $n has the lowest bit set, so if it is odd
That ++$freq{$_} for #ary; is a Statement Modifier, running the statement for each element of #ary where the current element is aliased by $_ variable
This prints
o1 => 3
o2 => 1
This printing of odd-frequency elements (if any) is sorted alphabetically in elements, just so. Please change to any particular order that may be needed, or let me know.
Comments on the code in the question, which is correct with two simple fixes.
It uses prototypes in a wrong way for the purpose, in sub FindOddCount($). I suspect that this isn't needed so let's not dwell on it -- just drop that and make it sub FindOddCount
The index in loops includes the length of the array (<=) so in the last iteration they attempt to index into the array past its last element. Off-by-one error. That can be fixed by changing the condition into < $array_length (instead of <=), but read on
There is no reason to use C-style loops, not even to iterate over the index. (Needed here since the position in the array is used.) Scripting languages provide for cleaner ways†
foreach my $i1 (0 .. $#arraynumber) {
my $IntCount = 0;
foreach my $i2 (0 .. $#arraynumber) {
if ( $arraynumber[$i1] == $arraynumber[$i2] ) {
...
That 0..N is the range operator, which creates the list of numbers within that range. The syntax $#array_name is the index of the last element in the array #array_name. Exactly what's needed. So there is no need for the array length
Multiple (six) arrays, used to check the code, can be manipulated in far better and easier ways by using references; see the tutorial for complex data structures perldsc, and in particular the page perllol, for array-of-arrays
In short: when you remove the prototype and fix off-by-one error your code seems to be correct.
† And not only scripting ones -- for example, C++11 introduced the range-based for loop
for (auto var: container) ... // really const auto&, or auto&, or auto&&
and the link (a standard reference) says
Used as a more readable equivalent to the traditional for loop [...]
Count the number of occurrences in a for loop using a hash. Then print the desired elements using grep, like so:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw( say );
my #array = (10,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10,10);
my %cnt;
# Count each element of the array:
$cnt{$_}++ for #array;
# Print only the array elements that occurred an odd number of times,
# separated by ", ":
say join q{, }, grep { $cnt{$_} % 2 } #array;
# 6, 6, 6

need to use pop function twice to remove last element from the array (perl)

I've just started to learn Perl and joined Euler project to practice coding. This is the first exercise I did. The task was: "If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000." My solution:
use strict;
use warnings;
my #numbers = (1..1000);
my $counter = 0;
my #all_array = ();
my $total = 0;
foreach $counter (#numbers) {
if (($numbers[$counter] % 3 == 0) or ($numbers[$counter] % 5 == 0)) {
push (#all_array, $numbers[$counter]);
}
}
pop (#all_array); #after that the last digit is still in place
pop (#all_array); # only now the number 1000 is removed
my $tot = eval join '+', #all_array; #returns correct value
print $tot;
The final element of the array is 1000. It seems as if it is followed by a space so to remove the number and get the correct result I have to use the pop function twice. The use of local $"='' changes nothing. Besides, I'm getting a message: Use of uninitialized value within #numbers in modulus (%) at C:\Users\Greg\Documents\perl\unt.pl line 10.
What am I doing wrong and how to fix it?
Let's go through your code:
#numbers is an array with the numbers from 1 to 1000
Why do you include 1000 in the list when the exercise says "less than N"?
the for loop
assigns each of the numbers to $counter, i.e. 1, 2, ...
you use $counter as index into #numbers
why do you do that when $counter is already the number you are looking for?
Perl arrays start at index 0, so you have an off-by-one error
you never check 1 because your first number will be $numbers[1] == 2 (OK, doesn't cause an incorrect result for the task at hand...)
you access one element behind the array, i.e. $numbers[1000] == undef
calculating with undef will cause a warning
undef % 3 == 0 is true, hence...
the first pop() will remove undef (from $counter == 1000)
the second pop() will remove 1000 (from $counter == 999)
then you use eval on a string 3 + 5 + 6 + ... a very inefficient way to do a sum :-)
Wouldn't it be just a simpler approach to calculate the sum while running over the numbers from 1 to N-1? F.ex.:
#!/usr/bin/perl
use strict;
use warnings;
foreach my $arg (#ARGV) {
my $sum = 0;
foreach my $number (1..$arg - 1) {
$sum += $number
if ($number % 3 == 0) || ($number % 5 == 0);
}
print "${arg}: ${sum}\n";
}
exit 0;
Test run:
$ perl dummy.pl 10 100 1000 10000
10: 23
100: 2318
1000: 233168
10000: 23331668

Perfect matching is not working

I have a problem about perfect matching.I want to get the sum of positive and negative integers from a file .Also I want to get dates have same values in the file.
My File:
Hello -12, 3.4 and 32. Where did you
go on 01/01/2013 ? On 01/01/2013, we
went home. -4 plus 5 makes 1.
03/02/2013
Results I should be getting:
-16 //the sum of negative integers.
38 //the sum of positive integers.
2 //total number of unique dates :)
My code is:
$sum=0;
$summ=0;
while (<>) {
foreach ($_=~ /-\d+/g)
{
$sum+=$_;
}
foreach ($poz=~ /^\d+?$/g) {
$summ+=$poz;
}
foreach (/\d{2}\/\d{2}\/\d{4}/) {
$count++;
}
}
print "$sum\n";
print "$summ\n";
print "$count\n";
The output I am getting is:
-16
0
2
I can not get the value of the sum of positive numbers. Could you please help me?
First of all, always use use strict; use warnings;. It would have found your first error: The use of $poz without ever giving it a value. Twice!
A positive integer is a sequence
Not preceded by -.
Not preceded by a digit.
Not preceded by ..
Not preceded by /.
Consists of digits
Not followed by . plus digits. (Well, you might consider 4.0 an integer, but I doubt it.)
Not followed by a digit.
Not followed by /.
(?<![\-\d./])\d+(?![\d/])(?!\.\d)
A negative integer is a sequence
Consists of - followed by digits
Not followed by . plus digits. (Well, you might consider 4.0 an integer, but I doubt it.)
Not followed by a digit.
-\d+(?!\d)(?!\.\d)
So,
use strict;
use warnings;
my $sum_p = 0;
my $sum_n = 0;
my $dates = 0;
while (<>) {
$sum_p += $_ for /(?<![\-\d.\/])\d+(?![\d\/])(?!\.\d)/g;
$sum_n += $_ for /-\d+(?!\d)(?!\.\d)/g;
++$dates while /\d{2}\/\d{2}\/\d{4}/g;
}
print "$sum_p\n";
print "$sum_n\n";
print "$dates\n";

fast way to compare rows in a dataset

I asked this question in R and got a lot of answers, but all of them crash my 4Gb Ram computer after a few hours running or they take a very long time to finish.
faster way to compare rows in a data frame
Some people said that it's not a job to be done in R. As I don't know C and I'm a little bit fluent in Perl, I'll ask here.
I'd like to know if there is a fast way to compare each row of a large dataset with the other rows, identifying the rows with a specific degree of homology. Let's say for the simple example below that I want homology >= 3.
data:
sample_1,10,11,10,13
sample_2,10,11,10,14
sample_3,10,10,8,12
sample_4,10,11,10,13
sample_5,13,13,10,13
The output should be something like:
output
sample duplicate matches
1 sample_1 sample_2 3
2 sample_1 sample_4 4
3 sample_2 sample_4 3
Matches are calculated when both lines have same numbers on same positions,
perl -F',' -lane'
$k = shift #F;
for my $kk (#o) {
$m = grep { $h{$kk}[$_] == $F[$_] } 0 .. $#F;
$m >=3 or next;
print ++$i, " $kk $k $m";
}
push #o, $k;
$h{$k} = [ #F ];
' file
output,
1 sample_1 sample_2 3
2 sample_1 sample_4 4
3 sample_2 sample_4 3
This solution provides an alternative to direct comparison, which will be slow for large data amounts.
Basic idea is to build an inverted index while reading the data.
This makes comparison faster if there are a lot of different values per column.
For each row, you look up the index and count the matches - this way you only consider the samples where this value actually occurs.
You might still have a memory problem because the index gets as large as your data.
To overcome that, you can shorten the sample name and use a persistent index (using DB_File, for example).
use strict;
use warnings;
use 5.010;
my #h;
my $LIMIT_HOMOLOGY = 3;
while(my $line = <>) {
my #arr = split /,/, $line;
my $sample_no = shift #arr;
my %sim;
foreach my $i (0..$#arr) {
my $value = $arr[$i];
our $l;
*l = \$h[$i]->{$value};
foreach my $s (#$l) {
$sim{$s}++;
}
push #$l, $sample_no;
}
foreach my $s (keys %sim) {
if ($sim{$s}>=$LIMIT_HOMOLOGY) {
say "$sample_no: $s. Matches: $sim{$s}";
}
}
}
For 25000 rows with 26 columns with random integer values between 1 and 100, the program took 69 seconds on my mac book air to finish.

how do you select column from a text file using perl

I want to subtract values in one column from another column and add the differences.How do I do this in perl? I am new to perl.Hence I am unable to figure out how to go about it. Kindly help me.
The first thing is to separate the data into columns. In this case, the columns are separated by a space. split(/ /) will return a list of the columns.
To subtract one from the other, its pulling the values out of the the list and subtracting them.
At the end, you add the difference to the running sum and then loop over the data.
#!/usr/bin/perl
use strict;
my $sum = 0;
while(<DATA>) {
my #vals = split(/ /);
my $diff = $vals[1] - $vals[0];
$sum += $diff;
}
print $sum,"\n";
__DATA__
1 3
3 5
5 7
This will print out 6 --- (3 - 1) + (5 - 3) + (7 - 5)
FYI, if you combine the autosplit (-a), loop (n) and command-line program (-e) arguments (see perlrun), you can shorten this to a one-liner, much like awk:
perl -ane "$sum += $F[1] - $F[0]; END { print $sum }" filename