How do I calculate the difference between each element in two arrays? - perl

I have a text file with numbers which I have grouped as follows, seperated by blank line:
42.034 41.630 40.158 26.823 26.366 25.289 23.949
34.712 35.133 35.185 35.577 28.463 28.412 30.831
33.490 33.839 32.059 32.072 33.425 33.349 34.709
12.596 13.332 12.810 13.329 13.329 13.569 11.418
Note: the groups are always of equal length and can be arranged in more than one line long, if the group is large, say 500 numbers long.
I was thinking of putting the groups in arrays and iterate along the length of the file.
My first question is: how should I subtract the first element of array 2 from array 1, array 3 from array 2, similarly for the second element and so on till the end of the group?
i.e.:
34.712-42.034,35.133-41.630,35.185-40.158 ...till the end of each group
33.490-34.712,33.839-35.133 ..................
and then save the differences of the first element in one group (second question: how ?) till the end
i.e.:
34.712-42.034 ; 33.490-34.712 ; and so on in one group
35.133-41.630 ; 33.839-35.133 ; ........
I am a beginner so any suggestions would be helpful.

Assuming you have your file opened, the following is a quick sketch
use List::MoreUtils qw<pairwise>;
...
my #list1 = split ' ', <$file_handle>;
my #list2 = split ' ', <$file_handle>;
my #diff = pairwise { $a - $b } #list1, #list2;
pairwise is the simplest way.
Otherwise there is the old standby:
# construct a list using each index in #list1 ( 0..$#list1 )
# consisting of the difference at each slot.
my #diff = map { $list1[$_] - $list2[$_] } 0..$#list1;

Here's the rest of the infrastructure to make Axeman's code work:
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw<pairwise>;
my (#prev_line, #this_line, #diff);
while (<>) {
next if /^\s+$/; # skip leading blank lines, if any
#prev_line = split;
last;
}
# get the rest of the lines, calculating and printing the difference,
# then saving this line's values in the previous line's values for the next
# set of differences
while (<>) {
next if /^\s+$/; # skip embedded blank lines
#this_line = split;
#diff = pairwise { $a - $b } #this_line, #prev_line;
print join(" ; ", #diff), "\n\n";
#prev_line = #this_line;
}
So given the input:
1 1 1
3 2 1
2 2 2
You'll get:
2 ; 1 ; 0
-1 ; 0 ; 1

Related

How to count the odd number of occurrences in Perl?

I have a program in Perl that is supposed to count the number of times an element appears in an array, and prints out the value of the element if the number of times it appears is odd.
Here is my code.
#!/usr/bin/perl
use strict;
use warnings;
sub FindOddCount($)
{
my #arraynumber = #_;
my $Even = 0;
my $i = 0;
my $j = 0;
my $array_length = scalar(#_);
for ($i = 0; $i <= $array_length; $i++)
{
my $IntCount = 0;
for ($j = 0; $j <= $array_length; $j++)
{
if ($arraynumber[$i] == $arraynumber[$j])
{
$IntCount++;
print($j);
}
}
$Even = $IntCount % 2;
if ($Even != 0)
{
return $arraynumber[$i];
}
}
if ($Even == 0)
{
return "none";
}
}
my #array1 = (1,1,2,2,3,3,4,4,5,5,6,7,7,7,7);
my #array2 = (10,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10,10);
my #array3 = (6,6,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10.10);
my #array4 = (10,10,7,7,2,2,3,3,4,4,5,5,7,7,7,7,10,10,6);
my #array5 = (6,6);
my #array6 = (1);
my $return_value1 = FindOddCount(#array1);
my $return_value2 = FindOddCount(#array2);
my $return_value3 = FindOddCount(#array3);
my $return_value4 = FindOddCount(#array4);
my $return_value5 = FindOddCount(#array5);
my $return_value6 = FindOddCount(#array6);
print "The Odd value for the first array is $return_value1\n";
print "The Odd value for the 2nd array is $return_value2\n ";
print "The Odd value for the 3rd array is $return_value3\n ";
print "The Odd value for the 4th array is $return_value4\n ";
print "The Odd value for the 5th array is $return_value5\n ";
print "The Odd value for the sixth array is $return_value6\n ";
Here are my results.
The Odd value for the first array is 15
The Odd value for the first array is 21
The Odd value for the first array is 21
The Odd value for the first array is 19
The Odd value for the first array is 2
The Odd value for the first array is 1
If you can't tell. It is printing the count of all of the elements of the array instead of returning the element that occurs an odd number of times. In addition I get this error.
Use of uninitialized value in numeric eq (==) at OddCount.pl line 17.
Line 17 is where the 1st array and the 2nd array are compared. Yet the values are clearly instantiated and they work when I print them out. What is the issue?
Build a frequency hash for an array then go through it to see which elements have odd counts
use warnings;
use strict;
use feature 'say';
my #ary = qw(7 o1 7 o2 o1 z z o1); # o1,o2 appear odd number of times
my %freq;
++$freq{$_} for #ary;
foreach my $key (sort keys %freq) {
say "$key => $freq{$key}" if $freq{$key} & 1;
}
This is far simpler than the code in the question -- but which is easily fixed, too. See below.
Some notes
++$freq{$_} increments the value for the key $_ in the hash %freq by 1, or it adds the key to the hash if it doesn't exist (by autovivification) and sets its value to one. So when an array is iterated over with this code in the end the hash %freq contains for keys the array elements and for their values the elements' counts
Test $n & 1 uses the bitwise AND -- it is true if $n has the lowest bit set, so if it is odd
That ++$freq{$_} for #ary; is a Statement Modifier, running the statement for each element of #ary where the current element is aliased by $_ variable
This prints
o1 => 3
o2 => 1
This printing of odd-frequency elements (if any) is sorted alphabetically in elements, just so. Please change to any particular order that may be needed, or let me know.
Comments on the code in the question, which is correct with two simple fixes.
It uses prototypes in a wrong way for the purpose, in sub FindOddCount($). I suspect that this isn't needed so let's not dwell on it -- just drop that and make it sub FindOddCount
The index in loops includes the length of the array (<=) so in the last iteration they attempt to index into the array past its last element. Off-by-one error. That can be fixed by changing the condition into < $array_length (instead of <=), but read on
There is no reason to use C-style loops, not even to iterate over the index. (Needed here since the position in the array is used.) Scripting languages provide for cleaner ways†
foreach my $i1 (0 .. $#arraynumber) {
my $IntCount = 0;
foreach my $i2 (0 .. $#arraynumber) {
if ( $arraynumber[$i1] == $arraynumber[$i2] ) {
...
That 0..N is the range operator, which creates the list of numbers within that range. The syntax $#array_name is the index of the last element in the array #array_name. Exactly what's needed. So there is no need for the array length
Multiple (six) arrays, used to check the code, can be manipulated in far better and easier ways by using references; see the tutorial for complex data structures perldsc, and in particular the page perllol, for array-of-arrays
In short: when you remove the prototype and fix off-by-one error your code seems to be correct.
† And not only scripting ones -- for example, C++11 introduced the range-based for loop
for (auto var: container) ... // really const auto&, or auto&, or auto&&
and the link (a standard reference) says
Used as a more readable equivalent to the traditional for loop [...]
Count the number of occurrences in a for loop using a hash. Then print the desired elements using grep, like so:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw( say );
my #array = (10,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10,10);
my %cnt;
# Count each element of the array:
$cnt{$_}++ for #array;
# Print only the array elements that occurred an odd number of times,
# separated by ", ":
say join q{, }, grep { $cnt{$_} % 2 } #array;
# 6, 6, 6

how do you select column from a text file using perl

I want to subtract values in one column from another column and add the differences.How do I do this in perl? I am new to perl.Hence I am unable to figure out how to go about it. Kindly help me.
The first thing is to separate the data into columns. In this case, the columns are separated by a space. split(/ /) will return a list of the columns.
To subtract one from the other, its pulling the values out of the the list and subtracting them.
At the end, you add the difference to the running sum and then loop over the data.
#!/usr/bin/perl
use strict;
my $sum = 0;
while(<DATA>) {
my #vals = split(/ /);
my $diff = $vals[1] - $vals[0];
$sum += $diff;
}
print $sum,"\n";
__DATA__
1 3
3 5
5 7
This will print out 6 --- (3 - 1) + (5 - 3) + (7 - 5)
FYI, if you combine the autosplit (-a), loop (n) and command-line program (-e) arguments (see perlrun), you can shorten this to a one-liner, much like awk:
perl -ane "$sum += $F[1] - $F[0]; END { print $sum }" filename

Finding equal lines in file with Perl

I have a CSV file which contains duplicated items in different rows.
x1,y1
x2,y2
y1,x1
x3,y3
The two rows containing x1,y1 and y1,x1 are a match as they contain the same data in a diffrent order.
I need your help to find an algorithm to search for such lines in a 12MB file.
If you can define some ordering and equality relations between fields, you could store a normalized form and test your lines for equality against that.
As an example, we will use string comparision for your fields, but after lowercasing them. We can then sort the parts according to this relation, and create a lookup table via a nested hash:
use strict; use warnings;
my $cache; # A hash of hashes. Will be autovivified later.
while (<DATA>) {
chomp;
my #fields = split;
# create the normalized representation by lowercasing and sorting the fields
my #normalized_fields = sort map lc, #fields;
# find or create the path in the lookup
my $pointer = \$cache;
$pointer = \${$pointer}->{$_} for #normalized_fields;
# if this is an unknow value, make it known, and output the line
unless (defined $$pointer) {
$$pointer = 1; # set some defined value
print "$_\n"; # emit the unique line
}
}
__DATA__
X1 y1
X2 y2
Y1 x1
X3 y3
In this example I used the scalar 1 as value of the lookup data structure, but in more complex scenarios the original fields or the line number could be stored here. For the sake of the example, I used space-seperated values here, but you could replace the split with a call to Text::CSV or something.
This hash-of-hashes approach has sublinear space complexity, and worst case linear space complexity. The lookup time only depends on the number (and size) of fields in a record, not on the total number of records.
Limitation: All records must have the same number of fields, or some shorter records could be falsely considered “seen”. To circumvent these problems, we can use more complex nodes:
my $pointer = \$cache;
$pointer = \$$pointer->[0]{$_} for #normalized_fields;
unless (defined $$pointer->[1]) {
$$pointer->[1] = 1; ...
}
or introduce a default value for nonexistant field (e.g. the seperator of the original file). Here an example with the NUL character:
my $fields = 3;
...;
die "record too long" if #fields > $fields;
...; # make normalized fields
push #normalized_fields, ("\x00") x ($fields - #normalized_fields);
...; # do the lookup
A lot depends on what you want to know about duplicate lines once they have been found. This program uses a simple hash to list the line numbers of those lines that are equivalent.
use strict;
use warnings;
my %data;
while (<DATA>) {
chomp;
my $key = join ',', sort map lc, split /,/;
push #{$data{$key}}, $.;
}
foreach my $list (values %data) {
next unless #$list > 1;
print "Lines ", join(', ', #$list), " are equivalent\n";
}
__DATA__
x1,y1
x2,y2
y1,x1
x3,y3
output
Lines 1, 3 are equivalent
Make two hash tables A and B
Stream through your input one line at a time
For the first line pair x and y, use each as key and the other as value for both hash tables (e.g., $A->{x} = y; $B->{y} = x;)
For the second and subsequent line pairs, test if the second field's value exists as a key for either A or B — if it does, you have a reverse match — if not, then repeat the addition process from step 3 to add it to the hash tables
To do a version of amon's answer without a hash table, if your data are numerical, you could:
Stream through input line by line, sorting fields one and two by numerical ordering
Pipe result to UNIX sort on first and second fields
Stream through sorted output line by line, checking if current line matches the previous line (reporting a reverse match, if true)
This has the advantage of using less memory than hash tables, but may take more time to process.
amon already provided the answer I would've provided, so please enjoy this bad answer:
#! /usr/bin/perl
use common::sense;
my $re = qr/(?!)/; # always fails
while (<DATA>) {
warn "Found duplicate: $_" if $_ =~ $re;
next unless /^(.*),(.*)$/;
die "Unexpected input at line $.: $_" if "$1$2" =~ tr/,//;
$re = qr/^\Q$2,$1\E$|$re/
}
__DATA__
x1,y1
x2,y2
y1,x1
x3,y3

Perl grep not returning expected value

I have the following code:
#!/usr/bin/perl
# splits.pl
use strict;
use warnings;
use diagnostics;
my $pivotfile = "myPath/Internal_Splits_Pivot.txt";
open PIVOTFILE, $pivotfile or die $!;
while (<PIVOTFILE>) { # loop through each line in file
next if ($. == 1); # skip first line (contains business segment code)
next if ($. == 2); # skip second line (contains transaction amount text)
my #fields = split('\t',$_); # split fields for line into an array
print scalar(grep $_, #fields), "\n";
}
Given that the data in the text file is this:
4 G I M N U X
Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount
0000-13-I21 600
0001-8V-034BLA 2,172 2,172
0001-8V-191GYG 13,125 4,375
0001-9W-GH5B2A -2,967.09 2,967.09 25.00
I would expect the output from the perl script to be: 2 3 3 4 given the amount of defined elements in each line. The file is a tab delimited text file with 8 columns.
Instead I get 3 4 3 4 and I have no idea why!
For background, I am using Counting array elements in Perl as the basis for my development, as I am trying to count the number of elements in the line to know if I need to skip that line or not.
I suspect you have spaces mixed with the tabs in some places, and your grep test will consider " " true.
What does:
use Data::Dumper;
$Data::Dumper::Useqq=1;
print Dumper [<PIVOTFILE>];
show?
The problem should be in this line:
my #fields = split('\t',$_); # split fields for line into an array
The tab character doesn't get interpolated. And your file doesn't seem to be tab-only separated, at least here on SO. I changed the split regex to match arbitrary whitespace, ran the code on my machine and got the "right" result:
my #fields = split(/\s+/,$_); # split fields for line into an array
Result:
2
3
3
4
As a side note:
For background, I am using Counting array elements in Perl as the basis for my development, as I am trying to count the number of elements in the line to know if I need to skip that line or not.
Now I understand why you use grep to count array elements. That's important when your array contains undefined values like here:
my #a;
$a[1] = 42; # #a contains the list (undef, 42)
say scalar #a; # 2
or when you manually deleted entries:
my #a = split /,/ => 'foo,bar'; # #a contains the list ('foo', 'bar')
delete $a[0]; # #a contains the list (undef, 'bar')
say scalar #a; # 2
But in many cases, especially when you're using arrays to just store list without operating on single array elements, scalar #a works perfectly fine.
my #a = (1 .. 17, 1 .. 25); # (1, 2, ..., 17, 1, 2, .., 25)
say scalar #a; # 42
It's important to understand, what grep does! In your case
print scalar(grep $_, #fields), "\n";
grep returns the list of true values of #fields and then you print how many you have. But sometimes this isn't what you want/expect:
my #things = (17, 42, 'foo', '', 0); # even '' and 0 are things
say scalar grep $_ => #things # 3!
Because the empty string and the number 0 are false values in Perl, they won't get counted with that idiom. So if you want to know how long an array is, just use
say scalar #array; # number of array entries
If you want to count true values, use this
say scalar grep $_ => #array; # number of true values
But if you want to count defined values, use this
say scalar grep defined($_) => #array; # number of defined values
I'm pretty sure you already know this from the other answers on the linked page. In hashes, the situation is a little bit more complex because setting something to undef is not the same as deleteing it:
my %h = (a => 0, b => 42, c => 17, d => 666);
$h{c} = undef; # still there, but undefined
delete $h{d}; # BAM! $h{d} is gone!
What happens when we try to count values?
say scalar grep $_ => values %h; # 1
because 42 is the only true value in %h.
say scalar grep defined $_ => values %h; # 2
because 0 is defined although it's false.
say scalar grep exists $h{$_} => qw(a b c d); # 3
because undefined values can exist. Conclusion:
know what you're doing instead of copy'n'pasting code snippets :)
There are not only tabs, but there are spaces as well.
trying out with splitting by space works
Look below
#!/usr/bin/perl
# splits.pl
use strict;
use warnings;
use diagnostics;
while (<DATA>) { # loop through each line in file
next if ($. == 1); # skip first line (contains business segment code)
next if ($. == 2); # skip second line (contains transaction amount text)
my #fields = split(" ",$_); # split fields by SPACE
print scalar(#fields), "\n";
}
__DATA__
4 G I M N U X
Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount
0000-13-I21 600
0001-8V-034BLA 2,172 2,172
0001-8V-191GYG 13,125 4,375
0001-9W-GH5B2A -2,967.09 2,967.09 25.00
Output
2
3
3
4
Your code works for me. The problem may be that the input file contains some "hidden" whitespace fields (eg. other whitespace than tabs). For instance
A<tab><space><CR> gives two fields, A and <space><CR>
A<tab>B<tab><CR> gives three, A, B, <CR> (remember, the end of line is part of the input!)
I suggest you to chomp every line you use; other than that, you will have to clean the array from whitespace-only fields. Eg.
scalar(grep /\S/, #fields)
should do it.
A lot of great help on this question, and quickly too!
After a long, drawn-out learning process, this is what I came up with that worked quite well, with intended results.
#!/usr/bin/perl
# splits.pl
use strict;
use warnings;
use diagnostics;
my $pivotfile = "myPath/Internal_Splits_Pivot.txt";
open PIVOTFILE, $pivotfile or die $!;
while (<PIVOTFILE>) { # loop through each line in file
next if ($. == 1); # skip first line (contains business segment code)
next if ($. == 2); # skip second line (contains transaction amount text)
chomp $_; # clean line of trailing \n and white space
my #fields = split(/\t/,$_); # split fields for line into an array
print scalar(grep $_, #fields), "\n";
}

How do I replace row identifiers with sequential numbers?

I have written a perl script that splits 3 columns into scalars and replaces various values in the second column using regex. This part works fine, as shown below. What I would like to do, though, is change the first column ($item_id into a series of sequential numbers that restart when the original (numeric) value of $item_id changes.
For example:
123
123
123
123
2397
2397
2397
2397
8693
8693
8693
8693
would be changed to something like this (in a column):
1
2
3
4
1
2
3
4
1
2
3
4
This could either replace the first column or be a new fourth column.
I understand that I might do this through a series of if-else statements and tried this, but that doesn't seem to play well with the while procedure I've already got working for me. - Thanks, Thom Shepard
open(DATA,"< text_to_be_processed.txt");
while (<DATA>)
{
chomp;
my ($item_id,$callnum,$data)=split(/\|/);
$callnum=~s/110/\%I/g;
$callnum=~s/245/\%T/g;
$callnum=~s/260/\%U/g;
print "$item_id\t$callnum\t$data\n";
} #End while
close DATA;
The basic steps are:
Outside of the loop declare the counter and a variable holding the previous $item_id.
Inside the loop you do three things:
reset the counter to 1 if the current $item_id differs from the previous one, otherwise increase it
use that counter, e.g. print it
remember the previous value
With code this could look something similar to this (untested):
my ($counter, $prev_item_id) = (0, '');
while (<DATA>) {
# do your thing
$counter = $item_id eq $prev_item_id ? $counter + 1 : 1;
$prev_item_id = $item_id;
print "$item_id\t$counter\t...\n";
}
This goes a little further than just what you asked...
Use lexical filehandles
[autodie] makes open throw an error automatically
Replace the call nums using a table
Don't assume the data is sorted by item ID
Here's the code.
use strict;
use warnings;
use autodie;
open(my $fh, "<", "text_to_be_processed.txt");
my %Callnum_Map = (
110 => '%I',
245 => '%T',
260 => '%U',
);
my %item_id_count;
while (<$fh>) {
chomp;
my($item_id,$callnum,$data) = split m{\|};
for my $search (keys %Callnum_Map) {
my $replace = $Callnum_Map{$search};
$callnum =~ s{$search}{$replace}g;
}
my $item_count = ++$item_id_count{$item_id};
print "$item_id\t$callnum\t$data\t$item_count\n";
}
By using a hash, it does not presume the data is sorted by item ID. So if it sees...
123|foo|bar
456|up|down
123|left|right
789|this|that
456|black|white
123|what|huh
It will produce...
1
1
2
1
2
3
This is more robust, assuming you want a count of how many times you've seen an item id in the whole file. If you want how many times its been seen consecutively, use Mortiz's solution.
Is this what you are looking for?
open(DATA,"< text_to_be_processed.txt");
my $counter = 0;
my $prev;
while (<DATA>)
{
chomp;
my ($item_id,$callnum,$data)=split(/\|/);
$callnum=~s/110/\%I/g;
$callnum=~s/245/\%T/g;
$callnum=~s/260/\%U/g;
++$counter;
$item_id = $counter;
#reset counter if $prev is different than $item_id
$counter = 0 if ($prev ne $item_id );
$prev = $item_id;
print "$item_id\t$callnum\t$data\n";
} #End while
close DATA;