How to subtract values in 2 different arrays in perl? - perl

Hi I have two arrays containing 4 columns and I want to subtract the value in column1 of array2 from value in column1 of array1 and value of column2 of array2 from column2 of array1 so on..example:
my #array1=(4.3,0.2,7,2.2,0.2,2.4)
my #array2=(2.2,0.6,5,2.1,1.3,3.2)
so the required output is
2.1 -0.4 2 # [4.3-2.2] [0.2-0.6] [7-5]
0.1 -1.1 -0.8 # [2.2-2.1] [0.2-1.3] [2.4-3.2]
For this the code I used is
my #diff = map {$array1[$_] - $array2[$_]} (0..2);
print OUT join(' ', #diff), "\n";
and the output I am getting now is
2.1 -0.4 2
2.2 -1.1 3.8
Again the first row is used from array one and not the second row, how can I overcome this problem?
I need output in rows of 3 columns like the way i have shown above so just i had filled in my array in row of 3 values.

This will produce the requested output. However, I suspect (based on your comments), that we could produce a better solution if you simply showed your raw input.
use strict;
use warnings;
my #a1 = (4.3,0.2,7,2.2,0.2,2.4);
my #a2 = (2.2,0.6,5,2.1,1.3,3.2);
my #out = map { $a1[$_] - $a2[$_] } 0 .. $#a1;
print "#out[0..2]\n";
print "#out[3..$#a1]\n";

First of all, your code doesn't even compile. Perl arrays aren't space separated - you need a qw() to turn those into arrays. Not sure how you got your results.
Perl doesn't have 2D arrays. 2.2 is NOT a column1 of row 1 of #array1 - it's element with index 3 of #array1. As far as Perl is concerned, your newline is just another whitespace separator, NOT something that magically turns a 1-d array into a table as you seem to think.
To get the result you want (process those 6 elements as 2 3-element arrays), you can either store them in an array of arrayrefs (Perl's implementation of C multidimentional arrays):
my #array1=( [ 4.3, 0.2, 7 ],
[ 2.2, 0.2, 2.4] );
for(my $idx=0; $idx1 < 2; $idx1++) {
for(my $idx2=0; $idx2 < 3; $idx2++) {
print $array1[$idx1]->[$idx2] - $array2[$idx1]->[$idx2];
print " ";
}
print "\n";
}
or, you can simply fake it by using offsets, the same way pointer arithmetic works in C's multidimentional arrays:
my #array1=( 4.3, 0.2, 7, # index 0..2
2.2, 0.2, 2.4); # index 3..5
for(my $idx=0; $idx1 < 2; $idx1++) {
for(my $idx2=0; $idx2 < 3; $idx2++) {
print $array1[$idx1 * 3 + $idx2] - $array2[$idx1 * 3 + $idx2];
print " ";
}
print "\n";
}

Related

How to count the odd number of occurrences in Perl?

I have a program in Perl that is supposed to count the number of times an element appears in an array, and prints out the value of the element if the number of times it appears is odd.
Here is my code.
#!/usr/bin/perl
use strict;
use warnings;
sub FindOddCount($)
{
my #arraynumber = #_;
my $Even = 0;
my $i = 0;
my $j = 0;
my $array_length = scalar(#_);
for ($i = 0; $i <= $array_length; $i++)
{
my $IntCount = 0;
for ($j = 0; $j <= $array_length; $j++)
{
if ($arraynumber[$i] == $arraynumber[$j])
{
$IntCount++;
print($j);
}
}
$Even = $IntCount % 2;
if ($Even != 0)
{
return $arraynumber[$i];
}
}
if ($Even == 0)
{
return "none";
}
}
my #array1 = (1,1,2,2,3,3,4,4,5,5,6,7,7,7,7);
my #array2 = (10,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10,10);
my #array3 = (6,6,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10.10);
my #array4 = (10,10,7,7,2,2,3,3,4,4,5,5,7,7,7,7,10,10,6);
my #array5 = (6,6);
my #array6 = (1);
my $return_value1 = FindOddCount(#array1);
my $return_value2 = FindOddCount(#array2);
my $return_value3 = FindOddCount(#array3);
my $return_value4 = FindOddCount(#array4);
my $return_value5 = FindOddCount(#array5);
my $return_value6 = FindOddCount(#array6);
print "The Odd value for the first array is $return_value1\n";
print "The Odd value for the 2nd array is $return_value2\n ";
print "The Odd value for the 3rd array is $return_value3\n ";
print "The Odd value for the 4th array is $return_value4\n ";
print "The Odd value for the 5th array is $return_value5\n ";
print "The Odd value for the sixth array is $return_value6\n ";
Here are my results.
The Odd value for the first array is 15
The Odd value for the first array is 21
The Odd value for the first array is 21
The Odd value for the first array is 19
The Odd value for the first array is 2
The Odd value for the first array is 1
If you can't tell. It is printing the count of all of the elements of the array instead of returning the element that occurs an odd number of times. In addition I get this error.
Use of uninitialized value in numeric eq (==) at OddCount.pl line 17.
Line 17 is where the 1st array and the 2nd array are compared. Yet the values are clearly instantiated and they work when I print them out. What is the issue?
Build a frequency hash for an array then go through it to see which elements have odd counts
use warnings;
use strict;
use feature 'say';
my #ary = qw(7 o1 7 o2 o1 z z o1); # o1,o2 appear odd number of times
my %freq;
++$freq{$_} for #ary;
foreach my $key (sort keys %freq) {
say "$key => $freq{$key}" if $freq{$key} & 1;
}
This is far simpler than the code in the question -- but which is easily fixed, too. See below.
Some notes
++$freq{$_} increments the value for the key $_ in the hash %freq by 1, or it adds the key to the hash if it doesn't exist (by autovivification) and sets its value to one. So when an array is iterated over with this code in the end the hash %freq contains for keys the array elements and for their values the elements' counts
Test $n & 1 uses the bitwise AND -- it is true if $n has the lowest bit set, so if it is odd
That ++$freq{$_} for #ary; is a Statement Modifier, running the statement for each element of #ary where the current element is aliased by $_ variable
This prints
o1 => 3
o2 => 1
This printing of odd-frequency elements (if any) is sorted alphabetically in elements, just so. Please change to any particular order that may be needed, or let me know.
Comments on the code in the question, which is correct with two simple fixes.
It uses prototypes in a wrong way for the purpose, in sub FindOddCount($). I suspect that this isn't needed so let's not dwell on it -- just drop that and make it sub FindOddCount
The index in loops includes the length of the array (<=) so in the last iteration they attempt to index into the array past its last element. Off-by-one error. That can be fixed by changing the condition into < $array_length (instead of <=), but read on
There is no reason to use C-style loops, not even to iterate over the index. (Needed here since the position in the array is used.) Scripting languages provide for cleaner ways†
foreach my $i1 (0 .. $#arraynumber) {
my $IntCount = 0;
foreach my $i2 (0 .. $#arraynumber) {
if ( $arraynumber[$i1] == $arraynumber[$i2] ) {
...
That 0..N is the range operator, which creates the list of numbers within that range. The syntax $#array_name is the index of the last element in the array #array_name. Exactly what's needed. So there is no need for the array length
Multiple (six) arrays, used to check the code, can be manipulated in far better and easier ways by using references; see the tutorial for complex data structures perldsc, and in particular the page perllol, for array-of-arrays
In short: when you remove the prototype and fix off-by-one error your code seems to be correct.
† And not only scripting ones -- for example, C++11 introduced the range-based for loop
for (auto var: container) ... // really const auto&, or auto&, or auto&&
and the link (a standard reference) says
Used as a more readable equivalent to the traditional for loop [...]
Count the number of occurrences in a for loop using a hash. Then print the desired elements using grep, like so:
#!/usr/bin/env perl
use warnings;
use strict;
use feature qw( say );
my #array = (10,10,7,7,6,6,2,2,3,3,4,4,5,5,6,7,7,7,7,10,10);
my %cnt;
# Count each element of the array:
$cnt{$_}++ for #array;
# Print only the array elements that occurred an odd number of times,
# separated by ", ":
say join q{, }, grep { $cnt{$_} % 2 } #array;
# 6, 6, 6

cosine similarity between strings perl

i have a file contain for example this text:
perl java python php scala
java pascal perl ruby ada
ASP awk php java perl
C# ada python java scala
I found a module which calculates cosine similaity, http://search.cpan.org/~wollmers/Bag-Similarity-0.019/lib/Bag/Similarity/Cosine.pm
I did a simple test in the bignning,
my $cosine = Bag::Similarity::Cosine->new;
my $similarity = $cosine->similarity(['perl','java','python','php','scala'],['java','pascal','perl','ruby','ada']);
print $similarity;
The rusult was 0.4;
The problem when i read from the file and calculate the cosine between each line, the results are different, this is the code:
open(F,"/home/ahmed/FILE.txt") or die " Pb pour ouvrir";
my #data; # containt each line of the FILE in each case
while(<F>) {
chomp;
push #data, $_;
}
#print join " ", #data;
my $cosine = Bag::Similarity::Cosine->new;
for my $i ( 0 .. $#data-1 ) {
for my $j ( $i + 1 .. $#data ) {
my $similarity = $cosine->similarity($data[$i],$data[$j]);
print "line $i a une similarite de $similarity avec line $j\n";
$i + 1,
$j + 1;
}
}
the results :
line 0 has a similarity of 0.933424735647156 with line 1
line 0 has a similarity of 0.953945734121021 with line 2
line 0 has a similarity of 0.939759036144578 with line 3
line 1 has a similarity of 0.917585834612093 with line 2
line 1 has a similarity of 0.945092544842746 with line 3
line 2 has a similarity of 0.908826679128811 with line 3
the similarity must be 0.4 between line 1 and 2;
I changed the FILE like this :
['perl','java','python','php','scala']
['java','pascal','perl','ruby','ada']
['ASP','awk','php','java','perl']
['C#','ada','python','java','scala']
but the same result,
Thank you.
There is syntax error in your program. Were you trying to use printf and used print mistakenly? Not sure about you but below works fine for me.
#!/usr/bin/perl
use strict;
use warnings;
use Bag::Similarity::Cosine;
my $cosine = Bag::Similarity::Cosine->new;
my #data;
while ( <DATA> ) {
push #data, { map { $_ => 1 } split };
}
for my $i ( 0 .. $#data-1 ) {
for my $j ( $i + 1 .. $#data ) {
my $similarity = $cosine->similarity($data[$i],$data[$j]);
print "line $i has a similarity of $similarity with line $j\n";
}
}
__DATA__
perl java python php scala
java pascal perl ruby ada
ASP awk php java perl
C# ada python java scala
Output:
line 0 has a similarity of 0.4 with line 1
line 0 has a similarity of 0.6 with line 2
line 0 has a similarity of 0.6 with line 3
line 1 has a similarity of 0.4 with line 2
line 1 has a similarity of 0.4 with line 3
line 2 has a similarity of 0.2 with line 3
I know nothing at all about this module. But I can read the documentation.
It looks to me like the module has two methods. similarity() is used for comparing two strings and from_bags() is used to compare two references to arrays containing strings. I expect that when you call similarity passing it two array references, then what gets compared is actually the stringification of the two references.
Try switching to from_bags() and see if that's any better.
Update: On investigating further, I see that similarity() will compare any kind of input (strings, array refs or hash refs).
This demonstrates using similarity() to compare the lines as text and as arrays of words.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Bag::Similarity::Cosine;
chomp(my #data = <DATA>);
my $cos = Bag::Similarity::Cosine->new;
for my $i (0 .. $#data - 1) {
for my $j (1 .. $#data) {
next if $i == $j;
say "$i -> $j: strings ", $cos->similarity($data[$i], $data[$j]);
say "$i -> $j: array refs ", $cos->similarity([split /\s+/, $data[$i]], [split /\s+/, $data[$j]]);
}
}
__DATA__
perl java python php scala
java pascal perl ruby ada
ASP awk php java perl
C# ada python java scala
And it gives this output:
$ perl similar
0 -> 1: strings 0.88602000346543
0 -> 1: array refs 0.4
0 -> 2: strings 0.89566858950296
0 -> 2: array refs 0.6
0 -> 3: strings 0.852802865422442
0 -> 3: array refs 0.6
1 -> 2: strings 0.872356744289958
1 -> 2: array refs 0.4
1 -> 3: strings 0.884721984738799
1 -> 3: array refs 0.4
2 -> 1: strings 0.872356744289958
2 -> 1: array refs 0.4
2 -> 3: strings 0.753778361444409
2 -> 3: array refs 0.2
I don't know which version gives you the information you want. I suspect it might be the array reference version.

how do you select column from a text file using perl

I want to subtract values in one column from another column and add the differences.How do I do this in perl? I am new to perl.Hence I am unable to figure out how to go about it. Kindly help me.
The first thing is to separate the data into columns. In this case, the columns are separated by a space. split(/ /) will return a list of the columns.
To subtract one from the other, its pulling the values out of the the list and subtracting them.
At the end, you add the difference to the running sum and then loop over the data.
#!/usr/bin/perl
use strict;
my $sum = 0;
while(<DATA>) {
my #vals = split(/ /);
my $diff = $vals[1] - $vals[0];
$sum += $diff;
}
print $sum,"\n";
__DATA__
1 3
3 5
5 7
This will print out 6 --- (3 - 1) + (5 - 3) + (7 - 5)
FYI, if you combine the autosplit (-a), loop (n) and command-line program (-e) arguments (see perlrun), you can shorten this to a one-liner, much like awk:
perl -ane "$sum += $F[1] - $F[0]; END { print $sum }" filename

How to get the array of all ngrams in Perl Text::Ngrams

As you know the module Text::Ngrams in Perl can give Ngrams analysis. There is the following function to retrieve the array of Ngrams and frequencies.
get_ngrams(orderby=>'ngram|frequency|none',onlyfirst=>NUMBER,out=>filename|handle,normalize=>1)
But it gives only the last Ngrams.
For example the following code does not give both Uni-Gram and Bi-Gram:
my $ng3 = Text::Ngrams->new( windowsize => 2, type=>'byte');
my $text = "test teXT TESTtexT";
$text =~ s/ +/ /g; # replace multiple spaces to single
$text = uc $text; # uppercase all
$ng3->process_text($text);
my #ngramsarray = $ng3->get_ngrams(orderby=>'frequency', onlyfirst=>10, normalize=>0 );
foreach(#ngramsarray)
{
print "$_\n";
}
output:
T E
4
E X
2
_ T
2
E S
2
S T
2
X T
2
T _
2
T T
1
However by using the function
to_string(orderby=>'ngram|frequency|none',onlyfirst=>NUMBER,out=>filename|handle,normalize=>1,spartan=>1)
it shows both of Ngrams. But only it displays the result. I need the result in an array.
How to get all Ngrams (Unigram and Bigram) at the same time by this array?
You can't get all the different sizes of n-grams at the same time, but you can get them all using multiple calls to get_ngrams. There is an undocumented parameter n to get_ngrams that says the size of the n-grams you want listed.
In your code, if you say
my #ngramsarray = $ng3->get_ngrams(
n => 1,
orderby = >'frequency',
onlyfirst => 10,
normalize => 0);
you get this list
('T', 8, 'E', 4, 'X', 2, '_', 2, 'S', 2)

How do I calculate the difference between each element in two arrays?

I have a text file with numbers which I have grouped as follows, seperated by blank line:
42.034 41.630 40.158 26.823 26.366 25.289 23.949
34.712 35.133 35.185 35.577 28.463 28.412 30.831
33.490 33.839 32.059 32.072 33.425 33.349 34.709
12.596 13.332 12.810 13.329 13.329 13.569 11.418
Note: the groups are always of equal length and can be arranged in more than one line long, if the group is large, say 500 numbers long.
I was thinking of putting the groups in arrays and iterate along the length of the file.
My first question is: how should I subtract the first element of array 2 from array 1, array 3 from array 2, similarly for the second element and so on till the end of the group?
i.e.:
34.712-42.034,35.133-41.630,35.185-40.158 ...till the end of each group
33.490-34.712,33.839-35.133 ..................
and then save the differences of the first element in one group (second question: how ?) till the end
i.e.:
34.712-42.034 ; 33.490-34.712 ; and so on in one group
35.133-41.630 ; 33.839-35.133 ; ........
I am a beginner so any suggestions would be helpful.
Assuming you have your file opened, the following is a quick sketch
use List::MoreUtils qw<pairwise>;
...
my #list1 = split ' ', <$file_handle>;
my #list2 = split ' ', <$file_handle>;
my #diff = pairwise { $a - $b } #list1, #list2;
pairwise is the simplest way.
Otherwise there is the old standby:
# construct a list using each index in #list1 ( 0..$#list1 )
# consisting of the difference at each slot.
my #diff = map { $list1[$_] - $list2[$_] } 0..$#list1;
Here's the rest of the infrastructure to make Axeman's code work:
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw<pairwise>;
my (#prev_line, #this_line, #diff);
while (<>) {
next if /^\s+$/; # skip leading blank lines, if any
#prev_line = split;
last;
}
# get the rest of the lines, calculating and printing the difference,
# then saving this line's values in the previous line's values for the next
# set of differences
while (<>) {
next if /^\s+$/; # skip embedded blank lines
#this_line = split;
#diff = pairwise { $a - $b } #this_line, #prev_line;
print join(" ; ", #diff), "\n\n";
#prev_line = #this_line;
}
So given the input:
1 1 1
3 2 1
2 2 2
You'll get:
2 ; 1 ; 0
-1 ; 0 ; 1