Perl script to check another array values depending on current array index - perl

I'm working on a perl assignment, that has three arrays - #array_A, #array_B and array_C with some values in it, I grep for a string "CAT" on array A and fetching its indices too
my #index = grep { $#array_A[$_] =~ 'CAT' } 0..$#array_A;
print "Index : #index\n";
Output: Index : 2 5
I have to take this as an input and check the value of other two arrays at indices 2 and 5 and print it to a file.
Trick is the position of the string - "CAT" varies. (Index might be 5 , 7 and 9)
I'm not quite getting the logic here , looking for some help with the logic.

Here's an overly verbose example of how to extract the values you want as to show what's happening, while hopefully leaving some room for you to have to further investigate. Note that it's idiomatic Perl to use regex delimiters when using =~. eg: $name =~ /steve/.
use warnings;
use strict;
my #a1 = qw(AT SAT CAT BAT MAT CAT SLAT);
my #a2 = qw(a b c d e f g);
my #a3 = qw(1 2 3 4 5 6 7);
# note the difference in the next line... no # symbol...
my #indexes = grep { $a1[$_] =~ /CAT/ } 0..$#a1;
for my $index (#indexes){
my $a2_value = $a2[$index];
my $a3_value = $a3[$index];
print "a1 index: $index\n" .
"a2 value: $a2_value\n" .
"a3 value: $a3_value\n" .
"\n";
}
Output:
a1 index: 2
a2 value: c
a3 value: 3
a1 index: 5
a2 value: f
a3 value: 6

Related

Declaring sets of anonymous arrays in perl

I'm trying to generate arrays for each of the agregations of cells in suduko. I seem to have fixed the problem, but don't understand what my earlier alternatives are doing.
I get the answer I expected if I write for instance:
#row = ( [], [], [], [], [], [], [], [], [] ) ;
I had expected
#row = ( [] ) x 9 ;
to behave the same. I also tried, which did better
#row = ( [] x 9 ) ;
Only the first element comes out strange with this, in two of the arrays. With the first rejected form I get all 81 elements in each array
I did wonder if the last form was actually legal?
# prob.pl - Show problem with repeat anonymous arrays
#
# ###################################################
#row = ( [] x 9 ) ;
#col = ( [] x 9 ) ;
#box = ( [] x 9 ) ;
for ( $i = 0 ; $i < 81 ; $i ++ ) {
push( #{$row[ row($i) ]}, $i ) ;
push( #{$col[ col($i) ]}, $i ) ;
push( #{$box[ box($i) ]}, $i ) ;
}
for ( $i = 0 ; $i < 9 ; $i ++ ) {
print STDERR "\#{\$row[$i]} = #{$row[$i]}\n" ;
print STDERR "\#{\$col[$i]} = #{$col[$i]}\n" ;
print STDERR "\#{\$box[$i]} = #{$box[$i]}\n" ;
}
sub row {
my( $i ) = #_ ;
int( $i / 9 ) ;
}
sub col {
my( $i ) = #_ ;
$i % 9 ;
}
sub box {
my( $i ) = #_ ;
int( col( $i ) / 3 ) + 3 * int( row( $i ) / 3 ) ;
}
Answer in two parts: first part is a simple view of what is happening and what you should do to fix it. Second part tries to explain this weird behavior you're getting.
Part 1 - simple explanations
All forms are legal; they are just not equivalent and probably don't do what you expect. In such case, Data::Dumper or Data::Printer are your friends:
use Data::Printer;
my #a1 = ( [] x 9 );
p #1;
Prints something like
[
[0] "ARRAY(0x1151f30)ARRAY(0x1151f30)ARRAY(0x1151f30)ARRAY(0x1151f30)ARRAY(0x1151f30)ARRAY(0x1151f30)ARRAY(0x1151f30)ARRAY(0x1151f30)ARRAY(0x1151f30)"
]
Quoting the doc of x (the "repetition operator", if you need to search for it):
In scalar context, or if the left operand is neither enclosed in parentheses nor a qw// list, it performs a string repetition.
([] x 9 calls x in scalar context)
On the other hand, when you do ([]) x 9, you get something like:
[
[0] [],
[1] var[0],
[2] var[0],
[3] var[0],
[4] var[0],
[5] var[0],
[6] var[0],
[7] var[0],
[8] var[0]
]
Quoting the doc again:
If the x is in list context, and the left operand is either enclosed in parentheses or a qw// list, it performs a list repetition. In that case it supplies list context to the left operand, and returns a list consisting of the left operand list repeated the number of times specified by the right operand.
What happens here, is that [] is evaluated before x is applied. It creates an arrayref, and x 9 then duplicates it 9 times.
Correct ways to achieve what you want would be either your first solution, or maybe, if you prefer something more concise (but still readable):
my #row = map { [] } 1 .. 9;
(since the body of the map is evaluated at each iteration, it creates indeed 9 distinct references)
Or, you could just not initialize #row, #col and #box and let autovivification creates the arrays when needed.
Part 2 - advanced explanation
You have some weird behavior with your program when you use ([] x 9). For simplicity, let me reproduce it with a simpler example:
use feature 'say';
#x = ([] x 5);
#y = ([] x 5);
#z = ([] x 5);
push #{$x[0]}, 1;
push #{$y[0]}, 1;
push #{$z[0]}, 1;
say "#{$x[0]}";
say "#{$y[0]}";
say "#{$z[0]}";
This program outputs:
1 1
1
1 1
Interestingly, removing the definition of #y (#y = ([] x 5)) from this programs produces the output:
1
1
1
Something fishy is going on. I'll explain it with two points.
First, let's consider the following example:
use Data::Printer;
use feature 'say';
say "Before:";
#x = "abcd";
p #x;
say "#{$x[0]}";
say "After:";
push #{$x[0]}, 5;
p #x;
say "#{$x[0]}";
say $abcd[0];
Which outputs
Before:
[
[0] "abcd"
]
After:
[
[0] "abcd"
]
5
5
When we do push #{$x[0]}, 5, #{$x[0]} becomes #{"abcd"}, which creates the arrays #abcd, and pushes 5 into it. $x[0] is still a string (abcd), but this string is also the name of an array.
Second*, let's look at the following program:
use Data::Printer;
#x = ([] x 5);
#y = ([] x 5);
#z = ([] x 5);
p #x;
p #y;
p #z;
We get the output:
[
[0] "ARRAY(0x19aff30)ARRAY(0x19aff30)ARRAY(0x19aff30)ARRAY(0x19aff30)ARRAY(0x19aff30)"
]
[
[0] "ARRAY(0x19b0188)ARRAY(0x19b0188)ARRAY(0x19b0188)ARRAY(0x19b0188)ARRAY(0x19b0188)"
]
[
[0] "ARRAY(0x19aff30)ARRAY(0x19aff30)ARRAY(0x19aff30)ARRAY(0x19aff30)ARRAY(0x19aff30)"
]
#x and #z contain the same reference. While this is surprising, I think that this is explainable: line 1, [] x 5 creates an arrayref, then converts it into a string to do x 5, and then it doesn't use the arrayref anymore. This means that the garbage collector is free to reclaim its memory, and Perl is free to reallocate something else at this address. For some reason, this doesn't happen right away (#y doesn't contain the same thing as #x), but only when allocating #z. This is probably just the result of the implementation of the garbage collector / optimizer, and I suspect it might change from a version to another.
In the end, what happens is this: #x and #z contains a single element, a string, which is identical. When you dereference $x[0] and $z[0], you therefore get the same array. Therefore, pushing into either $x[0] or $z[0] pushes into the same array.
This would have been caught with use strict, which would have said something like:
Can't use string ("ARRAY(0x2339f30)ARRAY(0x2339f30)"...) as an ARRAY ref while "strict refs" in use at repl1.pl line 11.
* note that for this second part, I am not sure that this is what happens, and this is only my (somewhat educated) guess. Please, don't take my word for it, and feel free to correct me if you know better.
The second form creates 9 copies of the same array reference. The third form creates a single array element consisting of a string like "ARRAY(0x221f488)" concatenated together 9 times. Either create 9 individual arrays with e.g. push #row, [] for 1..9; or rely on autovivification.

Perl: read file and re-arrange into columns

I have a file that i want to read in which has the following structure:
EDIT: i made the example a bit more specific to clarify what i need
HEADER
MORE HEADER
POINTS 2
x1 y1 z1
x2 y2 z2
VECTORS velocities
u1 v1 w1
u2 v2 w2
VECTORS displacements
a1 b1 c1
a2 b2 c2
The number of blocks containing some data is arbitrary, so is their order.
i want to read only data under "POINTS" and under "VECTORS displacements" and rearrange them in the following format:
x1 y1 z1 a1 b1 c1
x2 y2 z2 a2 b2 c2
I manage to read the xyz and abc blocks into separate arrays but my problem is to combine them into one.
I should mention that i am a perl newbie. Can somebody help me?
This is made very simple using the range operator. The expression
/DATA-TO-READ/ .. /DATA-NOT-TO-READ/
evaluates to 1 on the first line of the range (the DATA-TO-READ line), 2 on the second etc. On the last line (the DATA-NOT-TO-READ line) E0 is appended to the count so that it evaluates to the same numeric value but can also be tested for being the last line. On lines outside the range it evaluates to a false value.
This program accumulates the data in array #output and prints it when the end of the input is reached. It expects the path to the input file as a parameter on the command line.
use strict;
use warnings;
my (#output, $i);
while (<>) {
my $index = /DATA-TO-READ/ .. /DATA-NOT-TO-READ/;
if ($index and $index > 1 and $index !~ /E/) {
push #{ $output[$index-2] }, split;
}
}
print "#$_\n" for #output;
output
x1 y1 z1 a1 b1 c1
x2 y2 z2 a2 b2 c2
I only used 1 array to remember the first 3 columns. You can output directly when processing the second part of the data.
#!/usr/bin/perl
use strict;
use warnings;
my #first; # To store the first 3 columns.
my $reading; # Flag: are we reading the data?
while (<>) {
next unless $reading or /DATA-TO-READ/; # Skip the header.
$reading = 1, next unless $reading; # Skip the DATA-TO-READ line, enter the
# reading mode.
last if /DATA-NOT-TO-READ/; # End of the first part.
chomp; # Remove a newline.
push #first, $_; # Remember the line.
}
undef $reading; # Restore the flag.
while (<>) {
next unless $reading or /DATA-TO-READ/;
$reading = 1, next unless $reading;
last if /DATA-NOT-TO-READ/;
print shift #first, " $_"; # Print the remembered columns + current line.
}

How to get the array of all ngrams in Perl Text::Ngrams

As you know the module Text::Ngrams in Perl can give Ngrams analysis. There is the following function to retrieve the array of Ngrams and frequencies.
get_ngrams(orderby=>'ngram|frequency|none',onlyfirst=>NUMBER,out=>filename|handle,normalize=>1)
But it gives only the last Ngrams.
For example the following code does not give both Uni-Gram and Bi-Gram:
my $ng3 = Text::Ngrams->new( windowsize => 2, type=>'byte');
my $text = "test teXT TESTtexT";
$text =~ s/ +/ /g; # replace multiple spaces to single
$text = uc $text; # uppercase all
$ng3->process_text($text);
my #ngramsarray = $ng3->get_ngrams(orderby=>'frequency', onlyfirst=>10, normalize=>0 );
foreach(#ngramsarray)
{
print "$_\n";
}
output:
T E
4
E X
2
_ T
2
E S
2
S T
2
X T
2
T _
2
T T
1
However by using the function
to_string(orderby=>'ngram|frequency|none',onlyfirst=>NUMBER,out=>filename|handle,normalize=>1,spartan=>1)
it shows both of Ngrams. But only it displays the result. I need the result in an array.
How to get all Ngrams (Unigram and Bigram) at the same time by this array?
You can't get all the different sizes of n-grams at the same time, but you can get them all using multiple calls to get_ngrams. There is an undocumented parameter n to get_ngrams that says the size of the n-grams you want listed.
In your code, if you say
my #ngramsarray = $ng3->get_ngrams(
n => 1,
orderby = >'frequency',
onlyfirst => 10,
normalize => 0);
you get this list
('T', 8, 'E', 4, 'X', 2, '_', 2, 'S', 2)

How to subtract values in 2 different arrays in perl?

Hi I have two arrays containing 4 columns and I want to subtract the value in column1 of array2 from value in column1 of array1 and value of column2 of array2 from column2 of array1 so on..example:
my #array1=(4.3,0.2,7,2.2,0.2,2.4)
my #array2=(2.2,0.6,5,2.1,1.3,3.2)
so the required output is
2.1 -0.4 2 # [4.3-2.2] [0.2-0.6] [7-5]
0.1 -1.1 -0.8 # [2.2-2.1] [0.2-1.3] [2.4-3.2]
For this the code I used is
my #diff = map {$array1[$_] - $array2[$_]} (0..2);
print OUT join(' ', #diff), "\n";
and the output I am getting now is
2.1 -0.4 2
2.2 -1.1 3.8
Again the first row is used from array one and not the second row, how can I overcome this problem?
I need output in rows of 3 columns like the way i have shown above so just i had filled in my array in row of 3 values.
This will produce the requested output. However, I suspect (based on your comments), that we could produce a better solution if you simply showed your raw input.
use strict;
use warnings;
my #a1 = (4.3,0.2,7,2.2,0.2,2.4);
my #a2 = (2.2,0.6,5,2.1,1.3,3.2);
my #out = map { $a1[$_] - $a2[$_] } 0 .. $#a1;
print "#out[0..2]\n";
print "#out[3..$#a1]\n";
First of all, your code doesn't even compile. Perl arrays aren't space separated - you need a qw() to turn those into arrays. Not sure how you got your results.
Perl doesn't have 2D arrays. 2.2 is NOT a column1 of row 1 of #array1 - it's element with index 3 of #array1. As far as Perl is concerned, your newline is just another whitespace separator, NOT something that magically turns a 1-d array into a table as you seem to think.
To get the result you want (process those 6 elements as 2 3-element arrays), you can either store them in an array of arrayrefs (Perl's implementation of C multidimentional arrays):
my #array1=( [ 4.3, 0.2, 7 ],
[ 2.2, 0.2, 2.4] );
for(my $idx=0; $idx1 < 2; $idx1++) {
for(my $idx2=0; $idx2 < 3; $idx2++) {
print $array1[$idx1]->[$idx2] - $array2[$idx1]->[$idx2];
print " ";
}
print "\n";
}
or, you can simply fake it by using offsets, the same way pointer arithmetic works in C's multidimentional arrays:
my #array1=( 4.3, 0.2, 7, # index 0..2
2.2, 0.2, 2.4); # index 3..5
for(my $idx=0; $idx1 < 2; $idx1++) {
for(my $idx2=0; $idx2 < 3; $idx2++) {
print $array1[$idx1 * 3 + $idx2] - $array2[$idx1 * 3 + $idx2];
print " ";
}
print "\n";
}

How do I calculate the difference between each element in two arrays?

I have a text file with numbers which I have grouped as follows, seperated by blank line:
42.034 41.630 40.158 26.823 26.366 25.289 23.949
34.712 35.133 35.185 35.577 28.463 28.412 30.831
33.490 33.839 32.059 32.072 33.425 33.349 34.709
12.596 13.332 12.810 13.329 13.329 13.569 11.418
Note: the groups are always of equal length and can be arranged in more than one line long, if the group is large, say 500 numbers long.
I was thinking of putting the groups in arrays and iterate along the length of the file.
My first question is: how should I subtract the first element of array 2 from array 1, array 3 from array 2, similarly for the second element and so on till the end of the group?
i.e.:
34.712-42.034,35.133-41.630,35.185-40.158 ...till the end of each group
33.490-34.712,33.839-35.133 ..................
and then save the differences of the first element in one group (second question: how ?) till the end
i.e.:
34.712-42.034 ; 33.490-34.712 ; and so on in one group
35.133-41.630 ; 33.839-35.133 ; ........
I am a beginner so any suggestions would be helpful.
Assuming you have your file opened, the following is a quick sketch
use List::MoreUtils qw<pairwise>;
...
my #list1 = split ' ', <$file_handle>;
my #list2 = split ' ', <$file_handle>;
my #diff = pairwise { $a - $b } #list1, #list2;
pairwise is the simplest way.
Otherwise there is the old standby:
# construct a list using each index in #list1 ( 0..$#list1 )
# consisting of the difference at each slot.
my #diff = map { $list1[$_] - $list2[$_] } 0..$#list1;
Here's the rest of the infrastructure to make Axeman's code work:
#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw<pairwise>;
my (#prev_line, #this_line, #diff);
while (<>) {
next if /^\s+$/; # skip leading blank lines, if any
#prev_line = split;
last;
}
# get the rest of the lines, calculating and printing the difference,
# then saving this line's values in the previous line's values for the next
# set of differences
while (<>) {
next if /^\s+$/; # skip embedded blank lines
#this_line = split;
#diff = pairwise { $a - $b } #this_line, #prev_line;
print join(" ; ", #diff), "\n\n";
#prev_line = #this_line;
}
So given the input:
1 1 1
3 2 1
2 2 2
You'll get:
2 ; 1 ; 0
-1 ; 0 ; 1