Weird subroutine interaction in perl while drawing pyramids out of stars - perl

I wasn't completely sure what to title this, but here is the issue. My goal is to create a subroutine called drawPyramids that will draw a pyramid out of *, with the number of rows depending on the command line parameter. With each row, the number of * increments by 2, so it goes 1-3-5-7 and so on. Here's what I have so far:
sub drawRow {
my $space = $_[0];
my $star = $_[1];
for ($i=0;$i<$space;$i++) {
print " ";
}
for ($i=0;$i<$star;$i++) {
print "*";
}
}
sub drawPyramid {
my $rows = $_[0];
my $x = 1;
for ($i=1;$i<=$rows;$i++) {
drawRow($rows-$i,$x);
print "\n";
$x+=2;
}
}
if(#ARGV == 0) { #check if user entered parameter by checking size of array
die "ERROR: Please supply command-line parameter\n";
}
foreach $a(#ARGV) { #check if number is negative
if ($a < 0) {
print "ERROR: Number must be non-negative\n";
}
}
$sp = #ARGV[0];
$st = #ARGV[1];
drawPyramid($sp);
Lets say I run it as perl pyramid.pl 5 in my CMD. The expected result is:
*
***
*****
*******
*********
with 4 spaces before the first star on the first row, 3 spaces...and so on. However, this is what I get:
*
***
*****
The third row with 5 stars should have 2 spaces before the stars start, and the program doesn't even print the last two lines (which would have 7 and 9 stars).
What is wrong with the program? Any help is appreciated.

The problem is that both subroutines are reading and modifying the same global variable $i (also known as $::i or $main::i), so they're interfering with each other.
To fix this, you should instead use local variables, declared with my; that is, change this:
for ($i = 1; $i <= ...; $i++) {
to this:
for (my $i = 1; $i <= ...; $i++) {
throughout.

Related

using for loop, finding prime number between 1-100

I'd tried lots of ways to get rid of this problem but... I can't find what's the problem of this code.
I use Perl and I want to find the prime number between 1-100.
use strict;
my $i=0;
my $j=0;
my $count=0;
for ($i=0; $i<101; $i++)
{
for ($j=2; $j<$i; $j++)
{
if ($i%$j==0)
{
$count+=1;
}
}
if ($count==0)
{
print "$i\n";
}
}
There are a few things to think about as you use the Sieve of Eratosthenes. Armali already pointed out that you were reusing the value in $count because you had it in a higher scope, so it didn't reset for each number you wanted to check.
But, I reformatted your code a bit:
use v5.10;
NUM: for( my $i=1; $i < 101; $i++ ) {
DIVISOR: for( my $j=2; $j < $i; $j++ ) {
next NUM if $i%$j == 0;
}
say $i;
}
Instead of using a flag variable ($count) to figure out what to do, you can use loop controls. If you find any divisor, you know that you have found a non-prime and there's no need to continue. That is, you don't need to count divisors.
When you find one, stop and move on to the next number. To do that, I've labeled the looping constructs. That way, in the inner loop I can skip to the next iteration of the outer loop. And, usually, once I label one loop I label them all but you don't need to do that.
Once you figure that part out, you don't need to do so much work. Aside from 2, you know that all the even numbers are not prime. You don't need to check those. So, instead of being clever, I'll just break out 2 as a special case:
use v5.10;
say 2;
NUM: for( my $i = 3; $i < 101; $i += 2 ) {
DIVISOR: for( my $j=2; $j < $i; $j++ ) {
next NUM if $i%$j == 0;
}
say $i;
}
The inner loop is doing too much work too. None of the numbers that you are checking are even, so you don't need to check any even divisors (or those ending 5 once you choose 5). And, you only have to go half way, so you can stop when you get to the square root of the number.
#!perl
use v5.10;
say 2;
NUM: for( my $i = 3; $i < 101; $i += 2 ) {
my $stop_at = int sqrt $i;
DIVISOR: for( my $j=3; $j <= $stop_at; $j += 2 ) {
next NUM if $i % $j == 0;
}
say $i;
}
And, for a final flourish, I'll take the top number from the command-line arguments but default to 100. With that, the comparison in the outer loop changes to <=:
#!perl
use v5.10;
my $limit = $ARGV[0] // 100;
say 2;
NUM: for( my $i = 3; $i <= $limit; $i += 2 ) {
my $stop_at = int sqrt $i;
DIVISOR: for( my $j=3; $j <= $stop_at; $j += 2 ) {
next NUM if $i % $j == 0;
}
say $i;
}
But, ikegami notes in a comment that for my $x (0..$n-1) is more idiomatic. That doesn't easily handle step sizes larger than 1. You can do various things to multiply that number to get the candidate number, or ways to generate the list ahead of time (but that means you have the list all at once). I'll switch to a while instead, and assume that these other bits do their work properly.
The $get_number is some magic subroutine that always gives us back the next number, and the is_prime does what it does to make the determination:
while( my $n = $get_number->() ) {
say $n if is_prime($n);
}
Here's one way that might work. First, there's a nifty Perl regex trick to determine primes. It doesn't matter that I'm using that because you can change it to whatever you like because it's hidden behind is_prime. The biggest benefit here is that it's short (and a bit of a show off):
#!perl
use v5.10;
my $get_number = generate_sub( $ARGV[0] // 100 );
while( my $n = $get_number->() ) {
say $n if is_prime($n);
}
sub is_prime { ( '1' x $_[0] ) !~ /\A(11+?)\1+\z/ }
sub generate_sub {
my( $limit ) = #_;
sub {
state $queue = [ 2, 3 ];
return if $queue->[0] > $limit;
push $queue->#*, $queue->[-1] + 2;
shift $queue->#*;
}
}
The generate_sub is a bit more tricky. First, the 2 makes is a bit tricky. Second, Perl doesn't have a yield like Python or Ruby (would be nice). To get around that, I'll see a queue with the first two numbers then add the next number based on the last on (so, adding 2 to 3 gets 5, and so on). That gets around the unique interval from 2 to 3. This stops if the next number in the queue is above the one that you want.
But, that's a bit complicated and only there to handle the special case of 2. I've been playing with a different idiom lately although I'm not convinced its desirable.
The state is a way to declare a persistent lexical variable. It runs only on the first execution. We'll use a state to return the first 2 right away. Then, the next time we come around, that $rc statement doesn't run and $next has 3. From there, I get the current number (0+$next so it's not the same data), and increment $next in a list, but only return the first in that list. That's just a trick that condenses the if-else:
sub generate_sub {
my( $limit ) = #_;
sub {
state $rc = do { return 2 };
state $next = 3;
return $next <= $limit ? ( 0+$next, $next += 2 )[0] : ();
}
}
I don't recommend this for your problem, but you should consider a way to generate the list of numbers so it's not tightly coupled to the problem. That way, you can get rid of the looping constructs.
But, that's much more than you needed to know.
You initialized my $count=0; outside instead of inside the outer for loop.
Besides that, $i should start from 2 rather than 0.

Scoping in Perl

As a biology student, I'm trying to extend my programming knowledge and I ran into a problem with Perl.
I'm trying to create a program that generates random DNA strings and performs analysis work on the generated data.
In the first part of the program, I am able to print out the strings stored in the array, but the second part I cannot retrieve all but one of the elements of the array.
Could this be part of the scoping rules of Perl?
#!usr/bin/perl
# generate a random DNA strings and print it to file specified by the user.
$largearray[0] = 0;
print "How many nucleotides for the string?\n";
$n = <>;
$mylong = $n;
print "how many strings?\n";
$numstrings = <>;
# #largearray =();
$j = 0;
while ( $j < $numstrings ) {
$numstring = ''; # start with the empty string;
$dnastring = '';
$i = 0;
while ( $i < $n ) {
$numstring = int( rand( 4 ) ) . $numstring; # generate a new random integer
# between 0 and 3, and concatenate
# it with the existing $numstring,
# assigning the result to $numstring.
$i++; # increase the value of $i by one.
}
$dnastring = $numstring;
$dnastring =~ tr/0123/actg/; # translate the numbers to DNA characters.
#print $dnastring;
#print "\n";
$largearray[j] = $dnastring; #append generated string to end of array
#print $largearray[j];
#print $j;
#IN HERE THERE ARE GOOD ARRAY VALUES
#print "\n";
$j++;
}
# ii will be used to continuously take the next couple of strings from largearray
# for LCS matching.
$mytotal = 0;
$ii = 0;
while ( $ii < $numstrings ) {
$line = $largearray[ii];
print $largearray[ii]; #CANNOT RETRIEVE ARRAY VALUES
print "\n";
$ii++;
#string1 = split( //, $line );
$line = $largearray[ii];
#print $largearray[ii];
#print "\n";
$ii++;
chomp $line;
#string2 = split( //, $line );
$n = #string1; #assigning a list to a scalar just assigns the
#number of elements in the list to the scalar.
$m = #string2;
$v = 1;
$Cm = 0;
$Im = 0;
$V[0][0] = 0; # Assign the 0,0 entry of the V matrix
for ( $i = 1; $i <= $n; $i++ ) { # Assign the column 0 values and print
# String 1 See section 5.2 of Johnson
# for loops
$V[$i][0] = -$Im * $i;
}
for ( $j = 1; $j <= $m; $j++ ) { # Assign the row 0 values and print String 2
$V[0][$j] = -$Im * $j;
}
for ( $i = 1; $i <= $n; $i++ ) { # follow the recurrences to fill in the V matrix.
for ( $j = 1; $j <= $m; $j++ ) {
# print OUT "$string1[$i-1], $string2[$j-1]\n"; # This is here for debugging purposes.
if ( $string1[ $i - 1 ] eq $string2[ $j - 1 ] ) {
$t = 1 * $v;
}
else {
$t = -1 * $Cm;
}
$max = $V[ $i - 1 ][ $j - 1 ] + $t;
# print OUT "For $i, $j, t is $t \n"; # Another debugging line.
if ( $max < $V[$i][ $j - 1 ] - 1 * $Im ) {
$max = $V[$i][ $j - 1 ] - 1 * $Im;
}
if ( $V[ $i - 1 ][$j] - 1 * $Im > $max ) {
$max = $V[ $i - 1 ][$j] - 1 * $Im;
}
$V[$i][$j] = $max;
}
} #outer for loop
print $V[$n][$m];
$mytotal += $V[$n][$m]; # append current result to the grand total
print "\n";
} # end while loop
print "the average LCS value for length ", $mylong, " strings is: ";
print $mytotal/ $numstrings;
This isn't a scoping issue. You have declared none of your variables, which has the effect of implicitly making them all global and accessible everywhere in your code
I reformatted your Perl program so that I could read it, and then added this to the top of your program
use strict;
use warnings 'all';
which are essential in every Perl program you write
Then I added
no strict 'vars';
which is a very bad idea, and lets you get away without declaring any variables
The result is this
Argument "ii" isn't numeric in array element at E:\Perl\source\dna.pl line 60.
Argument "ii" isn't numeric in array element at E:\Perl\source\dna.pl line 61.
Argument "ii" isn't numeric in array element at E:\Perl\source\dna.pl line 67.
Argument "j" isn't numeric in array element at E:\Perl\source\dna.pl line 42.
Bareword "ii" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 60.
Bareword "ii" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 61.
Bareword "ii" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 67.
Bareword "j" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 42.
Execution of E:\Perl\source\dna.pl aborted due to compilation errors.
Line 42 (of my reformatted version) is
$largearray[j] = $dnastring
and lines 60, 61 and 67 are
$line = $largearray[ii];
print $largearray[ii]; #CANNOT RETRIEVE ARRAY VALUES
and
$line = $largearray[ii];
You are using j and ii as array indexes. Those are Perl subroutine calls, not variables. Adding use strict would have stopped this from compiling unless you had also declared sub ii and sub j
You might get away with it if you just change j and ii to $j and $ii, but you are certain to get into further problems
Please make the same changes to your own code, and declare every variable that you need using my as close as possible to the first place they are used
You should also improve your variable naming. Things like #largearray are pointless: the # says that it's an array, and whether it's large or not is relative, and of little use in understanding your code. If you have no better description of its purpose then #table or #data are probably a little better
Likewise, please avoid capital letters and most single-letter names. #V, $Cm and $Im are meaningless, and you would need fewer comments if those names were better
You certainly wouldn't need comments like # end while loop and # outer for loop if you had indented your blocks properly and kept them short enough so that both the beginning and the end can be seen on the screen at the same time, and the fewer comments you can get away with the better, because they badly clutter the code structure
Finally, it's worth noting that the C-style for loop is rarely the best choice in Perl. Your
for ( $i = 1; $i <= $n; $i++ ) { ... }
is much clearer as
for my $i ( 1 .. $n ) { ... }
and declaring the control variable at that point makes it unnecessary to invent new names like $ii for each new loop
I think you have a typo in your code:
ii => must be $ii
don't forget to put this at the beginning of your code:
use strict;
use warnings;
in order to avoid this (and others) kind of errors

Perl: subroutine throwing error with $i

Good morning,
I'm having trouble using a subroutine - if I put certain code into the subroutine, it throws an error of "use of uninitialised value $i in array element".
I have a very long script, so I will only post the bit I believe is relevant.
The subroutine I am calling is commented out underneath &exon_positive_strand (saves you scrolling down). When I remove the subroutine and uncomment the code, I get no errors. I can only imagine it is something to do with $i but I don't know what...
Any advice would be greatly appreciated.
Many thanks,
Ellie
my ($value, $col, $col2, $l_o_b, $left, $matchedID, $diff_three_prime, $diff_five_prime, $sequence, #three_prime_ss, #five_prime_ss, #reverse_five, #reverse_three);
my $i = 0;
open (EXONS_five, '>fasta_exons_five_non');
open (EXONS_three, '>fasta_exons_three_non');
foreach my $match(#exonic_matches) { ## works out exon from boundary relative to correct strand direction ##
if ($exon_ID[$i] !~ m/unknown/ && $dupmatches[$i] == 0)
{
$sequence = '';
$value = $exon_ID[$i];
$col = $exon_left{$value};
$col2 = $exon_right{$value};
#three_prime_ss = split(",", $col); ##splits left column into subcolumns
#five_prime_ss = split(",", $col2); ## splits right columnn into subcolumns
#reverse_three = reverse(#three_prime_ss);
#reverse_five = reverse(#five_prime_ss);
shift(#reverse_five);
if ($strands{$value} =~ m/\+/) {
&exon_positive_strand;
# $diff_three_prime = $LBP[$i] - $three_prime_ss[$exons2{$value} - 1]; ## minus numbers denote a difference to the left (i.e. upsteam)
# $diff_five_prime = $LBP[$i] - $five_prime_ss[$exons2{$value} - 1]; ## minus numbers denote a difference to the left (i.e. upsteam)
# $matchedID = $ID{$LBP[$i]};
# if ($diff_three_prime !~ m/\-/ && $diff_three_prime <= 3) {
# $BP{$LBP[$i]} =~ s/\[[ACTG]\]/$ref[$i]/i; ## putting variant into 50BP seq
# $l_o_b = 20;
# ##$right_of_boundary = 3;
# $l_o_b = $l_o_b + $diff_three_prime;
# $left = 51 - $l_o_b;
# $sequence = substr($BP{$LBP[$i]}, $left, 23);
# }
# elsif ($diff_five_prime =~ m/\-/ && $diff_five_prime >= -3) {
# $BP{$LBP[$i]} =~ s/\[[ACTG]\]/$ref[$i]/i; ## putting variant into 50BP seq
# $l_o_b = 3;
# ##$right_of_boundary = 6;
# $l_o_b = $l_o_b + $diff_five_prime;
# $left = 51 - $l_o_b;
# $sequence = substr( $BP{$LBP[$i]}, $left, 9);
}
}
my $seq_length = length($sequence);
if ($seq_length == 9) {
print EXONS_five (">" . "$match_exon{$col_exon_no[$i]}" . "\n", lc($sequence),"\n");
}
elsif ($seq_length == 23) {
print EXONS_three (">" . "$match_exon{$col_exon_no[$i]}" . "\n", lc($sequence),"\n");
}
$i++;
}
close (EXONS_five);
close (EXONS_three);
"Use of uninitialized value in array element" is not an error, it's a warning. Diagnostics can tell you what it means:
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.
To help you figure out what was undefined, perl will try to tell you the
name of the variable (if any) that was undefined. In some cases it cannot
do this, so it also tells you what operation you used the undefined value
in. Note, however, that perl optimizes your program and the operation
displayed in the warning may not necessarily appear literally in your
program. For example, "that $foo" is usually optimized into "that "
. $foo, and the warning will refer to the concatenation (.) operator,
even though there is no . in your program.
You need to pass the $i variable to the subroutine:
exon_positive_strand($i);
and
sub exon_positive_strand {
my $i = shift;
...

A little help with loops on perl

I am having trouble specifiying the correct algorithm. I am iterating over an input file with loops. The issue that I have is on the last loop.
#!/usr/bin/perl
# Lab #4
# Judd Bittman
# http://www-users.cselabs.umn.edu/classes/Spring-2011/csci3003/index.php?page=labs
# this site has what needs to be in the lab
# lab4 is the lab instructions
# yeast protein is the part that is being read
use warnings;
use strict;
my $file = "<YeastProteins.txt";
open(my $proteins, $file);
my #identifier;
my #qualifier;
my #molecularweight;
my #pi;
while (my $line1 = <$proteins>) {
#print $line1;
chomp($line1);
my #line = split(/\t/, $line1);
push(#identifier, $line[0]);
push(#qualifier, $line[1]);
push(#molecularweight, $line[2]);
push(#pi, $line[3]);
}
my $extreme = 0;
my $ex_index = 0;
for (my $index = 1; $index < 6805; $index++) {
if ( defined($identifier[$index])
&& defined($qualifier[$index])
&& defined($molecularweight[$index])
&& defined($pi[$index])) {
# print"$identifier[$index]\t:\t$qualifier[$index]:\t$molecularweight[$index]:\n$pi[$index]";
}
if ( defined($identifier[$index])
&& defined($qualifier[$index])
&& defined($pi[$index])) {
if (abs($pi[$index] - 7) > $extreme && $qualifier[$index] eq "Verified")
{
$extreme = abs($pi[$index] - 7);
$ex_index = $identifier[$index];
print $extreme. " " . $ex_index . "\n";
}
}
}
print $extreme;
print "\n";
print $ex_index;
print "\n";
# the part above does part b of the assignment
# YLR204W,its part of the Mitochondrial inner membrane protein as well as a processor.
my $exindex = 0;
my $high = 0;
# two lines above and below is part c
# there is an error and I know there is something wrong
for (my $index = 1; $index < 6805; $index++) {
if ( defined($qualifier[$index])
&& ($qualifier[$index]) eq "Verified"
&& defined($molecularweight[$index])
&& (abs($molecularweight[$index]) > $high)) {
$high = (abs($molecularweight[$index]) > $high); # something wrong on this line, I know I wrote something wrong
$exindex = $identifier[$index];
}
}
print $high;
print "\n";
print $exindex;
print "\n";
close($proteins);
exit;
On the final loop I want my loop to hold on to the protein that is verified and has the highest molecular mass. This is in the input file. What code can I use to tell the program that I want to hold the highest number and its name? I feel like I am very close but I can't put my finger on it.
First, a note about perl - in general, it's more common to use foreach style loops rather than c-style indexed loops. For example:
for my $protein (#proteins) {
#do something with $p
}
(Your situation might require it, I just thought I'd mention this)
To address your specific issue though:
$high = (abs($molecularweight[$index])>$high);
$high is being set to the result of the boolean test being performed. Remove the >$high part (which is being tested in your if statement) and you'll likely end up with what you intended.
You likely want a more complex data structure, such as a nested hash. It's hard to give a solid example without more knowledge of the data, but, say your first identifier were abc, the second one was def, etc:
my %protein_entries = (
abc => {
qualifier => 'something',
molecular_weight => 1234,
pi => 'something',
},
def => {
qualifier => 'something else',
molecular_weight => 5678,
pi => 'something else',
},
# …
);
Then, rather than having several different arrays and keeping track of which belongs to which, you get at the elements like so:
Then, if you want to get at the highest by molecular weight, you can sort the identifiers by their molecular weight, then slice off the highest one:
my $highest = (sort {
$protein_entries{$a}{molecular_weight}
<=>
$protein_entries{$b}{molecular_weight}
} keys %protein_entries)[1];
You're having problem with your algorithm because you're not structuring your data properly, basically.
In this example, $highest will hold def, then later you can go back and fetch $protein_entries{def}{molecular_weight} or any of the other keys in the anonymous hash referenced by $protein_entries{def}, thus being easily able to recall any relevant associated data.
Just change:
$high = (abs($molecularweight[$index]) > $high);
To this:
$high = abs($molecularweight[$index]) if (abs($molecularweight[$index]) > $high);
At the end of the loop, $high will be the highest value in $molecularweight array.

find extra, missing, invalid strings when comparing two lists in perl

List-1 List-2
one one
two three
three three
four four
five six
six seven
eight eighttt
nine nine
Looking to output
one | one PASS
two | * FAIL MISSING
three | three PASS
* | three FAIL EXTRA
four | four PASS
five | * FAIL MISSING
six | six PASS
* | seven FAIL EXTRA
eight | eighttt FAIL INVALID
nine | nine PASS
Actually the return from my current solution is a reference to the two modified lists and a reference to a "fail" list describing the failure for the index as either "no fail", "missing", "extra", or "invalid" which is also (obviously) fine output.
My current solution is:
sub compare {
local $thisfound = shift;
local $thatfound = shift;
local #thisorig = #{ $thisfound };
local #thatorig = #{ $thatfound };
local $best = 9999;
foreach $n (1..6) {
local $diff = 0;
local #thisfound = #thisorig;
local #thatfound = #thatorig;
local #fail = ();
for (local $i=0;$i<scalar(#thisfound) || $i<scalar(#thatfound);$i++) {
if($thisfound[$i] eq $thatfound[$i]) {
$fail[$i] = 'NO_FAIL';
next;
}
if($n == 1) { # 1 2 3
next unless __compare_missing__();
next unless __compare_extra__();
next unless __compare_invalid__();
} elsif($n == 2) { # 1 3 2
next unless __compare_missing__();
next unless __compare_invalid__();
next unless __compare_extra__();
} elsif($n == 3) { # 2 1 3
next unless __compare_extra__();
next unless __compare_missing__();
next unless __compare_invalid__();
} elsif($n == 4) { # 2 3 1
next unless __compare_extra__();
next unless __compare_invalid__();
next unless __compare_missing__();
} elsif($n == 5) { # 3 1 2
next unless __compare_invalid__();
next unless __compare_missing__();
next unless __compare_extra__();
} elsif($n == 6) { # 3 2 1
next unless __compare_invalid__();
next unless __compare_extra__();
next unless __compare_missing__();
}
push #fail,'INVALID';
$diff += 1;
}
if ($diff<$best) {
$best = $diff;
#thisbest = #thisfound;
#thatbest = #thatfound;
#failbest = #fail;
}
}
return (\#thisbest,\#thatbest,\#failbest)
}
sub __compare_missing__ {
my $j;
### Does that command match a later this command? ###
### If so most likely a MISSING command ###
for($j=$i+1;$j<scalar(#thisfound);$j++) {
if($thisfound[$j] eq $thatfound[$i]) {
$diff += $j-$i;
for ($i..$j-1) { push(#fail,'MISSING'); }
#end = #thatfound[$i..$#thatfound];
#thatfound = #thatfound[0..$i-1];
for ($i..$j-1) { push(#thatfound,'*'); }
push(#thatfound,#end);
$i=$j-1;
last;
}
}
$j == scalar(#thisfound);
}
sub __compare_extra__ {
my $j;
### Does this command match a later that command? ###
### If so, most likely an EXTRA command ###
for($j=$i+1;$j<scalar(#thatfound);$j++) {
if($thatfound[$j] eq $thisfound[$i]) {
$diff += $j-$i;
for ($i..$j-1) { push(#fail,'EXTRA'); }
#end = #thisfound[$i..$#thisfound];
#thisfound = #thisfound[0..$i-1];
for ($i..$j-1) { push (#thisfound,'*'); }
push(#thisfound,#end);
$i=$j-1;
last;
}
}
$j == scalar(#thatfound);
}
sub __compare_invalid__ {
my $j;
### Do later commands match? ###
### If so most likely an INVALID command ###
for($j=$i+1;$j<scalar(#thisfound);$j++) {
if($thisfound[$j] eq $thatfound[$j]) {
$diff += $j-$i;
for ($i..$j-1) { push(#fail,'INVALID'); }
$i=$j-1;
last;
}
}
$j == scalar(#thisfound);
}
But this isn't perfect ... who wants to simplify and improve? Specifically ... within a single data set, one order of searching is better for a subset and another order is better for a different subset.
If the arrays contain duplicate values, the answer is quite a bit more complicated than that.
See e.g. Algorithm::Diff or read about Levenshtein distance.
From perlfaq4's answer to How can I tell whether a certain element is contained in a list or array?:
(portions of this answer contributed by Anno Siegel and brian d foy)
Hearing the word "in" is an indication that you probably should have used a hash, not a list or array, to store your data. Hashes are designed to answer this question quickly and efficiently. Arrays aren't.
That being said, there are several ways to approach this. In Perl 5.10 and later, you can use the smart match operator to check that an item is contained in an array or a hash:
use 5.010;
if( $item ~~ #array )
{
say "The array contains $item"
}
if( $item ~~ %hash )
{
say "The hash contains $item"
}
With earlier versions of Perl, you have to do a bit more work. If you are going to make this query many times over arbitrary string values, the fastest way is probably to invert the original array and maintain a hash whose keys are the first array's values:
#blues = qw/azure cerulean teal turquoise lapis-lazuli/;
%is_blue = ();
for (#blues) { $is_blue{$_} = 1 }
Now you can check whether $is_blue{$some_color}. It might have been a good idea to keep the blues all in a hash in the first place.
If the values are all small integers, you could use a simple indexed array. This kind of an array will take up less space:
#primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
#is_tiny_prime = ();
for (#primes) { $is_tiny_prime[$_] = 1 }
# or simply #istiny_prime[#primes] = (1) x #primes;
Now you check whether $is_tiny_prime[$some_number].
If the values in question are integers instead of strings, you can save quite a lot of space by using bit strings instead:
#articles = ( 1..10, 150..2000, 2017 );
undef $read;
for (#articles) { vec($read,$_,1) = 1 }
Now check whether vec($read,$n,1) is true for some $n.
These methods guarantee fast individual tests but require a re-organization of the original list or array. They only pay off if you have to test multiple values against the same array.
If you are testing only once, the standard module List::Util exports the function first for this purpose. It works by stopping once it finds the element. It's written in C for speed, and its Perl equivalent looks like this subroutine:
sub first (&#) {
my $code = shift;
foreach (#_) {
return $_ if &{$code}();
}
undef;
}
If speed is of little concern, the common idiom uses grep in scalar context (which returns the number of items that passed its condition) to traverse the entire list. This does have the benefit of telling you how many matches it found, though.
my $is_there = grep $_ eq $whatever, #array;
If you want to actually extract the matching elements, simply use grep in list context.
my #matches = grep $_ eq $whatever, #array;
sub compare {
local #d = ();
my $this = shift;
my $that = shift;
my $distance = _levenshteindistance($this, $that);
my #thisorig = #{ $this };
my #thatorig = #{ $that };
my $s = $#thisorig;
my $t = $#thatorig;
#this = ();
#that = ();
#fail = ();
while($s>0 || $t>0) {
# deletion, insertion, substitution
my $min = _minimum($d[$s-1][$t],$d[$s][$t-1],$d[$s-1][$t-1]);
if($min == $d[$s-1][$t-1]) {
unshift(#this,$thisorig[$s]);
unshift(#that,$thatorig[$t]);
if($d[$s][$t] > $d[$s-1][$t-1]) {
unshift(#fail,'INVALID');
} else {
unshift(#fail,'NO_FAIL');
}
$s -= 1;
$t -= 1;
} elsif($min == $d[$s][$t-1]) {
unshift(#this,'*');
unshift(#that,$thatorig[$t]);
unshift(#fail,'EXTRA');
$t -= 1;
} elsif($min == $d[$s-1][$t]) {
unshift(#this,$thisorig[$s]);
unshift(#that,'*');
unshift(#fail,'MISSING');
$s -= 1;
} else {
die("Error! $!");
}
}
return(\#this, \#that, \#fail);
}
sub _minimum {
my $ret = 2**53;
foreach $in (#_) {
$ret = $ret < $in ? $ret : $in;
}
$ret;
}
sub _levenshteindistance {
my $s = shift;
my $t = shift;
my #s = #{ $s };
my #t = #{ $t };
for(my $i=0;$i<scalar(#s);$i++) {
$d[$i] = ();
}
for(my $i=0;$i<scalar(#s);$i++) {
$d[$i][0] = $i # deletion
}
for(my $j=0;$j<scalar(#t);$j++) {
$d[0][$j] = $j # insertion
}
for(my $j=1;$j<scalar(#t);$j++) {
for(my $i=1;$i<scalar(#s);$i++) {
if ($s[$i] eq $t[$j]) {
$d[$i][$j] = $d[$i-1][$j-1];
} else {
# deletion, insertion, substitution
$d[$i][$j] = _minimum($d[$i-1][$j]+1,$d[$i][$j-1]+1,$d[$i-1][$j-1]+1);
}
}
}
foreach $a (#d) {
#a = #{ $a };
foreach $b (#a) {
printf STDERR "%2d ",$b;
}
print STDERR "\n";
}
return $d[$#s][$#t];
}
The trick in Perl (and similar languages) is the hash, which doesn't care about order.
Suppose that the first array is the one that hold the valid elements. Construct a hash with those values as keys:
my #valid = qw( one two ... );
my %valid = map { $_, 1 } #valid;
Now, to find the invalid elements, you just have to find the ones not in the %valid hash:
my #invalid = grep { ! exists $valid{$_} } #array;
If you want to know the array indices of the invalid elements:
my #invalid_indices = grep { ! exists $valid{$_} } 0 .. $#array;
Now, you can expand that to find the repeated elements too. Not only do you check the %valid hash, but also keep track of what you have already seen:
my %Seen;
my #invalid_indices = grep { ! exists $valid{$_} && ! $Seen{$_}++ } 0 .. $#array;
The repeated valid elements are the ones with a value in %Seen that is greater than 1:
my #repeated_valid = grep { $Seen{$_} > 1 } #valid;
To find the missing elements, you look in %Seen to check what isn't in there.
my #missing = grep { ! $Seen{$_ } } #valid;
From perlfaq4's answer to How do I compute the difference of two arrays? How do I compute the intersection of two arrays?:
Use a hash. Here's code to do both and more. It assumes that each element is unique in a given array:
#union = #intersection = #difference = ();
%count = ();
foreach $element (#array1, #array2) { $count{$element}++ }
foreach $element (keys %count) {
push #union, $element;
push #{ $count{$element} > 1 ? \#intersection : \#difference }, $element;
}
Note that this is the symmetric difference, that is, all elements in either A or in B but not in both. Think of it as an xor operation.