Why does my Perl for loop exit early? - perl

I am trying to get a perl loop to work that is working from an array that contains 6 elements. I want the loop to pull out two elements from the array, perform certain functions, and then loop back and pull out the next two elements from the array until the array runs out of elements. Problem is that the loop only pulls out the first two elements and then stops. Some help here would be greatly apperaciated.
my open(infile, 'dnadata.txt');
my #data = < infile>;
chomp #data;
#print #data; #Debug
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
my $i=0;
my $j=0;
my #matrix =();
for(my $i=0; $i<2; $i++){
for( my $j=0; $j<$aalen; $j++){
$matrix[$i][$j] = 0;
}
}
The guidelines for this program states that the program should ignore the presence of gaps in the program. which means that DNA code that is matched up with a gap should be ignored. So the code that is pushed through needs to have alignments linked with gaps removed.
I need to modify the length of the array by two since I am comparing two sequence in this part of the loop.
#$lemseqcomp = $lenarray / 2;
#print $lenseqcomp;
#I need to initialize these saclar values.
$junk1 = " ";
$junk2 = " ";
$seq1 = " ";
$seq2 = " ";
This is the loop that is causeing issues. I belive that the first loop should move back to the array and pull out the next element each time it loops but it doesn't.
for($i=0; $i<$lenarray; $i++){
#This code should remove the the last value of the array once and
#then a second time. The sequences should be the same length at this point.
my $last1 =pop(#data1);
my $last2 =pop(#data1);
for($i=0; $i<length($last1); $i++){
my $letter1 = substr($last1, $i, 1);
my $letter2 = substr($last2, $i, 1);
if(($letter1 eq '-')|| ($letter2 eq '-')){
#I need to put the sequences I am getting rid of somewhere. Here is a good place as any.
$junk1 = $letter1 . $junk1;
$junk2 = $letter1 . $junk2;
}
else{
$seq1 = $letter1 . $seq1;
$seq2 = $letter2 . $seq2;
}
}
}
print "$seq1\n";
print "$seq2\n";
print "#data1\n";
I am actually trying to create a substitution matrix from scratch and return the data. The reason why the code looks weird, is because it isn't actually finished yet and I got stuck.
This is the test sequence if anyone is curious.
YFRFR
YF-FR
FRFRFR
ARFRFR
YFYFR-F
YFRFRYF

First off, if you're going to work with sequence data, use BioPerl. Life will be so much easier. However...
Since you know you'll be comparing the lines from your input file as pairs, it makes sense to read them into a datastructure that reflects that. As elsewhere suggested, an array like #data[[line1, line2],[line3,line4]) ensures that the correct pairs of lines are always together.
What I'm not clear on what you're trying to do is:
a) are you generating a consensus
sequence where the 2 sequences are
difference only by gaps
b) are your 2 sequences significantly
different and you're trying to
exclude the non-aligning parts and
then generate a consensus?
So, does the first pair represent your data, or is it more like the second?
ATCG---AAActctgGGGGG--taGC
ATCGcccAAActctgGGGGGTTtaGC
ATCG---AAActctgGGGGG--taGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
ATCGcccAAActctgGGGGGTTtaGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

The problem is that you're using $i as the counter variable for both your loops, so the inner loop modifies the counter out from under the outer loop. Try changing the inner loop's counter to $j, or using my to localize them properly.

Don't store your values as an array, store as a two-dimensional array:
my #dataset = ([$val1, $val2], [$val3, $val4]);
or
my #dataset;
push (#dataset, [$val_n1, $val_n2]);
Then:
for my $value (#dataset) {
### Do stuff with $value->[0] and $value->[1]
}

There are lots of strange things in your code: you are initializing a matrix then not using it; reading a whole file into an array; scanning a string C style but then not doing anything with the unmatched values; and finally, just printing the two last processed values (which, in your case, are the two first elements of your array, since you are using pop.)
Here's a guess.
use strict;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
# Preparing a regular expression. This is kind of useful if processing large
# amounts of data. This will match anything that is not in the string above.
my $regex = qr([^$aminoacids]);
# Our work function.
sub do_something {
my ($a, $b) = #_;
$a =~ s/$regex//g; # removing unwanted characters
$b =~ s/$regex//g; # ditto
# Printing, saving, whatever...
print "Something: $a - $b\n";
return ($a, $b);
}
my $prev;
while (<>) {
chomp;
if ($prev) {
do_something($prev, $_);
$prev = undef;
} else {
$prev = $_;
}
}
print STDERR "Warning: trailing data: $prev\n"
if $prev;

Since you are a total Perl/programming newbie, I am going to show a rewrite of your first code block, then I'll offer you some general advice and links.
Let's look at your first block of sample code. There is a lot of stuff all strung together, and it's hard to follow. I, personally, am too dumb to remember more than a few things at a time, so I chop problems into small pieces that I can understand. This is (was) known as 'chunking'.
One easy way to chunk your program is use write subroutines. Take any particular action or idea that is likely to be repeated or would make the current section of code long and hard to understand, and wrap it up into a nice neat package and get it out of the way.
It also helps if you add space to your code to make it easier to read. Your mind is already struggling to grok the code soup, why make things harder than necessary? Grouping like things, using _ in names, blank lines and indentation all help. There are also conventions that can help, like making constant values (values that cannot or should not change) all capital letters.
use strict; # Using strict will help catch errors.
use warnings; # ditto for warnings.
use diagnostics; # diagnostics will help you understand the error messages
# Put constants at the top of your program.
# It makes them easy to find, and change as needed.
my $AMINO_ACIDS = 'ARNDCQEGHILKMFPSTWYV';
my $AMINO_COUNT = length($AMINO_ACIDS);
my $DATA_FILE = 'dnadata.txt';
# Here I am using subroutines to encapsulate complexity:
my #data = read_data_file( $DATA_FILE );
my #matrix = initialize_matrix( 2, $amino_count, 0 );
# now we are done with the first block of code and can do more stuff
...
# This section down here looks kind of big, but it is mostly comments.
# Remove the didactic comments and suddenly the code is much more compact.
# Here are the actual subs that I abstracted out above.
# It helps to document your subs:
# - what they do
# - what arguments they take
# - what they return
# Read a data file and returns an array of dna strings read from the file.
#
# Arguments
# data_file => path to the data file to read
sub read_data_file {
my $data_file = shift;
# Here I am using a 3 argument open, and a lexical filehandle.
open( my $infile, '<', $data_file )
or die "Unable to open dnadata.txt - $!\n";
# I've left slurping the whole file intact, even though it can be very inefficient.
# Other times it is just what the doctor ordered.
my #data = <$infile>;
chomp #data;
# I return the data array rather than a reference
# to keep things simple since you are just learning.
#
# In my code, I'd pass a reference.
return #data;
}
# Initialize a matrix (or 2-d array) with a specified value.
#
# Arguments
# $i => width of matrix
# $j => height of matrix
# $value => initial value
sub initialize_matrix {
my $i = shift;
my $j = shift;
my $value = shift;
# I use two powerful perlisms here: map and the range operator.
#
# map is a list contsruction function that is very very powerful.
# it calls the code in brackets for each member of the the list it operates against.
# Think of it as a for loop that keeps the result of each iteration,
# and then builds an array out of the results.
#
# The range operator `..` creates a list of intervening values. For example:
# (1..5) is the same as (1, 2, 3, 4, 5)
my #matrix = map {
[ ($value) x $i ]
} 1..$j;
# So here we make a list of numbers from 1 to $j.
# For each member of the list we
# create an anonymous array containing a list of $i copies of $value.
# Then we add the anonymous array to the matrix.
return #matrix;
}
Now that the code rewrite is done, here are some links:
Here's a response I wrote titled "How to write a program". It offers some basic guidelines on how to approach writing software projects from specification. It is aimed at beginners. I hope you find it helpful. If nothing else, the links in it should be handy.
For a beginning programmer, beginning with Perl, there is no better book than Learning Perl.
I also recommend heading over to Perlmonks for Perl help and mentoring. It is an active Perl specific community site with very smart, friendly people who are happy to help you. Kind of like Stack Overflow, but more focused.
Good luck!

Instead of using a C-style for loop, you can read data from an array two elements at a time using splice inside a while loop:
while (my ($letter1, $letter2) = splice(#data, 0, 2))
{
# stuff...
}
I've cleaned up some of your other code below:
use strict;
use warnings;
open(my $infile, '<', 'dnadata.txt');
my #data = <$infile>;
close $infile;
chomp #data;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
# initialize a 2 x 21 array for holding the amino acid data
my $matrix;
foreach my $i (0 .. 1)
{
foreach my $j (0 .. $aalen-1)
{
$matrix->[$i][$j] = 0;
}
}
# Process all letters in the DNA data
while (my ($letter1, $letter2) = splice(#data, 0, 2))
{
# do something... not sure what?
# you appear to want to look up the letters in a reference table, perhaps $aminoacids?
}

Related

How to find common parts in a paths with perl?

Having several paths, like:
1: /abc/def/some/common/part/xyz/file1.ext
2: /other/path/to/7433/qwe/some/common/part/anotherfile.ext
3: /misc/path/7433/qwe/some/common/part/filexx.ext
4: /2443/totally/different/path/file9988.ext
5: /abc/another/same/path/to/ppp/thisfile.ext
6: /deep1/deep2/another/same/path/to/diffone/filename.ext
I need find the common parts - each possible ones, eg. in the above if possible to found common parts:
/some/common/part/ - in the paths 1,2,3
/another/same/path/to/ - in the 5,6
/path/to/ - in the 2,5,6
/path/ - 2,3,4,5,6
etc..
I simply absoulutely haven't any idea how to solve this - what approach is good one
string based - somewhat find common parts of a string
list based - splitting all path into lists and somewhat compare arrays for common elements
tree-graph - somewhat find a common parts of a graph
other?
When i get some direction how to solve this problem, I'm (probably) able code it myself - so don't want free programmming service - but need some guiding how to start.
I'm sure than here is already some CPAN module what could help me, but I'm really have'nt idea how to find the right useful module from the list of 30k modules for the above problem. :(
EDIT - For what i need this:
Having approx. 200k files, in 10k directories and many of them "belong together", like:
/u/some/path/project1/subprojct/file1
/u/backup/of/work/date/project1/subproject/file2
/u/backup_of_backup/of/work/date/project1/subproject/file2
/u/new/addtions/to/projec1/subproject/file3
The files are dirrerent kind (pdf, images, doc, txt and so on), several are identical (like above file2 - easy to filter with Digest::MD5), but the only way "group them together" is based on "common parts" of a path - e.g. "project1/subproject" and so on..
Another files HAS the same MD5, so can filter out duplicates, but they are in different trees, like
/u/path/some/file
/u/path/lastest_project/menu/file
/u/path/jquery/menu/file
/u/path/example/solution/jquery/menu/file
so, the files are the same, (identical md5) but need somewhat move one copy to the right place (and delete others) and need somewhat determine the "most used" common paths, and collect tags... (old path elements are tags)
The idea behind is:
if the same md5 files are mostly stored under some common path - I can make a decision where to move one copy...
And it is more complicated, but for explanation is enough the above ;)
Simply need lowering the entropy on my HDD ;)
There is some discussion about finding the longest common consecutive substrings in this thread: http://www.nntp.perl.org/group/perl.fwp/2002/02/msg1662.html
The "winner" appears to be the following code, but there are a few other things in there you could try:
#!/usr/bin/perl
use strict;
use warnings;
sub lcs {
my $this = shift;
my $that = shift;
my $str = join "\0", $this, $that;
my $len = 1;
my $lcs;
while ($str =~ m{ ([^\0]{$len,}) (?= [^\0]* \0 [^\0]*? \1 ) }xg) {
$lcs = $1;
$len = 1 + length($1);
}
if ($len == 1) { print("No common substring\n"); }
else {
print("Longest common substring of length $len: \"");
print("$lcs");
print("\"\n");
}
}
Keep in mind you would have to adjust it a little bit to account for the fact that you only want entire subdirectories that match... ie, change if ($len == 1) to something like if ($len == 1 or $lcs !~ /^\// or $lcs !~ /\/$/)
You would also have to add some bookkeeping to keep track of which ones match. When I ran this code on your examples above it also found the /abc/ match in lines 1 & 5.
One thing that may or may not be a problem is that the following two lines:
/abc/another/same/path/to/ppp/thisfile.ext
/abc/another/different/path/to/ppp/otherfile.ext
Would match on:
/abc/another/
But not on:
/path/to/ppp/
But -- here's the bad news -- you will have to do O(n^2) comparisons with n=200,000 files. That could take an obscene amount of time.
Another solution would be to go through each path in your list, add all of its possible directory paths as keys to a hash and push the file itself to the hash (so that the value is an array of files that have this path in it). Something like this:
use strict;
use warnings;
my %links;
open my $fh, "<", 'filename' or die "Can't open $!";
while (my $line = <$fh>) {
chomp($line);
my #dirs = split /\//, $line;
for my $i (0..$#dirs) {
if ($i == $#dirs) {
push(#{ $links{$dirs[$i]} }, $line);
}
for my $j ($i+1..$#dirs) {
push(#{ $links{join("/",#dirs[$i..$j])} }, $line);
#PROCESS THIS if length of array is > 1
}
}
}
Of course, this would take an obscene amount of memory. With 200,000 files to process, you might have a hard time no matter what you try, but maybe you can break it up into more manageable chunks. Hopefully, this will give you a starting point.
To solve this problem, you need the correct data structure. A hash that counts the partial paths works well:
use File::Spec;
my %Count_of = ();
while( <DATA> ){
my #names = File::Spec->splitdir( $_ );
# remove file
pop #names;
# if absolute path, remove empty names at start
shift #names while length( $names[0] ) == 0;
# don't count blank lines
next unless #names;
# move two cursor thru the names,
# and count the partial parts
# created from one to the other
for my $i ( 0 .. $#names ){
for my $j ( $i .. $#names ){
my $partial_path = File::Spec->catdir( #names[ $i .. $j ] );
$Count_of{ $partial_path } ++;
}
}
}
# now display the results
for my $path ( sort { $Count_of{$b} <=> $Count_of{$a} || $a cmp $b } keys %Count_of ){
# skip if singleton.
next if $Count_of{ $path } <= 1;
printf "%3d : %s\n", $Count_of{ $path }, $path;
}
__DATA__
/abc/def/some/common/part/xyz/file1.ext
/other/path/to/7433/qwe/some/common/part/anotherfile.ext
/misc/path/7433/qwe/some/common/part/filexx.ext
/2443/totally/different/path/file9988.ext
/abc/another/same/path/to/ppp/thisfile.ext
/deep1/deep2/another/same/path/to/diffone/filename.ext

Syntax errors at line 24 and 26. I don't know why?

syntax error at bioinfo2.pl line 24, near ");"
syntax error at bioinfo2.pl line 26, near "}"
Execution of bioinfo2.pl aborted due to compilation errors.
print "Enter file name......\n\n";
chomp($samplefile = <STDIN>);
open(INFILE,"$samplefile") or die "Could not open $samplefile";
#residue_name= ();
#residue_count= ();
while($newline = <INFILE>)
{
if ($newline =~ /^ATOM/)
{
chomp $newline;
#columns = split //, $newline;
$res = join '', $columns[17], $columns[18], $columns[19];
splice #columns,0;
$flag=0
for ($i = 0; $i<scalar(#residue_name); $i++;)
{
if (#residue_name[i] == $res)
{
#residue_count[i] = #residue_count[i] + 1;
$flag=1;
}
}
if($flag==0)
{
push(#residue_name, $res);
}
for ($i = 0; $i<scalar(#residue_name); $i++)
{
print (#residue_name[i], "-------", #residue_count[i], "\n");
}
}
}
It might be advisable to use strict; use warnings. That forces you to declare your variables (you can do so with my), and rules out many possible errors.
Here are a few things that I noticed:
In Perl5 v10 and later, you can use the say function (use 5.010 or use feature 'say'). This works like print but adds a newline at the end.
Never use the two-arg form of open. This opens some security issues. Provide an explicit open mode. Also, you can use scalars as filehandles; this provides nice features like auto-closing of files.
open my $INFILE, '<', $samplefile or die "Can't open $samplefile: $!";
The $! variable contains the reason why the open failed.
If you want to retrieve a list of elements from an array, you can use a slice (multiple subscripts):
my $res = join '', #columns[17 .. 19]; # also, range operator ".."
Note that the sigil is now an #, because we take multiple elems.
The splice #columns, 0 is a fancy way of saying “delete all elements from the array, and return them”. This is not neccessary (you don't read from that variable later). If you use lexical variables (declared with my), then each iteration of the while loop will receive a new variable. If you really want to remove the contents, you can undef #columns. This should be more efficient.
Actual error: You require a semicolon after $flag = 0 to terminate the statement before you can begin a loop.
Actual error: A C-style for-loop contains three expressions contained in parens. Your last semicolon divides them into 4 expressions, this is an error. Simply remove it, or look at my next tip:
C-style loops (for (foo; bar; baz) {}) are painful and error-prone. If you only iterate over a range (e.g. of indices), then you can use the range operator:
for my $i (0 .. $#residue_name) { ... }
The $# sigil gives the last index of an array.
When subscripting arrays (accessing array elements), then you have to include the sigil of the index:
$residue_name[$i]
Note that the sigil of the array is $, because we access only one element.
The pattern $var = $var + 1 can be shortened to $var++. This uses the increment operator.
The $flag == 0 could be abbreviated to !$flag, as all numbers except zero are considered true.
Here is a reimplementation of the script. It takes the filename as a command line argument; this is more flexible than prompting the user.
#!/usr/bin/perl
use strict; use warnings; use 5.010;
my $filename = $ARGV[0]; # #ARGV holds the command line args
open my $fh, "<", $filename or die "Can't open $filename: $!";
my #residue_name;
my #residue_count;
while(<$fh>) { # read into "$_" special variable
next unless /^ATOM/; # start a new iteration if regex doesn't match
my $number = join "", (split //)[17 .. 19]; # who needs temp variables?
my $push_number = 1; # self-documenting variable names
for my $i (0 .. $#residue_name) {
if ($residue_name[$i] == $number) {
$residue_count[$i]++;
$push_number = 0;
}
}
push #residue_name, $number if $push_number;
# are you sure you want to print this after every input line?
# I'd rather put this outside the loop.
for my $i (0 .. $#residue_name) {
say $residue_name[$i], ("-" x 7), $residue_count[$i]; # "x" repetition operator
}
}
And here is an implementation that may be faster for large input files: We use hashes (lookup tables), instead of looping through arrays:
#!/usr/bin/perl
use strict; use warnings; use 5.010;
my $filename = $ARGV[0]; # #ARGV holds the command line args
open my $fh, "<", $filename or die "Can't open $filename: $!";
my %count_residue; # this hash maps the numbers to counts
# automatically guarantees that every number has one count only
while(<$fh>) { # read into "$_" special variable
next unless /^ATOM/; # start a new iteration if regex doesn't match
my $number = join "", (split //)[17 .. 19]; # who needs temp variables?
if (exists $count_residue{$number}) {
# if we already have an entry for that number, we increment:
$count_residue{$number}++;
} else {
# We add the entry, and initialize to zero
$count_residue{$number} = 0;
}
# The above if/else initializes new numbers (seen once) to zero.
# If you want to count starting with one, replace the whole if/else by
# $count_residue{$number}++;
# print out all registered residues in numerically ascending order.
# If you want to sort them by their count, descending, then use
# sort { $count_residue{$b} <=> $count_residue{$a} } ...
for my $num (sort {$a <=> $b} keys %count_residue) {
say $num, ("-" x 7), $count_residue{$num};
}
}
It took me a while to chance down all the various errors. As others have said, use use warnings; and use strict;
Rule #1: Whenever you see syntax error pointing to a perfectly good line, you should always see if the line before is missing a semicolon. You forgot the semicolon after $flag=0.
In order to track down all the issues, I've rewritten your code into a more modern syntax:
#! /usr/bin/env perl
use strict;
use warnings;
use autodie;
print "Enter file name......\n\n";
chomp (my $samplefile = <STDIN>);
open my $input_file, '<:crlf', $samplefile;
my #residue_name;
my #residue_count;
while ( my $newline = <$input_file> ) {
chomp $newline;
next if $newline !~ /^ATOM/; #Eliminates the internal `if`
my #columns = split //, $newline;
my $res = join '', $columns[17], $columns[18], $columns[19];
my $flag = 0;
for my $i (0..$#residue_name) {
if ( $residue_name[$i] == $res ) {
$residue_count[$i]++;
$flag = 1;
}
}
if ( $flag == 0 ) {
push #residue_name, $res;
}
for my $i (0..$#residue_name) {
print "$residue_name[$i] ------- $residue_count[$i]\n";
}
}
close $input_file;
Here's a list of changes:
Lines 2 & 3: Always use use strict; and use warnings;. These will help you track down about 90% of your program errors.
Line 4: Use use autodie;. This will eliminate the need for checking whether a file opened or not.
Line 7 (and others): Using use strict; requires you to predeclare variables. Thus, you'll see my whenever a variable is first used.
Line 8: Use the three parameter open and use local variables for file handles instead of globs (i.e. $file_handle vs. FILE_HANDLE). The main reasons is that local variables are easier to pass into subroutines than globs.
Lines 9 & 10: No need to initialize the arrays, just declare them is enough.
Line 13: Always chomp as soon as you read in.
Line 14: Doing this eliminates an entire inner if statement that's embraces your entire while loop. Code blocks (such as if, while, and for) get hard to figure out when they get too long and too many embedded inside each other. Using next in this way allows me to eliminate the if block.
Line 17: Here's where you missed the semicolon which gave you your first syntax error. The main thing is I eliminated the very confusing splice command. If you want to zero out your array, you could have simply said #columns = (); which is much clearer. However, since #columns is now in scope only in the while loop, I no longer have to blank it out since it will be redefined for each line of your file.
Line 18: This is a much cleaner way of looping through all lines of your array. Note that $#residue_name gives you the last index of $#residue_name while scalar #resudue_name gives you the number of elements. This is a very important distinction! If I have an #array = (0, 1, 2, 3, 4), $#array will be 4, but scalar #array will be 5. Using the C style for loop can be a bit confusing when doing this. Should you use > or >=? Using (0..$#residue) name is obvious and eliminate the chance of errors which included the extra semi-colon inside your C style for statement. Because of the chance of errors and the complexity of the syntax, The developers who created Python have decided not allow for C style for loops.
Line 19 (and others): Using warnings pointed out that you did #residue_name[i] and it had several issues. First of all, you should use $residue_name[...] when indexing an array, and second of all, i is not an integer. You meant $i. Thus #residue_name[i] becomes $residue_name[$i].
Line 20: If you're incrementing a variable, use $foo++; or $foo += 1; and not $foo = $foo + 1;. The first two make it easier to see that you're incrementing a variable and not recalculating it's value.
Line 29: One of the great features of Perl is that variables can be interpolated inside quotes. You can put everything inside a single set of quotes. By the way, you should use . and not , if you do break up a print statement into multiple pieces. The , is a list operation. This means that what you print out is dependent upon the value of $,. The $, is a Perl variable that says what to print out between each item of a list when you interpolate a list into a string.
Please don't take this as criticism of your coding abilities. Many Perl books that teach Perl, and many course that teach Perl seem to teach Perl as it was back in the Perl 3.0 days. When I first learned Perl, it was at Perl 3.0, and much of my syntax would have looked like yours. However, Perl 5.x has been out for quite a while and contains many features that made programming easier and cleaner to read.
It took me a while to get out of Perl 3.0 habits and into Perl 4.0 and later Perl 5.0 habits. You learn by looking at what others do, and asking questions on forums like Stack Overflow.
I still can't say your code will work. I don't have your input, so I can't test it against that. However, by using this code as the basis of your program, debugging these errors should be pretty easy.

Perl need the right grep operator to match value of variable

I want to see if I have repeated items in my array, there are over 16.000 so will automate it
There may be other ways but I started with this and, well, would like to finish it unless there is a straightforward command. What I am doing is shifting and pushing from one array into another and this way, check the destination array to see if it is "in array" (like there is such a command in PHP).
So, I got this sub routine and it works with literals, but it doesn't with variables. It is because of the 'eq' or whatever I should need. The 'sourcefile' will contain one or more of the words of the destination array.
// Here I just fetch my file
$listamails = <STDIN>;
# Remove the newlines filename
chomp $listamails;
# open the file, or exit
unless ( open(MAILS, $listamails) ) {
print "Cannot open file \"$listamails\"\n\n";
exit;
}
# Read the list of mails from the file, and store it
# into the array variable #sourcefile
#sourcefile = <MAILS>;
# Close the handle - we've read all the data into #sourcefile now.
close MAILS;
my #destination = ('hi', 'bye');
sub in_array
{
my ($destination,$search_for) = #_;
return grep {$search_for eq $_} #$destination;
}
for($i = 0; $i <=100; $i ++)
{
$elemento = shift #sourcefile;
if(in_array(\#destination, $elemento))
{
print "it is";
}
else
{
print "it aint there";
}
}
Well, if instead of including the $elemento in there I put a 'hi' it does work and also I have printed the value of $elemento which is also 'hi', but when I put the variable, it does not work, and that is because of the 'eq', but I don't know what else to put. If I put == it complains that 'hi' is not a numeric value.
When you want distinct values think hash.
my %seen;
#seen{ #array } = ();
if (keys %seen == #array) {
print "\#array has no duplicate values\n";
}
It's not clear what you want. If your first sentence is the only one that matters ("I want to see if I have repeated items in my array"), then you could use:
my %seen;
if (grep ++$seen{$_} >= 2, #array) {
say "Has duplicates";
}
You said you have a large array, so it might be faster to stop as soon as you find a duplicate.
my %seen;
for (#array) {
if (++$seen{$_} == 2) {
say "Has duplicates";
last;
}
}
By the way, when looking for duplicates in a large number of items, it's much faster to use a strategy based on sorting. After sorting the items, all duplicates will be right next to each other, so to tell if something is a duplicate, all you have to do is compare it with the previous one:
#sorted = sort #sourcefile;
for (my $i = 1; $i < #sorted; ++$i) { # Start at 1 because we'll check the previous one
print "$sorted[$i] is a duplicate!\n" if $sorted[$i] eq $sorted[$i - 1];
}
This will print multiple dupe messages if there are multiple dupes, but you can clean it up.
As eugene y said, hashes are definitely the way to go here. Here's a direct translation of the code you posted to a hash-based method (with a little more Perlishness added along the way):
my #destination = ('hi', 'bye');
my %in_array = map { $_ => 1 } #destination;
for my $i (0 .. 100) {
$elemento = shift #sourcefile;
if(exists $in_array{$elemento})
{
print "it is";
}
else
{
print "it aint there";
}
}
Also, if you mean to check all elements of #sourcefile (as opposed to testing the first 101 elements) against #destination, you should replace the for line with
while (#sourcefile) {
Also also, don't forget to chomp any values read from a file! Lines read from a file have a linebreak at the end of them (the \r\n or \n mentioned in comments on the initial question), which will cause both eq and hash lookups to report that otherwise-matching values are different. This is, most likely, the reason why your code is failing to work correctly in the first place and changing to use sort or hashes won't fix that. First chomp your input to make it work, then use sort or hashes to make it efficient.

Simple multi-dimensional array with loop in perl

I'm trying to use an array and a loop to print out the following (basically for each letter of the alphabet, print each letter of the alphabet after it and then move on to the next letter). I'm new to perl, anyone have any quick words of :
aa
ab
ac
ad
...
ba
bb
bc
bd
...
ca
cb
...
Currently I have this, but it only prints a single character alphabet...
#arr = ("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z");
$i = #arr;
while ($i)
{
print $arr[$i];
$i--;
}
Using the range operator and the ranges you want to target:
use strict;
use warnings;
my #elements = ("aa" .. "zz");
for my $combo (#elements)
{
print "$combo\n";
}
You can utilize the initial 2 letters till the ending 2 letters you want as ending and the for will take care of everything.
This really isn't multi-dimensional array work, if it were you'd be working with stuff like:
my #foo = (
[1,2,3],
[4,7,8,1,2,3],
[2,3],
);
This is really a very basic how do I make a nested loop that iterates over the same array. I'll bet this is homework.
So, I'll let you figure out the nesting bits, but give some help with Perl's loop operators.
!! for/foreach
for (the each is optional) is the real heavy hitter for looping in perl. Use it like so:
for my $var ( #array ) {
#do stuff with $var
}
Each element in #array will be aliased to the $var variable, and the block of code will be executed. The fact that we are aliasing, rather than copying means that if alter the value of $var, #array will be changed as well. The stuff between the parenthesis may be any expression. The expression will be evaluated in list context. So if you put a file handle in the parens, the entire file will be read into memory and processed.
You can also leave off naming the loop variable, and $_ will be used instead. In general, DO NOT DO THIS.
!! C-Style for
Every once in a while you need to keep track of indexes as you loop over an array. This is when a C style for loop comes in handy.
for( my $i=0; $i<#array; $i++ ) {
# do stuff with $array[$i]
}
!! While/Until
While and until operate with boolean loop conditions. That means that the loop will repeat as long as the appropriate boolean value if found for the condition ( TRUE for while, and FALSE for until). In addition to the obvious cases where you are looking for a particular condition, while is great for processing a file one line at a time.
while ( my $line = <$fh> ) {
# Do stuff with $line.
}
!! map
map is an amazingly useful bit of functional programming kung-fu. It is used to turn one list into another. You pass an anonymous code reference that is used to enact the transformation.
# Multiply all elements of #old by two and store them in #new.
my #new = map { $_ * 2 } #old;
So how do you solve your particular problem? There are many ways. Which is best depends on how you want to use the results. If you want to create a new array of the letter pairs, use map. If you are interested primarily in a side effect (say printing a variable) use for. If you need to work with really big lists that come from sort of interator (like lines from a filehandle) use while.
Here's a solution. I wouldn't turn it in to your professor until you understand how it works.
print map { my $letter=$_; map "$letter$_\n", "a".."z" } "a".."z";
Look at perldoc articles, perlsyn for info on the looping constructs, perlfunc for info on map and look at perlop for info on the range operator (..).
Good luck.
Use the range operator (..) for your initialization. The range operator basically grabs a range of values such as numbers or characters.
Then use a nested loop to go through the array one time per character for a total of 26^2 iterations.
Rather than a while loop I've used a foreach loop to go through each item in the array. You could also put 'a' .. 'z' instead of declared #arr as the argument to the foreach loop. The foreach loops below set $char or $char2 to each value in #arr in turn.
my #arr = ('a' .. 'z');
for my $char (#arr) {
for my $char2 (#arr) {
print "$char$char2\n";
}
}
If all you really want to do is print the 676 strings you describe, then:
#!/usr/bin/perl
use warnings;
use strict;
my $str = 'aa';
while (length $str < 3) {
print $str++, "\n";
}
But I smell an "XY problem"...

What does it mean when you try to print an array or hash using Perl and you get, Array(0xd3888)?

What does it mean when you try to print an array or hash and you see the following; Array(0xd3888) or HASH(0xd3978)?
EXAMPLE
CODE
my #data = (
['1_TEST','1_T','1_TESTER'],
['2_TEST','2_T','2_TESTER'],
['3_TEST','3_T','3_TESTER'],
['4_TEST','4_T','4_TESTER'],
['5_TEST','5_T','5_TESTER'],
['6_TEST','6_T','^_TESTER']
);
foreach my $line (#data) {
chomp($line);
#random = split(/\|/,$line);
print "".$random[0]."".$random[1]."".$random[2]."","\n";
}
RESULT
ARRAY(0xc1864)
ARRAY(0xd384c)
ARRAY(0xd3894)
ARRAY(0xd38d0)
ARRAY(0xd390c)
ARRAY(0xd3948)
It's hard to tell whether you meant it or not, but the reason why you're getting array references is because you're not printing what you think you are.
You started out right when iterating over the 'rows' of #data with:
foreach my $line (#data) { ... }
However, the next line is a no-go. It seems that you're confusing text strings with an array structure. Yes, each row contains strings, but Perl treats #data as an array, not a string.
split is used to convert strings to arrays. It doesn't operate on arrays! Same goes for chomp (with an irrelevant exception).
What you'll want to do is replace the contents of the foreach loop with the following:
foreach my $line (#data) {
print $line->[0].", ".$line->[1].", ".$line->[2]."\n";
}
You'll notice the -> notation, which is there for a reason. $line refers to an array. It is not an array itself. The -> arrows deference the array, allowing you access to individual elements of the array referenced by $line.
If you're not comfortable with the idea of deferencing with arrows (and most beginners usually aren't), you can create a temporary array as shown below and use that instead.
foreach my $line (#data) {
my #random = #{ $line };
print $random[0].", ".$random[1].", ".$random[2]."\n";
}
OUTPUT
1_TEST, 1_T, 1_TESTER
2_TEST, 2_T, 2_TESTER
3_TEST, 3_T, 3_TESTER
4_TEST, 4_T, 4_TESTER
5_TEST, 5_T, 5_TESTER
6_TEST, 6_T, ^_TESTER
A one-liner might go something like print "#$_\n" for #data; (which is a bit OTT), but if you want to just print the array to see what it looks like (say, for debugging purposes), I'd recommend using the Data::Dump module, which pretty-prints arrays and hashes for you without you having to worry about it too much.
Just put use Data::Dump 'dump'; at beginning of your script, and then dump #data;. As simple as that!
It means you do not have an array; you have a reference to an array.
Note that an array is specified with round brackets - as a list; when you use the square bracket notation, you are creating a reference to an array.
foreach my $line (#data)
{
my #array = #$line;
print "$array[0] - $array[1] - $array[2]\n";
}
Illustrating the difference:
my #data = (
['1_TEST','1_T','1_TESTER'],
['2_TEST','2_T','2_TESTER'],
['3_TEST','3_T','3_TESTER'],
['4_TEST','4_T','4_TESTER'],
['5_TEST','5_T','5_TESTER'],
['6_TEST','6_T','^_TESTER']
);
# Original print loop
foreach my $line (#data)
{
chomp($line);
#random = split(/\|/,$line);
print "".$random[0]."".$random[1]."".$random[2]."","\n";
}
# Revised print loop
foreach my $line (#data)
{
my #array = #$line;
print "$array[0] - $array[1] - $array[2]\n";
}
Output
ARRAY(0x62c0f8)
ARRAY(0x649db8)
ARRAY(0x649980)
ARRAY(0x649e48)
ARRAY(0x649ec0)
ARRAY(0x649f38)
1_TEST - 1_T - 1_TESTER
2_TEST - 2_T - 2_TESTER
3_TEST - 3_T - 3_TESTER
4_TEST - 4_T - 4_TESTER
5_TEST - 5_T - 5_TESTER
6_TEST - 6_T - ^_TESTER
You're printing a reference to the hash or array, rather than the contents OF that.
In the particular code you're describing I seem to recall that Perl automagically makes the foreach looping index variable (my $line in your code) into an "alias" (a sort of reference I guess) of the value at each stage through the loop.
So $line is a reference to #data[x] ... which is, at each iteration, some array. To get at one of the element of #data[0] you'd need the $ sigil (because the elements of the array at #data[0] are scalars). However $line[0] is a reference to some package/global variable that doesn't exist (use warnings; use strict; will tell you that, BTW).
[Edited after Ether pointed out my ignorance]
#data is a list of anonymous array references; each of which contains a list of scalars. Thus you have to use the sort of explicit de-referencing I describe below:
What you need is something more like:
print ${$line}[0], ${$line}[1], ${$line}[2], "\n";
... notice that the ${xxx}[0] is ensuring that the xxx is derefenced, then indexing is performed on the result of the dereference, which is then extracted as a scalar.
I testing this as well:
print $$line[0], $$line[1], $$line[2], "\n";
... and it seems to work. (However, I think that the first form is more clear, even if it's more verbose).
Personally I chalk this up to yet another gotchya in Perl.
[Further editorializing] I still count this as a "gotchya." Stuff like this, and the fact that most of the responses to this question have been technically correct while utterly failing to show any effort to actually help the original poster, has once again reminded me why I shifted to Python so many years ago. The code I posted works, of course, and probably accomplishes what the OP was attempting. My explanation was wholly wrong. I saw the word "alias" in the `perlsyn` man page and remembered that there were some funky semantics somewhere there; so I totally missed the part that [...] is creating an anonymous reference. Unless you drink from the Perl Kool-Aid in deep drafts then even the simplest code cannot be explained.