How to make initialized value with in #array in regrep compilation - perl

i try to run this code lines,
#checkpoint = split (/\s+/,$array_lcp[0]);
$i=scalar #checkpoint
print NAME " $checkpoint[0] ";
for ($k=0; $k<=i; $k++)
{
if ($array_ARGVTEMP[$d] =~ m/$checkpoint[$k]/i)
{
#array = split (/\s+/,$array_ARGVTEMP[$d]);
print NAME " $checkpoint[$k]| $k||
$checkpoint[0]||| $checkpoint[1] ||||$checkpoint[2]||||| ";
} }
but in the result warnings, it said that:
"Use of uninitialized value within #checkpoint in regexp compilation at new3.pl line 64 (#2)" and line 64 is " if($array_ARGVTEMP[$d] =~ m/$checkpoint[$k]/i) "
Please help me, Thank you

#checkpoint has $i elements numbered 0 to $i-1, but you're accessing the element at index $i.
for (my $k=0; $k<=$i; $k++)
should be
for (my $k=0; $k<$i; $k++)
Actually, it should be
for my $k (0..$#checkpoints)
Actually, it should be
for my $checkpoint (#checkpoints)

Update The question changed. Originally the loop went to $k<=2 (array size wasn't mentioned), what this answer addressed. The main point remains and now it is clear that the loop goes up to the index equal to the array size, so one past the fence; the limit should be $k < $i. Thus "value of $k for which there is no element" mentioned below is the last one looped over.
The $checkpoint[$k], that draws the warning in the regex, are elements of the array #checkpoint with indices 0, 1, 2 -- what $k is in the loop.
The "uninitialized value within..." means that the array #checkpoint doesn't actually have all those elements, so for a value of $k for which there is no element the regex attempts to retrieve an undefined value and complains.
The first split likely returned fewer than three elements. Print out the #checkpoint to see.
A few more comments
Please always have use warnigs; and use strict; at the beginning of a program
Use lexical filehandles, so open files with open my $name, ... (not open NAME, ...)
To loop over numbers from a range a nice way is
for my $k (0..2) { ... }
(update) ... but the question changed, with the loop (intended to be) over all array elements and then there is no reason to use the index. Iterate directly over elements
foreach my $checkpoint (#checkpoints) { .... }
Whenever you use \s+ for the separator pattern in split most likely you should be using a special pattern ' ', which splits by \s+ and also disregards leading and trailing space.

Related

Handling with two warnings of #ARGS

Small debug question I can't solve for some reason. Consider the following code:
use warnings;
my $flag = 0;
foreach my $i (0..scalar(#ARGV)) {
$data{$OPTION} .= $ARGV[$i]." " if($flag);
$flag = 1 if($ARGV[$i] =~ /$OPTION/);
undef $ARGV[$i] if($flag);
}
I get the following two warnings:
Use of uninitialized value within #ARGV in concatenation (.) or string at line 4
Use of uninitialized value in pattern match (m//) at line 5
I get the reason is that I undefine some value of #ARGV and then it tries to check it.
The way I do it like this is because I would like to 'cut' some of the data of #ARGV before using GetOpt module (which uses this array).
How to solve it?
Let's expand on those comments a bit.
Imagine #ARGV contains four elements. They will have the indexes 0, 1, 2 and 3 (as arrays in Perl are zero-based).
And your loop looks like this:
foreach my $i (0..scalar(#ARGV)) {
You want to visit each element in #ARGV, so you use the range operator (..) to generate a list of all those indexes. But scalar #ARGV returns the number of elements in #ARGV and that's 4. So your range is 0 .. 4. And there's no value at $ARGV[4] - so you get an "undefined value" warning (as you're trying to read past the end of an array).
A better way to do this is to use $#ARGV instead of scalar #ARGV. For every array variable in Perl (say #foo) you also get a variable (called $#foo) which contains the last index number in the array. In our case, that's 3 and your range (0 .. $#ARGV) now contains the integers 0 .. 3 and you no longer try to read past the end of the array and you don't get the "undefined value" warnings.
There's one other improvement I would suggest. Inside your loop, you only ever use $i to access an element from #ARGV. It's only used in expressions like $ARGV[$i]. In this case, it's probably better to skip the middle man and to iterate across the elements in the array, not the indexes.
I mean you can write your code like this:
foreach my $arg (#ARGV) {
$data{$OPTION} .= $arg . " " if($flag);
$flag = 1 if($arg =~ /$OPTION/);
undef $arg if($flag);
}
I think that's a little easier to follow.

In Perl, how can I tell split not to strip empty trailing fields?

Was trying to count the number of lines in a string of text (including empty lines). A little surprised by the behavior of split. Had expected the following to output 2 but it printed 1 on my perl 5.14.2.
$str = "hello\
world\n\n";
#a = split(/\n/, $str);
print $#a, "\n";
Seems that split() is insensitive to consecutive \n (add more \n's at the end of the string will not increase the printout). The only I can get it sort of close to giving the number of lines is
$str = "hello\
world\n\n";
#a = split(/(\n)/, $str);
printf "%d\n", ($#a + 1)/2, "\n";
But it looks more like a workaround than a straight solution. Any ideas?
perldoc -f split:
If LIMIT is negative, it is treated as if it were instead
arbitrarily large; as many fields as possible are produced.
If LIMIT is omitted (or, equivalently, zero), then it is usually
treated as if it were instead negative but with the exception that
trailing empty fields are stripped (empty leading fields are
always preserved); if all fields are empty, then all fields are
considered to be trailing (and are thus stripped in this case).
$ perl -E 'my $x = "1\n2\n\n"; my #x = split /\n/, $x, -1; say $#x'
3
Perhaps the problem is that you are using $#a when scalar #a is what you are actually looking for?
I apologize if you are already aware of this or if this is not the issue, but $#a returns the index of the last element of #a and (scalar #a) returns the number of elements that #a contains. Since array indexing starts at 0, $#a is one less than scalar #a.

Perl significance of $#_ variable

I see when I loop through elements of an array, and test $#_ , I get -1 for each element. I am hoping someone can explain what this variable does, and what it is used for most often.
Just like $#foo is the last existing index of array #foo, $#_ is the last existing index of array #_. If #_ is empty, $#_ is -1.
It sounds like you mean to use $_. $_ is aliased by foreach, map and grep loops to the element current being processed. while (<>) also sets $_ (as it gets rewritten to while (defined($_ = <>))). As a result, $_ is used as the default argument by many builtins (e.g. say).
# Print each element on its own line
say for #a;
is short for
# Print each element on its own line
say $_ for #a;
which is the terse form of
# Print each element on its own line
for my $ele (#a) {
say $ele;
}
I believe you mean $_ which is a special variable in Perl. It holds the current value while looping through a list element. For instance, below will print out each element of #foo, one at a time.
foreach (#foo) {
print $_;
}

PERL -- Regex incl all hash keys (sorted) + deleting empty fields from $_ in file read

I'm working on a program and I have a couple of questions, hope you can help:
First I need to access a file and retrieve specific information according to an index that is obtained from a previous step, in which the indexes to retrieve are found and store in a hash.
I've been looking for a way to include all array elements in a regex that I can use in the file search, but I havenĀ“t been able to make it work. Eventually i've found a way that works:
my #atoms = ();
my $natoms=0;
foreach my $atomi (keys %{$atome}){
push (#atoms,$atomi);
$natoms++;
}
#atoms = sort {$b cmp $a} #atoms;
and then I use it as a regex this way:
while (<IN_LIG>){
if (!$natoms) {last;}
......
if ($_ =~ m/^\s*$atoms[$natoms-1]\s+/){
$natoms--;
.....
}
Is there any way to create a regex expression that would include all hash keys? They are numeric and must be sorted. The keys refer to the line index in IN_LIG, whose content is something like this:
8 C5 9.9153 2.3814 -8.6988 C.ar 1 MLK -0.1500
The key is to be found in column 0 (8). I have added ^ and \s+ to make sure it refers only to the first column.
My second problem is that sometimes input files are not always identical and they make contain white spaces before the index, so when I create an array from $_ I get column0 = " " instead of column0=8
I don't understand why this "empty column" is not eliminated on the split command and I'm having some trouble to remove it. This is what I have done:
#info = split (/[\s]+/,$_);
if ($info[0] eq " ") {splice (#info, 0,1);} # also tried $info[0] =~ m/\s+/
and when I print the array #info I get this:
Array:
Array: 8
Array: C5
Array: 9.9153
Array: 2.3814
.....
How can I get rid of the empty column?
Many thanks for your help
Merche
There is a special form of split where it will remove both leading and trailing spaces. It looks like this, try it:
my $line = ' begins with spaces and ends with spaces ';
my #tokens = split ' ', $line;
# This prints |begins:with:spaces:and:ends:with:spaces|
print "|", join(':', #tokens), "|\n";
See the documentation for split at http://p3rl.org/split (or with perldoc split)
Also, the first part of your program might be simpler as:
my #atoms = sort {$b cmp $a} keys %$atome;
my $natoms = #atoms;
But, what is your ultimate goal with the atoms? If you simply want to verify that the atoms you're given are indeed in the file, then you don't need to sort them, nor to count them:
my #atoms = keys %$atome;
while (<IN_LIG>){
# The atom ID on this line
my ($atom_id) = split ' ';
# Is this atom ID in the array of atom IDs that we are looking for
if (grep { /$atom_id/ } #atoms) {
# This line of the file has an atom that was in the array: $atom_id
}
}
Lets warm up by refining and correcting some of your code:
# If these are all numbers, do a numerical sort: <=> not cmp
my #atoms = ( sort { $b <=> $a } keys %{$atome} );
my $natoms = scalar #atoms;
No need to loop through the keys, you can insert them into the array right away. You can also sort them right away, and if they are numbers, the sort must be numerical, otherwise you will get a sort like: 1, 11, 111, 2, 22, 222, ...
$natoms can be assigned directly by the count of values in #atoms.
while(<IN_LIG>) {
last unless $natoms;
my $key = (split)[0]; # split splits on whitespace and $_ by default
$natoms-- if ($key == $atoms[$natoms - 1]);
}
I'm not quite sure what you are doing here, and if it is the best way, but this code should work, whereas your regex would not. Inside a regex, [] are meta characters. Split by default splits $_ on whitespace, so you need not be explicit about that. This split will also definitely remove all whitespace. Your empty field is most likely an empty string, '', and not a space ' '.
The best way to compare two numbers is not by a regex, but with the equality operator ==.
Your empty field should be gone by splitting on whitespace. The default for split is split ' '.
Also, if you are not already doing it, you should use:
use strict;
use warnings;
It will save you a lot of headaches.
for your second question you could use this line:
#info = $_ =~ m{^\s*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)}xms;
in order to capture 9 items from each line (assuming they do not contain whitespace).
The first question I do not understand.
Update: I would read alle the lines of the file and use them in a hash with $info[0] as the key and [#info[1..8]] as the value. Then you can lookup the entries by your index.
my %details;
while (<IN_LIG>) {
#info = $_ =~ m{^\s*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)}xms;
$details{ $info[0] } = [ #info[1..$#info] ];
}
Later you can lookup details for the indices you are interested in and process as needed. This assumes the index is unique (has the property of keys).
thanks for all your replies. I tried the split form with ' ' and it saved me several lines of code. thanks!
As for the regex, I found something that could make all keys as part of the string expression with join and quotemeta, but I couldn't make it work. Nevertheless I found an alternative that works, but I liked the join/quotemeta solution better
The atom indexes are obtained from a text file according to some energy threshold. Later, in the IN_LIG loop, I need to access the molecule file to obtain more information about the atoms selected, thus I use the atom "index" in the molecule to identify which lines of the file I have to read and process. This is a subroutine to which I send a hash with the atom index and some other information.
I tried this for the regex:
my $strings = join "|" map quotemeta,
sort { $hash->{$b} <=> $hash->{$a}} keys %($hash);
but I did something wrong cos it wouldn't take all keys

What perl code samples can lead to undefined behaviour?

These are the ones I'm aware of:
The behaviour of a "my" statement modified with a statement modifier conditional or loop construct (e.g. "my $x if ...").
Modifying a variable twice in the same statement, like $i = $i++;
sort() in scalar context
truncate(), when LENGTH is greater than the length of the file
Using 32-bit integers, "1 << 32" is undefined. Shifting by a negative number of bits is also undefined.
Non-scalar assignment to "state" variables, e.g. state #a = (1..3).
One that is easy to trip over is prematurely breaking out of a loop while iterating through a hash with each.
#!/usr/bin/perl
use strict;
use warnings;
my %name_to_num = ( one => 1, two => 2, three => 3 );
find_name(2); # works the first time
find_name(2); # but fails this time
exit;
sub find_name {
my($target) = #_;
while( my($name, $num) = each %name_to_num ) {
if($num == $target) {
print "The number $target is called '$name'\n";
return;
}
}
print "Unable to find a name for $target\n";
}
Output:
The number 2 is called 'two'
Unable to find a name for 2
This is obviously a silly example, but the point still stands - when iterating through a hash with each you should either never last or return out of the loop; or you should reset the iterator (with keys %hash) before each search.
These are just variations on the theme of modifying a structure that is being iterated over:
map, grep and sort where the code reference modifies the list of items to sort.
Another issue with sort arises where the code reference is not idempotent (in the comp sci sense)--sort_func($a, $b) must always return the same value for any given $a and $b.