Perl Global variable uninitialized - perl

I'm new to perl so please bear with me.
I have script that is parsing a CSV file. To make things easier to debug I am using a state machine FSA::Rules (works great love it).
Every thing is going well only now I need to make my logs make sense, as part of this I need to record line numbers so my program looks some thing like this.
my $line = '';
my $lineCount = 0;
sub do {
...
#CSV opened
...
#State machine stuff happens here
readLine;
if ($line =~ m/.*Pattern*/){
#do stuff
}
}
sub readLine{
$line = <CSV>;
$lineCount ++;
}
But I get the following error
Use of uninitialized value $line in pattern match (m//) at
Any one know why $line would not be initialized?
Thanks.

When you reach end of file, $line = <CSV> will assign the undefined value to $line. The usual idiom is to check whether the readline function (which is implicitly called by the <> operator) returned a good value or not before proceeding ...
while (my $line = <CSV>) {
# guaranteed that $line has a defined value
...
}
but you with your sequence of calls, you are avoiding that check. Your current code also increments $lineCount even when <CSV> does not return a good value, which may not be what you want either.

Related

Read perl file handle with $INPUT_RECORD_SEPARATOR as a regex

I'm looking for a way to read from a file handle line by line (and then execute a function on each line) with the following twist: what I want to treat as a "line" shall be terminated by varying characters and not just a single character that I define as $/. I now that $INPUT_RECORD_SEPARATOR or $/ do not support regular expressions or passing a list of characters to be treated as line terminators and this is where my problem lies.
My file handle comes from stdout of a process. Thus, I cannot seek inside the file handle and the full content is not available immediately but is produced bit by bit as the process is executed. I want to be able to attach things like a timestamp to each "line" the process produces using a function that I called handler in my examples. Each line should be handled as soon as it gets produced by the program.
Unfortunately, I can only come up with a way that either executes the handler function immediately but seems horribly inefficient or a way that uses a buffer but will only lead to "grouped" calls of the handler function and thus, for example, produce wrong timestamps.
In fact, in my specific case, my regex would even be very simple and just read /\n|\r/. So for this particular problem I don't even need full regex support but just the possibility to treat more than one character as the line terminator. But $/ doesn't support this.
Is there an efficient way to solve this problem in Perl?
Here is some quick pseudo-perl code to demonstrate my two approaches:
read the input file handle byte-by-byte
This would look like this:
my $acc = "";
while (read($fd, my $b, 1)) {
$acc .= $b;
if ($acc =~ /someregex$/) {
handler($acc);
$acc = "";
}
}
The advantage here is, that handler gets immediately dispatched once enough bytes are read. The disadvantage is, that we do string appending and check the regex for every single byte we read from $fd.
read the input file handle with blocks of X-byte at a time
This would look like this:
my $acc = "";
while (read($fd, my $b, $bufsize)) {
if ($b =~ /someregex/) {
my #parts = split /someregex/, $b;
# for brevity lets assume we always get more than 2 parts...
my $first = shift #parts;
handler(acc . $first);
my $last = pop #parts;
foreach my $part (#parts) {
handler($part);
}
$acc = $last;
}
}
The advantage here is, that we are more efficient as we only check every $bufsize bytes. The disadvantage is, that the execution of handler has to wait until $bufsize bytes have been read.
Setting $INPUT_RECORD_SEPARATOR to a regex wouldn't help, because Perl's readline uses buffered IO, too. The trick is to use your second approach but with unbuffered sysread instead of read. If you sysread from a pipe, the call will return as soon as data is available, even if the whole buffer couldn't be filled (at least on Unix).
The suggestion by nwellnhof allowed me to implement a solution to this problem:
my $acc = "";
while (1) {
my $ret = sysread($fh, my $buf, 1000);
if ($ret == 0) {
last;
}
# we split with a capturing group so that we also retain which line
# terminator was used
# a negative limit is used to also produce trailing empty fields if
# required
my #parts = split /(\r|\n)/, $buf, -1;
my $numparts = scalar #parts;
if ($numparts == 1) {
# line terminator was not found
$acc .= $buf;
} elsif ($numparts >= 3) {
# first match needs special treatment as it needs to be
# concatenated with $acc
my $first = shift #parts;
my $term = shift #parts;
handler($acc . $first . $term);
my $last = pop #parts;
for (my $i = 0; $i < $numparts - 3; $i+=2) {
handler($parts[$i] . $parts[$i+1]);
}
# the last part is put into the accumulator. This might
# just be the empty string if $buf ended in a line
# terminator
$acc = $last;
}
}
# if the output didn't end with a linebreak, handle the rest
if ($acc ne "") {
handler($acc);
}
My tests show that indeed sysread will return even before having read 1000 characters if there is a pause in the input stream. The code above takes care to concatenate multiple messages of length 1000 and split messages with a lesser length or multiple terminators correctly.
Please shout if you see any bug in above code.

Nested Loops in Perl - Failing logic

I'm trying to work out a bigger problem, but have simplified the issue for readability, ultimately the logic below is the reason the extended program is failing.
I am using Perl to search for a short sequence of letters within a larger sequence (protein sequences), and if it is not found, then I'd like to do some calculations. I don't know whether I'm going crazy, but I can't work out why this logic is failing.
sub calculateEpitopeMutations {
my #mutationArray;
my #epitopeArray;
my $count;
my $localEpitope;
open( EPITOPESIN2, $ARGV[5] ) or die "Unable to open file $ARGV[5]\n";
while ( my $line = <EPITOPESIN2> ) {
chomp $line;
push #epitopeArray, $line;
}
while ( my ( $key, $value ) = each our %sequencesForCalculation ) {
foreach ( #epitopeArray ) {
$localEpitope = $_;
if ( $value =~ /($localEpitope)/g ) {
print "$key\n$localEpitope\nexactly the same\n\n";
next;
}
else {
#This is where I'd like to do the further calculations
print "$key\n$localEpitope\nthere is a difference\n\n";
next;
}
}
}
}
$ARGV[5] is the name of a text file containing a list of 9-character sequences, exactly like the following
RVSENIQRF
SFQVDCFLW
The idea is to put these into array #epitopeArray and iterate through these, and compare them with all (currently just one) $value sequences in the hash %sequencesForCalculation.
%sequencesForCalculation is a hash, where $value is a long sequence of characters, like this
MDSNTMSSFQVDCFLWHIRKRFADNGLGDAPFLDRLRRDQKSLKGRGNTLGLDIETATLVGKQIVEWILKEESSETLRMTIASVPTSRYLSDMTLEEMSRDWFMLMPRQKKIGPLCVRLDQAVMEKNIVLKANFSVIFNRLETLILLRAFTEEEAIVGEISPLPSLPGHTYEDVKNAVGVLIGGLEWNGNTVRVSENIQRFAWRNCDENGRPSLPPEQK
Currently, the small 9-character long sequence $localEpitope is contained in the longer sequence $value so when I iterate through the program, I should get this printed every time.
($key contains a header of information about the protein sequences, but is irrelevant so I have shortened it to just the variable name.)
$key
RVSENIQRF
Exactly the same
$key
SFQVDCFLW
Exactly the same
$key
But instead I'm getting this
$key
RVSENIQRF
exactly the same
$key
SFQVDCFLW
there is a difference
$key
Any ideas? Please let me know if anything further is required.
Update
TL;DR: You should change $value =~ /($localEpitope)/g to $value =~ /$localEpitope/
Okay now that we know the real circumstances, the problem (as melpomene points out in his comment) is that you have the /g modifier on your pattern match. There's no reason for that; you don't want check how many times the substring appears, you just want to know whether it's there at all
The problem is that variables subjected to a /g pattern search keep a state that says where the last search ended. So you're searching for $epitopeArray[0] in the longer string and finding it, and then searching for $epitopeArray[1] from where the previous search terminated. The first substring appears after the second one, so only the first is found
For more information on this behaviour, take a look at the pos function which returns the current value of this state. For instance pos($value) will return the character offset where the next m//g will start its search
This short program demonstrates the problem. With the /g modifier only BBB is found. Remove it and both are found
use strict;
use warnings;
use 5.010;
my $long_s = 'xxxAAAxxxBBBxxx';
for my $substr ( qw/ BBB AAA / ) {
if ( $long_s =~ /$substr/g ) {
say "$substr okay";
}
else {
say "$substr nope";
}
}
output
BBB okay
AAA nope
Original
You say
Currently, the small 9-character long sequence ($localEpitope) IS contained in the longer sequence ($value), and so when I iterate through the program, I should get the following printed everytime
So $localEpitope is a substring of $value and you're saying that
$value =~ /($localEpitope)/g
evaluates to true
That is correct behaviour. $value =~ /$localEpitope/ will check whether $localEpitope can be found anywhere in $value
Unfortunately it's not clear enough from what you've written to suggest a solution

Perl while loops not working

I'm quite new to perl and apologies if this has already been answered in a previous discussion. I have a script that needs to use the declared variables outside the loops, but only one loop is working, even though I have declared the variables outside of the loop, the code is:
my $sample;
open(IN, 'ls /*_R1_*.gz |');
while (my $sample = <IN>) {
chomp $sample;
print "sample = $sample\n";
my $fastq1="${sample}"; #need to use fastq1 later on hence it's declared here
my $sample2;
open(IN, 'ls /*_R2_*.gz |');
while (my $sample2 = <IN>) {
chomp $sample2;
print "sample2 = $sample2\n";
my $fastq2="${sample2}"; #need to use fastq2 later on hence it's declared here
}
}
Sample2 works but sample1 does not, only the first sample is output and then the loop goes onto sample2, the output is:
sample =/sample1_R1_001.fastq.gz
sample2 =/sample1_R2_001.fastq.gz
sample2 =/sample2_R2_001.fastq.gz
sample2 =/sample3_R2_001.fastq.gz
etc..
Can anyone figure this out?
Thanks
From your comments, I assume that your problem is probably that you declare $fastq1 and $fastq2 inside the loop. That means they will be out of scope outside the loops, and not accessible. You need something like:
my ($fastq1, $fastq2);
while ( ... ) {
....
$fastq1 = $sample;
}
Note that this will only save the last value in the loop of that variable. The others will of course be overwritten each loop iteration. If you have more values to save, use an array or hash.
Some other notes on your code.
You should always use
use strict;
use warnings;
Not doing so is a very bad idea, as it will only hide the errors and warnings, not solve them.
my $sample;
You declare this variable twice.
open(IN, 'ls /*_R1_*.gz |');
This is just bad on all possible levels:
System calls are always the least desirable option, unless no alternatives exist
Perl has many ways of reading file names
Parsing the output of ls is fragile and not portable
Piping the result of the system command through open is compounding the other flaws with this approach.
Recommended solution: Use either opendir + readdir or glob:
for my $files (</*_R1_*.gz>) { ... }
# or
opendir my $dh, "/" or die $!;
while (my file = readdir $dh) {
next unless $file =~ /_R1_.*\.gz$/;
...
}
my $fastq1 = "${sample}";
You do not need to quote a variable. Nor use support curly braces.
When declaring the variable with my inside a loop, it only retains its value that single loop iteration. Since you never use this variable, I assume you meant to use it outside the loop. But it will be out of scope there.
This can be written
my $fastq1 = $sample;
But you probably want to declare those variables outside your while loops, or they will be out of scope there. You should know that this will only save the last value for these variables, of course.
Also, as Rohit says, your loops are nested, which I assume is not what you wanted. This is most likely because you do not use a proper text editor to write your code, so your indentation is all messed up, and it is hard to see where one loop ends. Follow Rohit's advice there.
You are closing the first while loop after the end of 2nd while loop. Because of that, your 2nd while loop become a part of your 1st while loop, wherein, you are re-assigning the file handler - IN to a different file. And since you are exhausting it in the inner while loop, your outer while loop never run again.
You should close the brace before starting the next while:
while(my $sample = <IN>){
chomp $sample;
print "sample = $sample\n";
my $fastq1="${sample}";
} # You need this
my $sample2;
open(IN, 'ls /data_n2/vmistry/Fluidigm_Exome/300bp_fastq/*_R2_*.gz |');
while(my $sample2 = <IN>){
chomp $sample2;
print "sample2 = $sample2\n";
my $fastq2="${sample2}";
}
# } # Remove this

Perl comparison operation between a variable and an element of an array

I am having quite a bit of trouble with a Perl script I am writing. I want to compare an element of an array to a variable I have to see if they are true. For some reason I cannot seem to get the comparison operation to work correctly. It will either evaluate at true all the time (even when outputting both strings clearly shows they are not the same), or it will always be false and never evaluate (even if they are the same). I have found an example of just this kind of comparison operation on another website, but when I use it it doesn't work. Am I missing something? Is the variable type I take from the file not a string? (Can't be an integer as far as I can tell as it is an IP address).
$ipaddress = '192.43.2.130'
if ($address[0] == ' ')
{
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
#address = <FH>;
close(FH);
print $address[0];
print $address[1];
}
for ($i = 0; $i < #address; $i++)
{
print "hello";
if ($address[$i] eq $ipaddress)
{print $address[$i];
$file = "server_$i";
print "I got here first";
goto SENDING;}
}
SENDING:
print " I am here";
I am pretty weak in Perl, so forgive me for any rookie mistakes/assumptions I may have made in my very meager bit of code. Thank you for you time.
if ($address[0] == ' ')
{
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
#address = <FH>;
close(FH);
You have several issues with this code here. First you should use strict because it would tell you that #address is being used before it's defined and you're also using numeric comparison on a string.
Secondly you aren't creating an array of the address in the file. You need to loop through the lines of the file to add each address:
my #address = ();
while( my $addr = <FH> ) {
chomp($addr); # removes the newline character
push(#address, $addr);
}
However you really don't need to push into an array at all. Just loop through the file and find the IP. Also don't use goto. That's what last is for.
while( my $addr = <FH> ) {
chomp($addr);
if( $addr eq $ipaddress ) {
$file = "server_$i";
print $addr,"\n";
print "I got here first"; # not sure what this means
last; # breaks out of the loop
}
}
When you're reading in from a file like that, you should use chomp() when doing a comparison with that line. When you do:
print $address[0];
print $address[1];
The output is on two separate lines, even though you haven't explicitly printed a newline. That's because $address[$i] contains a newline at the end. chomp removes this.
if ($address[$i] eq $ipaddress)
could read
my $currentIP = $address[$i];
chomp($currentIP);
if ($currentIP eq $ipaddress)
Once you're familiar with chomp, you could even use:
chomp(my $currentIP = $address[$i]);
if ($currentIP eq $ipaddress)
Also, please replace the goto with a last statement. That's perl's equivalent of C's break.
Also, from your comment on Jack's answer:
Here's some code you can use for finding how long it's been since a file was modified:
my $secondsSinceUpdate = time() - stat('filename.txt')->mtime;
You probably are having an issue with newlines. Try using chomp($address[$i]).
First of all, please don't use goto. Every time you use goto, the baby Jesus cries while killing a kitten.
Secondly, your code is a bit confusing in that you seem to be populating #address after starting the if($address[0] == '') statement (not to mention that that if should be if($address[0] eq '')).
If you're trying to compare each element of #address with $ipaddress for equality, you can do something like the following
Note: This code assumes that you've populated #address.
my $num_matches=0;
foreach(#address)
{
$num_matches++ if $_ eq $ipaddress;
}
if($num_matches)
{
#You've got a match! Do something.
}
else
{
#You don't have any matches. This may or may not be bad. Do something else.
}
Alternatively, you can use the grep operator to get any and all matches from #address:
my #matches=grep{$_ eq $ipaddress}#address;
if(#matches)
{
#You've got matches.
}
else
{
#Sorry, no matches.
}
Finally, if you're using a version of Perl that is 5.10 or higher, you can use the smart match operator (ie ~~):
if($ipaddress~~#address)
{
#You've got a match!
}
else
{
#Nope, no matches.
}
When you read from a file like that you include the end-of-line character (generally \n) in each element. Use chomp #address; to get rid of it.
Also, use last; to exit the loop; goto is practically never needed.
Here's a rather idiomatic rewrite of your code. I'm excluding some of your logic that you might need, but isn't clear why:
$ipaddress = '192.43.2.130'
open (FH, "serverips.txt") or die "Crossroads could not find a list of backend servers";
while (<FH>) { # loop over the file, using the default input space
chomp; # remove end-of-line
last if ($_ eq $ipaddress); # a RE could easily be used here also, but keep the exact match
}
close(FH);
$file = "server_$."; # $. is the line number - it's not necessary to keep track yourself
print "The file is $file\n";
Some people dislike using perl's implicit variables (like $_ and $.) but they're not that hard to keep track of. perldoc perlvar lists all these variables and explains their usage.
Regarding the exact match vs. "RE" (regular expression, or regexp - see perldoc perlre for lots of gory details) -- the syntax for testing a RE against the default input space ($_) is very simple. Instead of
last if ($_ eq $ipaddress);
you could use
last if (/$ipaddress/);
Although treating an ip address as a regular expression (where . has a special meaning) is probably not a good idea.

Can't make sense out of this Perl code

This snippet basically reads a file line by line, which looks something like:
Album=In Between Dreams
Interpret=Jack Johnson
Titel=Better Together
Titel=Never Know
Titel=Banana Pancakes
Album=Pictures
Interpret=Katie Melua
Titel=Mary Pickford
Titel=It's All in My Head
Titel=If the Lights Go Out
Album=All the Lost Souls
Interpret=James Blunt
Titel=1973
Titel=One of the Brightest Stars
So it somehow connects the "Interpreter" with an album and this album with a list of titles. But what I don't quite get is how:
while ($line = <IN>) {
chomp $line;
if ($line =~ /=/) {
($name, $wert) = split(/=/, $line);
}
else {
next;
}
if ($name eq "Album") {
$album = $wert;
}
if ($name eq "Interpret") {
$interpret = $wert;
$cd{$interpret}{album} = $album; // assigns an album to an interpreter?
$titelnummer = 0;
}
if ($name eq "Titel") {
$cd{$interpret}{titel}[$titelnummer++] = $wert; // assigns titles to an interpreter - WTF? how can this work?
}
}
The while loop keeps running and putting the current line into $line as long as there are new lines in the file handle <IN>. chomp removes the newline at the end of every row.
split splits the line into two parts on the equal sign (/=/ is a regular expression) and puts the first part in $name and the second part in $wert.
%cd is a hash that contains references to other hashes. The first "level" is the name of interpreter.
(Please ask more specific questions if you still do not understand.)
cd is a hash of hashes.
$cd{$interpret}{album} contains album for interpreter.
$cd{$interpret}{titel} contains an array of Titel, which is filled incrementally in the last if.
Perl is a very concise language.
The best way to figure out what's going on is to inspect the data structure. After the while loop, temporarily insert this code:
use Data::Dumper;
print '%cd ', Dumper \%cd;
exit;
This may have a large output if the input is large.