Perl Use of uninitialized value in regexp compilation at warning - perl

I have pasted small snippet of code below:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $start_data;
my $name = "Data_abc";
while(<DATA>){
my $line = $_;
if ($line =~ /^Start:\s+/){
my ($st, $data) = split(/\s+/,$line);
$start_data = $data;
}
for( $name ){
/^$start_data/ and do { next; }
}
print "END of execution\n";
}
print $start_data;
__DATA__
===============================
2020-05-20 Name
===============================
Start: Data_abc
Load: Load_data
Script is working as expected but it throws up warning -
Use of uninitialized value $start_data in regexp compilation at storage_problem.pl line 18,
Since I have already declared $start_data at the beginning, why this warning it shows?

$start_data was declared, but you did not assign it a value before you tried to read it in the regex. Therefore, it is undefined.
When I run your code, I get 3 warning messages, corresponding to your 1st 3 lines of DATA. Those 3 lines do not match your regex (they don't start with Start:).
Since you did not initialize $start_data with a value, you get the uninitialized warning.
Once the 4th line is read, you stop getting the warnings because $start_data is assigned a value (Data_abc).

The code provided by OP declares $start_data but does not initialize it.
On read of first line this $start_data checked in /^$start_data/ regular expression which is equivalent to /undef/ what causes following message
Use of uninitialized value $start_data in regexp compilation at storage_problem.pl line 18,
Perhaps the code should be written us following
use strict;
use warnings;
use feature 'say';
use autodie;
my $start_data;
my $name = "Data_abc";
while(<DATA>){
next unless /$name/;
$start_data = $1 if /^Start:\s+(\w+)/;
}
say 'END of execution';
say "Start: $start_data" if defined $start_data;
__DATA__
===============================
2020-05-20 Name
===============================
Start: Data_abc
Load: Load_data

Because there is no guarantee that that if block is going to execute.
You can either ask if the variable is set before to read it, or just initialize to whatever value makes sense for your use case.

Related

Facing issue using map function in perl

I have the below lines:
7290741.out:Info: /test doesn't exist, Running on Network location
7300568.out:sh: /tmp/test/1234_123_test/test1/test2/abc.txt: bad interpreter
I have these lines in an array #test1.
I want to fetch only the numbers before .out and put them in an array #test2.
I used the below code for this:
foreach my $test1(#test1) {
my #test2;
map { /(d+)\.out/ and push #test2, $1 } <$error>;
print "#test2\n";
}
But, when I execute the code, it is printing complete lines, and I want the output like below:
7290741
7300568
Can someone please help?
When I ran your code, I just got 2 blank lines of output.
Regardless, here is a simpler version of your code which just prints out the numbers:
use warnings;
use strict;
my #test1;
while (<DATA>) {
chomp;
push #test1, $_;
}
my #test2;
for (#test1) {
push #test2, $1 if /(\d+)\.out/;
}
for (#test2) {
print "$_\n";
}
__DATA__
7290741.out:Info: /test doesn't exist, Running on Network location
7300568.out:sh: /tmp/test/1234_123_test/test1/test2/abc.txt: bad interpreter
There are numerous problems with your code.
You should use warnings and strict.
The $error variable was not defined in the code you posted.
d in the regular expression should have been \d.
You should have declared the #test2 variable outside the foreach loop; otherwise, it would only have a single value due to variable scoping.
my #nums = map { /^(\d+)\.out/ } <$errors>
Is the simplest way to put it, using map. When you try to first read the errors into an array (#test1) and loop around those values, and inside the loop try to read the values again, you are doing the same thing twice. map is also a loop.
This is assuming that your file with errors is what the filehandle $errors is reading from. Remember also to always use
use strict;
use warnings;
Perhaps OP is looking for solution of following form
Explanation:
<DATA> in this context will represent an array of lines
map will form a loop through lines
regex extracts portion of information OP interested in
result is stored in #test array
NOTE: Data::Dumper is used only for the result visualization
#!/usr/bin/env perl
#
# vim: ai ts=4 sw=4
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my #test = map { /^(.*?).out/ } <DATA>;
say Dumper(\#test);
__DATA__
7290741.out:Info: /test doesn't exist, Running on Network location
7300568.out:sh: /tmp/test/1234_123_test/test1/test2/abc.txt: bad interpreter
Output
$VAR1 = [
'7290741',
'7300568'
];

How to print rows information from two dimensional array in Perl?

I have the following two dimensional array (file.txt):
Code Element Repetitions
AL Train 23
BM Car 30
CN Bike 44
From an input (Code) given by the user, I want to extract the
corresponding Element information.
Example input: BM
Example output:Car
I tried with this code but I do not know how to compare the input name with array content. Thank you a lot
#!/usr/bin/perl
use strict;
use warnings;
print("Type code: ");
my $code = <STDIN>;
chomp($code);
my #content;
if(!open(TABLET, "file.txt")){
die "Unable to open the file\n";
}
while(<TABLET>){
chomp;
push #content, [split / /];
}
foreach my $row ($content) {
if ($content{$code}) {
print "$content{$code}\n";
}
}
close(TABLET);
There are a few problems here. And they can mostly be found by adding use strict to your code. The vast majority of experienced Perl programmers will always start their programs with:
use strict;
use warnings;
as these additions will find a huge number of common mistakes that programmers are prone to make.
The first problem can't be found like that. It seems to be a typo. You split your input using split /;+/ but your input file seems to be delimited by whitespace. So change split /;+/ to just split.
Now let's add use strict to your code and see what happens.
$ perl 2d
Global symbol "$content" requires explicit package name (did you forget to declare "my $content"?) at 2d line 20.
Global symbol "%content" requires explicit package name (did you forget to declare "my %content"?) at 2d line 21.
Global symbol "%content" requires explicit package name (did you forget to declare "my %content"?) at 2d line 22.
Execution of 2d aborted due to compilation errors.
Although there are three errors listed here, the second and third ones are both the same. But let's start with the first. Line 20 in my program is:
foreach my $row ($content) {
But what's that $content variable? You don't use that anywhere else. I suspect it's a typo for #content. Let's change that and try again.
$ perl 2d
Global symbol "%content" requires explicit package name (did you forget to declare "my %content"?) at 2d line 21.
Global symbol "%content" requires explicit package name (did you forget to declare "my %content"?) at 2d line 22.
Execution of 2d aborted due to compilation errors.
Ok. That fixed the first problem, but I guess we now have to look at the repeated error. This is generated by lines 21 and 22, which look like this:
if ($content{$code}) {
print "$content{$code}\n";
Obviously, there's no mention of %content on either of those lines - so what's the problem?
Well, the problem is that %content is mentioned on both of those lines, but it's disguised as $content{$code} in both cases. You have an array called #content and you'd look up values in that array using syntax like $content[0]. The face that you're using {...} instead of [...] means that you're looking in %content, not #content (in Perl you're allowed to have an array and a hash - and also a scalar - all with the same name, which is always a terrible idea!)
But we can't just change $content{$code} to $content[$code] because $code is string ("BM") and array indexes are integers. I we need to rethink this from scratch and actually store the data in %content, not #content. And, actually, I think that makes the code simpler.
#!/usr/bin/perl -w
use strict;
use warnings;
print("Type code: ");
my $code = <STDIN>;
chomp($code);
my %content;
if (!open(TABLET, "file.txt")){
die "Unable to open the file\n";
}
while(<TABLET>){
chomp;
my #record = split;
$content{$record[0]} = \#record;
}
if (exists $content{$code}) {
print "$content{$code}[1]\n";
} else {
print "$code is not a valid code\n";
}
close(TABLET);
We can clean that up a bit (for example, by using lexical filehandles and the three-arg version of open()) to get this:
#!/usr/bin/perl
use strict;
use warnings;
print("Type code: ");
chomp( my $code = <STDIN> );
my %content;
open my $tablet_fh, '<', 'file.txt'
or die "Unable to open the file\n";
while(<$tablet_fh>){
chomp;
my #record = split;
$content{$record[0]} = \#record;
}
if (exists $content{$code}) {
print "$content{$code}[1]\n";
} else {
print "$code is not a valid code\n";
}

Syntax errors at line 24 and 26. I don't know why?

syntax error at bioinfo2.pl line 24, near ");"
syntax error at bioinfo2.pl line 26, near "}"
Execution of bioinfo2.pl aborted due to compilation errors.
print "Enter file name......\n\n";
chomp($samplefile = <STDIN>);
open(INFILE,"$samplefile") or die "Could not open $samplefile";
#residue_name= ();
#residue_count= ();
while($newline = <INFILE>)
{
if ($newline =~ /^ATOM/)
{
chomp $newline;
#columns = split //, $newline;
$res = join '', $columns[17], $columns[18], $columns[19];
splice #columns,0;
$flag=0
for ($i = 0; $i<scalar(#residue_name); $i++;)
{
if (#residue_name[i] == $res)
{
#residue_count[i] = #residue_count[i] + 1;
$flag=1;
}
}
if($flag==0)
{
push(#residue_name, $res);
}
for ($i = 0; $i<scalar(#residue_name); $i++)
{
print (#residue_name[i], "-------", #residue_count[i], "\n");
}
}
}
It might be advisable to use strict; use warnings. That forces you to declare your variables (you can do so with my), and rules out many possible errors.
Here are a few things that I noticed:
In Perl5 v10 and later, you can use the say function (use 5.010 or use feature 'say'). This works like print but adds a newline at the end.
Never use the two-arg form of open. This opens some security issues. Provide an explicit open mode. Also, you can use scalars as filehandles; this provides nice features like auto-closing of files.
open my $INFILE, '<', $samplefile or die "Can't open $samplefile: $!";
The $! variable contains the reason why the open failed.
If you want to retrieve a list of elements from an array, you can use a slice (multiple subscripts):
my $res = join '', #columns[17 .. 19]; # also, range operator ".."
Note that the sigil is now an #, because we take multiple elems.
The splice #columns, 0 is a fancy way of saying “delete all elements from the array, and return them”. This is not neccessary (you don't read from that variable later). If you use lexical variables (declared with my), then each iteration of the while loop will receive a new variable. If you really want to remove the contents, you can undef #columns. This should be more efficient.
Actual error: You require a semicolon after $flag = 0 to terminate the statement before you can begin a loop.
Actual error: A C-style for-loop contains three expressions contained in parens. Your last semicolon divides them into 4 expressions, this is an error. Simply remove it, or look at my next tip:
C-style loops (for (foo; bar; baz) {}) are painful and error-prone. If you only iterate over a range (e.g. of indices), then you can use the range operator:
for my $i (0 .. $#residue_name) { ... }
The $# sigil gives the last index of an array.
When subscripting arrays (accessing array elements), then you have to include the sigil of the index:
$residue_name[$i]
Note that the sigil of the array is $, because we access only one element.
The pattern $var = $var + 1 can be shortened to $var++. This uses the increment operator.
The $flag == 0 could be abbreviated to !$flag, as all numbers except zero are considered true.
Here is a reimplementation of the script. It takes the filename as a command line argument; this is more flexible than prompting the user.
#!/usr/bin/perl
use strict; use warnings; use 5.010;
my $filename = $ARGV[0]; # #ARGV holds the command line args
open my $fh, "<", $filename or die "Can't open $filename: $!";
my #residue_name;
my #residue_count;
while(<$fh>) { # read into "$_" special variable
next unless /^ATOM/; # start a new iteration if regex doesn't match
my $number = join "", (split //)[17 .. 19]; # who needs temp variables?
my $push_number = 1; # self-documenting variable names
for my $i (0 .. $#residue_name) {
if ($residue_name[$i] == $number) {
$residue_count[$i]++;
$push_number = 0;
}
}
push #residue_name, $number if $push_number;
# are you sure you want to print this after every input line?
# I'd rather put this outside the loop.
for my $i (0 .. $#residue_name) {
say $residue_name[$i], ("-" x 7), $residue_count[$i]; # "x" repetition operator
}
}
And here is an implementation that may be faster for large input files: We use hashes (lookup tables), instead of looping through arrays:
#!/usr/bin/perl
use strict; use warnings; use 5.010;
my $filename = $ARGV[0]; # #ARGV holds the command line args
open my $fh, "<", $filename or die "Can't open $filename: $!";
my %count_residue; # this hash maps the numbers to counts
# automatically guarantees that every number has one count only
while(<$fh>) { # read into "$_" special variable
next unless /^ATOM/; # start a new iteration if regex doesn't match
my $number = join "", (split //)[17 .. 19]; # who needs temp variables?
if (exists $count_residue{$number}) {
# if we already have an entry for that number, we increment:
$count_residue{$number}++;
} else {
# We add the entry, and initialize to zero
$count_residue{$number} = 0;
}
# The above if/else initializes new numbers (seen once) to zero.
# If you want to count starting with one, replace the whole if/else by
# $count_residue{$number}++;
# print out all registered residues in numerically ascending order.
# If you want to sort them by their count, descending, then use
# sort { $count_residue{$b} <=> $count_residue{$a} } ...
for my $num (sort {$a <=> $b} keys %count_residue) {
say $num, ("-" x 7), $count_residue{$num};
}
}
It took me a while to chance down all the various errors. As others have said, use use warnings; and use strict;
Rule #1: Whenever you see syntax error pointing to a perfectly good line, you should always see if the line before is missing a semicolon. You forgot the semicolon after $flag=0.
In order to track down all the issues, I've rewritten your code into a more modern syntax:
#! /usr/bin/env perl
use strict;
use warnings;
use autodie;
print "Enter file name......\n\n";
chomp (my $samplefile = <STDIN>);
open my $input_file, '<:crlf', $samplefile;
my #residue_name;
my #residue_count;
while ( my $newline = <$input_file> ) {
chomp $newline;
next if $newline !~ /^ATOM/; #Eliminates the internal `if`
my #columns = split //, $newline;
my $res = join '', $columns[17], $columns[18], $columns[19];
my $flag = 0;
for my $i (0..$#residue_name) {
if ( $residue_name[$i] == $res ) {
$residue_count[$i]++;
$flag = 1;
}
}
if ( $flag == 0 ) {
push #residue_name, $res;
}
for my $i (0..$#residue_name) {
print "$residue_name[$i] ------- $residue_count[$i]\n";
}
}
close $input_file;
Here's a list of changes:
Lines 2 & 3: Always use use strict; and use warnings;. These will help you track down about 90% of your program errors.
Line 4: Use use autodie;. This will eliminate the need for checking whether a file opened or not.
Line 7 (and others): Using use strict; requires you to predeclare variables. Thus, you'll see my whenever a variable is first used.
Line 8: Use the three parameter open and use local variables for file handles instead of globs (i.e. $file_handle vs. FILE_HANDLE). The main reasons is that local variables are easier to pass into subroutines than globs.
Lines 9 & 10: No need to initialize the arrays, just declare them is enough.
Line 13: Always chomp as soon as you read in.
Line 14: Doing this eliminates an entire inner if statement that's embraces your entire while loop. Code blocks (such as if, while, and for) get hard to figure out when they get too long and too many embedded inside each other. Using next in this way allows me to eliminate the if block.
Line 17: Here's where you missed the semicolon which gave you your first syntax error. The main thing is I eliminated the very confusing splice command. If you want to zero out your array, you could have simply said #columns = (); which is much clearer. However, since #columns is now in scope only in the while loop, I no longer have to blank it out since it will be redefined for each line of your file.
Line 18: This is a much cleaner way of looping through all lines of your array. Note that $#residue_name gives you the last index of $#residue_name while scalar #resudue_name gives you the number of elements. This is a very important distinction! If I have an #array = (0, 1, 2, 3, 4), $#array will be 4, but scalar #array will be 5. Using the C style for loop can be a bit confusing when doing this. Should you use > or >=? Using (0..$#residue) name is obvious and eliminate the chance of errors which included the extra semi-colon inside your C style for statement. Because of the chance of errors and the complexity of the syntax, The developers who created Python have decided not allow for C style for loops.
Line 19 (and others): Using warnings pointed out that you did #residue_name[i] and it had several issues. First of all, you should use $residue_name[...] when indexing an array, and second of all, i is not an integer. You meant $i. Thus #residue_name[i] becomes $residue_name[$i].
Line 20: If you're incrementing a variable, use $foo++; or $foo += 1; and not $foo = $foo + 1;. The first two make it easier to see that you're incrementing a variable and not recalculating it's value.
Line 29: One of the great features of Perl is that variables can be interpolated inside quotes. You can put everything inside a single set of quotes. By the way, you should use . and not , if you do break up a print statement into multiple pieces. The , is a list operation. This means that what you print out is dependent upon the value of $,. The $, is a Perl variable that says what to print out between each item of a list when you interpolate a list into a string.
Please don't take this as criticism of your coding abilities. Many Perl books that teach Perl, and many course that teach Perl seem to teach Perl as it was back in the Perl 3.0 days. When I first learned Perl, it was at Perl 3.0, and much of my syntax would have looked like yours. However, Perl 5.x has been out for quite a while and contains many features that made programming easier and cleaner to read.
It took me a while to get out of Perl 3.0 habits and into Perl 4.0 and later Perl 5.0 habits. You learn by looking at what others do, and asking questions on forums like Stack Overflow.
I still can't say your code will work. I don't have your input, so I can't test it against that. However, by using this code as the basis of your program, debugging these errors should be pretty easy.

Prevent Perl from printing identical warning messages

Consider the following nonsense script as an example:
use strict;
use warnings;
my $uninitialisedValue;
while(<>){
print ${$uninitialisedValue}{$_},"\n";
}
Which is run from the command line:
$ perl warningPrinter.pl < longfile.txt
Regardless of what standard input contains, standard output will be full of:
Use of uninitialized value in print at warningPrinter.pl line 16, <> line 1.
Use of uninitialized value in print at warningPrinter.pl line 16, <> line 2.
Use of uninitialized value in print at warningPrinter.pl line 16, <> line 3.
Use of uninitialized value in print at warningPrinter.pl line 16, <> line 4.
...
I work with very long files, so receiving this as output when testing my script is at the very least mildly irritating. It can take a while for the process to respond to a Ctrl + C termination signal and my terminal is suddenly filled with the same error message.
Is there a way of either getting Perl to print just the first instance of an identical and reoccurring warning message, or to just make warning messages fatal to the execution of the script? Seeing as I have never produced a script that works despite having warnings in them, I would accept either. But it's probably more convenient if I can get Perl to print identical warnings just once.
I thought I would show you how unique warning logic might be created. I don't recommend it though:
my %printed;
local $SIG{__WARN__} = sub {
my $message = shift;
my ( $msg, $loc ) = $message =~ m/(.*?) at (.*?line \d+)/;
print $message unless $printed{$loc}{$msg}++;
};
I should say that I do not recommend this as a general practice. Because it's better to have a warning policy. It's either an operation that can take an undefined value, or you don't want to handle an undef value. I try to remove all warnings from my completed code.
In the first case, putting no warnings 'uninitialized'; in the for loop is a much easier--and regular thing to do. In the second case, you'd probably want to fail.
However, if it is something you would actually like to handle but warn once about, say that you wanted robust handling of the data, but wanted to warn upstream processes that you got some bad data, you could go about creating a sub warn_once:
{ use Carp ();
my %warned;
sub warn_once {
my $message = shift;
my ( $msg, $loc ) = $message =~ m/(.*?) at (.*?line \d+)/;
Carp::carp( $message ) unless $warned{$loc}{$msg}++;
};
}
And call it like this:
while ( <> ) {
warn_once( '$uninitialisedValue is uninitialized' )
unless defined( $uninitialisedValue)
;
no warnings 'uninitialized';
print ${$uninitialisedValue}{$_},"\n";
}
Then you have decided something.

help with perl script converting use of argv to using getopts

I am trying to convert the use of #ARGV with using Getopt::Std instead in my perl script.
I am getting some substr errors and need some help figuring this out.
Errors:
Use of uninitialized value in substr at ./h.pl line 33.
Use of uninitialized value in substr at ./h.pl line 33.
substr outside of string at ./h.pl line 33.
Use of uninitialized value in substr at ./h.pl line 33.
substr outside of string at ./h.pl line 33.
The 'month' parameter (undef) to DateTime::new was an 'undef', which is not one of the allowed types: scalar
at /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi/DateTime.pm line 176
DateTime::new('undef', 'HASH(0xb6932d0)') called at ./h.pl line 33
Here is my code. (commented out code was working code using #ARGV)
use strict;
use warnings;
use Getopt::Std;
use DateTime;
# Getopt usage
my %opt;
getopts ('fd:ld:h', \%opt);
$opt{h} and &Usage;
my $first_date = $opt{fd};
my $last_date = $opt{ld};
#unless(#ARGV==2)
#{
# print "Usage: myperlscript first_date last_date\n";
# exit(1);
#}
#
#my ($first_date,$last_date)=#ARGV;
# Convert using Getopts
my $date=DateTime->new(
{
year=>substr($first_date,0,4),
month=>substr($first_date,4,2),
day=>substr($first_date,6,2)
});
while($date->ymd('') le $last_date)
{
print $date->ymd('') . "\n";
$date->add(days=>1);
}
Even if you think Getopt::Std will do what you want, use Getopt::Long. For pretty much the same reasons you'd not just hand-roll an #ARGV handler.
To quote (in part) tchrist in http://www.nntp.perl.org/group/perl.perl5.porters/2008/05/msg136952.html:
I really like Getopt::Long...I cannot say enough good things about it to do it the justice it deserves... The only problem is that I just don't use it enough. I bet I'm not alone. What seems to happen is that at first we just want to add--oh say for example JUST ONE, SINGLE LITTLE -v flag. Well, that's so easy enough to hand-hack, that of course we do so... But just like any other piece of software, these things all seem to have a way of overgrowing their original expectations... Getopt::Long is just wonderful, up--I believe--to any job you can come up with for it. Too often its absence means that I've in the long run made more work for myself--or others--by not having used it originally.
"getopt, getopts - Process single-character switches with switch clustering"
As only single character switches are allowed $opt{fd} and $opt{ld} are undef.
Getopt::Long does what you want.
use strict;
use warnings;
use Getopt::Long;
my $fd;
my $ld;
my $result = GetOptions(
'fd=s' => \$fd,
'ld=s' => \$ld,
);
die unless $result;
print "fd: $fd\n";
print "ld: $ld\n";