Odd "Use of uninitialized value", regular expression error - perl

I'm using Strawberry Perl 5 on Windows 10. It does seem like my regular expressions are broken or regex101 won't tell me the truth. I want to catch 'num km'. Even tho my array seems to be the right length it'd often say"Use of uninitialized value".
my $string = "^ˇ~ --_ 12 km aéeklwa 32 km | \|ġ^ 0 km 23-24 km";
if (#szelmatches = $string =~ /\d+(\-\d+)?\s+km/gm) {
my $number_of_elements = scalar(#szelmatches);
print "Elements in the array : $number_of_elements \n";
}
foreach (#szelmatches) {
print "$_\n";
}
OUTPUT:
Elements in the array : 4
Use of uninitialized value $_ in concatenation (.) or string at C:\misc\perlek\wttr\szel.pl line 16.
I've ran defined() checks but it seems like my array elements are all defined. Changing \- to .{1} occasionaly worked but it is quite annoying to write like this. regex101.com and regexr.com tells me everything is allright.
I know you could write it simpler/shorter/better/faster/nicer etc, but i honestly think this should work. Do you guys have any idea what am i doing wrong?

Firstly, I had to fix a syntax error in your code before I could run it (the closing ) was missing from your if statement). Please cut and paste code, rather than retyping it.
If Perl tells you that it's finding undefs then it's almost certainly right. Using Data::Dumper can show us what is going on.
use warnings;
use Data::Dumper;
my $string = "^ˇ~ --_ 12 km aéeklwa 32 km | \|ġ^ 0 km 23-24 km";
if (#szelmatches = $string =~ /\d+(\-\d+)?\s+km/gm) {
my $number_of_elements = scalar(#szelmatches);
print "Elements in the array : $number_of_elements \n";
}
print Dumper \#szelmatches;
foreach (#szelmatches) {
print "$_\n";
}
That gives us the following:
$VAR1 = [
undef,
undef,
undef,
'-24'
];
So, yes, there are three undefs in your results. Can we work out why?
Well, here's your match operator.
/\d+(\-\d+)?\s+km/gm
It's looking for digits followed by an optional dash and more digits. But it's only that optional part that you're capturing (as it has parentheses around it). And in the first three cases, that optional section doesn't appear. So you get undef for those first three matches.
Let's actually match what you want (the whole digits section, I think) by putting more parentheses around the whole thing.
/(\d+(\-\d+)?)\s+km/gm
Now we get this result:
$VAR1 = [
'12',
undef,
'32',
undef,
'0',
undef,
'23-24',
'-24'
];
That's better. We get all of the matches we want, alongside the original ones. So, that's twice as many matches as we want. That's because we now have two sets of parentheses for each match. We need the first set to match and capture the digit section and the second set to join together the "-" and the "\d+". But we don't need the second set to capture its contents.
If you read the section on "Extended Patterns" in the perlre manual page, you'll see that we can create non-capturing parentheses with (?:...). So let's use that.
/(\d+(?:\-\d+)?)\s+km/gm
And that gives us:
$VAR1 = [
'12',
'32',
'0',
'23-24'
];
Which is, I think, what you wanted.
Update: Re-reading your question, I realise that you wanted the 'km' as well. So I've moved the closing parentheses past that.
/(\d+(?:\-\d+)?\s+km)/gm
And that gives us:
$VAR1 = [
'12 km',
'32 km',
'0 km',
'23-24 km'
];

The warning you see is because $_ is undefined. In Perl, you can have variables that have no value at all. That's undef.
The first thing you want to do in this case is inspect your array. The core Data::Dumper module is good for that. Or you can install Data::Printer from CPAN, which I prefer.
print Dumper \#szelmatches;
foreach (#szelmatches) {
print "$_\n";
}
This will output
$VAR1 = [
undef,
undef,
undef,
'-24'
];
Clearly there are some undefs in the array. This is because you have a capture group (\-\d) that is optional ?. Each time the string gets matched successfully via the /g modifier, it will put all of the capture group results into your array. But the only group you have is optional, so the pattern matches even if there is no -\d going on.
You can visualise this on Debugex. If you want to have a more detailed play-around with it, try the Regexp::Debugger module, which will allow you to step-by-step debug your regex right in your terminal.
You will have to tell us which numbers you actually want to capture.
If all you are after is the second one after the dash (which you do not have to escape, it has no special meaning), then you should not make that capture group optional.

Two problems.
When a capture is conditional (e.g. (...)?) and it doesn't match anything, it captures undef.
When there's one or more captures, the match returns the capture text rather than the entire text matched.
The solution is to remove the useless and problem-causing capture. Replace
if ( my #szelmatches = $string =~ /\d+(\-\d+)?\s+km/g )
with
if ( my #szelmatches = $string =~ /\d+(?:\-\d+)?\s+km/g )

Related

What is "Use of unitialized value $. in range (or flip)" trying to tell me in Perl

I have the following code snippet in Perl:
my $argsize = #args;
if ($argsize >1){
foreach my $a ($args[1..$argsize-1]) {
$a =~ s/(.*[-+*].*)/\($1\)/; # if there's a math operator, put in parens
}
}
On execution I'm getting "Use of unitialized value $. in range (or flip) , followed by Argument "" isn't numeric in array element at... both pointing to the foreach line.
Can someone help me decipher the error message (and fix the problem(s))? I have an array #args of strings. The code should loop through the second to n't elements (if any exist), and surround individual args with () if they contain a +,-, or *.
I don't think the error stems from the values in args, I think I'm screwing up the range somehow... but I'm failing when args has > 1 element. an example might be:
<"bla bla bla"> <x-1> <foo>
The long and short of it is - your foreach line is broken:
foreach my $a (#args[1..$argsize-1]) {
Works fine. It's because you're using a $ which says 'scalar value' rather than an # which says array (or list).
If you use diagnostics you get;
Use of uninitialized value $. in range (or flip) at
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.
To help you figure out what was undefined, perl will try to tell you
the name of the variable (if any) that was undefined. In some cases
it cannot do this, so it also tells you what operation you used the
undefined value in. Note, however, that perl optimizes your program
and the operation displayed in the warning may not necessarily appear
literally in your program. For example, "that $foo" is usually
optimized into "that " . $foo, and the warning will refer to the
concatenation (.) operator, even though there is no . in
your program.
You can reproduce this error by:
my $x = 1..3;
Which is actually pretty much what you're doing here - you're trying to assign an array value into a scalar.
There's a load more detail in this question:
What is the Perl context with range operator?
But basically: It's treating it as a range operator, as if you were working your way through a file. You would be able to 'act on' particular lines in the file via this operator.
e.g.:
use Data::Dumper;
while (<DATA>) {
my $x = 2 .. 3;
print Dumper $x;
print if $x;
}
__DATA__
line one
another line
third line
fourth line
That range operator is testing line numbers - and because you have no line numbers (because you're not iterating a file) it errors. (But otherwise - it might work, but you'd get some really strange results ;))
But I'd suggest you're doing this quite a convoluted way, and making (potentially?) an error, in that you're starting your array at 1, not zero.
You could instead:
s/(.*[-+*].*)/\($1\)/ for #args;
Which'll have the same result.
(If you need to skip the first argument:
my ( $first_arg, #rest ) = #args;
s/(.*[-+*].*)/\($1\)/ for #rest;
But that error at runtime is the result of some of the data you're feeding in. What you've got here though:
use strict;
use warnings;
my #args = ( '<"bla bla bla">', '<x-1>', '<foo>' );
print "Before #args\n";
s/(.*[-+*].*)/\($1\)/ for #args;
print "After: #args\n";

Skipping particular positions in a string using substitution operator in perl

Yesterday, I got stuck in a perl script. Let me simplify it, suppose there is a string (say ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD), first I've to break it at every position where "E" comes, and secondly, break it specifically where the user wants to be at. But, the condition is, program should not cut at those sites where E is followed by P. For example there are 6 Es in this sequence, so one should get 7 fragments, but as 2 Es are followed by P one will get 5 only fragments in the output.
I need help regarding the second case. Suppose user doesn't wants to cut this sequence at, say 5th and 10th positions of E in the sequence, then what should be the corresponding script to let program skip these two sites only? My script for first case is:
my $otext = 'ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD';
$otext=~ s/([E])/$1=/g; #Main cut rule.
$otext=~ s/=P/P/g;
#output = split( /\=/, $otext);
print "#output";
Please do help!
To split on "E" except where it's followed by "P", you should use Negative look-ahead assertions.
From perldoc perlre "Look-Around Assertions" section:
(?!pattern)
A zero-width negative look-ahead assertion.
For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar".
my $otext = 'ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD';
# E E EP E EP E
my #output=split(/E(?!P)/, $otext);
use Data::Dumper; print Data::Dumper->Dump([\#output]);"
$VAR1 = [
'ABCD',
'ABCD',
'ABCDEPABCD',
'ABCDEPABCD',
'ABCD'
];
Now, in order to NOT cut at occurences #2 and #4, you can do 2 things:
Concoct a really fancy regex that automatically fails to match on given occurence. I will leave that to someone else to attempt in an answer for completeness sake.
Simply stitch together the correct fragments.
I'm too brain-dead to come up with a good idiomatic way of doing it, but the simple and dirty way is either:
my %no_cuts = map { ($_=>1) } (2,4); # Do not cut in positions 2,4
my #output_final;
for(my $i=0; $i < #output; $i++) {
if ($no_cuts{$i}) {
$output_final[-1] .= $output[$i];
} else {
push #output_final, $output[$i];
}
}
print Data::Dumper->Dump([\#output_final];
$VAR1 = [
'ABCD',
'ABCDABCDEPABCD',
'ABCDEPABCDABCD'
];
Or, simpler:
my %no_cuts = map { ($_=>1) } (2,4); # Do not cut in positions 2,4
for(my $i=0; $i < #output; $i++) {
$output[$i-1] .= $output[$i];
$output[$i]=undef; # Make the slot empty
}
my #output_final = grep {$_} #output; # Skip empty slots
print Data::Dumper->Dump([\#output_final];
$VAR1 = [
'ABCD',
'ABCDABCDEPABCD',
'ABCDEPABCDABCD'
];
Here's a dirty trick that exploits two facts:
normal text strings never contain null bytes (if you don't know what a null byte is, you should as a programmer: http://en.wikipedia.org/wiki/Null_character, and nb. it is not the same thing as the number 0 or the character 0).
perl strings can contain null bytes if you put them there, but be careful, as this may screw up some perl internal functions.
The "be careful" is just a point to be aware of. Anyway, the idea is to substitute a null byte at the point where you don't want breaks:
my $s = "ABCDEABCDEABCDEPABCDEABCDEPABCDEABCD";
my #nobreak = (4,9);
foreach (#nobreak) {
substr($s, $_, 1) = "\0";
}
"\0" is an escape sequence representing a null byte like "\t" is a tab. Again: it is not the character 0. I used 4 and 9 because there were E's in those positions. If you print the string now it looks like:
ABCDABCDABCDEPABCDEABCDEPABCDEABCD
Because null bytes don't display, but they are there, and we are going to swap them back out later. First the split:
my #a = split(/E(?!P)/, $s);
Then swap the zero bytes back:
$_ =~ s/\0/E/g foreach (#a);
If you print #a now, you get:
ABCDEABCDEABCDEPABCD
ABCDEPABCD
ABCD
Which is exactly what you want. Note that split removes the delimiter (in this case, the E); if you intended to keep those you can tack them back on again afterward. If the delimiter is from a more dynamic regex it is slightly more complicated, see here:
http://perlmeme.org/howtos/perlfunc/split_function.html
"Example 9. Keeping the delimiter"
If there is some possibility that the #nobreak positions are not E's, then you must also keep track of those when you swap them out to make sure you replace with the correct character again.

What does the Perl split function return when there is no value between tokens?

I'm trying to split a string using the split function but there isn't always a value between tokens.
Ex: ABC,123,,,,,,XYZ
I don't want to skip the multiple tokens though. These values are in specific positions in the string. However, when I do a split, and then try to step through my resulting array, I get "Use of uninitialized value" warnings.
I've tried comparing the value using $splitvalues[x] eq "" and I've tried using defined($splitvalues[x]) , but I can't for the life of me figure out how to identify what the split function is putting in to my array when there is no value between tokens.
Here's the snippet of my code (now with more crunchy goodness):
my #matrixDetail = ();
#some other processing happens here that is based on matching data from the
##oldDetail array with the first field of the #matrixLine array. If it does
#match, then I do the split
if($IHaveAMatch)
{
#matrixDetail = split(',', $matrixLine[1]);
}
else
{
#matrixDetail = ('','','','','','','');
}
my $newDetailString =
(($matrixDetail[0] eq '') ? $oldDetail[0] : $matrixDetail[0])
. (($matrixDetail[1] eq '') ? $oldDetail[1] : $matrixDetail[1])
.
.
.
. (($matrixDetail[6] eq '') ? $oldDetail[6] : $matrixDetail[6]);
because this is just snippets, I've left some of the other logic out, but the if statement is inside a sub that technically returns the #matrixDetail array back. If I don't find a match in my matrix and set the array equal to the array of empty strings manually, then I get no warnings. It's only when the split populates the #matrixDetail.
Also, I should mention, I've been writing code for nearly 15 years, but only very recently have I needed to work with Perl. The logic in my script is sound (or at least, it works), I'm just being anal about cleaning up my warnings and trying to figure out this little nuance.
#!perl
use warnings;
use strict;
use Data::Dumper;
my $str = "ABC,123,,,,,,XYZ";
my #elems = split ',', $str;
print Dumper \#elems;
This gives:
$VAR1 = [
'ABC',
'123',
'',
'',
'',
'',
'',
'XYZ'
];
It puts in an empty string.
Edit: Note that the documentation for split() states that "by default, empty leading fields are preserved, and empty trailing ones are deleted." Thus, if your string is ABC,123,,,,,,XYZ,,,, then your returned list will be the same as the above example, but if your string is ,,,,ABC,123, then you will have a list with three empty strings in elements 0, 1, and 2 (in addition to 'ABC' and '123').
Edit 2: Try dumping out the #matrixDetail and #oldDetail arrays. It's likely that one of those isn't the length that you think it is. You might also consider checking the number of elements in those two lists before trying to use them to make sure you have as many elements as you're expecting.
I suggest to use Text::CSV from CPAN. It is a ready made solution which already covers all the weird edge cases of parsing CSV formatted files.
delims with nothing between them give empty strings when split. Empty strings evaluate as false in boolean context.
If you know that your "details" input will never contain "0" (or other scalar that evaluates to false), this should work:
my #matrixDetail = split(',', $matrixLine[1]);
die if #matrixDetail > #oldDetail;
my $newDetailString = "";
for my $i (0..$#oldDetail) {
$newDetailString .= $matrixDetail[$i] || $oldDetail[$i]; # thanks canSpice
}
say $newDetailString;
(there are probably other scalars besides empty string and zero that evaluate to false but I couldn't name them off the top of my head.)
TMTOWTDI:
$matrixDetail[$_] ||= $oldDetail[$_] for 0..$#oldDetail;
my $newDetailString = join("", #matrixDetail);
edit: for loops now go from 0 to $#oldDetail instead of $#matrixDetail since trailing ",,," are not returned by split.
edit2: if you can't be sure that real input won't evaluate as false, you could always just test the length of your split elements. This is safer, definitely, though perhaps less elegant ^_^
Empty fields in the middle will be ''. Empty fields on the end will be omitted, unless you specify a third parameter to split large enough (or -1 for all).

Can I replace the binding operator with the smartmatch operator in Perl?

How can I write this with the smartmatch operator (~~)?
use 5.010;
my $string = '12 23 34 45 5464 46';
while ( $string =~ /(\d\d)\s/g ) {
say $1;
}
Interesting. perlsyn states:
Any ~~ Regex pattern match $a =~ /$b/
so, at first glance, it seems reasonable to expect
use strict; use warnings;
use 5.010;
my $string = '12 23 34 45 5464 46';
while ( $string ~~ /(\d\d)\s/g ) {
say $1;
}
to print 12, 23, etc but it gets stuck in a loop, matching 12 repeatedly. Using:
$ perl -MO=Deparse y.pl
yields
while ($string ~~ qr/(\d\d)\s/g) {
say $1;
}
looking at perlop, we notice
qr/STRING/msixpo
Note that 'g' is not listed as a modifier (logically, to me).
Interestingly, if you write:
my $re = qr/(\d\d)\s/g;
perl barfs:
Bareword found where operator expected at C:\Temp\y.pl line 5,
near "qr/(\d\d)\s/g"
syntax error at C:\Temp\y.pl line 5, near "qr/(\d\d)\s/g"
and presumably it should also say something if an invalid expression is used in the code above
If we go and look at what these two variants get transformed into, we can see the reason for this.
First lets look at the original version.
perl -MO=Deparse -e'while("abc" =~ /(.)/g){print "hi\n"}'
while ('abc' =~ /(.)/g) {
print "hi\n";
}
As you can see there wasn't any changing of the opcodes.
Now if you go and change it to use the smart-match operator, you can see it does actually change.
perl -MO=Deparse -e'while("abc" ~~ /(.)/g){print "hi\n"}'
while ('abc' ~~ qr/(.)/g) {
print "hi\n";
}
It changes it to qr, which doesn't recognize the /g option.
This should probably give you an error, but it doesn't get transformed until after it gets parsed.
The warning you should have gotten, and would get if you used qr instead is:
syntax error at -e line 1, near "qr/(.)/g"
The smart-match feature was never intended to replace the =~ operator. It came out of the process of making given/when work like it does.
Most of the time, when(EXPR) is treated as an implicit smart match of $_.
...
Is the expected behaviour to output to first match endlessly? Because that's what this code must do in its current form. The problem isn't the smart-match operator. The while loop is endless, because no modification ever occurs to $string. The /g global switch doesn't change the loop itself.
What are you trying to achieve? I'm assuming you want to output the two-digit values, one per line. In which case you might want to consider:
say join("\n", grep { /^\d{2}$/ } split(" ",$string));
To be honest, I'm not sure you can use the smart match operator for this. In my limited testing, it looks like the smart match is returning a boolean instead of a list of matches. The code you posted (using =~) can work without it, however.
What you posted doesn't work because of the while loop. The conditional statement on a while loop is executed before the start of each iteration. In this case, your regex is returning the first value in $string because it is reset at each iteration. A foreach would work however:
my $string = '12 23 34 45 5464 46';
foreach my $number ($string =~ /(\d\d)\s/g) {
print $number."\n";
}

How do I build a 2d matrix using STDIN in Perl?

How do I build a 2d matrix using STDIN?
If I input a matrix like so:
1 2 3
4 5 6
7 5 6
7 8 9
4 5 6
3 3 3
how do I input this and create two matrices out of this?
Here's my code so far
while (defined ($a=<STDIN>)) {
chomp ($a);
push #a,($a);
}
This is just for the input.
My understanding is I can just add each row to a stack. When the matrices are all put in I can take each line, break by space to create an array. I then need to create an array reference and push this reference into an array to create my matrix. How the heck do I do this? Is there an easier way? I've been bashing my head on this for 3 days now. I feel pretty damn stupid right now...
Let's make that code you have a little more Perl-y, and we'll do everything you need done in one pass:
my #a = ();
while(<>) {
push #a, [ split ];
}
This is taking a lot out of your answer, so I'll opt to explain it, rather than aiming for John Wayne-like answering reflexes. We'll start with your line here:
while(defined(my $a = <STDIN>))
Perl users know that many loops will implicitly use the $_ variable. If you need lots of nested loops, you should avoid using that variable, and use well-named variables for each level of looping, but in this case we only have one level, so let's go ahead and use it:
while(defined($_ = <STDIN>))
Now, Perl is kind enough to understand that we want to test for defined()ness a lot, so it will allow us to shorten that to this:
while(<STDIN>)
This is implicitly translated by Perl as assigning the line read to $_ and returning true as long as the result is defined (and therefore until end-of-file occurs). However, Perl gives us one more trick:
while(<>)
This will loop over STDIN or, if arguments are given on the command line, it will open those as files and loop over them. So this still reads from STDIN:
./myscript.pl
But we can also read from one or more files:
./myscript.pl myfile [myfile2 [myfile3 ...]]
It's easier and more intuitive than using the shell to do the same (though this will still work):
cat myfile [myfile2 [myfile3 ...]] | ./myscript.pl
If you don't want this behavior, you can change it back to <STDIN>, but consider keeping it.
The loop is:
push #a, [ split ];
First, split() with no arguments is identical to split /\s+/, $_ (i.e. it splits the $_ string on occurrences of whitespace characters), and due to the subtleties of split empty trailing fields are removed, so a chomp() is unnecessary. Then, [] creates an anonymous array reference (which, in this case, contains the contents of our split $_ string). Then, we push that array reference onto #a. Simple as pie, you now have a two-dimensional matrix from your standard input.
Try this:
use strict;
use warnings;
use Data::Dumper;
my #matrix;
while (my $line = <>) {
chomp $line;
my #row = split /\s+/, $line, 3;
push #matrix, \#row;
}
print Dumper(\#matrix);
Instead of using <STDIN> explicitly, you can read from either stdin or a piped file with <>.
Inputting one matrix gives the result:
$VAR1 = [
[
'1',
'2',
'3'
],
[
'4',
'5',
'6'
],
[
'7',
'8',
'9'
]
];
From here you should be able to see what you need to do to read in two matrices.
The other answers seem to be missing the requirement to read multiple matrices from the same input, breaking on a blank line. There are a few different ways to go about this, including frobbing $/, but here's one that appeals to me.
# Read a matrix from a handle, with columns delimited by whitespace
# and rows delimited by newlines. A matrix ends at a blank line
# (which is consumed) or EOF.
sub read_matrix_from {
my ($handle) = #_;
my #out;
while (<$handle>) {
last unless /\S/;
push #out, [ split ];
}
return \#out;
}
my #matrices;
push #matrices, read_matrix_from(\*ARGV) until eof();
Season the last part to taste, of course -- you might be using an explicitly opened handle instead of ARGV magic, and you might know in advance how many things you're reading instead of going to EOF, etc.