using variable as regex substitution parameter - perl

I've been doing some searching and haven't found an answer. Why isn't this working?
$self->{W_CONTENT} =~ /$regex/;
print $1; #is there a value? YES
$store{URL} =~ s/$param/$1/;
Yes $1 has a value. $param is replaced however it is replaced with nothing. I'm positive $1 has a value. If I replace with text instead of "$1" it works fine. Please help!

For $1 to have a value you need to ensure that $param has parentheses () in it. i.e. The following has a problem similar to what you are explaining.
my $fred = "Fred";
$fred =~ s/red/$1/;
# $fred will now be "F"
But this works
my $fred = "Fred";
$fred =~ s/r(ed)/$1/;
# $fred will now be "Fed"
Now if you want to use the $1 from your first regex in the second one you need to copy it. Every regex evaluation resets $1 ... $&. So you want something like:
$self->{W_CONTENT} =~ /$regex/;
print $1; #is there a value? YES
my $old1 = $1;
$store{URL} =~ s/$param/$old1/;

Backreferences such as $1 shouldn't be used inside the expression; you'd use a different notation - for an overview, check out Perl Regular Expressions Quickstart.
Consider getting the value of $1 and storing it in another variable, then using that in the regex.

Related

Using binding operator in perl

I am working on a program in perl and I am trying to combine more than one regex in a binding operator. I have tried using the syntax below but it's not working. I will like to know if there is any other way to go with this.
$in =~ (s/pattern/replacement/)||(s/pattern/replacement/)||...
You can often get a clue about what the Perl makes of some code using B::Deparse.
$ perl -MO=Deparse -E'$in =~ (s/pattern1/replacement1/)||(s/pattern2/replacement2/)'
[ ... snip ... ]
s/pattern2/replacement2/u unless $in =~ s/pattern1/replacement1/u;
-e syntax OK
So it's attempting your first substitution on $in. And if that fails, it is then trying your second substitution. But it's not using $in for the second substitution, it's using $_ instead.
You're running up against precedence issues here. Perl interprets your code as:
($in =~ s/pattern1/replacement1/) or (s/pattern2/replacement2/)
Notice that the opening parenthesis has moved before $in.
As others have pointed out, it's best to use a loop approach here. But I thought it might be useful to explain why your version didn't work.
Update: To be clear, if you wanted to use syntax like this, then you would need:
($in =~ s/pattern1/replacement1/) or
($in =~ s/pattern2/replacement2/);
Note that I've included $in =~ in each expression. At this point, it becomes obvious (I hope) why the looping solution is better.
However, because or is a short-circuiting operator, this statement will stop after the first successful substitution. I assumed that's what you wanted from your use of it in your original code. If that's not what you want, then you need to either switch to using and or (better, in my opinion) break them out into separate statements.
$in =~ s/pattern1/replacement1/;
$in =~ s/pattern2/replacement2/;
The closest you could get with a syntax looking similar to that would be
s/one/ONE/ or
s/two/TWO/ or
...
s/ten/TEN/ for $str;
This will attempt each substitution in turn, once only, stopping after the first successful one.
Use for to "topicalize" (alias $_ to your variable).
for ($in) {
s/pattern/replacement/;
s/pattern/replacement/;
}
A simpler way might be to create an array of all such patterns and replacements, then simply iterate through your array applying the substitution one pattern at a time.
my $in = "some string you want to modify";
my #patterns = (
['pattern to match', 'replacement string'],
# ...
);
$in = replace_many($in, \#patterns);
sub replace_many {
my ($in, $replacements) = #_;
foreach my $replacement ( #$replacements ) {
my ($pattern, $replace_string) = #$replacement;
$in =~ s/$pattern/$replace_string/;
}
return $in;
}
It's not at all clear what you need, and it's not at all clear that you can accomplish what you appear to want by the means you suggest. The OR operator is a short circuit operator, and you may not want this behavior. Please give an example of the input you expect, and the output you desire, hopefully several examples of each. Meanwhile, here is a test script.
use warnings;
use strict;
my $in1 = 'George Walker Bush';
my $in2 = 'George Walker Bush';
my $in3 = 'George Walker Bush';
my $in4 = 'George Walker Bush';
(my $out1 = $in1) =~ s/e/*/g;
print "out1 = $out1 \n";
(my $out2 = $in2) =~ s/Bush/Obama/;
print "out2 = $out2 \n";
(my $out3 = $in3) =~ s/(George)|(Bush)/Obama/g;
print "out3 = $out3\n";
$in4 =~ /(George)|(Walker)|(Bush)/g;
print "$1 - $2 - $3\n";
exit(0);
You will notice in the last case that only the first OR operator matches in the regular expression. If you wanted to replace 'George Walker Bush' with Barack Hussein Obama', you could do that easily enough, but you would also replace 'George Washington'with 'Barack Washington' - is this what you want? Here is the output of the script:
out1 = G*org* Walk*r Bush
out2 = George Walker Obama
out3 = Obama Walker Obama
Use of uninitialized value $2 in concatenation (.) or string at pq_151111a.plx line 19.
Use of uninitialized value $3 in concatenation (.) or string at pq_151111a.plx line 19.
George - -

conditional substitution using hashes

I'm trying for substitution in which a condition will allow or disallow substitution.
I have a string
$string = "There is <tag1>you can do for it. that dosen't mean <tag2>you are fool.There <tag3>you got it.";
Here are two hashes which are used to check condition.
my %tag = ('tag1' => '<you>', 'tag2'=>'<do>', 'tag3'=>'<no>');
my %data = ('you'=>'<you>');
Here is actual substitution in which substitution is allowed for hash tag values not matched.
$string =~ s{(?<=<(.*?)>)(you)}{
if($tag{"$1"} eq $data{"$2"}){next;}
"I"
}sixe;
in this code I want to substitute 'you' with something with the condition that it is not equal to the hash value given in tag.
Can I use next in substitution?
Problem is that I can't use \g modifier. And after using next I cant go for next substitution.
Also I can't modify expression while matching and using next it dosen't go for second match, it stops there.
You can't use a variable length look behind assertion. The only one that is allowed is the special \K marker.
With that in mind, one way to perform this test is the following:
use strict;
use warnings;
while (my $string = <DATA>) {
$string =~ s{<([^>]*)>\K(?!\1)\w+}{I}s;
print $string;
}
__DATA__
There is <you>you can do for it. that dosen't mean <notyou>you are fool.
There is <you>you can do for it. that dosen't mean <do>you are fool.There <no>you got it.
Output:
There is <you>you can do for it. that dosen't mean <notyou>I are fool.
There is <you>you can do for it. that dosen't mean <do>I are fool.There <no>you got it.
It was simple but got my two days to think about it. I just written another substitution where it ignores previous tag which is cancelled by next;
$string = "There is <tag1>you can do for it. that dosen't mean <tag2>you are fool.There <tag3>you got it.";
my %tag = ('tag1' => '<you>', 'tag2'=>'<do>', 'tag3'=>'<no>');
my %data = ('you'=>'<you>');
my $notag;
$string =~ s{(?<=<(.*?)>)(you)}{
$notag = $2;
if($tag{"$1"} eq $data{"$2"}){next;}
"I"
}sie;
$string =~ s{(?<=<(.*?)>)(?<!$notag)(you)}{
"I"
}sie;

grep keyword with if-condition

I have an input file with
words;
yadda yadda;
keyword 123;
yadda;
and I want to simply get the value 123 saved as a variable. I tried a solution from here:
my $var;
open(FILE,$data.dat) or die "error on opening $data: $!\n";
while (my $line = <FILE>) {
if (/^keyword/) {
$var = $1;
print $line;
last;
}
}
close(FILE);
This isn't working and gives me following error: Use of uninitialized value $_ in pattern match (m//) at ./script.pl line 91, <FILE> line 384. (this occurs for all lines of <FILE>)
I found another solution without the if-condition which just states #string = sort grep /^keyword/,<FILE>; and works. Can you please explain to me what is happening here?
/edit
Thx for the answers and explanations! What do you think is the better/more elegant way? The grep or the if-condition?
You need the following change:
if ($line =~ m/^keyword\s+(\d+)/)
Explanation: You read into $line, hence $_ which is the default target for match is undefined.
In addition, you'd get another error with $1, because your pattern did not specify a capturing group.
$1 refers to the first capture group, but your regex doesn't contain any capture groups, so it's undefined. Try
if ($line =~ /^keyword\s+(-?(?:\d+|\d*\.\d*)(?:[Ee]-?(?:\d+|\d*\.\d*))?)/) {
Notice also that the regex is being applied to the variable containing the line you just read.
Edit: Updated to cope with numbers in scientific notation. This is a significant additional requirement which you should have specified explicitly in the first place.

Backreferences undefined after finding pattern in perl v5.14.2

I found it strange that backreferences ($1,$2,$3) were not working in my original code, so I ran this example from the web:
#!/usr/bin/perl
# matchtest2.plx
use warnings;
use strict;
$_ = '1: A silly sentence (495,a) *BUT* one which will be useful. silly (3)';
my $pattern = "silly";
if (/$pattern/) {
print "The text matches the pattern '$pattern'.\n";
print "\$1 is '$1'\n" if defined $1;
print "\$2 is '$2'\n" if defined $2;
print "\$3 is '$3'\n" if defined $3;
print "\$4 is '$4'\n" if defined $4;
print "\$5 is '$5'\n" if defined $5;
}
else {
print "'$pattern' was not found.\n";
}
Which only gave me:
The text matches the pattern 'silly'.
Why are the backreferences still undefined after the pattern was found? I am using Wubi (Ubuntu 12.04 64-bit) and my perl version is 5.14.2. Thank you in advance for your help.
You are not capturing any strings: No parentheses in your pattern. If you had done:
my $pattern = "(silly)";
You would have gotten something in $1.
In case you do not know, $1 is the text captured in the first parentheses, $2 the second parentheses, and so on.
This is expected behaviour! It is obvious that you pattern will match, so it is no suprise that the corresponding if-block is executed.
The term “backreferences” for $1, $2, ... may be slightly suboptimal, let's call them “capture groups”.
In a regex, you can enclose parts of the pattern with parens to be remembered later:
/(silly)/
This pattern has one group. The contents of this group will be stored in $1 if it matches.
All capture group variables for groups that don't exists in the pattern or were not populated are set to undef on an otherwise successfull match, so for above pattern $2, $3, ... would all be undef.

Perl: Why would eq work, when =~ doesn't?

Working code:
if ( $check1 eq $search_key ...
Previous 'buggy' code:
if ( $check1 =~ /$search_key/ ...
The words (in $check1 and $search_key) should be the same, but why doesn't the 2nd one return true all the time? What is different about these?
$check1 is acquired through a split. $search_key is either inputted before ("word") or at runtime: (<>), both are then passed to a subroutine.
A further question would be, can I convert the following with without any hidden problems?
if ($category_id eq "subj") {
I want to be able to say: =~ /subj/ so that "subject" would still remain true.
Thanks in advance.
$check1 =~ /$search_key/ doesn't work because any special characters in $search_key will be interpreted as a part of the regular expression.
Moreover, this really tests whether $check1 contains the substring $search_key. You really wanted $check1 =~ /^$search_key$/, although it's still incorrect because of the reason mentioned above.
Better stick with eq for exact string comparisons.
as mentioned before, special characters in $search_key will be interpreted, to prevent this, use \Q: if ( $check1 =~ /\Q$search_key/), which will take he content of $search_key as a literal. You can use \E to end this if ( $check1 =~ /\b\Q$search_key\E\b/) for example.
This information is in perlre
Regarding your second question, if just you want plain substring matching, you can use the index function. Then replace
if ($category_id eq "subj") {
with
if (0 <= index $category_id, "subj") {
This is a case-sensitive match.
Addition for clarafication: it will match asubj, subj, and even subjugate