Perl: Why would eq work, when =~ doesn't? - perl

Working code:
if ( $check1 eq $search_key ...
Previous 'buggy' code:
if ( $check1 =~ /$search_key/ ...
The words (in $check1 and $search_key) should be the same, but why doesn't the 2nd one return true all the time? What is different about these?
$check1 is acquired through a split. $search_key is either inputted before ("word") or at runtime: (<>), both are then passed to a subroutine.
A further question would be, can I convert the following with without any hidden problems?
if ($category_id eq "subj") {
I want to be able to say: =~ /subj/ so that "subject" would still remain true.
Thanks in advance.

$check1 =~ /$search_key/ doesn't work because any special characters in $search_key will be interpreted as a part of the regular expression.
Moreover, this really tests whether $check1 contains the substring $search_key. You really wanted $check1 =~ /^$search_key$/, although it's still incorrect because of the reason mentioned above.
Better stick with eq for exact string comparisons.

as mentioned before, special characters in $search_key will be interpreted, to prevent this, use \Q: if ( $check1 =~ /\Q$search_key/), which will take he content of $search_key as a literal. You can use \E to end this if ( $check1 =~ /\b\Q$search_key\E\b/) for example.
This information is in perlre

Regarding your second question, if just you want plain substring matching, you can use the index function. Then replace
if ($category_id eq "subj") {
with
if (0 <= index $category_id, "subj") {
This is a case-sensitive match.
Addition for clarafication: it will match asubj, subj, and even subjugate

Related

conditional substitution using hashes

I'm trying for substitution in which a condition will allow or disallow substitution.
I have a string
$string = "There is <tag1>you can do for it. that dosen't mean <tag2>you are fool.There <tag3>you got it.";
Here are two hashes which are used to check condition.
my %tag = ('tag1' => '<you>', 'tag2'=>'<do>', 'tag3'=>'<no>');
my %data = ('you'=>'<you>');
Here is actual substitution in which substitution is allowed for hash tag values not matched.
$string =~ s{(?<=<(.*?)>)(you)}{
if($tag{"$1"} eq $data{"$2"}){next;}
"I"
}sixe;
in this code I want to substitute 'you' with something with the condition that it is not equal to the hash value given in tag.
Can I use next in substitution?
Problem is that I can't use \g modifier. And after using next I cant go for next substitution.
Also I can't modify expression while matching and using next it dosen't go for second match, it stops there.
You can't use a variable length look behind assertion. The only one that is allowed is the special \K marker.
With that in mind, one way to perform this test is the following:
use strict;
use warnings;
while (my $string = <DATA>) {
$string =~ s{<([^>]*)>\K(?!\1)\w+}{I}s;
print $string;
}
__DATA__
There is <you>you can do for it. that dosen't mean <notyou>you are fool.
There is <you>you can do for it. that dosen't mean <do>you are fool.There <no>you got it.
Output:
There is <you>you can do for it. that dosen't mean <notyou>I are fool.
There is <you>you can do for it. that dosen't mean <do>I are fool.There <no>you got it.
It was simple but got my two days to think about it. I just written another substitution where it ignores previous tag which is cancelled by next;
$string = "There is <tag1>you can do for it. that dosen't mean <tag2>you are fool.There <tag3>you got it.";
my %tag = ('tag1' => '<you>', 'tag2'=>'<do>', 'tag3'=>'<no>');
my %data = ('you'=>'<you>');
my $notag;
$string =~ s{(?<=<(.*?)>)(you)}{
$notag = $2;
if($tag{"$1"} eq $data{"$2"}){next;}
"I"
}sie;
$string =~ s{(?<=<(.*?)>)(?<!$notag)(you)}{
"I"
}sie;

using variable as regex substitution parameter

I've been doing some searching and haven't found an answer. Why isn't this working?
$self->{W_CONTENT} =~ /$regex/;
print $1; #is there a value? YES
$store{URL} =~ s/$param/$1/;
Yes $1 has a value. $param is replaced however it is replaced with nothing. I'm positive $1 has a value. If I replace with text instead of "$1" it works fine. Please help!
For $1 to have a value you need to ensure that $param has parentheses () in it. i.e. The following has a problem similar to what you are explaining.
my $fred = "Fred";
$fred =~ s/red/$1/;
# $fred will now be "F"
But this works
my $fred = "Fred";
$fred =~ s/r(ed)/$1/;
# $fred will now be "Fed"
Now if you want to use the $1 from your first regex in the second one you need to copy it. Every regex evaluation resets $1 ... $&. So you want something like:
$self->{W_CONTENT} =~ /$regex/;
print $1; #is there a value? YES
my $old1 = $1;
$store{URL} =~ s/$param/$old1/;
Backreferences such as $1 shouldn't be used inside the expression; you'd use a different notation - for an overview, check out Perl Regular Expressions Quickstart.
Consider getting the value of $1 and storing it in another variable, then using that in the regex.

Perl: Debug for uninitialized s///?

I'm some trouble finding the problem with my program. Getting the error:
Use of uninitialized value in substitution (s///)
I realize this has been asked before, but that didn't help me. I realize $1 might be unitialized, but I was wondering if you guys could help me figure out why?
Here's the problem part of the code:
$one_match_ref->{'sentence'} = $1 if ($line =~ /^Parsing \[sent. \d+ len. \d+\]: \[(.+)\]/);
$one_match_ref->{'sentence'} =~ s/, / /g;
EDIT: I have declared the $one_match_ref->{'sentence'} like so:
my $sentence;
$one_match_ref = {
chapternumber => $chapternumber_value,
sentencenumber => $sentencenumber_value,
sentence => $sentence, ##Get from parsed text: remove commas
grammar_relation => $grammar_relation_value, ##Get from parsed text: split?
arg1 => $argument1, ##Get from parsed text: first_dependencyword
arg2 => $argument2 ##Get from parsed text: second_dependencyword
};
But none of these variables have anything assigned to them.
My attempts:
A. If I put: if( defined (one_match_ref->{'sentence'})) after the s///, it works. But this is cumbersome, and seems to be avoiding the problem instead of fixing it.
The last time I used that fix, it was because my loop had an "off-by-one" error, I don't think this is the case this time.
B. If I declare: my $sentence = ''; It prints, but with a lot of blank lines in between. How can I eliminate these?
EDIT: For interest and efficiency purposes: Is it better to use split to get what I want?
Thanks in advance for any help or advice. Let me know if you need an example of the file format.
Your code boils down to
my $sentence;
$one_match_ref = { sentence => $sentence };
() if ($line =~ /^Parsing \[sent. \d+ len. \d+\]: \[(.+)\]/);
$one_match_ref->{'sentence'} =~ s/, / /g;
You assign undef to $one_match_ref->{'sentence'}, then you try to remove the commas from it. That doesn't make any sense, thus the warning.
Maybe you want
my $sentence;
$one_match_ref = { sentence => $sentence };
if ($line =~ /^Parsing \[sent. \d+ len. \d+\]: \[(.+)\]/) {
$one_match_ref->{'sentence'} = $1;
$one_match_ref->{'sentence'} =~ s/, / /g;
}
I'm not sure it's $1 that's uninitialised here but rather $one_match_ref->{'sentence'}.
That value is set if and only if the line matches the regex. Otherwise it's not touched at all.
My reasoning is that it's complaining during the substitute rather than the assignment. You could possibly fix it by simply setting $one_match_ref->{'sentence'} to a known value before those two lines (such as the empty string).
But this depends on what you're actually using those values for.

How to extract a number from a string in Perl?

I have
print $str;
abcd*%1234$sdfsd..#d
The string would always have only one continuous stretch of numbers, like 1234 in this case. Rest all will be either alphabets or other special characters.
How can I extract the number (1234 in this case) and store it back in str?
This page suggests that I should use \d, but how?
If you don't want to modify the original string, you can extract the numbers by capturing them in the regex, using subpatterns. In list context, a regular expression returns the matches defined in the subpatterns.
my $str = 'abc 123 x456xy 789foo';
my ($first_num) = $str =~ /(\d+)/; # 123
my #all_nums = $str =~ /(\d+)/g; # (123, 456, 789)
$str =~ s/\D//g;
This removes all nondigit characters from the string. That's all that you need to do.
EDIT: if Unicode digits in other scripts may be present, a better solution is:
$str =~ s/[^0-9]//g;
If you wanted to do it the destructive way, this is the fastest way to do it.
$str =~ tr/0-9//cd;
translate all characters in the complement of 0-9 to nothing, delete them.
The one caveat to this approach, and Phillip Potter's, is that were there another group of digits further down the string, they would be concatenated with the first group of digits. So it's not clear that you would want to do this.
The surefire way to get one and only one group of digits is
( $str ) = $str =~ /(\d+)/;
The match, in a list context returns a list of captures. The parens around $str are simply to put the expression in a list context and assign the first capture to $str.
Personally, I would do it like this:
$s =~ /([0-9]+)/;
print $1;
$1 will contain the first group matched the given regular expression (the part in round brackets).

What is wrong with this Perl code?

$value = $list[1] ~ s/\D//g;
syntax error at try1.pl line 53, near "] ~"
Execution of try1.pl aborted due to compilation errors.
I am trying to extract the digits from the second element of #list, and store it into $value.
You mean =~, not ~. ~ is a unary bitwise negation operator.
A couple of ways to do this:
($value) = $list[1] =~ /(\d+)/;
Both sets of parens are important; only if there are capturing parentheses does the match operation return actual content instead of just an indication of success, and then only in list context (provided by the list-assign operator ()=).
Or the common idiom of copy and then modify:
($value = $list[1]) =~ s/\D//;
maybe you wanted the =~ operator?
P.S. note that $value will not get assigned the resulting string (the string itself is changed in place). $value will get assigned the number of substitutions that were made
You said in a comment that are trying to get rid of non-digits. It looks like you are trying to preserve the old value and get the modified value in a new variable. The Perl idiom for that is:
( my $new = $old ) =~ s/\D//g;
And wanted \digits not non-\Digits. And have a superfluous s/ubstitute operator where a match makes more sense.
if ($list[1] =~ /(\d+)/) {
$value = $1;
}