How to use a variable in a substitution? - perl

I've a text file and I want to match and erase the following text (please note the newline):
[ From:
http://www.website.com ]
The following code works
$text =~ s/\[.*\]//ms;
This other doesn't
my $patt = \[.*\];
$text =~ s/$patt//ms;
Would someone be so kind to explain me why?
Thanks in advance

The second variant works perfectly, if you quote the pattern string and get rid of syntax error:
#!/usr/bin/perl
use strict;
use warnings;
my $text = qq{a[ From:
http://www.website.com ]b};
my $patt = qr/\[.*?\]/s;
$text =~ s/$patt//;
print $text;
Prints:
ab
I added ? quantifier to the regexp to make the replacement ungreedy. And removed m modifier, because you are not using ^ and $ in your regexp, so m is useless.

The only reason your variation isn't working is that you haven't put quotes around your $patt string. As it is it throws a syntax error. This works fine
my $patt = '\[.*\]';
$text =~ s/$patt//ms;
My only comment is that the /m modifier is superfluous as it modifies the behaviour of the $ and ^ anchors, which you aren't using here. Only /s is necessary to make the . match newline characters.

Related

How do you match \'

I need a regex to match \' <---- literally backslash apostrophe.
my $line = '\'this';
$line =~ s/(\o{134})(\o{047})/\\\\'/g;
$line =~ s/\\'/\\\\'/g;
$line =~ s/[\\][']/\\\\'/g;
printf('%s',$line);
print "\n";
All I get out of this is
'this
When what I want is
\\'this
This occurs whether the string is declared using ' or ". This was a test script for tracking down a file parsing bug. I wanted to confirm that the regex was working as expected.
I don't know if when the backslash apostrophe is parsed by the regex it is not treated as 2 characters, but is instead treated as an escaped apostrophe.
Either way. what is the best way to match \' and print out \\'? I don't want to escape any other back slashes or apostrophes and I can't change the text I am parsing, just the way it is handled and outputted.
s/\\'/\\\\'/g
All three of your patterns match a backslash followed by a quote, the above being the simplest.
Your testing was in vain because your string doesn't contain any backslashes. Both string literals "\'this" (from earlier edit) and '\'this' (from later edit) produce the string 'this.
say "\'this"; # 'this
say '\'this'; # 'this
To produce the string \'this, you could use either of the following string literals (among others):
"\\'this"
'\\\'this'
say "\\'this"; # \'this
say '\\\'this'; # \'this
The answer is, of course
s/[\\][']/\\\\'/g
This will match
\'this
And substitute with this
\\'this
This was the only way I could get it to work.
Perl
Too much "regexing" in your snippet. Try:
my $line = '\'this';
$line =~ s/'/\\\\\'/g;
printf('%s',$line);
print "\n";
# \\'this
or... if you want another mode:
my $line = '\'this';
$line =~ s/'/\\'/g;
printf('%s',$line);
print "\n";
# \'this

Using binding operator in perl

I am working on a program in perl and I am trying to combine more than one regex in a binding operator. I have tried using the syntax below but it's not working. I will like to know if there is any other way to go with this.
$in =~ (s/pattern/replacement/)||(s/pattern/replacement/)||...
You can often get a clue about what the Perl makes of some code using B::Deparse.
$ perl -MO=Deparse -E'$in =~ (s/pattern1/replacement1/)||(s/pattern2/replacement2/)'
[ ... snip ... ]
s/pattern2/replacement2/u unless $in =~ s/pattern1/replacement1/u;
-e syntax OK
So it's attempting your first substitution on $in. And if that fails, it is then trying your second substitution. But it's not using $in for the second substitution, it's using $_ instead.
You're running up against precedence issues here. Perl interprets your code as:
($in =~ s/pattern1/replacement1/) or (s/pattern2/replacement2/)
Notice that the opening parenthesis has moved before $in.
As others have pointed out, it's best to use a loop approach here. But I thought it might be useful to explain why your version didn't work.
Update: To be clear, if you wanted to use syntax like this, then you would need:
($in =~ s/pattern1/replacement1/) or
($in =~ s/pattern2/replacement2/);
Note that I've included $in =~ in each expression. At this point, it becomes obvious (I hope) why the looping solution is better.
However, because or is a short-circuiting operator, this statement will stop after the first successful substitution. I assumed that's what you wanted from your use of it in your original code. If that's not what you want, then you need to either switch to using and or (better, in my opinion) break them out into separate statements.
$in =~ s/pattern1/replacement1/;
$in =~ s/pattern2/replacement2/;
The closest you could get with a syntax looking similar to that would be
s/one/ONE/ or
s/two/TWO/ or
...
s/ten/TEN/ for $str;
This will attempt each substitution in turn, once only, stopping after the first successful one.
Use for to "topicalize" (alias $_ to your variable).
for ($in) {
s/pattern/replacement/;
s/pattern/replacement/;
}
A simpler way might be to create an array of all such patterns and replacements, then simply iterate through your array applying the substitution one pattern at a time.
my $in = "some string you want to modify";
my #patterns = (
['pattern to match', 'replacement string'],
# ...
);
$in = replace_many($in, \#patterns);
sub replace_many {
my ($in, $replacements) = #_;
foreach my $replacement ( #$replacements ) {
my ($pattern, $replace_string) = #$replacement;
$in =~ s/$pattern/$replace_string/;
}
return $in;
}
It's not at all clear what you need, and it's not at all clear that you can accomplish what you appear to want by the means you suggest. The OR operator is a short circuit operator, and you may not want this behavior. Please give an example of the input you expect, and the output you desire, hopefully several examples of each. Meanwhile, here is a test script.
use warnings;
use strict;
my $in1 = 'George Walker Bush';
my $in2 = 'George Walker Bush';
my $in3 = 'George Walker Bush';
my $in4 = 'George Walker Bush';
(my $out1 = $in1) =~ s/e/*/g;
print "out1 = $out1 \n";
(my $out2 = $in2) =~ s/Bush/Obama/;
print "out2 = $out2 \n";
(my $out3 = $in3) =~ s/(George)|(Bush)/Obama/g;
print "out3 = $out3\n";
$in4 =~ /(George)|(Walker)|(Bush)/g;
print "$1 - $2 - $3\n";
exit(0);
You will notice in the last case that only the first OR operator matches in the regular expression. If you wanted to replace 'George Walker Bush' with Barack Hussein Obama', you could do that easily enough, but you would also replace 'George Washington'with 'Barack Washington' - is this what you want? Here is the output of the script:
out1 = G*org* Walk*r Bush
out2 = George Walker Obama
out3 = Obama Walker Obama
Use of uninitialized value $2 in concatenation (.) or string at pq_151111a.plx line 19.
Use of uninitialized value $3 in concatenation (.) or string at pq_151111a.plx line 19.
George - -

How to truncate the extension for special case in Perl?

I'm working on a script to truncate all the extensions for a file using the regex as below but it seem doesn't works well as this command does remove some data that I want as it will basically removing everything whenever it see a dot.
The regex I use currently:-
/\..*?$/
It would remove some files like
b10_120.00c.current.all --> b10_120
abc_10.77.log.bac.temp.ls --> abc_10
but I'm looking for an output in b10_120.00c and abc_10.77
Aside from that, is there a way to printout the output such as it keep certain extension only? Such as for the above 2 examples, it will displays b10_120.00c.current and abc_10.77.log. Thank you very much.
The following will strip file name extensions off:
s/\.[^.]+$//;
Explanation
\. matches a literal .
[^.]+ matches every character that is not a .
$ till end of string
Update
my ($new_file_name) = ( $file_name =~ m/^( [^.]+ \. [^.]+ )/x );
Explanation
^ anchor at the start of the string
[^.]+ matches every character that is not a .
\. matches a literal .
[^.]+ matches every character that is not a .
Test
#!/usr/bin/env perl
use strict;
use warnings;
use Test::More 'tests' => 2;
my %file_name_map = (
'b10_120.00c.current.all' => 'b10_120.00c',
'abc_10.77.log.bac.temp.ls' => 'abc_10.77',
);
sub new_file_name {
my $file_name = shift;
my ($new_file_name) = ( $file_name =~ m/^( [^.]+ \. [^.]+ )/x );
return $new_file_name;
}
for my $file_name ( keys %file_name_map ) {
is $file_name_map{$file_name}, new_file_name($file_name),
"Got $file_name_map{$file_name}";
}
$file =~ s/(\.[^.]+).*/$1/; # SO requires 30 chars in answer, that is stupid
You should use \. for the dot in the regular expression.
Also please explain in more details how you want to process file name.
Instead of a regex, I would suggest using this package:
http://perldoc.perl.org/File/Basename.html

How to get rid of control characters in perl.. specifically [gs]?

my code is as follows
my $string = $cells[71];
print $string;
this prints the string but where spaces should be there is a box with 01 10 in it. I opened it in Notepad++ and the box turned into a black GS (which i am assuming is group separator).
I looked online and it said to use:
s/[^[:print:]]+//g
but when i set the string to:
my $string =~s/[^[:print:]]+//g
and I run the program i get:
4294967295
How do i resolve this?
I did what HOBBS said and it worked... thanks :)
Is there anyway I could print an enter where each of these characters are ( the box with 1001)?
When doing a regex match, you need to be careful to write $var =~ /pattern/, not $var = ~ /pattern/. When you use the second one, you're doing /pattern/, which is a regex match against $_, returning a number in scalar context. Then you do ~, which takes the bitwise inverse of that number, then ($var =) you assign that result to $var. Not what you wanted at all.
You have to assign the variable first, then do the substitution:
my $string = $cells[71];
$string =~ s/[^[:print:]]+//g;

How to replace a $ in a string in perl script

I have a string,
$str = abc#$and#def
I tried to replace '$' with it's hex value using ,
$str=~s/$/%26/g
But the output is abc#.
This might be because '$'is considered as the end of the line or string.
Please let me know
Your problem is nothing to do with your substitution; when you are assigning to the string in the first place:
$str = "abc#$and#def";
$and and #def are treated as variables to interpolate.
You need to escape the sigils or use single quotes (which don't interpolate variables):
$str = 'abc#$and#def';
# or
$str = "abc#\$and\#def";
And you really really need to enable warnings, which would have told you your assignment was the problem.
You need to escape the $ with a \
$str =~ s/\$/%26/g
Try escaping the $ with a \:
$str =~ s/\$/%26/g;
Ron