Removing text inside parens, but not the parens in Perl - perl

OK, I got a weird one that I've been jamming on for awhile (fri afternoon mind does not work I guess).
Does anyone know of a away to parse a string and remove all of the text inside parens without removing the parens themselves...but with deleting parens found inside.
ie.
myString = "this is my string (though (I) need (help) fixing it)"
after running it through what I want it would look like:
myString = "this is my string ()"
very important to keep those two parens there.

The module Regexp::Common deals with more than 1 top level of parentheses.
use strict;
use warnings;
use Regexp::Common qw/balanced/;
my #strings = (
'111(22(33)44)55',
'a(b(c(d)(e))f)g(h)((i)j)',
'this is my string (though (I) need (help) fixing it)',
);
s/$RE{balanced}{-parens=>'()'}/()/g for #strings;
print "$_\n" for #strings;
Output:
111()55
a()g()()
this is my string ()

You need to escape the parentheses to prevent them from starting a capture group. The pattern \(.+\) match the longest substring that starts with a ( and ends with a ). That will gobble up everything up to the last ) including any intervening parentheses. Finally, we replace that string with one containing just ():
#!/usr/bin/perl
use strict; use warnings;
my $s = "this is my string (though (I) need (help) fixing it)";
$s =~ s{\(.+\)}{()};
print "$s\n";

If you want to use Regular Expressions without using Regexp::Common. Look at the "Look Around" Feature. It was introduced with Perl 5.
You can read more about "Look Ahead" and "Look Behind" at regular-expressions.info.
There is also a section on "Look Around" in the "Mastering Regular Expressions" book. Look on page 59.
#!/usr/bin/env perl
use Modern::Perl;
my $string = 'this is my (string (that)) I (need help fixing)';
$string =~ s/(?<=\()[^)]+[^(]+(?=\))//g;
say $string;

Related

Perl delete a character from a string unless it is a duplicate

What is the cleanest/simplest way to remove a particular character from a string unless it is repeated once.
For example given the following string:
'it''s 'simple'
I expect:
it's simple
For my usage there shall never be more than two ' characters in a row.
Use a negative look-ahead assertion.
#!/use/bin/perl
use strict;
use warnings;
use feature 'say';
$_ = "'it''s simple'";
say;
s/'(?!')//g;
say;
'(?!') means "a single quote that isn't followed immediately by another single quote".
Output:
'it''s simple'
it's simple
use warnings;
use strict;
my $text = q!'it''s 'simple'!;
$text =~ s/'('?)/$1/g;
print "$text\n";
So in the regex '('?), it will match - and remove - the first ' and if followed by another, will capture it and place it in the result.
This version will handle each one or two apostrophe group separately (because OP used the term "duplicate" instead of "multiple"). If you want to replace any 2+ apostrophe sequence with a single one, use the regex '('?)'* instead.

perl pattern matching one by one and process it

I have a string
[something]text1[/something] blah blah [something]text2[/something]
I need to write a Perl script to read what is in the [something] tag, process it to "text-x", and put it back with an [otherthing] tag. So the above string should be
[otherthing]text-1[/otherthing] blah blah [otherthing]text-2[/otherthing]
Processing "textx" to "text-x" is not one step process.
So this is solution that I have till now:
m/[something](?<text>.*)[/something]/
This will get me the string in between and I can process that to "text-x" but how do I put it back in the same place with [otherthing]text-x[/otherthing]?
How do I use s/// in this case?
How to do it for the whole string one by one ?
You can use the /e switch on s/// to evaluate the right hand side before using the result as the substitution, and the /g flag to do this for every match.
Here is a simple example:
use 5.12.0;
my $str = ">1< >2< >34<";
$str =~ s/>(\d+)</">>".process("$1")."<<"/eg;
say $str;
sub process {
return "x" x $_[0];
}
This should come close. It uses the /e modifier to allow you to do processing in the replacement side of the regex and so it calls the fix_textx function where you can do multiple steps.
The normal way of iterating over matches is with the /g modifier.
#!/usr/bin/perl
use strict;
use warnings;
my $string = '[something]text1[/something] blah blah [something]text2[/something]';
$string =~ s{\[something\](text[^[]*)\[\/something\]}
{'[otherthing]' . fix_textx($1) . '[/otherthing]'}ge;
print $string;
sub fix_textx {
my ($testx) = #_;
$testx =~ s/text\K(.*)/-$1/;
return $testx;
}
EDIT: fixed the square bracket. Thanks #tadmc
In this particular case, you can accomplish what you're trying to do by splitting the string on "[something]" and then processing the beginning of each piece (except the first one), then joining the pieces back together when you're done.
I don't know if there is a general way to iterate over the regex matches in a string in Perl. I'm hoping someone else will answer this question and educate me on that.

How does split work here?

$string = 'a=1;b=2';
use Data::Dumper;
#array = split("; ?", $string);
print Dumper(\#array);
output:
$VAR1 = [
'a=1',
'b=2'
];
Anyone knows how "; ?" work here?It's not regex, but works quite like regex,so I don't understand.
I think it means "semicolon followed by optional space (just one or zero)".
It's not regex, but works quite like regex,so I don't understand.
The pattern parameter to split is always treated as a regular expression (would be better to not use a string, though). The only exception is the "single space", which is taken to mean "split on whitespace"
The first parameter of split is a regex. So I'd rather write split /; ?/, $string;.
When you use a string for the first parameter, it just means the regex can vary and has to be compiled anew each time the split is run. See perldoc -f split for details.
The regex could be read; the character ";" optionally followed by a space. See perlretut and perlreref for details.
A semicolon (the ;) followed by an optional (the ?) space (the ).

I want to create a perl code to extract what is in the parentheses and port it to a variable

I want to create a perl code to extract what is in the parentheses and port it to a variable.
"(05-NW)HPLaserjet" should become "05-NW"
Something like this:
Catch "("
take out any spaces that exsist in between ()
everything in between () = variable 1
How would I go about doing this?
This is a job for regular expressions. Looks confusing because parens are used as meta characters in regular expression and are also part of the pattern in your example, escaped by backslashes.
C:\temp $ echo (05-NW)HPLaserjet | perl -nlwe "print for m/\(([^)]+)\)/g"
Match opening paren, start capture group, match one or more characters that aren't the closing paren, close capture group, match closing paren.
You can use regular expressions (see perlretut) to match and capture the value. By assigning to a list, you can put your captures into named variables. The global variables $1, $2 etc. are also used for capture groups, so you can use that instead of list assignment if you like.
use strict;
use warnings;
while (<>) # read every line
{
my ($printer_code) = m/
\( # Match literal opening parenthesis
([^\)]*) # Capture group (printer_code): Match characters which aren't right parenthesis, zero or more times
\)/x; # Match literal closing parenthesis
# The 'x' modifier allows you to add whitespace and comments to regex for clarity.
# If you use it, make sure you use '\ ' (or '\s', etc.) for actual literal whitespace matching!
}
__DATA__
(05-NW)HPLaserjet
perldoc perlre
use warnings;
use strict;
my $s = '(05-NW)HPLaserjet';
my ($v) = $s =~ /\((.*)\)/; # Grab everything between parens (including other parens)
$v =~ s/\s//g; # Remove all whitespace
print "$v\n";
__END__
05-NW
See also: Perl Idioms Explained - #ary = $str =~ m/(stuff)/g

Why does this base64 string comparison in Perl fail?

I am trying to compare an encode_base64('test') to the string variable containing the base64 string of 'test'. The problem is it never validates!
use MIMI::Base64 qw(encode_base64);
if (encode_base64("test") eq "dGVzdA==")
{
print "true";
}
Am I forgetting anything?
Here's a link to a Perlmonks page which says "Beware of the newline at the end of the encode_base64() encoded strings".
So the simple 'eq' may fail.
To suppress the newline, say encode_base64("test", "") instead.
When you do a string comparison and it fails unexpectedly, print the strings to see what is actually in them. I put brackets around the value to see any extra whitespace:
use MIME::Base64;
$b64 = encode_base64("test");
print "b64 is [$b64]\n";
if ($b64 eq "dGVzdA==") {
print "true";
}
This is a basic debugging technique using the best debugger ever invented. Get used to using it a lot. :)
Also, sometimes you need to read the documentation for things a couple time to catch the important parts. In this case, MIME::Base64 tells you that encode_base64 takes two arguments. The second argument is the line ending and defaults to a newline. If you don't want a newline on the end of the string you need to give it another line ending, such as the empty string:
encode_base64("test", "")
Here's an interesting tip: use Perl's wonderful and well-loved testing modules for debugging. Not only will that give you a head start on testing, but sometimes they'll make your debugging output a lot faster. For example:
#!/usr/bin/perl
use strict;
use warnings;
use Test::More 0.88;
BEGIN { use_ok 'MIME::Base64' => qw(encode_base64) }
is( encode_base64("test", "dGVzdA==", q{"test" encodes okay} );
done_testing;
Run that script, with perl or with prove, and it won't just tell you that it didn't match, it will say:
# Failed test '"test" encodes okay'
# at testbase64.pl line 6.
# got: 'gGVzdA==
# '
# expected: 'dGVzdA=='
and sharp-eyed readers will notice that the difference between the two is indeed the newline. :)