Perl delete a character from a string unless it is a duplicate - perl

What is the cleanest/simplest way to remove a particular character from a string unless it is repeated once.
For example given the following string:
'it''s 'simple'
I expect:
it's simple
For my usage there shall never be more than two ' characters in a row.

Use a negative look-ahead assertion.
#!/use/bin/perl
use strict;
use warnings;
use feature 'say';
$_ = "'it''s simple'";
say;
s/'(?!')//g;
say;
'(?!') means "a single quote that isn't followed immediately by another single quote".
Output:
'it''s simple'
it's simple

use warnings;
use strict;
my $text = q!'it''s 'simple'!;
$text =~ s/'('?)/$1/g;
print "$text\n";
So in the regex '('?), it will match - and remove - the first ' and if followed by another, will capture it and place it in the result.
This version will handle each one or two apostrophe group separately (because OP used the term "duplicate" instead of "multiple"). If you want to replace any 2+ apostrophe sequence with a single one, use the regex '('?)'* instead.

Related

In Perl, can you use a variable for the whole of a match string?

I'm new to Perl, though not to programming, and am working through Learning Perl. The book has exercises to match successive lines of a small text file.
I had the idea of supplying match strings from STDIN, and going through the file for each one:
while(<STDIN>) {
chomp;
$regex = $_;
seek JUNK, 0, 0;
while(<JUNK>) {
chomp();
if(/$regex/) {
say;
}
}
say '';
}
This works fine, but I can't find a way to interpolate an entire match string, e.g.
/fred/i
into the predicate. I tried
if($$matcher) # with $matcher = '/fred/'
but Perl complained.
I imagine this is my ignorance, and should welcome enlightenment.
Statement modifiers, such as /i, are a part of the code telling Perl how to perform the match, not a part of the pattern to be matched. This is why that doesn't work for you.
You have three ways to work around this (well, probably more, since this is Perl we're talking about, but three ways that I can think of straight off):
1) Use extended regex syntax and, when you want a case-insensitive match, enter (?i:fred), as suggested in comments on the question.
2) Use string eval to allow the use of the regular statement modifiers: if (eval "$_ =~ $regex") { say } Note that this method will require you to also type the surrounding slashes. e.g., You'd have to enter /fred/i; just typing in fred would not work. Note also that it's a huge security hole to do this without validating your input first, since the user's entered text is executed as Perl code, just as if it were part of the original program. (Imagine if the user entered //, system("rm -rf /") - it would test against an empty regex, then delete all the files on your computer.) So probably not a recommended approach unless you really know what you're doing and/or you're the only one who will ever run the program.
3) The most complex, but also most correct, solution is to write a parser which inspects the user's entered string to see whether any special flags are present and then responds accordingly. A very simple example which allows the user to append /i for a case-insensitive search:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
while(<STDIN>) {
chomp;
my #parts = split '/', $_;
# If the user input starts with a /, the first part will be empty, so throw
# it away.
shift #parts unless $parts[0];
my $re = shift #parts;
my %flags;
for (#parts) {
for (split '') {
$flags{i} = 1 if $_ eq 'i';
}
}
my $f = join '', keys %flags;
say "Matched" if eval qq('foo' =~ /$re/$f);
}
This also uses string eval, so it is potentially vulnerable to the same kind of security issues as #2, but $re cannot contain any / characters (the split '/' would have ended $re immediately prior to the first /), which prevents code from being inserted there and $f can contain only the letter i (or any other flags you might choose to recognize if you expand on this). So it should be safe. (But, if anyone can demonstrate an exploit I missed, please tell me about it in comments!)
Problem
What you are trying to do can be summarized by:
my $regex = '/fred/i';
my #lines = (
'A line containing some words and Fred said Hello.',
'Another line. Here is a regex embedded in the line: /fred/i',
);
for ( #lines ) {
say if /$regex/;
}
Output:
Another line. Here is a regex embedded in the line: /fred/i
We see that the second line matches $regex, whereas we wanted the first line containing Fred to match the string fred with the (case insensitive) i flag added to the regex. The problem is that the characters / and i in $regex are taken as characters to be matched literally, i.e., they are not interpreted as special characters surrounding a Regex (as part of a Perl expression).
Note:
The character / is special as part of a Perl expression for a regular expression, but it is not special inside the Regex pattern. There are however characters that are special inside the pattern, the so-called meta characters:
\ | ( ) [ { ^ $ * + ? .
see perldoc quotemeta for more information.
A solution using extended patterns
Simply change the first line to:
my $regex = '(?i)fred'; # or alternatively: (?i:fred)
Regex flags can be added to a regex pattern using "Extended patterns" described in the manual perldoc perlre :
Extended Patterns
The syntax for most of these is a pair of parentheses with a question
mark as the first thing within the parentheses. The character after
the question mark indicates the extension.
[...]
(?adlupimnsx-imnsx)
(?^alupimnsx)
One or more embedded pattern-match modifiers, to be turned on (or
turned off if preceded by "-" ) for the remainder of the pattern or
the remainder of the enclosing pattern group (if any). This is
particularly useful for dynamically-generated patterns, such as those
read in from a configuration file, taken from an argument, or
specified in a table somewhere.
[...]
These modifiers are restored at the end of the enclosing group.
Alternatively the non-capturing form can be used:
(?:pattern)
(?adluimnsx-imnsx:pattern)
(?^aluimnsx:pattern)
This is for clustering, not capturing; it groups subexpressions like
"()" , but doesn't make backreferences as "()" does.
The question has been answered in the following comment:
Try (?i:fred), see Extended
patterns in
perldoc perlre for more information
– Håkon Hægland 7 hours ago.

perl pattern matching one by one and process it

I have a string
[something]text1[/something] blah blah [something]text2[/something]
I need to write a Perl script to read what is in the [something] tag, process it to "text-x", and put it back with an [otherthing] tag. So the above string should be
[otherthing]text-1[/otherthing] blah blah [otherthing]text-2[/otherthing]
Processing "textx" to "text-x" is not one step process.
So this is solution that I have till now:
m/[something](?<text>.*)[/something]/
This will get me the string in between and I can process that to "text-x" but how do I put it back in the same place with [otherthing]text-x[/otherthing]?
How do I use s/// in this case?
How to do it for the whole string one by one ?
You can use the /e switch on s/// to evaluate the right hand side before using the result as the substitution, and the /g flag to do this for every match.
Here is a simple example:
use 5.12.0;
my $str = ">1< >2< >34<";
$str =~ s/>(\d+)</">>".process("$1")."<<"/eg;
say $str;
sub process {
return "x" x $_[0];
}
This should come close. It uses the /e modifier to allow you to do processing in the replacement side of the regex and so it calls the fix_textx function where you can do multiple steps.
The normal way of iterating over matches is with the /g modifier.
#!/usr/bin/perl
use strict;
use warnings;
my $string = '[something]text1[/something] blah blah [something]text2[/something]';
$string =~ s{\[something\](text[^[]*)\[\/something\]}
{'[otherthing]' . fix_textx($1) . '[/otherthing]'}ge;
print $string;
sub fix_textx {
my ($testx) = #_;
$testx =~ s/text\K(.*)/-$1/;
return $testx;
}
EDIT: fixed the square bracket. Thanks #tadmc
In this particular case, you can accomplish what you're trying to do by splitting the string on "[something]" and then processing the beginning of each piece (except the first one), then joining the pieces back together when you're done.
I don't know if there is a general way to iterate over the regex matches in a string in Perl. I'm hoping someone else will answer this question and educate me on that.

I want to create a perl code to extract what is in the parentheses and port it to a variable

I want to create a perl code to extract what is in the parentheses and port it to a variable.
"(05-NW)HPLaserjet" should become "05-NW"
Something like this:
Catch "("
take out any spaces that exsist in between ()
everything in between () = variable 1
How would I go about doing this?
This is a job for regular expressions. Looks confusing because parens are used as meta characters in regular expression and are also part of the pattern in your example, escaped by backslashes.
C:\temp $ echo (05-NW)HPLaserjet | perl -nlwe "print for m/\(([^)]+)\)/g"
Match opening paren, start capture group, match one or more characters that aren't the closing paren, close capture group, match closing paren.
You can use regular expressions (see perlretut) to match and capture the value. By assigning to a list, you can put your captures into named variables. The global variables $1, $2 etc. are also used for capture groups, so you can use that instead of list assignment if you like.
use strict;
use warnings;
while (<>) # read every line
{
my ($printer_code) = m/
\( # Match literal opening parenthesis
([^\)]*) # Capture group (printer_code): Match characters which aren't right parenthesis, zero or more times
\)/x; # Match literal closing parenthesis
# The 'x' modifier allows you to add whitespace and comments to regex for clarity.
# If you use it, make sure you use '\ ' (or '\s', etc.) for actual literal whitespace matching!
}
__DATA__
(05-NW)HPLaserjet
perldoc perlre
use warnings;
use strict;
my $s = '(05-NW)HPLaserjet';
my ($v) = $s =~ /\((.*)\)/; # Grab everything between parens (including other parens)
$v =~ s/\s//g; # Remove all whitespace
print "$v\n";
__END__
05-NW
See also: Perl Idioms Explained - #ary = $str =~ m/(stuff)/g

how to replace the special character with escape character

my $c= 'ODD_`!"£$%^&*(){}][##;:/?.>,<|\'
I want to replace all of them into as special character
how achiveve this in faster way ..
my $c= 'ODD_\`\!\"\£\$\%\^\&\*\(\)\{\}\]\[\#\,\;\:\/\?\.\>\,\<\|\\'
Use quotemeta:
#!/usr/bin/env perl
use warnings; use strict;
my $c = 'ODD_`!"£$%^&*(){}][##;:/?.>,<|\\';
print quotemeta($c), "\n";
Note that your definition of $c would not compile as you have to escape \ even in single quoted strings.
While I think that Sinan's answer is correct for what I am assuming you need (based on your list of characters to escape), for completeness I will add the module URI::Escape which escapes the metacharacters in HTML text. It does seem that it has some facility to specify the unsafe characters though, so perhaps it could help you too.

Removing text inside parens, but not the parens in Perl

OK, I got a weird one that I've been jamming on for awhile (fri afternoon mind does not work I guess).
Does anyone know of a away to parse a string and remove all of the text inside parens without removing the parens themselves...but with deleting parens found inside.
ie.
myString = "this is my string (though (I) need (help) fixing it)"
after running it through what I want it would look like:
myString = "this is my string ()"
very important to keep those two parens there.
The module Regexp::Common deals with more than 1 top level of parentheses.
use strict;
use warnings;
use Regexp::Common qw/balanced/;
my #strings = (
'111(22(33)44)55',
'a(b(c(d)(e))f)g(h)((i)j)',
'this is my string (though (I) need (help) fixing it)',
);
s/$RE{balanced}{-parens=>'()'}/()/g for #strings;
print "$_\n" for #strings;
Output:
111()55
a()g()()
this is my string ()
You need to escape the parentheses to prevent them from starting a capture group. The pattern \(.+\) match the longest substring that starts with a ( and ends with a ). That will gobble up everything up to the last ) including any intervening parentheses. Finally, we replace that string with one containing just ():
#!/usr/bin/perl
use strict; use warnings;
my $s = "this is my string (though (I) need (help) fixing it)";
$s =~ s{\(.+\)}{()};
print "$s\n";
If you want to use Regular Expressions without using Regexp::Common. Look at the "Look Around" Feature. It was introduced with Perl 5.
You can read more about "Look Ahead" and "Look Behind" at regular-expressions.info.
There is also a section on "Look Around" in the "Mastering Regular Expressions" book. Look on page 59.
#!/usr/bin/env perl
use Modern::Perl;
my $string = 'this is my (string (that)) I (need help fixing)';
$string =~ s/(?<=\()[^)]+[^(]+(?=\))//g;
say $string;