perl pattern matching one by one and process it - perl

I have a string
[something]text1[/something] blah blah [something]text2[/something]
I need to write a Perl script to read what is in the [something] tag, process it to "text-x", and put it back with an [otherthing] tag. So the above string should be
[otherthing]text-1[/otherthing] blah blah [otherthing]text-2[/otherthing]
Processing "textx" to "text-x" is not one step process.
So this is solution that I have till now:
m/[something](?<text>.*)[/something]/
This will get me the string in between and I can process that to "text-x" but how do I put it back in the same place with [otherthing]text-x[/otherthing]?
How do I use s/// in this case?
How to do it for the whole string one by one ?

You can use the /e switch on s/// to evaluate the right hand side before using the result as the substitution, and the /g flag to do this for every match.
Here is a simple example:
use 5.12.0;
my $str = ">1< >2< >34<";
$str =~ s/>(\d+)</">>".process("$1")."<<"/eg;
say $str;
sub process {
return "x" x $_[0];
}

This should come close. It uses the /e modifier to allow you to do processing in the replacement side of the regex and so it calls the fix_textx function where you can do multiple steps.
The normal way of iterating over matches is with the /g modifier.
#!/usr/bin/perl
use strict;
use warnings;
my $string = '[something]text1[/something] blah blah [something]text2[/something]';
$string =~ s{\[something\](text[^[]*)\[\/something\]}
{'[otherthing]' . fix_textx($1) . '[/otherthing]'}ge;
print $string;
sub fix_textx {
my ($testx) = #_;
$testx =~ s/text\K(.*)/-$1/;
return $testx;
}
EDIT: fixed the square bracket. Thanks #tadmc

In this particular case, you can accomplish what you're trying to do by splitting the string on "[something]" and then processing the beginning of each piece (except the first one), then joining the pieces back together when you're done.
I don't know if there is a general way to iterate over the regex matches in a string in Perl. I'm hoping someone else will answer this question and educate me on that.

Related

find and replace text in a perl script

so I know that there are many ways of doing what I am asking for, but all that I have found is not helping me for what I am trying to do.
It is supposed to be a simple find and replace script using stdin and stdout.
I have a script called replace.pl and this is what i have in it:
#!/usr/bin/perl
use strict;
use warnings;
while(<STDIN>){
$_ = s/$ARGV[0]/$ARGV[1]/g;
print STDOUT $_;
}
When I run echo "replace a with b please" | replace.pl 'a' 'b'
all I get back is a "1". My desire output is "replace b with b please" but what ever I try to do, it is not changing it. Could any one tell me what i am doing wrong here?
Try
s/$ARGV[0]/$ARGV[1]/g;
instead of
$_ = s/$ARGV[0]/$ARGV[1]/g;
as s/// returns 1 when substitution was successful.
You can also quote search pattern if it should be literal string (not a regex),
s/\Q$ARGV[0]\E/$ARGV[1]/g;
From perlop:
s/PATTERN/REPLACEMENT/msixpodualgcer
Searches a string for a pattern, and if found, replaces that pattern with the replacement text and returns the number of substitutions made. Otherwise it returns false (specifically, the empty string).
That's why the code sets $_ = 1. You just want to do s/$ARGV[0]/$ARGV[1]/g; for its substitution side-effect, without assigning its return value to $_.
while (<STDIN>) {
s/$ARGV[0]/$ARGV[1]/g;
print;
}

Preserving backslashes in Perl strings

Is there a way in Perl to preserve and print all backslashes in a string variable?
For example:
$str = 'a\\b';
The output is
a\b
but I need
a\\b
The problem is can't process the string in any way to escape the backslashes because
I have to read complex regular expressions from a database and don't know in which combination and number they appear and have to print them exactly as they are on a web page.
I tried with template toolkit and html and html_entity filters. The only way it works so far is to use a single quoted here document:
print <<'XYZ';
a\\b
XYZ
But then I can't interpolate variables which makes this solution useless.
I tried to write a string to a web page, into file and on the shell, but no luck, always one backslash disappears. Maybe I am totally on the wrong track, but what is the correct way to print complex regular expressions including backslashes in all combinations and numbers without any changes?
In other words:
I have a database containing hundreds of regular expressions as string data. I want to read them with perl and print them on a web page exatly as they are in the database.
There are all the time changes to these regular expressions by many administrators so I don't know in advance how and what to escape.
A typical example would look like this:
'C:\\test\\file \S+'
but it could change the next day to
'\S+ C:\\test\\file'
Maybe a correct conclusion would be to escape every backslash exactly one time no matter in which combination and in which number it appears? This would mean it works to double them up. Then the problem isn't as big as I feared. I tested it on the bash and it works with two and even three backslashes in a row (4 backslaches print 2 ones and 6 backslashes print 3 ones).
The backslash only has significance to Perl when it occurs in Perl source code, e.g.: your assignment of a literal string to a variable:
my $str = 'a\\b';
However, if you read data from a file (or a database or socket etc) any backslashes in the data you read will be preserved without you needing to take any special steps.
my $str = 'a\\b';
print $str;
This prints a\\b.
Use
my $str = 'a\\\\b';
instead
It's a PITA, but you will just have to double up the backslashes, e.g.
a\\\\b
Otherwise, you could store the backslash in another variable, and interpolate that.
The minimum to get two slashes is (unfortunately) three slashes:
use 5.016;
my $a = 'a\\\b';
say $a;
The problem I tried to solve does not exist. I confused initializing a string directly in the code with using the html forms. Using a string inside the code preserving all backslashes is only possible either with a here document or by reading a textfile containing the string. But if I just use the html form on a web page to insert a string and use escapeHTML() from the CGI module it takes care of all and you can insert the most wired combinations of special characters. They all get displayed and preserved exactly as inserted. So I should have started directly with html and database operations instead of trying to examine things first
by using strings directly in the code. Anyway, thanks for your help.
You can use the following regular expression to form your string correctly:
my $str = 'a\\b';
$str =~ s/\\/\\\\/g;
print "$str\n";
This prints a\\b.
EDIT:
You can use non-interpolating here-document instead:
my $str = <<'EOF';
a\\b
EOF
print "$str\n";
This still prints a\\b.
Grant's answer provided the hint I needed. Some of the other answers did not match Perl's operation on my system so ...
#!/usr/bin/perl
use warnings;
use strict;
my $var = 'content';
print "\'\"\N{U+0050}\\\\\\$var\n";
print <<END;
\'\"\N{U+0050}\\\\\\$var\n
END
print '\'\"\N{U+0050}\\\\\\$var\n'.$/;
my $str = '\'\"\N{U+0050}\\\\\\$var\n';
print $str.$/;
print #ARGV;
print $/;
Called from bash ... using the bash means of escaping in quotes which changes \' to '\''.
jamie#debian:~$ ./ft.pl '\'\''\"\N{U+0050}\\\\\\$var\n'
'"P\\\content
'"P\\\content
'\"\N{U+0050}\\\$var\n
'\"\N{U+0050}\\\$var\n
\'\"\N{U+0050}\\\\\\$var\n
The final line, with six backslashes in the middle, was what I had expected. Reality differed.
So:
"in here \" is interpolated
in HEREDOC \ is interpolated
'in single quotes only \' is interpolated and only for \ and ' (are there more?)
my $str = 'same limited \ interpolation';
perl.pl 'escape using bash rules' with #ARGV is not interpolated

How to combine two lines together using Perl?

How to combine two lines together using Perl? I'm trying to combine these two lines using a Perl regular expression:
__Data__
test1 - results
dkdkdkdkdkd
I would like the output to be like this:
__Data__
test1 - results dkdkdkdkdkd
I thought this would accomplish this but not working:
$_ =~ s/__Data__\n(test1.*)\n(.*)\n/__Data__\n$1 $2/smg;
If you have a multiline string:
s/__Data__\ntest1.*\K\n//g;
The /s modifier only makes the wildcard . match \n, so it will cause .* to slurp your newline and cause the match of \n to be displaced to the last place it occurs. Which, depending on your data, might be far off.
The /m modifier makes ^ and $ match inside the string at newlines, so not so useful. The \K escape preserves whatever comes before it, so you do not need to put it back afterwards.
If you have a single line string, for instance in a while loop:
while (<>) {
if (/^__Data__/) {
$_ .= <>; # add next line
chomp; # remove newline
$_ .= <>; # add third line
}
print;
}
There seems to be a problem with the setup of $_. When I run this script, I get the output I expect (and the output I think you'd expect). The main difference is that I've added a newline at the end of the replacement pattern in the substitute. The rest is cosmetic or test infrastructure.
Script
#!/usr/bin/env perl
use strict;
use warnings;
my $text = "__Data__\ntest1 - results\ndkdkdkdkdkd\n";
my $copy = $text;
$text =~ s/__Data__\n(test1.*)\n(.*)\n/__Data__\n$1 $2\n/smg;
print "<<$copy>>\n";
print "<<$text>>\n";
Output
<<__Data__
test1 - results
dkdkdkdkdkd
>>
<<__Data__
test1 - results dkdkdkdkdkd
>>
Note the use of << and >> to mark the ends of strings; it often helps when debugging. Use any symbols you like; just enclose your displayed text in such markers to help yourself debug what's going on.
(Tested with Perl 5.12.1 on RHEL 5 for x86/64, but I don't think the code is version or platform dependent.)

Can I use unpack to split a string into characters in Perl?

A common 'Perlism' is generating a list as something to loop over in this form:
for($str=~/./g) { print "the next character from \"$str\"=$_\n"; }
In this case the global match regex returns a list that is one character in turn from the string $str, and assigns that value to $_
Instead of a regex, split can be used in the same way or 'a'..'z', map, etc.
I am investigating unpack to generate a field by field interpretation of a string. I have always found unpack to be less straightforward to the way my brain works, and I have never really dug that deeply into it.
As a simple case, I want to generate a list that is one character in each element from a string using unpack (yes -- I know I can do it with split(//,$str) and /./g but I really want to see if unpack can be used this way...)
Obviously, I can use a field list for unpack that is unpack("A1" x length($str), $str) but is there some other way that kinda looks like globbing? ie, can I call unpack(some_format,$str) either in list context or in a loop such that unpack will return the next group of character in the format group until $str is exausted?
I have read The Perl 5.12 Pack pod and the Perl 5.12 pack tutorial and the Perkmonks tutorial
Here is the sample code:
#!/usr/bin/perl
use warnings;
use strict;
my $str=join('',('a'..'z', 'A'..'Z')); #the alphabet...
$str=~s/(.{1,3})/$1 /g; #...in groups of three
print "str=$str\n\n";
for ($str=~/./g) {
print "regex: = $_\n";
}
for(split(//,$str)) {
print "split: \$_=$_\n";
}
for(unpack("A1" x length($str), $str)) {
print "unpack: \$_=$_\n";
}
pack and unpack templates can use parentheses to group things much like regexps can. The group can be followed by a repeat count. * as a repeat count means "repeat until you run out of things to pack/unpack".
for(unpack("(A1)*", $str)) {
print "unpack: \$_=$_\n";
}
You'd have to run a benchmark to find out which of these is the fastest.

Removing text inside parens, but not the parens in Perl

OK, I got a weird one that I've been jamming on for awhile (fri afternoon mind does not work I guess).
Does anyone know of a away to parse a string and remove all of the text inside parens without removing the parens themselves...but with deleting parens found inside.
ie.
myString = "this is my string (though (I) need (help) fixing it)"
after running it through what I want it would look like:
myString = "this is my string ()"
very important to keep those two parens there.
The module Regexp::Common deals with more than 1 top level of parentheses.
use strict;
use warnings;
use Regexp::Common qw/balanced/;
my #strings = (
'111(22(33)44)55',
'a(b(c(d)(e))f)g(h)((i)j)',
'this is my string (though (I) need (help) fixing it)',
);
s/$RE{balanced}{-parens=>'()'}/()/g for #strings;
print "$_\n" for #strings;
Output:
111()55
a()g()()
this is my string ()
You need to escape the parentheses to prevent them from starting a capture group. The pattern \(.+\) match the longest substring that starts with a ( and ends with a ). That will gobble up everything up to the last ) including any intervening parentheses. Finally, we replace that string with one containing just ():
#!/usr/bin/perl
use strict; use warnings;
my $s = "this is my string (though (I) need (help) fixing it)";
$s =~ s{\(.+\)}{()};
print "$s\n";
If you want to use Regular Expressions without using Regexp::Common. Look at the "Look Around" Feature. It was introduced with Perl 5.
You can read more about "Look Ahead" and "Look Behind" at regular-expressions.info.
There is also a section on "Look Around" in the "Mastering Regular Expressions" book. Look on page 59.
#!/usr/bin/env perl
use Modern::Perl;
my $string = 'this is my (string (that)) I (need help fixing)';
$string =~ s/(?<=\()[^)]+[^(]+(?=\))//g;
say $string;