How to substitute arbitrary fixed strings in Perl - perl

I want to replace a fixed string within another string using Perl. Both strings are contained in variables.
If it was impossible for the replaced string to contain any regex meta-characters, I could do something like this:
my $text = 'The quick brown fox jumps over the lazy dog!';
my $search = 'lazy';
my $replace = 'drowsy';
$text =~ s/$search/$replace/;
Alas, I want this to work for arbitrary fixed strings. E.g., this should leave $text unchanged:
my $text = 'The quick brown fox jumps over the lazy dog!';
my $search = 'dog.';
my $replace = 'donkey.';
$text =~ s/$search/$replace/;
Instead, this replaces dog! with donkey., since the dot matches the exclamation mark.
Assuming that the variable contents themselves are not hardcoded, e.g., they can come from a file or from the command line, is there a way to quote or otherwise markdown the contents of a variable so that they are not interpreted as a regular expression in such substitution operations?
Or is there a better way to handle fixed strings? Preferably something that would still allow me to use regex-like features such as anchors or back-references.

Run your $search through quotemeta:
my $text = 'The quick brown fox jumps over the lazy dog!';
my $search = quotemeta('dog.');
my $replace = 'donkey.';
$text =~ s/$search/$replace/;
This will unfortunately not allow you to use other regex features. If you have a select set of features you want to escape out, perhaps you can just run your $search through a first "cleaning" regex or function, something like:
my $search = 'dog.';
$search = clean($search);
sub clean {
my $str = shift;
$str =~ s/\./\\\./g;
return $str;
}

Wrap your search string with \Q...\E, which quotes any meta characters within.
$text =~ s/\Q$search\E/$replace/;

#Replace a string without using RegExp.
sub str_replace {
my $replace_this = shift;
my $with_this = shift;
my $string = shift;
my $length = length($string);
my $target = length($replace_this);
for(my $i=0; $i<$length - $target + 1; $i++) {
if(substr($string,$i,$target) eq $replace_this) {
$string = substr($string,0,$i) . $with_this . substr($string,$i+$target);
return $string; #Comment this if you what a global replace
}
}
return $string;
}

Related

Using a Perl variable as the substitute expression

I want to read the substitute expressions flexibly from external data,
so my problem could be reduced to the following:
my $pattern = "s/a/b/g";
my $string = "abcd";
$string =~ $pattern;
print("$string\n");
This is not functioning, but where is my problem?
Or does it even have no solution?
$pattern doesn't contain a regex pattern; it contains a bit of Perl source code. To evaluate Perl code, you need eval EXPR (or do or require).
Requiring Perl code from a user is a bad idea, though. Instead, I recommend the requiring the pattern and the replacement separately, as in the following:
my $pattern = 'a';
my $replacement = 'b';
my $string = 'abcd';
$string =~ s/$pattern/$replacement/g;
Use a function from String::Substitution if you want to allow $1 and such to be allowed in the replacement expression.
use String::Substitution qw( gsub_modify );
my $pattern = '(a)(b)';
my $replacement = '$2$1';
my $string = 'abcd';
gsub_modify($string, $pattern, $replacement);

Split to get only characters in Perl

I have a string like this :
Reporting EXE1 BASE,Normal
I need to get a var for every words like :
$info = "Reporting";
$host = "EXE1";
$device = "BASE";
$status = "Normal";
In fact, i saw the function "Split" might be a good use, but i don't understand the patern to use.
I prefer to use a global regex pattern match instead of split. That way you can specify the characters that you're interested in instead of the ones that you want to discard, and there's no chance of a spurious initial empty field if your string happens to start with a separator
It looks like you want to pick out "word" characters, which are upper and lower case letters, decimal digits, and the underscore character. There's a built-in character class \w for that, so finding all sequences that match \w+ should find the data for you
Here's an example program
use strict;
use warnings 'all';
my $s = 'Reporting EXE1 BASE,Normal';
my ( $info, $host, $device, $status ) = $s =~ /\w+/g;
print qq{\$info = "$info"\n};
print qq{\$host = "$host"\n};
print qq{\$device = "$device"\n};
print qq{\$status = "$status"\n};
output
$info = "Reporting"
$host = "EXE1"
$device = "BASE"
$status = "Normal"
If you want to allow more characters than \w matches then you could use
my ( $info, $host, $device, $status ) = $s =~ /[^\s,]+/g;
which matches sequences of characters that are neither space nor comma
Given your sample data the results are identical, but I cannot tell what your real data looks like
Use split(/\s|,/,"Reporting EXE1 BASE,Normal") to split the string on comma and blank
You might try this code.
my $str = "Reporting EXE1 BASE,Normal";
my #fields = split /\s|,/, $str;
my $info = $fields[0];
my $host = $fields[1];
my $device = $fields[2];
my $status = $fields[3];
print "$info\n";
print "$host\n";
print "$device\n";
print "$status\n";
Or more compact version -
my $str = "Reporting EXE1 BASE,Normal";
my ( $info, $host, $device, $status ) = split /[\s,]/, $str ;
print "$info\n";
print "$host\n";
print "$device\n";
print "$status\n";
No need to store the data in an array. Directly create the list and give the variable name to it.
my $string = "Reporting EXE1 BASE,Normal";
my ($info ,$host,$device,$status) = split(/\s|,/,$string);
print "$info ,$host,$device,$status";
Or else you could use pattern matching
my ($info ,$host,$device,$status) = $string =~m/(\w+)/g;

Unmatched ) in reg when using lc function

I am trying to run the following code:
$lines = "Enjoyable )) DAY";
$lines =~ lc $lines;
print $lines;
It fails on the second line where I get the error mentioned in the title. I understand the brackets are causing the trouble. I think I could use "quotemeta", but the thing is that my string contains info that I go on to process later, so I would like to keep the string intact as far as possible and not tamper with it too much.
You have two problems here.
1. =~ is used to execute a specific set of operations
The =~ operator is used to either match with //, m//, qr// or a string; or to substitute with s/// or tr///.
If all you want to do is lowercase the contents of $lines then you should use = not =~.
$lines = "Enjoyable )) DAY";
$lines = lc $lines;
print $lines;
2. Regular expressions have special characters which must be escaped
If you want to match $lines against a lower case version of $Lines, which should return true if $lines was already entirely lower case and false otherwise, then you need to escape the ")" characters.
#!/usr/bin/env perl
use strict;
use warnings;
my $lines = "enjoyable )) day";
if ($lines =~ lc quotemeta $lines) {
print "lines is lower case\n";
}
print $lines;
Note this is a toy example trying to find a reason for doing $lines =~ lc $lines - It would be much better (faster, safer) to solve this with eq as in $lines eq lc $lines.
See perldoc -f quotemeta or http://perldoc.perl.org/functions/quotemeta.html for more details on quotemeta.
=~ is used for regular expressions. "lc" is not part of regex, it's a function like this: $new = lc($old);
I don't recall the regex operator for lowercase, because I use lc() all the time.

perl find and replace ../ and  

I am using Perl to replace all instances of
../../../../../../abc' and  
in a string with
/ and , respectively.
The method I am using looks like this:
sub encode
{
my $result = $_[0];
$result =~ s/..\/..\/..\/..\/..\/..\//\//g;
$result =~ s/ / /g;
return $result;
}
Is this correct?
Essentially, yes, although the first regex has to be written in a different way: because . matches any character, we have to escape it \. or put it in its own character class [.]. The first regex can also be written cleaner as
...;
$result =~ s{ (?: [.][.]/ ){6} }
{/}gx;
...;
We look for the literal pattern ../ repeated 6 times and then replace it. Because I use curly braces as a delimiter I don't have to escape the slash. Because I use the /x modifier I can have these spaces inside the regex improving readability.
Try this. It will print /foo bar/baz.
#!/usr/bin/perl -w
use strict;
my $result = "../../../../../../foo bar/baz";
#$result =~ s/(\.\.\/)+/\//g; #for any number of ../
$result =~ s/(\.\.\/){6}/\//g; #for 6 exactly
$result =~ s/ / /g;
print $result . "\n";
you forgot the abc, i think:
sub encode
{
my $result = $_[0];
$result =~ s/(?:..\/){6}abc/\//g;
$result =~ s/ / /g;
return $result;
}

Perl search is only showing last result

I have two arrays, one with search terms and another which is multiple lines fetched from a file. I have a nested foreach statement and am searching for for all combinations, but only the very last match is showing even though I know for a fact that there are many other matches!! I have tried many different versions of the code but here is my last one:
open (MYFILE, 'searchTerms.txt');
open (MYFILE2, 'fileToSearchIn.xml');
#searchTerms = <MYFILE>;
#xml = <MYFILE2>;
close(MYFILE2);
close(MYFILE);
$results = "";
foreach $searchIn (#xml)
{
foreach $searchFor (#searchTerms)
{
#print "searching for $searchFor in: $searchIn\n";
if ($searchIn =~ m/$searchFor/)
{
$temp = "found in $searchIn \n while searching for: $searchFor ";
$results = $results.$temp."\n";
$temp = "";
}
}
}
print $results;
You should always use strict and use warnings at the start of your program, and declare all variables at the point of their first use using my. This applies especially when you are asking for help with your code as this measure can quickly reveal many simple mistakes.
As Raze2dust has said it is important to remember that lines read from a file will have a trailing newline "\n" character. If you were checking for exact matches between a pair of lines then this wouldn't matter, but since it's not working for you I assume the strings in searchTerms.txt can appear anywhere in the lines of fileToSearchIn.xml. That means you need to use chomp the strings from searchTerms.txt; lines from the other file can stay as they are.
Things like this are made a lot easier by using the File::Slurp module. It does all the file handling for you and will chomp any newlines from the input text if you ask.
I have changed your program to use this module so that you can see how it works.
use strict;
use warnings;
use File::Slurp;
my #searchTerms = read_file('searchTerms.txt', chomp => 1);
my #xml = read_file('fileToSearchIn.xml');
my #results;
foreach my $searchIn (#xml) {
foreach my $searchFor (#searchTerms) {
if ($searchIn =~ m/$searchFor/) {
push #results, qq/Found in "$searchIn"\n while searching for "$searchFor"/;
}
}
}
print "$_\n" for #results;
chomp your inputs to remove newline characters:
open (MYFILE, 'searchTerms.txt');
open (MYFILE2, 'fileToSearchIn.xml');
#searchTerms = <MYFILE>;
#xml = <MYFILE2>;
close(MYFILE2);
close(MYFILE);
$results = "";
foreach $searchIn (#xml)
{
chomp($searchIn);
foreach $searchFor (#searchTerms)
{
chomp($searchFor);
#print "searching for $searchFor in: $searchIn\n";
if ($searchIn =~ m/$searchFor/)
{
$temp = "found in $searchIn \n while searching for: $searchFor ";
$results = $results.$temp."\n";
$temp = "";
}
}
}
print $results;
Basically, you are thinking you are searching for 'a', but actually it is searching for 'a\n' because that is how it reads the input unless you use chomp. It matches only if 'a' is the last character because in that case, it will be succeeded by a newline.