find and replace text in a perl script - perl

so I know that there are many ways of doing what I am asking for, but all that I have found is not helping me for what I am trying to do.
It is supposed to be a simple find and replace script using stdin and stdout.
I have a script called replace.pl and this is what i have in it:
#!/usr/bin/perl
use strict;
use warnings;
while(<STDIN>){
$_ = s/$ARGV[0]/$ARGV[1]/g;
print STDOUT $_;
}
When I run echo "replace a with b please" | replace.pl 'a' 'b'
all I get back is a "1". My desire output is "replace b with b please" but what ever I try to do, it is not changing it. Could any one tell me what i am doing wrong here?

Try
s/$ARGV[0]/$ARGV[1]/g;
instead of
$_ = s/$ARGV[0]/$ARGV[1]/g;
as s/// returns 1 when substitution was successful.
You can also quote search pattern if it should be literal string (not a regex),
s/\Q$ARGV[0]\E/$ARGV[1]/g;

From perlop:
s/PATTERN/REPLACEMENT/msixpodualgcer
Searches a string for a pattern, and if found, replaces that pattern with the replacement text and returns the number of substitutions made. Otherwise it returns false (specifically, the empty string).
That's why the code sets $_ = 1. You just want to do s/$ARGV[0]/$ARGV[1]/g; for its substitution side-effect, without assigning its return value to $_.
while (<STDIN>) {
s/$ARGV[0]/$ARGV[1]/g;
print;
}

Related

What is the meaning of the number sign (#) in a Perl regex match?

What is the meaning of below statement in perl?
($script = $0) =~ s#^.*/##g;
I am trying to understand the operator =~ along with the statement on the right side s#^.*/##g.
Thanks
=~ applies the thing on the right (a pattern match or search and replace) to the thing on the left. There's lots of documentation about =~ out there, so I'm just going to point you at a pretty good one.
There's a couple of idioms going on there which are not obvious nor well documented which might be tripping you up. Let's cover them.
First is this...
($copy = $original) =~ s/foo/bar/;
This is a way of copying a variable and performing a search and replace on it in a single step. It is equivalent to:
$copy = $original;
$copy =~ s/foo/bar/;
The =~ operates on whatever is on the left after the left hand code has been run. ($copy = $original) evaluates to $copy so the =~ acts on the copy.
s#^.*/##g is the same as s/^.*\///g but using alternative delimiters to avoid Leaning Toothpick Syndrome. You can use just about anything as a regex delimiter. # is common, though I think its ugly and hard to read. I prefer {} because they balance. s{^.*/}{}g is equivalent code.
Unrolling the idioms, you have this:
$script = $0;
$script =~ s{^.*/}{}g;
$0 is the name of the script. So this is code to copy the name of the script and strip everything up to the last slash (.* is greedy and will match as much as possible) off it. It is getting just the filename of the script.
The /g indicates to perform the match on the string as many times as possible. Since this can only ever match once (the ^ anchors it to the beginning of the string) it serves no purpose.
There's a better and safer way to do this.
use File::Basename;
$script = basename($0);
It's very, very simple:
Perl quote-like expressions can take many different characters as part separators. The separator right after the command (in this case, the s) is the separator for the rest of the operation. For example:
# Out with the "Old" and "In" with the new
$string =~ s/old/new/;
$string =~ s#old#new#;
$string =~ s(old)(new);
$string =~ s#old#new#;
All four of those expressions are the same thing. They replace the string old with new in my $string. Whatever comes after the s is the separator. Note that parentheses, curly braces, and square brackets use parings. This works out rather nicely for the q and qq which can be used instead of single quotes and double quotes:
print "The value of \$foo is \"foo\"\n"; # A bit hard to read
print qq/The value of \$foo is "$foo"\n/; # Maybe slashes weren't a great choice...
print qq(The value of \$foo is "$foo"\n); # Very nice and clean!
print qq(The value of \$foo is (believe it or not) "$foo"\n); #Still works!
The last still works because the quote like operators count opening and closing parentheses. Of course, with regular expressions, parentheses and square brackets are part of the regular expression syntax, so you won't see them so much in substitutions.
Most of the time, it is highly recommended that you stick with the s/.../.../ form just for readability. It's what people are use to and it's easy to digest. However, what if you have this?
$bin_dir =~ s/\/home\/([^\/]+)\/bin/\/Users\/$1\bin/;
Those backslashes can make it hard to read, so the tradition has been to replace the backslash separators to avoid the hills and valleys effect.
$bin_dir =~ s#/home/([^/]+)/bin#/Users/$1/bin#;
This is a bit hard to read, but at least I don't have to quote each forward slash and backslash, so it's easier to see what I'm substituting. Regular expressions are hard because good quote characters are hard to find. Various special symbols such as the ^, *, |, and + are magical regular expression characters, and could probably be in a regular expression, the # is a common one to use. It's not common in strings, and it doesn't have any special meaning in a regular expression, so it won't be used.
Getting back to your original question:
($script = $0) =~ s#^.*/##g;
is the equivalent of:
($script = $0) =~ s/^.*\///g;
But because the original programmer didn't want to backquote that slash, they changed the separator character.
As for the:
($script = $0) =~ s#^.*/##g;`
It's the same as saying:
$script = $0;
$script =~ s#^.*/##g;
You're assigning the $script variable and doing the substitution in a single step. It's very common in Perl, but it is a bit hard to understand at first.
By the way, if I understand that basic expression (Removing all characters to the last forward slash. This would have been way cleaner:
use File::Basename;
...
$script = basename($0);
Much easier to read and understand -- even for an old Perl hand.
In perl, you can use many kinds of characters as quoting characters (string, regular expression, list). lets break it down:
Assign the $script variable the contents of $0 (the string that contains the name of the calling script.)
The =~ character is the binding operator. It invokes a regular expression match or a regex search and replace. In this case, it matches against the new variable, $script.
the s character indicates a search and replace regex.
The # character is being used as the delimiter for the regex. The regex pattern quote character is usually the / character, but you can use others, including # in this case.
The regex, ^.*/. It means, "at the start of string, search for zero or more characters until a slash. This will keep capturing on each line except for newline characters (which . does not match by default.)
The # indicating the start of the 'replace' value. Usually you have a pattern here that uses any captured part of the first line.
The # again. This ends the replace pattern. Since there was nothing between the start and end of the replace pattern, everything that was found in the first is replaced with nothing.
g, or global match. The search and replace will keep happening as many times as it matches in the value.
Effectively, searches for and empties every value before the / in the value , but keeps all the newlines, in the name of the script. It's a really lazy way of getting the script name when invoked in a long script that only works with a unix-like path.
If you have a chance, consider replacing with File::Basename, a core module in Perl:
use File::Basename;
# later ...
my $script = fileparse($0);

How to combine two lines together using Perl?

How to combine two lines together using Perl? I'm trying to combine these two lines using a Perl regular expression:
__Data__
test1 - results
dkdkdkdkdkd
I would like the output to be like this:
__Data__
test1 - results dkdkdkdkdkd
I thought this would accomplish this but not working:
$_ =~ s/__Data__\n(test1.*)\n(.*)\n/__Data__\n$1 $2/smg;
If you have a multiline string:
s/__Data__\ntest1.*\K\n//g;
The /s modifier only makes the wildcard . match \n, so it will cause .* to slurp your newline and cause the match of \n to be displaced to the last place it occurs. Which, depending on your data, might be far off.
The /m modifier makes ^ and $ match inside the string at newlines, so not so useful. The \K escape preserves whatever comes before it, so you do not need to put it back afterwards.
If you have a single line string, for instance in a while loop:
while (<>) {
if (/^__Data__/) {
$_ .= <>; # add next line
chomp; # remove newline
$_ .= <>; # add third line
}
print;
}
There seems to be a problem with the setup of $_. When I run this script, I get the output I expect (and the output I think you'd expect). The main difference is that I've added a newline at the end of the replacement pattern in the substitute. The rest is cosmetic or test infrastructure.
Script
#!/usr/bin/env perl
use strict;
use warnings;
my $text = "__Data__\ntest1 - results\ndkdkdkdkdkd\n";
my $copy = $text;
$text =~ s/__Data__\n(test1.*)\n(.*)\n/__Data__\n$1 $2\n/smg;
print "<<$copy>>\n";
print "<<$text>>\n";
Output
<<__Data__
test1 - results
dkdkdkdkdkd
>>
<<__Data__
test1 - results dkdkdkdkdkd
>>
Note the use of << and >> to mark the ends of strings; it often helps when debugging. Use any symbols you like; just enclose your displayed text in such markers to help yourself debug what's going on.
(Tested with Perl 5.12.1 on RHEL 5 for x86/64, but I don't think the code is version or platform dependent.)

perl pattern matching one by one and process it

I have a string
[something]text1[/something] blah blah [something]text2[/something]
I need to write a Perl script to read what is in the [something] tag, process it to "text-x", and put it back with an [otherthing] tag. So the above string should be
[otherthing]text-1[/otherthing] blah blah [otherthing]text-2[/otherthing]
Processing "textx" to "text-x" is not one step process.
So this is solution that I have till now:
m/[something](?<text>.*)[/something]/
This will get me the string in between and I can process that to "text-x" but how do I put it back in the same place with [otherthing]text-x[/otherthing]?
How do I use s/// in this case?
How to do it for the whole string one by one ?
You can use the /e switch on s/// to evaluate the right hand side before using the result as the substitution, and the /g flag to do this for every match.
Here is a simple example:
use 5.12.0;
my $str = ">1< >2< >34<";
$str =~ s/>(\d+)</">>".process("$1")."<<"/eg;
say $str;
sub process {
return "x" x $_[0];
}
This should come close. It uses the /e modifier to allow you to do processing in the replacement side of the regex and so it calls the fix_textx function where you can do multiple steps.
The normal way of iterating over matches is with the /g modifier.
#!/usr/bin/perl
use strict;
use warnings;
my $string = '[something]text1[/something] blah blah [something]text2[/something]';
$string =~ s{\[something\](text[^[]*)\[\/something\]}
{'[otherthing]' . fix_textx($1) . '[/otherthing]'}ge;
print $string;
sub fix_textx {
my ($testx) = #_;
$testx =~ s/text\K(.*)/-$1/;
return $testx;
}
EDIT: fixed the square bracket. Thanks #tadmc
In this particular case, you can accomplish what you're trying to do by splitting the string on "[something]" and then processing the beginning of each piece (except the first one), then joining the pieces back together when you're done.
I don't know if there is a general way to iterate over the regex matches in a string in Perl. I'm hoping someone else will answer this question and educate me on that.

I want to create a perl code to extract what is in the parentheses and port it to a variable

I want to create a perl code to extract what is in the parentheses and port it to a variable.
"(05-NW)HPLaserjet" should become "05-NW"
Something like this:
Catch "("
take out any spaces that exsist in between ()
everything in between () = variable 1
How would I go about doing this?
This is a job for regular expressions. Looks confusing because parens are used as meta characters in regular expression and are also part of the pattern in your example, escaped by backslashes.
C:\temp $ echo (05-NW)HPLaserjet | perl -nlwe "print for m/\(([^)]+)\)/g"
Match opening paren, start capture group, match one or more characters that aren't the closing paren, close capture group, match closing paren.
You can use regular expressions (see perlretut) to match and capture the value. By assigning to a list, you can put your captures into named variables. The global variables $1, $2 etc. are also used for capture groups, so you can use that instead of list assignment if you like.
use strict;
use warnings;
while (<>) # read every line
{
my ($printer_code) = m/
\( # Match literal opening parenthesis
([^\)]*) # Capture group (printer_code): Match characters which aren't right parenthesis, zero or more times
\)/x; # Match literal closing parenthesis
# The 'x' modifier allows you to add whitespace and comments to regex for clarity.
# If you use it, make sure you use '\ ' (or '\s', etc.) for actual literal whitespace matching!
}
__DATA__
(05-NW)HPLaserjet
perldoc perlre
use warnings;
use strict;
my $s = '(05-NW)HPLaserjet';
my ($v) = $s =~ /\((.*)\)/; # Grab everything between parens (including other parens)
$v =~ s/\s//g; # Remove all whitespace
print "$v\n";
__END__
05-NW
See also: Perl Idioms Explained - #ary = $str =~ m/(stuff)/g

Need to print the last occurrence of a string in Perl

I have a script in Perl that searches for an error that is in a config file, but it prints out any occurrence of the error. I need to match what is in the config file and print out only the last time the error occurred. Any ideas?
Wow...I was not expecting this much of a response. I should've been more clear in stating this is for log monitoring on a windows box that sends an alert to Nagios. This is actually my first Perl program and all this information has been very helpful. Does anyone know how I can apply this any of the tail answers on a wintel box?
Another way to do it:
perl -n -e '$e = $1 if /(REGEX_HERE)/; END{ print $e }' CONFIG_FILE_HERE
What exactly do you need to print? The line containing the error? More context than that?
File::ReadBackwards can be helpful.
In outline:
my $errinfo;
while (<>)
{
$errinfo = "whatever" if (m/the error pattern/);
}
print "error: $errinfo\n" if ($errinfo);
This catches all errors, but doesn't print until the end, when only the last one survives.
A brute-force approach involves setting up your own pipeline by pointing STDOUT to tail. This allows you to print all errors, and then it's up to tail to worry about only letting the last one out.
You didn't specify, so I assume a legal config line is of the form
Name = some value
Matching that is straightforward:
^ (starting at the beginning of line)
\w+ (one or more “word characters”)
\s+ (followed by mandatory whitespace)
= (followed by an equals sign)
\s+ (more mandatory whitespace)
.+ (some mandatory value)
$ (finishing at the end of the line)
Gluing it together, we get
#! /usr/bin/perl
use warnings;
use strict;
# for demo only
*ARGV = *DATA;
my $pid = open STDOUT, "|-", "tail", "-1" or die "$0: open: $!";
while (<>) {
print unless /^ \w+ \s+ = \s+ .+ $/x;
}
close STDOUT or warn "$0: close: $!";
__DATA__
This = assignment is ok
But := not this
And == definitely not this
Output:
$ ./lasterr
And == definitely not this
With regular expressions, when you want the last occurrence of a pattern, place ^.* at the front of your pattern. For example, to replace the last X in the input with Y, use
$ echo XABCXXXQQQXX | perl -pe 's/^(.*)X/$1Y/'
XABCXXXQQQXY
Note that the ^ is redundant because regular-expression quantifiers are greedy, but I like having it there for emphasis.
Applying this technique to your problem, you can search for the last line in your config file that contains an error as in the following program:
#! /usr/bin/perl
use warnings;
use strict;
local $_ = do { local $/; scalar <DATA> };
if (/\A.* ^(?! \w+ \s+ = \s+ [^\r\n]+ $) (.+?)$/smx) {
print $1, "\n";
}
__DATA__
This = assignment is ok
But := not this
And == definitely not this
The syntax of the regular expression is a bit different because $_ contains multiple lines, but the principle is the same. \A is similar to ^, but it matches only at the beginning of string to be searched. With the /m switch (“multi-line”), ^ matches at logical line boundaries.
Up to this point, we know the pattern
/\A.* ^ .../
matches the last line that looks like something. The negative look-ahead assertion (?!...) looks for a line that is not a legal config line. Ordinarily . matches any character except newline, but the /s switch (“single line”) lifts this restriction. Specifying [^\r\n]+, that is, one or more characters that are neither carriage return nor line feed, does not allow the match to spill into the next line.
Look-around assertions do not capture, so we grab the offending line with (.+?)$. The reason it's safe to use . in this context is because we know the current line is bad and the non-greedy quantifier +? stops matching as soon as it can, which in this case is the end of the current logical line.
All these regular expressions use the /x switch (“extended mode”) to allow extra whitespace: the aim is to improve readability.