Run a sed search and replace inside perl

Run a sed search and replace inside perl - sed

I am trying to test the code snippet below for a bigger script that I am writing. However, I can't get the search working with parentheses and variables.
Appreciate any help someone can give me.
Code snippet:
#!/usr/bin/perl
$file="test4.html";
$Search="Help (Test)";
$Replace="Testing";
print "/usr/bin/sed -i cb 's/$Search/$Replace/g' $file\n";
`/usr/bin/sed -i cb 's/$Search/$Replace/g' $file`;
Thanks,
Ash

The syntax to run a command in a child process and wait for its termination in perl is system "cmd", "arg1", "arg2",...:
#!/usr/bin/perl
$file="test4.html";
$Search="Help (Test)";
$Replace="Testing";
print "/usr/bin/sed -icb -e 's/$Search/$Replace/g' -- $file\n";
system "/usr/bin/sed", "-icb", "-e", "s/$Search/$Replace/g", "--", $file;
(error checking left as an exercise, see perldoc -f system for details)
Note that -i is not a standard sed option. The few implementations that support it (yours must be the FreeBSD one as you've separated the cb backup extension from -i) have actually copied it from perl! It does feel a bit silly to be calling sed from perl here.
Looking at your approach:
The `...` operator itself is reminiscent of the equivalent `...` shell operator. In perl, what's inside is evaluated as if inside double quoted, in that $var, #var... perl variables are expanded, and a shell is started with -c and the resulting string as arguments and with its stdout redirected to a pipe.
The shell interprets that argument as code in the shell syntax. Perl reads the output of that inline shell script from the other end of the pipe and that makes up the expansion of `...`. Same as in shell command substitution except that there's is no stripping of zero bytes or of trailing newlines.
sed -i produces no output, so it's pointless to try and capture its output with `...` here.
Now in your case, the code that sh is asked to interpret is:
/usr/bin/sed -i cb 's/Help (Test)/Testing/g' test4.html
That should work fine on FreeBSD or macOS at least. If $file had been test$(reboot).html, that would have been worse though.
Here, because you have the contents of variables that end up interpreted as code in an interpreter (here sh), you have a potential arbitrary command injection vulnerability.
In the system approach, we remove sh, so that particular vulnerability is removed. However sed is also an interpreter of some language. That language is not as omnipotent as that of sh, but for instance sed can write to arbitrary files with its w command. The GNU implementation (which you don't seem to be using) can run arbitrary commands as well.
So you still potentially have a code injection vulnerability in the case of $Search or $Replace coming from an external source.
If that's the case, you'd need to make sure your properly sanitise those values before running sed. See for instance: How to ensure that string interpolated into `sed` substitution escapes all metachars

Related

From perl, why does a backticked call to a c-shell pass quoted string ok if called with tcsh but not source?

I need to write a perl script that calls a c-shell script that calls yet another perl script. I cannot change the c-shell script or the perl script it calls. One of the args that needs to be passed is a quotes string with spaces. If I use backticks to call the c-shell, and I run the c-shell with tcsh, the quoted string is respected as a single entity. However, if I run the c-shell with source, it is not.
I feel that I need to use 'source' because when the c-shell is called by users from the command line, it is called through an alias that sources the c-shell. E.g.
alias top "source top.csh"
Consider these...
topmost.pl
#!/usr/bin/env perl
use strict;
print "Try with tcsh...\n";
my $msg = `tcsh ./top.csh -arg1 "this line has spaces"`;
print "$msg\n";
print "Try with source...\n";
my $msg = `source ./top.csh -arg1 "this line has spaces"`;
print "$msg\n";
exit;
top.csh is simply....
perl ./subperl.pl $*:q
exit
And subperl.pl is...
#!/usr/bin/env perl
use strict;
print "In subperl.pl\n";
foreach $x (#ARGV) {
print "$x\n";
}
print "The End\n";
exit;
When I run the topmost.pl script, I get...
Try with tcsh...
In subperl.pl
-arg1
this line has spaces
The End
Try with source...
In subperl.pl
-arg1
this
line
has
spaces:q
The End
Why does the "sourced" call to the top.csh script fail to respect the quotes ?

#Kaz has the answer as to why your code isn't working. This answer is about how to avoid this class of problems entirely.
First, if you can, add a #!/bin/tcsh to top.csh and make it executable (ie. chmod +x). Now it can be executed as top.csh without needing to know what shell to use.
Then you'll want to avoid using `` for anything but very simple commands. This is because `` is interpreted by the shell and now you need to worry about shell special characters and escapes and spaces... it's a mess. What you need is a way to call external programs without invoking a shell.
You can do this by passing a list to system, but system cannot capture the output.
system "tcsh", "./top.csh", "-arg1", "this line has spaces";
While you can cobble something together with open and pipes, it's better to use a pre-existing library such as IPC::System::Simple.
use IPC::System::Simple qw(capturex);
# Or capturex("./top.csh", ...) if you add a #! to top.csh.
my $msg = capturex("tcsh", "./top.csh", "-arg1", "this line has spaces");
For more involved interactions with executables, look into System::Command or IPC::Run.
Needless to say, Perl scripts which call shell scripts which call Perl scripts is a bit of a nightmare to maintain. Rather than do that, it is better to scoop the guts of subperl.pl out into a Perl library and have both subperl.pl and your code use that library.

The command in backticks is being interpreted by your system interpreter (invoked via /bin/sh), which I'm guessing might be GNU Bash. Or, in any case, it seems to be some shell which understands the source command, and almost certainly a POSIX-like shell. That source command quite probably tells that shell to read a script written in that shell's own language. So, for instance, if that shell happens to be Bash, it will treat that as a Bash script1, not as a Tcsh script.
The only way both could work is if the script is a "polyglot": a program which can be interpreted by either tcsh or the system shell that is used by perl to implement backticks.
(An easy example of a C Shell + POSIX shell polyglot is a script that contains nothing but a sequence of trivial commands consisting of space separated words like cp from to.)
Your script isn't a polyglot; only Tcsh understands the :q syntax, not the other shell.
More precisely, if /bin/sh is Bash, the original source ... command as well as the contents of the sourced top.csh script will be treated as a POSIX-mode Bash script, since when Bash is invoked as /bin/sh, it turns off its POSIX-incompatible behaviors. So even if Bash's pathname expansion supported the Tcsh :q mechanism, it would almost certainly be turned off under POSIX mode because $*:q already has a firm meaning in POSIX.

Why is Perl complaining about an unterminated string here?

I have a Perl script which runs fine under Perl 5.005 on HPUX 11i v3 and that causes a small problem under Perl 5.10 on my Ubuntu 11.04 box.
It boils down to a line like:
open (CMD, "do something | grep -E 'sometext$' |");
(it's actually capturing ps -ef output to see if a process is running but I don't think that's important (a)).
Now this runs fine on the HPUX environment but, when I try it out under Ubuntu, I get:
sh: Syntax error: Unterminated quoted string
By inserting copious debug statements, I tracked it down to that offending line then started removing characters one-by-one until it stopped complaining. Luckily, $ was the first one I tried and it stopped giving me the error, so I then changed the line to:
open (CMD, "do something | grep -E 'sometext\$' |");
and it worked fine (under Linux anyway - I haven't tested that on HPUX since I don't have access to that machine today - if it does work, I'll just use that approach but I'd still like to know why it's a problem).
So it seems obvious that the $ is "swallowing" the closing single quote under my Linux environment but not apparently on the HPUX one.
My question is simply, why? Surely there weren't any massive changes between 5.005 and 5.10. Or is there some sort of configuration item I'm missing?
(a) But, if you know a better way to do this without external CPAN modules (ie, with just the baseline Perl 5.005 installation), I'd be happy to know about it.

$' is a special variable (see perldoc perlvar). 5.005 was many versions ago, so it's possible that something has changed in the regexp engine to make this variable different (although it appears to be in 5.005 also)
As for the better way, you could at least only run the 'ps -ef' in a pipeline and do the 'grep' in perl.

Use the following!!!
use strict;
use warnings;
You would have gotten
Use of uninitialized value $' in concatenation (.) or string

A sigil followed by any punctuation symbol (on a standard keyboard) is a variable in Perl, regardless of if it is defined or not. So in a double quoted string [$#][symbol] will always be read as one token and interpolated unless the sigil is escaped.
I have a feeling that the difference you are seeing has to do with different system shells rather than different versions of perl.
Consider your line:
open (CMD, "do something | grep -E 'sometext$' |");
When perl sees that, it will interpolate the empty $' variable into the double quoted string, so the string becomes:
open (CMD, "do something | grep -E 'sometext |");
At that point, your shell gets to process a line that looks like:
do something | grep -E 'sometext
And if it succeeds or fails will depend on the shell's rules regarding unterminated strings (some shells will complain loudly, others will automatically terminate the string at eof).
If you were using the warnings pragma in your script, you probably would have gotten a warning about interpolating an undefined variable.
A shorter and cleaner way to read in the output of ps would be:
my #lines = grep /sometext\$/, `ps -ef`;
Or using an explicit open:
my #lines = grep /sometext\$/, do {
open my $fh, '|-', 'ps -ef' or die $!;
<$fh>
};

Because $' is a special variable in recent versions of Perl.
From the official documentation (perlvar):
$':
The string following whatever was matched by the last successful
pattern match (not counting any matches hidden within a BLOCK or
eval() enclosed by the current BLOCK).
If there were no successful pattern matches, $' is empty and your statement essentially interpolates to
open (CMD, "do something | grep -E 'sometext |");
Escaping the dollar sign (the solution that works for you on Linux) should work on HPUX too.
I'm not sure when was this variable added, but I can confirm that it exists in Perl 5.8.5. What's New for Perl 5.005 mentions $' (not as a new feature), so I think it was already there before then.

You should probably use single-quotes rather than double quotes around the string since there is nothing in the string that should be interpolated:
open (CMD, q{do something | grep -E 'sometext$' |});
Better would be to use the 3-argument form of open with a lexical file handle:
open my $cmd, '-|', q{do something | grep -E 'sometext$'} or die 'a horrible death';
I don't have a good explanation for why $' is being recognized as the special variable in 5.10 and yet it was not in 5.005. That is unexpected.
Is there a really good reason you can't upgrade to something like 5.14.1? Even if you don't change the system-provided Perl, there's no obvious reason you can't install a recent version in some other location and use that for all your scripting work.

$' is a special variable.
If you want to avoid variable expansion, just use q():
open (CMD, q(do something | grep -E 'sometext$' |));

How to find path of find2perl script on Unix using bash or perl

We (the company I work for) need to run the find2perl script on over a thousand different Unix servers of different flavors (Linux, Solaris, HP-UX, AIX) and different versions.
The one thing that all the servers have in common, is that they all have at least one implementation of perl installed. However, not all systems have it configured the same way.
Finding the location of perl is easy enough using the which command. However, on 70% of the servers, the actual directory containing find2perl (the bin folder of perl) is not present in the $PATH variable and can't be located that way.
On some servers, perl is actually a symbolic link pointing another location, in which case I can use ls -l and sed to extract the target of the link to find where perl is actually installed.
On other servers however, it's more complicated, as it seems perl was compiled to a custom location and the binary of perl present in /bin or /usr/bin (or wherever perl is found) is not a symbolic link, but rather a full blown executable. In this case, I thought about using the #INC variable of perl to try to find find2perl but it seems rather excessive.
What would be the better/best/fullproof method (one-liner if possible) to always get the location of find2perl on a Unix system?

Ways to locate find2perl
Two ways, both of which rely on asking the perl install how it was configured:
Config.pm
Its probably scriptdirexp from Config.pm.
$ perl -MConfig -E 'say $Config{scriptdirexp}'
/usr/bin
And indeed, that's where find2perl is on my system. You can use Config; in your perl scripts, which is its major advantage over the next method.
perl -V:varname
As per Yanick Girouard's comment, you can also use perl -V:scriptdirexp to get this, in a format suitable to passing to eval in a shell script. There are actually several formats available (so, you don't need to use e.g., cut to parse it):
OPTION OUTPUT (\n = actual newline) NOTES
-V:scriptdirexp scriptdirexp='/usr/bin';\n full shell syntax, even if multiple -V options
-V:scriptdirexp: scriptdirexp='/usr/bin' trailing colon omits semicolon and newline
-V::scriptdirexp '/usr/bin'; \n extra leading colon omits var= part
-V::scriptdirexp: '/usr/bin' you can combine them.
Full documentation is in the perlrun manpage.
Ways to embed find2perl
If you decide to copy over find2perl, as per evil otto's comment, you can actually do that by embedding it in your shell script. There are many ways. If neither of the two below work, then you can certainly use shar (which has an extremely long history, and is likely compatible with everything).
Quoted here-document
The easiest way is if your shell supports quoted here-documents. They all should, as its a POSIX requirement:
#!/bin/sh
perl - -name 'foo' -mtime 2 -print <<'FIND2PERL'
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if $running_under_some_shell;
⋮
FIND2PERL
Hex dump in a non-quoted here-document
If some of your shells don't implement quoted here-documents (POSIX‽ what's that!), then you have to protect find2perl from shell expansion. An easy way is to hex dump it, as 0–9 and a–f are all safe from shell expansion. The dump is easily done with xxd -p /usr/bin/find2perl, which only requires xxd on one machine. To read back the dump, you can use plain perl:
#!/bin/sh
perl -n -e 'chomp; print pack("H*", $_)' <<HEX | perl - -name 'foo'
23212f7573722f62696e2f7065726c0a202020206576616c202765786563
202f7573722f62696e2f7065726c202d5320243020247b312b222440227d
⋮
HEX
Using find2perl several times
Naturally, with either approach, you could also write find2perl to a temporary file (if you need to invoke it multiple times, for example). You could also embed it in a shell function.

perl -lwe '$_ = $^X; s/perl$/find2perl/; -f or die qq($_ not -f); print'
Copy the interpreter executable path into dollar default argument. Patch the value, assuming that find2perl is in the same directory as perl itself. (This is specified as UNIX only, so you don't have to cater for perl.exe, which would be easy enough to deal with.) Then test the file exists, and die if it doesn't. (You might invent some better error handling.) Then print the path if we're still alive. That's it.
Okay, here's a version that works for Windows, too:
perl -lwe "$_ = $^X; s/perl(\.exe)?$/find2perl/;
-f or -f qq($_.bat) or die qq($_ not -f); print"
Note the double quotes, de rigueur on Windows for cmd.exe. And it has to go on one line, I just wrapped it for readability.

Could someone tell me what this means in Perl

I'm new to Perl and was hoping someone could tell me what this means exactly
eval 'exec ${PERLHOME}/bin/perl -S $0 ${1+"$#"}' # -*- perl -*-
if 0;

This is explained in perldoc perlrun:
-S
makes Perl use the PATH environment variable to search for the program
unless the name of the program contains path separators.
...
Typically this is used to emulate #! startup on platforms that don't
support #! . It's also convenient when debugging a script that uses
#! , and is thus normally found by the shell's $PATH search
mechanism.
This example works on many platforms that have a shell compatible with
Bourne shell:
#!/usr/bin/perl
eval 'exec /usr/bin/perl -wS $0 ${1+"$#"}'
if $running_under_some_shell;
The system ignores the first line and feeds the program to /bin/sh,
which proceeds to try to execute the Perl program as a shell script.
The shell executes the second line as a normal shell command, and thus
starts up the Perl interpreter. On some systems $0 doesn't always
contain the full pathname, so the -S tells Perl to search for the
program if necessary. After Perl locates the program, it parses the
lines and ignores them because the variable
$running_under_some_shell is never true. If the program will be
interpreted by csh, you will need to replace ${1+"$#"} with $* ,
even though that doesn't understand embedded spaces (and such) in the
argument list. To start up sh rather than csh, some systems may
have to replace the #! line with a line containing just a colon,
which will be politely ignored by Perl.
In short, it mimics shebang behavior for platforms that have shells compatible with Bash.

It's valid both as shell script and as a Perl program. It is used to run the Perl interpreter after all on systems where the shebang doesn't work, for some reason. It's rarely seen these days but used to be common in the early 1990s.
The comment is just a comment, but it has special meaning in Emacs, which will open the file in perl mode.

I just read #Zaid's response, which is better and more correct than mine as long as this code is on the first line of the script being executed, and no shebang exists. I've never seen this kind of substitute. Quite interesting, really.
The second line, if 0; is a part of the first line. You can tell since the first line lacks a ;. It would be more obvious if this was one long single line with the comment being after the semicolon.
So it's equivalent to:
if(0) {
eval 'exec ${PERLHOME}/bin/perl -S $0 ${1+"$#"}
}
In perl, 0 will be evaluated to false, and so the eval-clause will never execute. Presumably this condition(the if) was a quick way to disable the line. Perhaps the evaluation was once something real instead of an always-false.
See perl --help, perldoc -f eval and perldoc -f exec for information on the evaluation block itself.
The remaining trickyness (${1+"$#"}) I have no idea about. This isn't perl anyway; it's interpreted by whichever shell exec is launching (Correct me if I'm wrong on this!). If it's bash, I don't think it does anything at all and can be substituted with $#, which is the environment variable holding all commandline arguments (ie #ARGV in perl).

How do I use Perl on the command line to search the output of other programs?

As I understand (Perl is new to me) Perl can be used to script against a Unix command line. What I want to do is run (hardcoded) command line calls, and search the output of these calls for RegEx matches. Is there a way to do this simply in Perl? How?
EDIT: Sequence here is:
-Call another program.
-Run a regex against its output.

my $command = "ls -l /";
my #output = `$command`;
for (#output) {
print if /^d/;
}
The qx// quasi-quoting operator (for which backticks are a shortcut) is stolen from shell syntax: run the string as a command in a new shell, and return its output (as a string or a list, depending on context). See perlop for details.
You can also open a pipe:
open my $pipe, "$command |";
while (<$pipe>) {
# do stuff
}
close $pipe;
This allows you to (a) avoid gathering the entire command's output into memory at once, and (b) gives you finer control over running the command. For example, you can avoid having the command be parsed by the shell:
open my $pipe, '-|', #command, '< single argument not mangled by shell >';
See perlipc for more details on that.

You might be able to get away without Perl, as others have mentioned. However, if there is some Perl feature you need, such as extended regex features or additional text manipulation, you can pipe your output to perl then do what you need. Perl's -e switch let's you specify the Perl program on the command line:
command | perl -ne 'print if /.../'
There are several other switches you can pass to perl to make it very powerful on the command line. These are documented in perlrun. Also check out some of the articles in Randal Schwartz's Unix Review column, especially his first article for them. You can also google for Perl one liners to find lots of examples.

Do you need Perl at all? How about
command -I use | grep "myregexp" && dosomething
right in the shell?

#!/usr/bin/perl
sub my_action() {
print "Implement some action here\n";
}
open PROG, "/path/to/your/command|" or die $!;
while (<PROG>) {
/your_regexp_here/ and my_action();
print $_;
}
close PROG;
This will scan output from your command, match regexps and do some action (which now is printing the line)

In Perl you can use backticks to execute commands on the shell. Here is a document on using backticks. I'm not sure about how to capture the output, but I'm sure there's more than a way to do it.

You indeed use a one-liner in a case like this. I recently coded up one that I use, among other ways, to produce output which lists the directory structure present in a .zip archive (one dir entry per line). So using that output as an example of command output that we'd like to filter, we could put a pipe in and then use perl with the -n -e flags to filter the incoming data (and/or do other things with it):
[command_producing_text_output] | perl -MFile::Path -n -e \
"BEGIN{#PTM=()} if (m{^perl/(bin|lib(?!/site))}) {chomp;push #PTM,$_}" ^
-e "END{#WDD=mkpath (\#PTM,1);" ^
-e "printf qq/Created %u dirs to reflect part of structure present in the .ZIP file\n/, scalar(#WDD);}"
the shell syntax used, including: quoting of perl code and escaping of newlines, reflects CMD.exe usage in Windows NT-like consoles. If you need to, mentally replace
"^" with "\" and " with ' in the appropriate places.
The one-liner above adds only the directory names that start with "perl/bin" or
"perl/lib (not followed by "/site"); it then creates those directories. You wind
up with a (empty) tree that you can use for whatever evil purposes you desire.
The main point is to illustrate that there are flags available (-n, -p) to
allow perl to loop over each input record (line), and that what you can do is unlimited in terms of complexity.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse