sed "unterminated `s'command`" error when running from a script - perl

I have a temp file with contents:
a
b
c
d
e
When I run sed 's#b#batman\nRobin#' temp from command line, I get:
a
batman
Robin
c
d
e
However, when I run the command from a Perl scriptL
#!/usr/bin/perl
use strict;
use warnings;
`sed 's#b#batman\nRobin#' temp`
It produces error:
sed: -e expression #1, char 10: unterminated `s' command
What am I doing wrong?

Why run another tool like sed once you are inside a Perl program? If anything, now you have far more tools and power so just do it with Perl.
One simple way to do your sed thing
use warnings;
use strict;
die "Usage: $0 file(s)\n" if not #ARGV;
while (<>) {
s/b/batman\nRobin/;
print;
}
Run this program by supplying the file (temp) to it on the command line. The die line is there merely to support/enforce such usage; it is inessential for script's operation.
This program then is a simple filter
<> operator reads line by line all files submitted on the command line
A line is assigned by it to $_ variable, a default for many things in Perl
The s/// operator by default binds to $_, which gets changed (if pattern matches)
print by default prints the $_ variable
Use nearly anything you want for delimiters in regex, see m// and s/// operators
This can also be done as
while (<>) {
print s/b/batman\nRobin/r
}
With /r modifier s/// returns the changed string (or the original if pattern didn't match)
Finally that's also just
print s/b/batman\nRobin/r while <>;
but I'd expect that with a script you really want to do more and then this probablyisn't it.
On the other side of things you could write it more properly
use warnings;
use strict;
use feature qw(say);
die "Usage: $0 file(s)\n" if not #ARGV;
while (my $line = <>) {
chomp $line;
$line =~ s/b/batman\nRobin/;
say $line;
}
With a line in a lexical variable nicely chomp-ed this is ready for more work.

Related

How to run a perl file in a terminal pipeline

Probably a very simple question:
I have a perl file sed.perl that takes as an input a string, makes some substitutions there and prints it on the standard output.
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
use feature 'say';
#use Cwd;
my ($text) = #ARGV;
$text =~ s/\.\)\n/'\.'\)\n/;
print $text;
I want to feed the script with a string output from a terminal pipeline. Let's say in this way:
cat input.txt | perl sed.perl
but this doesn't work: Use of uninitialized value $text in substitution (s///) at
Using a score symbol doesn't works either:
cat input.txt | perl sed.perl -
#ARGV doesn't do what you think it does. It's literally the arguments passed to perl.
E.g. :
myscript.pl some arg
#ARGV will host 'some', 'arg'.
What you want is the STDIN file handle.
e.g.
#!/usr/bin/perl
use strict;
use warnings;
while ( <STDIN> ) {
s/something/somethingelse/g;
print;
}
Now what this is doing is reading STDIN line by line. Your pattern includes \n. Do you actually need it? It looks like you're 'just' using it as a line anchor, and so you could use:
s/\.\)$/'\.'\)/g;
$ is the regex for "end of line" - see perlre for more.
However, as noted in the comments by reinierpost - there's another thing that's probably useful to know - perl has the "diamond operator" <> which does two things:
If filenames are specified to the script, opens them and reads from them.
If no arguments specified, reads STDIN.
So you could do:
while ( <> ) {
s/something/somethingelse/g;
print;
}
And then your script can either be invoked by:
cat input.txt | ./yourscript.pl
Or:
./yourscript.pl input.txt
And you'll have the same result.
You're trying to read the text as if it were an argument, when in fact with a pipeline the information from input.txt will be something you read from stdin. With a pipeline stdout from the left process is connected to stdin of the right process.
As others have mentioned, #ARGV contains arguments to your script, and you'll want to use STDIN instead.
The shortest solution seems to be:
#/usr/bin/env perl
use strict;
while(<>) {
print s/\.\)$/'.')/r;
}
Note that you could also achieve the results of your example using a one-liner as follows:
cat input.txt | perl -pe 's/your/substitution/'

How can I slurp STDIN in Perl?

I piping the output of several scripts. One of these scripts outputs an entire HTML page that gets processed by my perl script. I want to be able to pull the whole 58K of text into the perl script (which will contain newlines, of course).
I thought this might work:
open(my $TTY, '<', '/dev/tty');
my $html_string= do { local( #ARGV, $/ ) = $TTY ; <> } ;
But it just isn't doing what I need. Any suggestions?
my #lines = <STDIN>;
or
my $str = do { local $/; <STDIN> };
I can't let this opportunity to say how much I love IO::All pass without saying:
♥ ♥ __ "I really like IO::All ... a lot" __ ♥ ♥
Variation on the POD SYNOPSIS:
use IO::All;
my $contents < io('-') ;
print "\n printing your IO: \n $contents \n with IO::All goodness ..." ;
Warning: IO::All may begin replacing everything else you know about IO in perl with its own insidious goodness.
tl;dr: see at the bottom of the post. Explanation first.
practical example
I’ve just wondered about the same, but I wanted something suitable for a shell one-liner. Turns out this is (Korn shell, whole example, dissected below):
print -nr -- "$x" | perl -C7 -0777 -Mutf8 -MEncode -e "print encode('MIME-Q', 'Subject: ' . <>);"; print
Dissecting:
print -nr -- "$x" echos the whole of $x without any trailing newline (-n) or backslash escape (-r), POSIX equivalent: printf '%s' "$x"
-C7 sets stdin, stdout, and stderr into UTF-8 mode (you may or may not need it)
-0777 sets $/ so that Perl will slurp the entire file; reference: man perlrun(1)
-Mutf8 -MEncode loads two modules
the remainder is the Perl command itself: print encode('MIME-Q', 'Subject: ' . <>);, let’s look at it from inner to outer, right to left:
<> takes the entire stdin content
which is concatenated with the string "Subject: "
and passed to Encode::encode asking it to convert that to MIME Quoted-Printable
the result of which is printed on stdout (without any trailing newline)
this is followed by ; print, again in Korn shell, which is the same as ; echo in POSIX shell – just echoïng a newline.
tl;dr
Call perl with the -0777 option. Then, inside the script, <> will contain the entire stdin.
complete self-contained example
#!/usr/bin/perl -0777
my $x = <>;
print "Look ma, I got this: '$x'\n";
To get it into a single string you want:
#!/usr/bin/perl -w
use strict;
my $html_string;
while(<>){
$html_string .= $_;
}
print $html_string;
I've always used a bare block.
my $x;
{
undef $/; # Set slurp mode
$x = <>; # Read in everything up to EOF
}
# $x should now contain all of STDIN

perl search & replace script for all files in a directory

I have a directory with nearly 1,200 files. I need to successively go through each file in a perl script to search and replace any occurrences of 66 strings. So, for each file I need to run all 66 s&r's. My replace string is in Thai, so I cannot use the shell. It must be a .pl file or similar so that I can use use::utf8. I am just not familiar with how to open all files in a directory one by one to perform actions on them. Here is a sample of my s&r:
s/psa0*(\d+)/เพลงสดุดี\1/g;
Thanks for any help.
use utf8;
use strict;
use warnings;
use File::Glob qw( bsd_glob );
#ARGV = map bsd_glob($_), #ARGV;
while (<>) {
s/psa0*(?=\d)/เพลงสดุดี/g;
print;
}
perl -i.bak script.pl *
I used File::Glob's bsd_glob since glob won't handle spaces "correctly". They are actually the same function, but the function behaves differently based on how it's called.
By the way, using \1 in the replacement expression (i.e. outside a regular expression) makes no sense. \1 is a regex pattern that means "match what the first capture captured". So
s/psa0*(\d+)/เพลงสดุดี\1/g;
should be
s/psa0*(\d+)/เพลงสดุดี$1/g;
The following is a faster alternative:
s/psa0*(?=\d)/เพลงสดุดี/g;
See opendir/readdir/closedir for functions that can iterate through all the filenames in a directory (much like you would use open/readline/close to iterate through all the lines in a file).
Also see the glob function, which returns a list of filenames that match some pattern.
Just in case someone could use it in the future. This is what I actually did.
use warnings;
use strict;
use utf8;
my #files = glob ("*.html");
foreach $a (#files) {
open IN, "$a" or die $!;
open OUT, ">$a-" or die $!;
binmode(IN, ":utf8");
binmode(OUT, ":utf8");
select (OUT);
foreach (<IN>) {
s/gen0*(\d+)/ปฐมกาล $1/;
s/exo0*(\d+)/อพยพ $1/;
s/lev0*(\d+)/เลวีนิติ $1/;
s/num0*(\d+)/กันดารวิถี $1/;
...etc...
print "$_";
}
close IN;
close OUT;
};

Perl substitute with regex

When I run this command over a Perl one liner, it picks up the the regular expression -
so that can't be bad.
more tagcommands | perl -nle 'print /(\d{8}_\d{9})/' | sort
12012011_000005769
12012011_000005772
12162011_000005792
12162011_000005792
But when I run this script over the command invocation below, it does not pick up the
regex.
#!/usr/bin/perl
use strict;
my $switch="12012011_000005777";
open (FILE, "more /home/shortcasper/work/tagcommands|");
my #array_old = (<FILE>) ;
my #array_new = #array_old ;
foreach my $line(#array_new) {
$line =~ s/\d{8}_\d{9}/$switch/g;
print $line;
sleep 1;
}
This is the data that I am feeding into the script
/CASPERBOT/START URL=simplefile:///data/tag/squirrels/squirrels /12012011_000005777N.dart.gz CASPER=SeqRashMessage
/CASPERBOT/ADDSERVER simplefile:///data/tag/squirrels/12012011_0000057770.dart.trans.gz
/CASPERRIP/newApp multistitch CASPER_BIN
/CASPER_BIN/START URLS=simplefile:///data/tag/squirrels /12012011_000005777R.rash.gz?exitOnEOF=false;binaryfile:///data/tag/squirrels/12162011_000005792D.binaryBlob.gz?exitOnEOF=false;simplefile:///data/tag/squirrels/12012011_000005777E.bean.trans.gz?exitOnEOF=false EXTRACTORS=rash;island;rash BINARY=T
You should study your one-liner to see how it works. First check perl -h to learn about the switches used:
-l[octal] enable line ending processing, specifies line terminator
-n assume "while (<>) { ... }" loop around program
The first one is not exactly self-explanatory, but what -l actually does is chomp each line, and then change $\ and $/ to newline. So, your one-liner:
perl -nle 'print /(\d{8}_\d{9})/'
Actually does this:
$\ = "\n";
while (<>) {
chomp;
print /(\d{8}_\d{9})/;
}
A very easy way to see this is to use the Deparse command:
$ perl -MO=Deparse -nle 'print /(\d{8}_\d{9})/'
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
print /(\d{8}_\d{9})/;
}
-e syntax OK
So, that's how you transform that into a working script.
I have no idea how you went from that to this:
use strict;
my $switch="12012011_000005777";
open (FILE, "more /home/shortcasper/work/tagcommands|");
my #array_old = (<FILE>) ;
my #array_new = #array_old ;
foreach my $line(#array_new) {
$line =~ s/\d{8}_\d{9}/$switch/g;
print $line;
sleep 1;
}
First of all, why are you opening a pipe from the more command to read a text file? That is like calling a tow truck to fetch you a cab. Just open the file. Or better yet, don't. Just use the diamond operator, like you did the first time.
You don't need to first copy the lines of a file to an array, and then use the array. while(<FILE>) is a simple way to do it.
In your one-liner, you print the regex. Well, you print the return value of the regex. In this script, you print $line. I'm not sure how you thought that would do the same thing.
Your regex here will remove all set of numbers and replace it with the ones in your script. Nothing else.
You may also be aware that sleep 1 will not do what you think. Try this one-liner, for example:
perl -we 'for (1 .. 10) { print "line $_\n"; sleep 1; }'
As you will notice, it will simply wait 10 seconds then print everything at once. That's because perl by default prints to the standard output buffer (in the shell!), and that buffer is not printed until it is full or flushed (when the perl execution ends). So, it's a perception problem. Everything works like it should, you just don't see it.
If you absolutely want to have a sleep statement in your script, you'll probably want to autoflush, e.g. STDOUT->autoflush(1);
However, why are you doing that? Is it so you will have time to read the numbers? If so, put that more statement at the end of your one-liner instead:
perl ...... | more
That will pipe the output into the more command, so you can read it at your own pace. Now, for your one-liner:
Always also use -w, unless you specifically want to avoid getting warnings (which basically you never should).
Your one-liner will only print the first match. If you want to print all the matches on a new line:
perl -wnle 'print for /(\d{8}_\d{9})/g'
If you want to print all the matches, but keep the ones from the same line on the same line:
perl -wnle 'print "#a" if #a = /(\d{8}_\d{9})/g'
Well, that should cover it.
Your open call may be failing (you should always check the result of an open to make sure it succeeded if the rest of the program depends on it) but I believe your problem is in complicating things by opening a pipe from a more command instead of simply opening the file itself. Change the open to simply
open FILE, "/home/shortcasper/work/tagcommands" or die $!;
and things should improve.

How can I use 's///' if my string contains a '/'?

I'm using Perl 5.10.6 on Mac 10.6.6. I want to execute a simple search and replace against a file so I tried:
my $searchAndReplaceCmd = "perl -pi -e 's/\\Q${localTestDir}\\E//g' ${testSuiteFile}";
system( $searchAndReplaceCmd );
but the problem above is the variable $localTestDir contains directory separators ("/"), and this screws up the regular expression ...
Bareword found where operator expected at -e line 1, near
"s/\Q/home/selenium"
Backslash found where operator expected at -e
line 1, near "Live\" syntax error at -e line 1, near
"s/\Q/home/selenium"
Search pattern not terminated at -e line 1.
How do I do a search and replace on a file when the variable in question contains regular expression characters? Thanks.
It seems that $localTestDir has begins with a /.
Remedy by changing the regex delimiter to something other than /:
my $searchAndReplaceCmd = "perl -pi -e 's!\\Q${localTestDir}\\E!!g' ${testSuiteFile}";
From perldoc perlrequick :
$x = "A 39% hit rate";
$x =~ s!(\d+)%!$1/100!e; # $x contains "A 0.39 hit rate"
The last example shows that s/// can use other delimiters, such as
s!!! and s{}{}, and even s{}//. If single quotes are used s''', then
the regex and replacement are treated as single-quoted strings.
Question is why you do a search and replace from within perl, through the shell, within perl. Seems like a roundabout way of doing things, and you'll run into problems with shell interpolation.
The \Q ... \E should override the special characters in your string, so / "should" not be an issue. From perlre:
\Q quote (disable) pattern metacharacters till \E
Here's an alternative (untested), all perl solution. If you want to be extra certain, exchange the / delimiter to something else, such as s### (you can use any character as a delimiter).
use strict;
use warnings;
use File::Copy;
open my $fh, '<', $testSuiteFile or die $!;
open my $out, '>', $testSuiteFile . ".bak" or die $!;
while (<$fh>) {
s/\Q${localTestDir}\E//g;
print $out $_;
}
move($testSuiteFile . ".bak", $testSuiteFile) or die $!;
Or use Tie::File
use strict;
use warnings;
use Tie::File;
tie, my #file, 'Tie::File', $testSuiteFile or die $!;
for (#file) {
s/\Q${localTestDir}\E//g;
}
untie #file;
Changing the delimiters is useful, but more generally you can put a backslash in front of any regular expression character to make it non-special.
So s/\/abc/\/xyz/ will work, although it is not very readable.
The problem is that the substitution of $localTestDir is happening too soon.
Here is an approach that lets you use / for your re-delimiters:
system('perl', '-pi',
'-e', 'BEGIN { $dir = shift(#ARGV) };',
'-e', 's/\\Q$dir\\E//g',
$localTestDir,
$suiteTestFile
);
Note that this also protects the contents of $suiteTestFile from being interpreted by the shell.