Probably a very simple question:
I have a perl file sed.perl that takes as an input a string, makes some substitutions there and prints it on the standard output.
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
use feature 'say';
#use Cwd;
my ($text) = #ARGV;
$text =~ s/\.\)\n/'\.'\)\n/;
print $text;
I want to feed the script with a string output from a terminal pipeline. Let's say in this way:
cat input.txt | perl sed.perl
but this doesn't work: Use of uninitialized value $text in substitution (s///) at
Using a score symbol doesn't works either:
cat input.txt | perl sed.perl -
#ARGV doesn't do what you think it does. It's literally the arguments passed to perl.
E.g. :
myscript.pl some arg
#ARGV will host 'some', 'arg'.
What you want is the STDIN file handle.
e.g.
#!/usr/bin/perl
use strict;
use warnings;
while ( <STDIN> ) {
s/something/somethingelse/g;
print;
}
Now what this is doing is reading STDIN line by line. Your pattern includes \n. Do you actually need it? It looks like you're 'just' using it as a line anchor, and so you could use:
s/\.\)$/'\.'\)/g;
$ is the regex for "end of line" - see perlre for more.
However, as noted in the comments by reinierpost - there's another thing that's probably useful to know - perl has the "diamond operator" <> which does two things:
If filenames are specified to the script, opens them and reads from them.
If no arguments specified, reads STDIN.
So you could do:
while ( <> ) {
s/something/somethingelse/g;
print;
}
And then your script can either be invoked by:
cat input.txt | ./yourscript.pl
Or:
./yourscript.pl input.txt
And you'll have the same result.
You're trying to read the text as if it were an argument, when in fact with a pipeline the information from input.txt will be something you read from stdin. With a pipeline stdout from the left process is connected to stdin of the right process.
As others have mentioned, #ARGV contains arguments to your script, and you'll want to use STDIN instead.
The shortest solution seems to be:
#/usr/bin/env perl
use strict;
while(<>) {
print s/\.\)$/'.')/r;
}
Note that you could also achieve the results of your example using a one-liner as follows:
cat input.txt | perl -pe 's/your/substitution/'
Related
I have a temp file with contents:
a
b
c
d
e
When I run sed 's#b#batman\nRobin#' temp from command line, I get:
a
batman
Robin
c
d
e
However, when I run the command from a Perl scriptL
#!/usr/bin/perl
use strict;
use warnings;
`sed 's#b#batman\nRobin#' temp`
It produces error:
sed: -e expression #1, char 10: unterminated `s' command
What am I doing wrong?
Why run another tool like sed once you are inside a Perl program? If anything, now you have far more tools and power so just do it with Perl.
One simple way to do your sed thing
use warnings;
use strict;
die "Usage: $0 file(s)\n" if not #ARGV;
while (<>) {
s/b/batman\nRobin/;
print;
}
Run this program by supplying the file (temp) to it on the command line. The die line is there merely to support/enforce such usage; it is inessential for script's operation.
This program then is a simple filter
<> operator reads line by line all files submitted on the command line
A line is assigned by it to $_ variable, a default for many things in Perl
The s/// operator by default binds to $_, which gets changed (if pattern matches)
print by default prints the $_ variable
Use nearly anything you want for delimiters in regex, see m// and s/// operators
This can also be done as
while (<>) {
print s/b/batman\nRobin/r
}
With /r modifier s/// returns the changed string (or the original if pattern didn't match)
Finally that's also just
print s/b/batman\nRobin/r while <>;
but I'd expect that with a script you really want to do more and then this probablyisn't it.
On the other side of things you could write it more properly
use warnings;
use strict;
use feature qw(say);
die "Usage: $0 file(s)\n" if not #ARGV;
while (my $line = <>) {
chomp $line;
$line =~ s/b/batman\nRobin/;
say $line;
}
With a line in a lexical variable nicely chomp-ed this is ready for more work.
I have a perl script which reads a file, changes the required thing and then prints the output of the file on the console.
I want the output to be updated in the same file from where it is picking the data.
How can this be done?
You can use the -i switch, or the $^I special variable.
perl -i.backup -pe 's/change me/something else/'
or
#!/usr/bin/perl
use warnings;
use strict;
$^I = '.backup';
while (<>) {
...
print;
}
Note that it only works for the special file handle *ARGV used by the diamond operator. It creates a new file behind the scenes, anyway.
See perlrun and perlvar.
What's the use of <> in Perl. How to use it ?
If we simply write
<>;
and
while(<>)
what is that the program doing in both cases?
The answers above are all correct, but it might come across more plainly if you understand general UNIX command line usage. It is very common to want a command to work on multiple files. E.g.
ls -l *.c
The command line shell (bash et al) turns this into:
ls -l a.c b.c c.c ...
in other words, ls never see '*.c' unless the pattern doesn't match. Try this at a command prompt (not perl):
echo *
you'll notice that you do not get an *.
So, if the shell is handing you a bunch of file names, and you'd like to go through each one's data in turn, perl's <> operator gives you a nice way of doing that...it puts the next line of the next file (or stdin if no files are named) into $_ (the default scalar).
Here is a poor man's grep:
while(<>) {
print if m/pattern/;
}
Running this script:
./t.pl *
would print out all of the lines of all of the files that match the given pattern.
cat /etc/passwd | ./t.pl
would use cat to generate some lines of text that would then be checked for the pattern by the loop in perl.
So you see, while(<>) gets you a very standard UNIX command line behavior...process all of the files I give you, or process the thing I piped to you.
<>;
is a short way of writing
readline();
or if you add in the default argument,
readline(*ARGV);
readline is an operator that reads a line from the specified file handle. Reading from the special file handle ARGV will read from STDIN if #ARGV is empty or from the concatenation of the files named by #ARGV if it's not.
As for
while (<>)
It's a syntax error. If you had
while (<>) { ... }
it get rewritten to
while (defined($_ = <>)) { ... }
And as previously explained, that means the same as
while (defined($_ = readline(*ARGV))) { ... }
That means it will read lines from (previously explained) ARGV until there are no more lines to read.
It is called the diamond operator and feeds data from either stdin if ARGV is empty or each line from the files named in ARGV. This webpage http://docstore.mik.ua/orelly/perl/learn/ch06_02.htm explains it very well.
In many cases of programming with syntactical sugar like this, Deparse of O is helpful to find out what's happening:
$ perl -MO=Deparse -e 'while(<>){print 42}'
while (defined($_ = <ARGV>)) {
print 42;
}
-e syntax OK
Quoting perldoc perlop:
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk, and any other Unix filter program that takes
a list of filenames, doing the same to each line of input from all of
them. Input from <> comes either from standard input, or from each
file listed on the command line.
it takes the STDIN standard input:
> cat temp.pl
#!/usr/bin/perl
use strict;
use warnings;
my $count=<>;
print "$count"."\n";
>
below is the execution:
> temp.pl
3
3
>
so as soon as you execute the script it will wait till the user gives some input.
after 3 is given as input,it stores that value in $count and it prints the value in the next statement.
When I run this command over a Perl one liner, it picks up the the regular expression -
so that can't be bad.
more tagcommands | perl -nle 'print /(\d{8}_\d{9})/' | sort
12012011_000005769
12012011_000005772
12162011_000005792
12162011_000005792
But when I run this script over the command invocation below, it does not pick up the
regex.
#!/usr/bin/perl
use strict;
my $switch="12012011_000005777";
open (FILE, "more /home/shortcasper/work/tagcommands|");
my #array_old = (<FILE>) ;
my #array_new = #array_old ;
foreach my $line(#array_new) {
$line =~ s/\d{8}_\d{9}/$switch/g;
print $line;
sleep 1;
}
This is the data that I am feeding into the script
/CASPERBOT/START URL=simplefile:///data/tag/squirrels/squirrels /12012011_000005777N.dart.gz CASPER=SeqRashMessage
/CASPERBOT/ADDSERVER simplefile:///data/tag/squirrels/12012011_0000057770.dart.trans.gz
/CASPERRIP/newApp multistitch CASPER_BIN
/CASPER_BIN/START URLS=simplefile:///data/tag/squirrels /12012011_000005777R.rash.gz?exitOnEOF=false;binaryfile:///data/tag/squirrels/12162011_000005792D.binaryBlob.gz?exitOnEOF=false;simplefile:///data/tag/squirrels/12012011_000005777E.bean.trans.gz?exitOnEOF=false EXTRACTORS=rash;island;rash BINARY=T
You should study your one-liner to see how it works. First check perl -h to learn about the switches used:
-l[octal] enable line ending processing, specifies line terminator
-n assume "while (<>) { ... }" loop around program
The first one is not exactly self-explanatory, but what -l actually does is chomp each line, and then change $\ and $/ to newline. So, your one-liner:
perl -nle 'print /(\d{8}_\d{9})/'
Actually does this:
$\ = "\n";
while (<>) {
chomp;
print /(\d{8}_\d{9})/;
}
A very easy way to see this is to use the Deparse command:
$ perl -MO=Deparse -nle 'print /(\d{8}_\d{9})/'
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
chomp $_;
print /(\d{8}_\d{9})/;
}
-e syntax OK
So, that's how you transform that into a working script.
I have no idea how you went from that to this:
use strict;
my $switch="12012011_000005777";
open (FILE, "more /home/shortcasper/work/tagcommands|");
my #array_old = (<FILE>) ;
my #array_new = #array_old ;
foreach my $line(#array_new) {
$line =~ s/\d{8}_\d{9}/$switch/g;
print $line;
sleep 1;
}
First of all, why are you opening a pipe from the more command to read a text file? That is like calling a tow truck to fetch you a cab. Just open the file. Or better yet, don't. Just use the diamond operator, like you did the first time.
You don't need to first copy the lines of a file to an array, and then use the array. while(<FILE>) is a simple way to do it.
In your one-liner, you print the regex. Well, you print the return value of the regex. In this script, you print $line. I'm not sure how you thought that would do the same thing.
Your regex here will remove all set of numbers and replace it with the ones in your script. Nothing else.
You may also be aware that sleep 1 will not do what you think. Try this one-liner, for example:
perl -we 'for (1 .. 10) { print "line $_\n"; sleep 1; }'
As you will notice, it will simply wait 10 seconds then print everything at once. That's because perl by default prints to the standard output buffer (in the shell!), and that buffer is not printed until it is full or flushed (when the perl execution ends). So, it's a perception problem. Everything works like it should, you just don't see it.
If you absolutely want to have a sleep statement in your script, you'll probably want to autoflush, e.g. STDOUT->autoflush(1);
However, why are you doing that? Is it so you will have time to read the numbers? If so, put that more statement at the end of your one-liner instead:
perl ...... | more
That will pipe the output into the more command, so you can read it at your own pace. Now, for your one-liner:
Always also use -w, unless you specifically want to avoid getting warnings (which basically you never should).
Your one-liner will only print the first match. If you want to print all the matches on a new line:
perl -wnle 'print for /(\d{8}_\d{9})/g'
If you want to print all the matches, but keep the ones from the same line on the same line:
perl -wnle 'print "#a" if #a = /(\d{8}_\d{9})/g'
Well, that should cover it.
Your open call may be failing (you should always check the result of an open to make sure it succeeded if the rest of the program depends on it) but I believe your problem is in complicating things by opening a pipe from a more command instead of simply opening the file itself. Change the open to simply
open FILE, "/home/shortcasper/work/tagcommands" or die $!;
and things should improve.
What does reading from <> do in Perl? For example, what will the following do?
print for(<>);
The so-called diamond operator (<>) reads line-by-line (in scalar context) from STDIN or the filename(s) specified as command-line arguments.
From perldoc perlop:
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk. Input from <> comes either from standard
input, or from each file listed on the command line. Here's how it
works: the first time <> is evaluated, the #ARGV array is checked,
and if it is empty, $ARGV[0] is set to "-", which when opened
gives you standard input. The #ARGV array is then processed as a
list of filenames.
In list context, <> returns all lines, with each line stored as an element in the list.
This means that print for <>; will do the same thing as print while <>;, albeit with more memory.
You've found the single most magical piece of Perl. Well, I'm sure there's more magical things, but this little idiom makes it very easy to write programs intended for shell pipeline use and file-operation use.
When run without any arguments, <> will read lines one-at-a-time from standard input.
When run with arguments, it'll treat the arguments as filenames and read lines one-at-a-time from the named files in turn.
A short demo:
$ cat > print.pl
#!/usr/bin/perl -w
print for(<>);
$ chmod 755 print.pl
$ echo hello world | ./print.pl
hello world
$ ./print.pl print.pl
#!/usr/bin/perl -w
print for(<>);
$ ./print.pl print.pl print.pl
#!/usr/bin/perl -w
print for(<>);
#!/usr/bin/perl -w
print for(<>);
$
I typed in the program by hand there; hit ^D when you've typed it in completely.
It reads from standard input, one line at a time, and stores it to $_. print then prints out $_ by default since it is not given an argument. This program reads from standard input and echoes to standard output until it reaches EOF.