Perl one-liner: how to reference the filename passed in when -ne or -pe commandline switches are used - perl

In Perl, it's normally easy enough to get a reference to the commandline arguments. I just use $ARGV[0] for example to get the name of a file that was passed in as the first argument.
When using a Perl one-liner, however, it seems to no longer work. For example, here I want to print the name of the file that I'm iterating through if a certain string is found within it:
perl -ne 'print $ARGV[0] if(/needle/)' haystack.txt
This doesn't work, because ARGV doesn't get populated when the -n or -p switch is used. Is there a way around this?

What you are looking for is $ARGV. Quote from perlvar:
$ARGV
Contains the name of the current file when reading from <> .
So, your one-liner would become:
perl -ne 'print $ARGV if(/needle/)' haystack.txt
Though be aware that it will print once for each match. If you want a newline added to the print, you can use the -l option.
perl -lne 'print $ARGV if(/needle/)' haystack.txt
If you want it to print only once for each match, you can close the ARGV file handle and make it skip to the next file:
perl -lne 'if (/needle/) { print $ARGV; close ARGV }' haystack.txt haystack2.txt
As Peter Mortensen points out, $ARGV and $ARGV[0] are two different variables. $ARGV[0] refers to the first element of the array #ARGV, whereas $ARGV is a scalar which is a completely different variable.
You say that #ARGV is not populated when using the -p or -n switch, which is not true. The code that runs silently is something like:
while (#ARGV) {
$ARGV = shift #ARGV; # arguments are removed during runtime
open ARGV, $ARGV or die $!;
while (defined($_ = <ARGV>)) { # long version of: while (<>) {
# your code goes here
} continue { # when using the -p switch
print $_; # it includes a print statement
}
}
Which in essence means that using $ARGV[0] will never show the real file name, because it is removed before it is accessed, and placed in $ARGV.

Related

Perl backticks subprocess is causing EOF on STDIN

I'm having this issue with my perl program that is reading from a file (which I open on STDIN and read each line one at a time using $line = <>). After I execute a `backtick` command, and then I go to read the next line from STDIN, I get an undef, signaling EOF. I isolated it to the backtick command using debugging code as follows:
my $dir = dirname(__FILE__);
say STDERR "before: tell(STDIN)=" . tell(STDIN) . ", eof(STDIN)=" . eof(STDIN);
say STDERR "\#export_info = `echo nostdin | perl $dir/pythonizer_importer.pl $fullfile`;";
#export_info = `echo nostdin | perl $dir/pythonizer_importer.pl $fullfile`;
say STDERR "after: tell(STDIN)=" . tell(STDIN) . ", eof(STDIN)=" . eof(STDIN);
The output is:
before: tell(STDIN)=15146, eof(STDIN)=
#export_info = `echo nostdin | perl ../pythonizer_importer.pl ./Pscan.pm`;
after: tell(STDIN)=15146, eof(STDIN)=1
I recently added the echo nostdin | to the perl command which had no effect. How do I run this command and get the STDOUT without messing up my STDIN? BTW, this is all running on Windows. I fire off the main program from a git bash if that matters.
Try locally undefining STDIN before running the backticks command, like this example script does. Note that any subroutines called from the sub that calls local will see the new value. You can also do open STDIN, "<", "file for child process to read"; after the local *STDIN but before the backticks but remember to close() the file before restoring STDIN to its old value.
The child process is affecting your STDIN because "the STDIN filehandle used by the command is inherited from Perl's STDIN." – perlop manual
This is just an example; in your actual script, replace the sed command with your actual command to run.
use strict;
use warnings;
#Run a command and get its output
sub get_output {
# Prevent passing our STDIN to child process
local *STDIN = undef;
print "Running sed\n";
#replace the sed command with the actual command you want to run
return `sed 's/a/b/'`;
}
my $output = get_output();
print $output;
#We can still read STDIN even after running a child process
print "Waiting for input\n";
print "Readline is " . scalar readline;
Input:
a
b
c
^D
line
Output:
Running sed
b
b
c
Waiting for input
Readline is line

Perl autosplit with non-file arguments

I'm trying to create a perl script that autosplits the STDIN, and then does something to column X. But I want the column to be passed by argument to the script. However, apparently, when I invoke perl with the autosplit flag, it switches to a mode where it tries to implicitly open the "files" defined in the command line arguments, even though my arguments aren't files.
Example:
my $column = shift;
while(<STDIN>) {
print "F $column: $F[$column]!\n";
}
Then I try to run it with argument 2 to print the 2nd column:
$ echo -e "1 22\n2 33\n3 55" | perl -a myscript.pl 2
When I turn on warnings and such, I get this error message:
Can't open 2: No such file or directory (#1)
(S inplace) The implicit opening of a file through use of the <>
filehandle, either implicitly under the -n or -p command-line
switches, or explicitly, failed for the indicated reason. Usually
this is because you don't have read permission for a file which
you named on the command line.
(F) You tried to call perl with the -e switch, but /dev/null (or
your operating system's equivalent) could not be opened.
If I omit the -a, I don't get any errors and all is good, but then I need to manually split my lines. Is there any other flag I can use to make autosplit not do that? Or alternatively, can I start autosplit inside the script, so that it won't try to implicitly open the arguments as files? I can manually split my input, I just wanted to know what other alternatives I have.
There's not much of a "mode" here. -n adds
while (readline) {
...
}
around your code; -a adds a split statement:
while (readline) {
our #F = split;
...
}
readline (without argument) reads from the filenames specified in #ARGV (or STDIN if #ARGV is empty).
-a implicitly enables -n because otherwise there is nothing for it to do; there is no "autosplit mode", it simply adds code around your program.
Your best bet is to split manually:
while (<STDIN>) {
my #F = split;
print "F $column: $F[$column]!\n";
}
It's fairly short and it makes it obvious what's going on.
I don't like using the command line shortcut switches (e.g. -a, -p, etc.) in script files. In a file you have enough space to write multiple lines, properly formatted / indented. It's better to be explicit.
That said, you can technically do this:
#!perl -na
our $column;
BEGIN { $column = shift; }
print "F $column: $F[$column]!\n";
This is equivalent to:
while (readline) {
our #F = split;
our $column;
BEGIN { $column = shift; }
print "F $column: $F[$column]!\n";
}
The BEGIN block only runs once, at compile time. Provided the script is only called with one argument, this will remove it from #ARGV before the readline loop starts, which will make it read from STDIN.
-n, -p, -l, -a and -F don't change the state of the Perl interpreter. They literally alter the source code.
$ perl -MO=Deparse -a -ne'f()'
LINE: while (defined($_ = readline ARGV)) {
our #F = split(' ', $_, 0);
f();
}
-e syntax OK
As you can see, there's really no such thing as autosplit. It's just a normal split injected into the source code, and it makes no sense to ask Perl to insert a split instead of just inserting a split. So, simply add the following to your loop:
my #F = split;

Perl search for string and get the full line from text file

I want to search for a string and get the full line from a text file through Perl scripting.
So the text file will be like the following.
data-key-1,col-1.1,col-1.2
data-key-2,col-2.1,col-2.2
data-key-3,col-3.1,col-3.2
Here I want to apply data-key-1 as the search string and get the full line into a Perl variable.
Here I want the exact replacement of grep "data-key-1" data.csv in the shell.
Some syntax like the following worked while running in the console.
perl -wln -e 'print if /\bAPPLE\b/' your_file
But how can I place it in a script? With the perl keyword we can't put it into a script. Is there a way to avoid the loops?
If you'd know the command line options you are giving for your one-liner, you'd know exactly what to write inside your perl script. When you read a file, you need a loop. Choice of loop can yield different results performance wise. Using for loop to read a while is more expensive than using a while loop to read a file.
Your one-liner:
perl -wln -e 'print if /\bAPPLE\b/' your_file
is basically saying:
-w : Use warnings
-l : Chomp the newline character from each line before processing and place it back during printing.
-n : Create an implicit while(<>) { ... } loop to perform an action on each line
-e : Tell perl interpreter to execute the code that follows it.
print if /\bAPPLE\b/ to print entire line if line contains the word APPLE.
So to use the above inside a perl script, you'd do:
#!usr/bin/perl
use strict;
use warnings;
open my $fh, '<', 'your_file' or die "Cannot open file: $!\n";
while(<$fh>) {
my $line = $_ if /\bAPPLE\b/;
# do something with $line
}
chomp is not really required here because you are not doing anything with the line other then checking for an existence of a word.
open($file, "<filename");
while(<$file>) {
print $_ if ($_ =~ /^data-key-3,/);
}
use strict;
use warnings;
# the file name of your .csv file
my $file = 'data.csv';
# open the file for reading
open(FILE, "<$file") or
die("Could not open log file. $!\n");
#process line by line:
while(<FILE>) {
my($line) = $_;
# remove any trail space (the newline)
# not necessary, but again, good habit
chomp($line);
my #result = grep (/data-key-1/, $line);
push (#final, #result);
}
print #final;

How do I use perl like sed?

I have a file that has some entries like
--ERROR--- Failed to execute the command with employee Name="shayam" Age="34"
--Successfully executed the command with employee Name="ram" Age="55"
--ERROR--- Failed to execute the command with employee Name="sam" Age="23"
--ERROR--- Failed to execute the command with employee Name="yam" Age="3"
I have to extract only the Name and Age of those for whom the command execution was failed.
in this case i need to extract shayam 34 sam 23 yam 3. I need to do this in perl. thanks a lot..
perl -p -e 's/../../g' file
Or to inline replace:
perl -pi -e 's/../../g' file
As a one-liner:
perl -lne '/^--ERROR---.*Name="(.*?)" Age="(.*?)"/ && print "$1 $2"' file
Your title makes it not clear. Anyway...
while(<>) {
next if !/^--ERROR/;
/Name="([^"]+)"\s+Age="([^"]+)"/;
print $1, " ", $2, "\n";
}
can do it reading from stdin; of course, you can change the reading loop to anything else and the print with something to populate an hash or whatever according to your needs.
As a one liner, try:
perl -ne 'print "$1 $2\n" if /^--ERROR/ && /Name="(.*?)"\s+Age="(.*?)"/;'
This is a lot like using sed, but with Perl syntax.
The immediate question of "how do I use perl like sed?" is best answered with s2p, the sed to perl converter. Given the command line, "sed $script", simply invoke "s2p $script" to generate a (typically unreadable) perl script that emulates sed for the given set of commands.
Refer to comments :
my #a = <>; # Reading entire file into an array
chomp #a; # Removing extra spaces
#a = grep {/ERROR/} #a; # Removing lines that do not contain ERROR
# mapping with sed-like regexp to keep only names and ages :
#a = map {s/^.*Name=\"([a-z]+)\" Age=\"([0-9]+)\".*$/$1 $2/; $_} #a;
print join " ",#a; # print of array content

Why can't I get the output of a command with system() in Perl?

When executing a command on the command-line from Perl, is there a way to store that result as a variable in Perl?
my $command = "cat $input_file | uniq -d | wc -l";
my $result = system($command);
$result always turns out to be 0.
Use "backticks":
my $command = "cat $input_file | uniq -d | wc -l";
my $result = `$command`;
And if interested in the exit code you can capture it with:
my $retcode = $?;
right after making the external call.
From perlfaq8:
Why can't I get the output of a command with system()?
You're confusing the purpose of system() and backticks (````). system() runs a command and returns exit status information (as a 16 bit value: the low 7 bits are the signal the process died from, if any, and the high 8 bits are the actual exit value). Backticks (``) run a command and return what it sent to STDOUT.
$exit_status = system("mail-users");
$output_string = `ls`;
You can use the Perl back-ticks to run a shell command, and save the result in an array.
my #results = `$command`;
To get just a single result from the shell command, you can store it in a scalar variable:
my $result = `$command`;
If you are expecting back multiple lines of output, it's easier to use an array, but if you're just expecting back one line, it's better to use scalar.
(something like that, my perl is rusty)
You can use backticks, as others have suggested. That's fine if you trust whatever variables you're using to build your command.
For more flexibility, you can open the command as a pipe and read from that as you would a file. This is particularly useful when you want to pass variables as command line arguments to the program and you don't trust their source to be free of shell escape characters, as open in recent Perl (>= 5.8) has the capacity to invoke a program from an argument list. So you can do the following:
open(FILEHANDLE, '-|', 'uniq', 'some-file.txt') or die "Cannot fork: $!\n";
while (<FILEHANDLE>) {
# process a line from $_
}
close FILEHANDLE or die "child error: $!\n";
IPC::System::Simple provides the 'capture' command which provides a safe, portable alternative to backticks. It (and other commands from this module) are highly recommended.
I'm assuming this a 'contrived' example, because this is
equivalent: 'uniq -d $input_file | wc -l'.
In almost all my experience, the only reason for putting the results
in to a perl variable, is to parse the later. In that case, I use
the following pattern:
$last = undef;
$lc = 0;
open(FN, "$input_file");
while (<FN>) {
# any other parsing of the current line
$lc++ if ($last eq $_);
$last = $_;
}
close(FN);
print "$lc\n";
This also has the added advantages:
no fork for shell, cat, uniq, and wc
faster
parse and collect the desired input