< operator in UNIX, passing to Perl script - perl

When evaluating if(-t STDIN), does the < UNIX operator count as STDIN? If not, how do I get that data?
So someone types perl example.pl < testing.txt. This doesn't behave like data piped in via ls | ./example.pl. How can I get that behavior?

Test -p STDIN, which checks if the filehandle STDIN is attached to a pipe.
touch foo
perl -e 'print -p STDIN' < foo # nothing
cat foo | perl -e 'print -p STDIN' # 1
But I'm not sure I understand your question. In all three of these cases
1. perl -e 'print $_=<STDIN>' < <(echo foo)
2. echo foo | perl -e 'print $_=<STDIN>'
3. perl -e 'print $_=<STDIN>' # then type "foo\n" to the console
the inputs are the same and all accessible through the STDIN filehandle. In the first two cases, -t STDIN will evaluate to false, and in the second case, -p STDIN will be true.
The differences in behavior between these three cases are subtle, and usually not important. The third case, obviously, will wait until at least one line of input (terminated with "\n" or EOF) is received. The difference between the first two cases is even more subtle. When the input to your program is piped from the output of another process, you are somewhat at the mercy of that first process with respect to latency or whether that program buffers its output.
Maybe you could expand on what you mean when you say
perl example.pl < testing.txt
doesn't behave like
ls | ./example.pl

-t tests whether or not STDIN is attached to a tty.
When you pipe data to perl, it will not be attached to a tty. This should not depend on the mechanism you use to pipe (ie, whether you pipe a command using | or pipe a file using <.) However, you will have a tty attached when you run the program directly. Given the following example:
#!/usr/bin/perl
print ((-t STDIN) ? "is a tty\n" : "is not a tty\n");
You would expect the following output:
% perl ./ttytest.pl
is a tty
% perl ./ttytest.pl < somefile
is not a tty
% ls | perl ./ttytest.pl
is not a tty

Related

How can I force perl to process args ONLY from stdin and not from a file on command line?

If I have this inline command:
perl -pi -e 's/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g'
Which is simply to substitute four-digit hex, and add it a 'x' in front. -i used with no filenames on the command line, reading from STDIN. So for params: 0000 0776, results are \x00\x00\x07\x76
I know, that if -n or -p (with printing) called, perl takes <> diamond. But I want to pass args only AFTER command, but perl assumes it as files to read. So how do I force -n or -p to regard args after command to be regular args for <> in program, and not args as files to read?
Also, I do not understand the role of i here. If i would not include it, then I would be adding args line after line (as does <>), but with i, it takes all my args at once?
If there are no arguments (i.e., if #ARGV is empty), then your one-line script (which implicitly uses <>) will read input from STDIN. So the solution is to clear #ARGV at compile time.
perl -pi -e 'BEGIN{#ARGV=()}
s/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g'
Another solution: Force ARGV (the implicit file handle that the base <> operator reads from) to point to STDIN. This solution doesn't clobber your #ARGV, if any.
perl -pi -e 'BEGIN{*ARGV=*STDIN}
s/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g'
The -p option is equivalent to the following code:
LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
-n is the same without the continue block. There's no way to change what it reads from (which is unfortunate, since <<>> and <STDIN> are both safer options), but it's pretty easy to replicate it with your modification (the error checking is rarely necessary here):
perl -e 'while (<STDIN>) { s/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g } continue { print }'

xargs pass multiple arguments to perl subroutine?

I know how to pipe multiple arguments with xargs:
echo a b | xargs -l bash -c '1:$0 2:$1'
and I know how to pass the array of arguments to my perl module's subroutine from xargs:
echo a b | xargs --replace={} perl -I/home/me/module.pm -Mme -e 'me::someSub("{}")'
But I can't seem to get multiple individual arguments passed to perl using those dollar references (to satisfy the me::someSub signature):
echo a b | xargs -l perl -e 'print("$0 $1")'
Just prints:
-e
So how do I get the shell arguments: $0, $1 passed to my perl module's subroutine?
I know I could just delimit a;b so that the xarg {} could be processed by perl splitting it to get individual arguments), but I could also just completely process all STDIN with perl. Instead, my objective is to use perl -e so that I can explicitly call the subroutine I want (rather than having some pre-process in the script that figures out what subroutine to call and what arguments to use based on STDIN, to avoid script maintenance costs).
While bash's argument are available as $# and $0, $1, $2, etc, Perl's arguments are available via #ARGV. This means that the Perl equivalent of
echo a b | xargs -l bash -c 'echo "1:$0 2:$1"'
is
echo a b | xargs -l perl -e'CORE::say "1:$ARGV[0] 2:$ARGV[1]"'
That said, it doesn't make sense to use xargs in this way because there's no way to predict how many times it will call perl, and there's no way to predict how many arguments it will pass to perl each time. You have an XY Problem, and you haven't provided any information to help us. Maybe you're looking for
perl -e'CORE::say "1:$ARGV[0] 2:$ARGV[1]"' $( echo a b )
I am not sure about the details of your design, so I take it that you need a Perl one-liner to use shell's variables that are seen in the scope in which it's called.
A perl -e'...' executes a Perl program given under ''. For any variables from the environment where this program runs -- a pipeline, or a shell script -- to be available to the program their values need be passed to it. Ways to do this with a one-liner are spelled out in this post, and here is a summary.
A Perl program receives arguments passed to it on the command-line in #ARGV array. So you can invoke it in a pipeline as
... | perl -e'($v1, $v2) = #ARGV; ...' "$0" "$1"
or as
... | xargs -l perl -e'($v1, $v2) = #ARGV; ...'
if xargs is indeed used to feed the Perl program its input. In the first example the variables are quoted to protect possible interesting characters in them (spaces, *, etc) from being interpreted by the shell that sets up and runs the perl program.
If input contains multiple lines to process and the one-liner uses -n or -p for it then unpack arguments in a BEGIN block
... | perl -ne'BEGIN { ($v1, $v2) = splice(#ARGV,0,2) }; ...' "$0" "$1" ...
which runs at compile time, so before the loop over input lines provided by -n/-p. The arguments other than filenames are now removed from #ARGV, so to leave only the filenames there for -n/-p, in case input comes from files.
There is also a rudimentary mechanism for command-line switches in a one-liner, via the -s switch. Please see the link above for details; I'd recommend #ARGV over this.
Finally, your calling code could set up environment variables which are then available to the Perl progam in %ENV. However, that doesn't seem to be suitable to what you seem to want.
Also see this post for another example.

What is the default `perl` print target?

I was assuming that print "foo" is just a shortcut for print STDOUT "foo".
However I noticed that (at least) in combination with the -i switch this assumption no longer holds:
perl -ni 'print $_' file
this simply does not change the content of the file.
perl -ni 'print STDOUT $_' file
This however prints the content to the terminal and leaves the file empty.
Therefore the question: What is the default target of print? I.e. where does the first print print to?
perldoc -f print says:
To set the default output handle to something other than STDOUT, use the select operation.
Obviously perl internally used the select operation to set the default output handle to something else. Is there any way to explicitly specify the current default output handle?
This would allow me to write something like
perl -i -wne 'print {/b/ ? STDOUT : XXX } $_' file
to build a grep which removes all printed lines.
The default filehandle for print is controlled by the select function. It defaults to STDOUT, but can be changed at any time.
To quote the documentation:
select FILEHANDLE
select
Returns the currently selected filehandle. If FILEHANDLE is supplied, sets the new current default filehandle for output. This has
two effects: first, a write or a print without a filehandle default to
this FILEHANDLE. Second, references to variables related to output
will refer to this output channel.
If you look at the documentation for the -i option, you'll see that the expansion includes the line select(ARGVOUT). That's what causes output to go back to the file you're editing.
ARGVOUT is special only during -i processing, but it does work there:
$ cat foo
a1
b1
a2
b2
$ perl -i -wne 'print {/b/ ? STDOUT : ARGVOUT } $_' foo
b1
b2
$ cat foo
a1
a2
You can also use perl -i -wne 'print {/b/ ? STDOUT : select } $_' foo since ARGVOUT will be the currently selected filehandle.
It prints to the currently-selected handle, which is STDOUT by default. You can change the selected handle yourself using the one-argument form of select, and in-place editing mode -i flag / $^I automatically selects the destination file for you (see the description of -i in perlrun for code equivalent to what -i does).

perl line-mode oneliner with ARGV [duplicate]

This question already has answers here:
How can I process options using Perl in -n or -p mode?
(2 answers)
Closed last year.
I often need to run some Perl one-liners for fast data manipulations, like
some_command | perl -lne 'print if /abc/'
Reading from a pipe, I don't need a loop around the command arg filenames. How can I achieve the next?
some_command | perl -lne 'print if /$ARGV[0]/' abc
This gives the error:
Can't open abc: No such file or directory.
I understand that the '-n' does the
while(<>) {.... }
around my program, and the <> takes args as filenames, but doing the next every time is a bit impractical
#/bin/sh
while read line
do
some_command | perl -lne 'BEGIN{$val=shift #ARGV} print if /$val/' "$line"
done
Is there some better way to get "inside" the Perl ONE-LINER command line arguments without getting them interpreted as filenames?
Some solutions:
perl -e'while (<STDIN>) { print if /$ARGV[0]/ }' pat
perl -e'$p = shift; while (<>) { print if /$p/ }' pat
perl -e'$p = shift; print grep /$p/, <>' pat
perl -ne'BEGIN { $p = shift } print if /$p/' pat
perl -sne'print if /$p/' -- -p=pat
PAT=pat perl -ne'print if /$ENV{PAT}/'
Of course, it might make more sense to create a pattern that's an ORing or all patterns rather than executing the same command for each pattern.
Also reasonably short:
... | expr=abc perl -lne 'print if /$ENV{expr}/'
Works in bash shell but maybe not other shells.
It depends on what you think will be in the lines you read, but you could play with:
#/bin/sh
while read line
do
some_command | perl -lne "print if /$line/"
done
Clearly, if $line might contain slashes, this is not going to fly. Then, AFAIK, you're stuck with the BEGIN block formulation.

Perl's diamond operator: can it be done in bash?

Is there an idiomatic way to simulate Perl's diamond operator in bash? With the diamond operator,
script.sh | ...
reads stdin for its input and
script.sh file1 file2 | ...
reads file1 and file2 for its input.
One other constraint is that I want to use the stdin in script.sh for something else other than input to my own script. The below code does what I want for the file1 file2 ... case above, but not for data provided on stdin.
command - $# <<EOF
some_code_for_first_argument_of_command_here
EOF
I'd prefer a Bash solution but any Unix shell is OK.
Edit: for clarification, here is the content of script.sh:
#!/bin/bash
command - $# <<EOF
some_code_for_first_argument_of_command_here
EOF
I want this to work the way the diamond operator would work in Perl, but it only handles filenames-as-arguments right now.
Edit 2: I can't do anything that goes
cat XXX | command
because the stdin for command is not the user's data. The stdin for command is my data in the here-doc. I would like the user data to come in on the stdin of my script, but it can't be the stdin of the call to command inside my script.
Sure, this is totally doable:
#!/bin/bash
cat $# | some_command_goes_here
Users can then call your script with no arguments (or '-') to read from stdin, or multiple files, all of which will be read.
If you want to process the contents of those files (say, line-by-line), you could do something like this:
for line in $(cat $#); do
echo "I read: $line"
done
Edit: Changed $* to $# to handle spaces in filenames, thanks to a helpful comment.
Kind of cheezy, but how about
cat file1 file2 | script.sh
I am (like everyone else, it seems) a bit confused about exactly what the goal is here, so I'll give three possible answers that may cover what you actually want. First, the relatively simple goal of getting the script to read from either a list of files (supplied on the command line) or from its regular stdin:
if [ $# -gt 0 ]; then
exec < <(cat "$#")
fi
# From this point on, the script's stdin is redirected from the files
# (if any) supplied on the command line
Note: the double-quoted use of $# is the best way to avoid problems with funny characters (e.g. spaces) in filenames -- $* and unquoted $# both mess this up. The <() trick I'm using here is a bash-only feature; it fires off cat in the background to feed data from files supplied on the command line, and then we use exec to replace the script's stdin with the output from cat.
...but that doesn't seem to be what you actually want. What you seem to really want is to pass the supplied filenames or the script's stdin as arguments to a command inside the script. This requires sort of the opposite process: converting the script's stdin into a file (actually a named pipe) whose name can be passed to the command. Like this:
if [[ $# -gt 0 ]]; then
command "$#" <<EOF
here-doc goes here
EOF
else
command <(cat) <<EOF
here-doc goes here
EOF
fi
This uses <() to launder the script's stdin through cat to a named pipe, which is then passed to command as an argument. Meanwhile, command's stdin is taken from the here-doc.
Now, I think that's what you want to do, but it's not quite what you've asked for, which is to both redirect the script's stdin from the supplied files and pass stdin to the command inside the script. This can be done by combining the above techniques:
if [ $# -gt 0 ]; then
exec < <(cat "$#")
fi
command <(cat) <<EOF
here-doc goes here
EOF
...although I can't think why you'd actually want to do this.
The Perl diamond operator essentially loops across all the command line arguments, treating each as a filename. It opens each file and reads them line-by-line. Here's some bash code that will do approximately the same.
for f in "$#"
do
# Do something with $f, such as...
cat $f | command1 | command2
-or-
command1 < $f
-or-
# Read $f line-by-line
cat $f | while read line_from_f
do
# Do stuff with $line_from_f
done
done
You want to take the first argument and do something with it, and then either read from any files specified or stdin if no files?
Personally, I'd suggest using getopt to indicate arguments using the "-a value" syntax to help disambiguate, but that's just me. Here's how I'd do it in bash without getopts:
firstarg=${1?:usage: $0 arg [file1 .. fileN]}
shift
typeset -a files
if [[ ${##} -gt 0 ]]
then
files=( "$#" )
else
files=( "/dev/stdin" )
fi
for file in "${files[#]}"
do
whatever_you_want < "$file"
done
The ?: operator will die if there are no args specified, since you seem to want at least one arg either way. After grabbing that, shift the args over by one, and then either use the remaining args as your file list, or the bash special filehandle "/dev/stdin" if there were no other args.
I think that the "if no files are specified, use /dev/stdin - otherwise use the files on the command line" piece is probably what you're looking for, but the rest of the code is at least useful for context.
Also a little cheezy, but how about this:
if [[ $# -eq 0 ]]
then
# read from stdin
else
# read from $* (args)
fi
If you need to read and process line-by-line (which is likely) and don't want to copy/paste the same code twice (which is likely), define a function in your script and just pass the lines one-by-one to this function, and process them in said function.
Why not use ``cat #* in the script? For example:
x=`cat $*`
echo $x