xargs pass multiple arguments to perl subroutine? - perl

I know how to pipe multiple arguments with xargs:
echo a b | xargs -l bash -c '1:$0 2:$1'
and I know how to pass the array of arguments to my perl module's subroutine from xargs:
echo a b | xargs --replace={} perl -I/home/me/module.pm -Mme -e 'me::someSub("{}")'
But I can't seem to get multiple individual arguments passed to perl using those dollar references (to satisfy the me::someSub signature):
echo a b | xargs -l perl -e 'print("$0 $1")'
Just prints:
-e
So how do I get the shell arguments: $0, $1 passed to my perl module's subroutine?
I know I could just delimit a;b so that the xarg {} could be processed by perl splitting it to get individual arguments), but I could also just completely process all STDIN with perl. Instead, my objective is to use perl -e so that I can explicitly call the subroutine I want (rather than having some pre-process in the script that figures out what subroutine to call and what arguments to use based on STDIN, to avoid script maintenance costs).

While bash's argument are available as $# and $0, $1, $2, etc, Perl's arguments are available via #ARGV. This means that the Perl equivalent of
echo a b | xargs -l bash -c 'echo "1:$0 2:$1"'
is
echo a b | xargs -l perl -e'CORE::say "1:$ARGV[0] 2:$ARGV[1]"'
That said, it doesn't make sense to use xargs in this way because there's no way to predict how many times it will call perl, and there's no way to predict how many arguments it will pass to perl each time. You have an XY Problem, and you haven't provided any information to help us. Maybe you're looking for
perl -e'CORE::say "1:$ARGV[0] 2:$ARGV[1]"' $( echo a b )

I am not sure about the details of your design, so I take it that you need a Perl one-liner to use shell's variables that are seen in the scope in which it's called.
A perl -e'...' executes a Perl program given under ''. For any variables from the environment where this program runs -- a pipeline, or a shell script -- to be available to the program their values need be passed to it. Ways to do this with a one-liner are spelled out in this post, and here is a summary.
A Perl program receives arguments passed to it on the command-line in #ARGV array. So you can invoke it in a pipeline as
... | perl -e'($v1, $v2) = #ARGV; ...' "$0" "$1"
or as
... | xargs -l perl -e'($v1, $v2) = #ARGV; ...'
if xargs is indeed used to feed the Perl program its input. In the first example the variables are quoted to protect possible interesting characters in them (spaces, *, etc) from being interpreted by the shell that sets up and runs the perl program.
If input contains multiple lines to process and the one-liner uses -n or -p for it then unpack arguments in a BEGIN block
... | perl -ne'BEGIN { ($v1, $v2) = splice(#ARGV,0,2) }; ...' "$0" "$1" ...
which runs at compile time, so before the loop over input lines provided by -n/-p. The arguments other than filenames are now removed from #ARGV, so to leave only the filenames there for -n/-p, in case input comes from files.
There is also a rudimentary mechanism for command-line switches in a one-liner, via the -s switch. Please see the link above for details; I'd recommend #ARGV over this.
Finally, your calling code could set up environment variables which are then available to the Perl progam in %ENV. However, that doesn't seem to be suitable to what you seem to want.
Also see this post for another example.

Related

Force Perl to stop special processing of command line arguments if -e eval switch is used

perl -e 'print(123, #ARGV);' a b
# 123ab
perl -e 'print(123, #ARGV);' --help
# prints Perl's help instead
This is a toy example demonstrating the problem. In my real use-case I'm using -e to execute a large script from an embedded interpreter using perl_parse(...) function, the script has its own processing of --help switch, so I'd like to block any special processing of command line arguments after -e.
Is it possible?
Use a double hyphen to stop argument processing:
$ perl -e'print "[#ARGV]\n"' -- --help
[--help]
$

Passing a Variable to SED Command

What is the syntax to pass a variable to sed command that updates the second column in a CSV file. The variable name is $tag
This is the command I have used but I don't know where to put the variable exactly.
basename "$dec" | sed 's/.*/&,A/' >> home/kelsabry/Downloads/Tests/results.csv
where $decis variable that returns to me a certain directory.
Output:
Downloads, A
Documents, A
etc.
My command to pass the variable into sed to update the second column was:
basename "$dec" | sed 's/.*/&,'$tag'/' >> home/kelsabry/Downloads/Tests/results.csv
but it gave me this output:
Downloads, '$tag'
Documents, '$tag'
etc.
So, where should I write the variable $tag in sed command?
Unfortunately, sed is neither aware of fields nor capable of accepting variables. For that, you'd use shell or awk or shell or some other language.
sed is a Stream EDitor and in your example is taking input from stdin, not a variable.
If you do want to embed shell variables inside a sed script, understand that you are basically creating your sed script on-the-fly, and it's important to make sure you do it safely.
For example, if there's the possibility that your $tag variable might contain something that will cause misinterpretation of the sed script (i.e. perhaps it came from user input),
you need protection. In POSIX shell, perhaps something like this:
if [ "$tag" != "${tag#*[!A-Z]}" ]; then
printf 'ERROR: invalid tag\n' >&2
exit 1
fi
or even:
case "$tag" in
[A-Z]) : ;;
*) printf 'ERROR: invalid tag\n' >&2; exit 1 ;;
esac
then
# Note the alternative to `basename`
echo "${dec##*/}" | sed 's/$/,'"$tag"'/' >> path/to/file.csv
Note that sed doesn't know anything about fields or CSV. sed is simply being used to append a string on to the end of the line.
Of course, in csh (which perhaps shouldn't be used for scripted automation), you are missing the more useful parameter expansion tools, but you can still protect yourself in other ways:
if ( $%tag == 1 ) then
switch ($tag)
case [A-Z]:
printf '%s,%s\n' `basename "$dec"` "$tag"
breaksw
default:
printf 'ERROR: invalid tag\n'
exit 1
breaksw
endsw
else
printf 'ERROR: invalid tag\n'
exit 1
endif
(Note: this is untested. Mileage varies based on multiple conditions. May contain nuts.)
The issue you listed in your question was a quoting problem. You said: sed 's/.*/&,'$tag'/' >.
An alternative might be to use awk:
echo "${dec##*/}" | awk -v tag="$tag" '{print $0 OFS tag}' OFS=, >> path/to/file.csv
Awk is a more complete programming language, and supports named variables, unlike sed. The -v option allows you to pre-load an awk variable with the contents of a shell variable.
CSH is considered harmful by some. I'd recommend doing this in a POSIX shell instead, if only to take advantage of the much larger pool of experts who can help with your scripting questions. :)

Output of perl debugger to file (Windows)

I tried to list all the subroutines of a script with the perl debugger and put the results in an external file. But It didn't work.
My code:
perl -d -S myscript.pl > results.txt
-S = list all subroutines
-d = debug perl script
Greets,
The -S isn't supposed to be used as a command line switch. Running perl -d will start a debugger process, and one of the commands you can use there is S.
Example:
$ perl -d tmp/splithttpdconf.pl
Loading DB routines from perl5db.pl version 1.28
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
main::(tmp/splithttpdconf.pl:6): my $basedir = shift;
DB<1> S main::
main::BEGIN
main::debug
main::splitconf
DB<2>
In order to get the kind of output you want, you can to use the profiler module Devel::DProf instead. It'll output profiler info into a file which can be read by the dprofpp program. Here's an example to get the list of subroutines:
perl -d:DProf perlscript.pl; dprofpp -T
If you only want the subroutines within your own script, and not those loaded from other modules, add a grep to it, e.g.:
perl -d:DProf perlscript.pl; dprofpp -T | grep main::
Though for the particular question of knowing what subroutines exist in a given program, provided you use a consistent coding style it'd probably be easier to just do a grep "sub.*{" to start with.
In your home directory, create a file called .perldb with the following contents:
parse_options("NonStop=1 LineInfo=results.txt AutoTrace=1 frame=2");
And then run the command
perl -d myscript.pl
If you want to scan and list the entire subroutine's what Perl see's before it runs:
perl -MO=Deparse -f myscript.pl

< operator in UNIX, passing to Perl script

When evaluating if(-t STDIN), does the < UNIX operator count as STDIN? If not, how do I get that data?
So someone types perl example.pl < testing.txt. This doesn't behave like data piped in via ls | ./example.pl. How can I get that behavior?
Test -p STDIN, which checks if the filehandle STDIN is attached to a pipe.
touch foo
perl -e 'print -p STDIN' < foo # nothing
cat foo | perl -e 'print -p STDIN' # 1
But I'm not sure I understand your question. In all three of these cases
1. perl -e 'print $_=<STDIN>' < <(echo foo)
2. echo foo | perl -e 'print $_=<STDIN>'
3. perl -e 'print $_=<STDIN>' # then type "foo\n" to the console
the inputs are the same and all accessible through the STDIN filehandle. In the first two cases, -t STDIN will evaluate to false, and in the second case, -p STDIN will be true.
The differences in behavior between these three cases are subtle, and usually not important. The third case, obviously, will wait until at least one line of input (terminated with "\n" or EOF) is received. The difference between the first two cases is even more subtle. When the input to your program is piped from the output of another process, you are somewhat at the mercy of that first process with respect to latency or whether that program buffers its output.
Maybe you could expand on what you mean when you say
perl example.pl < testing.txt
doesn't behave like
ls | ./example.pl
-t tests whether or not STDIN is attached to a tty.
When you pipe data to perl, it will not be attached to a tty. This should not depend on the mechanism you use to pipe (ie, whether you pipe a command using | or pipe a file using <.) However, you will have a tty attached when you run the program directly. Given the following example:
#!/usr/bin/perl
print ((-t STDIN) ? "is a tty\n" : "is not a tty\n");
You would expect the following output:
% perl ./ttytest.pl
is a tty
% perl ./ttytest.pl < somefile
is not a tty
% ls | perl ./ttytest.pl
is not a tty

Perl's diamond operator: can it be done in bash?

Is there an idiomatic way to simulate Perl's diamond operator in bash? With the diamond operator,
script.sh | ...
reads stdin for its input and
script.sh file1 file2 | ...
reads file1 and file2 for its input.
One other constraint is that I want to use the stdin in script.sh for something else other than input to my own script. The below code does what I want for the file1 file2 ... case above, but not for data provided on stdin.
command - $# <<EOF
some_code_for_first_argument_of_command_here
EOF
I'd prefer a Bash solution but any Unix shell is OK.
Edit: for clarification, here is the content of script.sh:
#!/bin/bash
command - $# <<EOF
some_code_for_first_argument_of_command_here
EOF
I want this to work the way the diamond operator would work in Perl, but it only handles filenames-as-arguments right now.
Edit 2: I can't do anything that goes
cat XXX | command
because the stdin for command is not the user's data. The stdin for command is my data in the here-doc. I would like the user data to come in on the stdin of my script, but it can't be the stdin of the call to command inside my script.
Sure, this is totally doable:
#!/bin/bash
cat $# | some_command_goes_here
Users can then call your script with no arguments (or '-') to read from stdin, or multiple files, all of which will be read.
If you want to process the contents of those files (say, line-by-line), you could do something like this:
for line in $(cat $#); do
echo "I read: $line"
done
Edit: Changed $* to $# to handle spaces in filenames, thanks to a helpful comment.
Kind of cheezy, but how about
cat file1 file2 | script.sh
I am (like everyone else, it seems) a bit confused about exactly what the goal is here, so I'll give three possible answers that may cover what you actually want. First, the relatively simple goal of getting the script to read from either a list of files (supplied on the command line) or from its regular stdin:
if [ $# -gt 0 ]; then
exec < <(cat "$#")
fi
# From this point on, the script's stdin is redirected from the files
# (if any) supplied on the command line
Note: the double-quoted use of $# is the best way to avoid problems with funny characters (e.g. spaces) in filenames -- $* and unquoted $# both mess this up. The <() trick I'm using here is a bash-only feature; it fires off cat in the background to feed data from files supplied on the command line, and then we use exec to replace the script's stdin with the output from cat.
...but that doesn't seem to be what you actually want. What you seem to really want is to pass the supplied filenames or the script's stdin as arguments to a command inside the script. This requires sort of the opposite process: converting the script's stdin into a file (actually a named pipe) whose name can be passed to the command. Like this:
if [[ $# -gt 0 ]]; then
command "$#" <<EOF
here-doc goes here
EOF
else
command <(cat) <<EOF
here-doc goes here
EOF
fi
This uses <() to launder the script's stdin through cat to a named pipe, which is then passed to command as an argument. Meanwhile, command's stdin is taken from the here-doc.
Now, I think that's what you want to do, but it's not quite what you've asked for, which is to both redirect the script's stdin from the supplied files and pass stdin to the command inside the script. This can be done by combining the above techniques:
if [ $# -gt 0 ]; then
exec < <(cat "$#")
fi
command <(cat) <<EOF
here-doc goes here
EOF
...although I can't think why you'd actually want to do this.
The Perl diamond operator essentially loops across all the command line arguments, treating each as a filename. It opens each file and reads them line-by-line. Here's some bash code that will do approximately the same.
for f in "$#"
do
# Do something with $f, such as...
cat $f | command1 | command2
-or-
command1 < $f
-or-
# Read $f line-by-line
cat $f | while read line_from_f
do
# Do stuff with $line_from_f
done
done
You want to take the first argument and do something with it, and then either read from any files specified or stdin if no files?
Personally, I'd suggest using getopt to indicate arguments using the "-a value" syntax to help disambiguate, but that's just me. Here's how I'd do it in bash without getopts:
firstarg=${1?:usage: $0 arg [file1 .. fileN]}
shift
typeset -a files
if [[ ${##} -gt 0 ]]
then
files=( "$#" )
else
files=( "/dev/stdin" )
fi
for file in "${files[#]}"
do
whatever_you_want < "$file"
done
The ?: operator will die if there are no args specified, since you seem to want at least one arg either way. After grabbing that, shift the args over by one, and then either use the remaining args as your file list, or the bash special filehandle "/dev/stdin" if there were no other args.
I think that the "if no files are specified, use /dev/stdin - otherwise use the files on the command line" piece is probably what you're looking for, but the rest of the code is at least useful for context.
Also a little cheezy, but how about this:
if [[ $# -eq 0 ]]
then
# read from stdin
else
# read from $* (args)
fi
If you need to read and process line-by-line (which is likely) and don't want to copy/paste the same code twice (which is likely), define a function in your script and just pass the lines one-by-one to this function, and process them in said function.
Why not use ``cat #* in the script? For example:
x=`cat $*`
echo $x