Perl's diamond operator: can it be done in bash? - perl

Is there an idiomatic way to simulate Perl's diamond operator in bash? With the diamond operator,
script.sh | ...
reads stdin for its input and
script.sh file1 file2 | ...
reads file1 and file2 for its input.
One other constraint is that I want to use the stdin in script.sh for something else other than input to my own script. The below code does what I want for the file1 file2 ... case above, but not for data provided on stdin.
command - $# <<EOF
some_code_for_first_argument_of_command_here
EOF
I'd prefer a Bash solution but any Unix shell is OK.
Edit: for clarification, here is the content of script.sh:
#!/bin/bash
command - $# <<EOF
some_code_for_first_argument_of_command_here
EOF
I want this to work the way the diamond operator would work in Perl, but it only handles filenames-as-arguments right now.
Edit 2: I can't do anything that goes
cat XXX | command
because the stdin for command is not the user's data. The stdin for command is my data in the here-doc. I would like the user data to come in on the stdin of my script, but it can't be the stdin of the call to command inside my script.

Sure, this is totally doable:
#!/bin/bash
cat $# | some_command_goes_here
Users can then call your script with no arguments (or '-') to read from stdin, or multiple files, all of which will be read.
If you want to process the contents of those files (say, line-by-line), you could do something like this:
for line in $(cat $#); do
echo "I read: $line"
done
Edit: Changed $* to $# to handle spaces in filenames, thanks to a helpful comment.

Kind of cheezy, but how about
cat file1 file2 | script.sh

I am (like everyone else, it seems) a bit confused about exactly what the goal is here, so I'll give three possible answers that may cover what you actually want. First, the relatively simple goal of getting the script to read from either a list of files (supplied on the command line) or from its regular stdin:
if [ $# -gt 0 ]; then
exec < <(cat "$#")
fi
# From this point on, the script's stdin is redirected from the files
# (if any) supplied on the command line
Note: the double-quoted use of $# is the best way to avoid problems with funny characters (e.g. spaces) in filenames -- $* and unquoted $# both mess this up. The <() trick I'm using here is a bash-only feature; it fires off cat in the background to feed data from files supplied on the command line, and then we use exec to replace the script's stdin with the output from cat.
...but that doesn't seem to be what you actually want. What you seem to really want is to pass the supplied filenames or the script's stdin as arguments to a command inside the script. This requires sort of the opposite process: converting the script's stdin into a file (actually a named pipe) whose name can be passed to the command. Like this:
if [[ $# -gt 0 ]]; then
command "$#" <<EOF
here-doc goes here
EOF
else
command <(cat) <<EOF
here-doc goes here
EOF
fi
This uses <() to launder the script's stdin through cat to a named pipe, which is then passed to command as an argument. Meanwhile, command's stdin is taken from the here-doc.
Now, I think that's what you want to do, but it's not quite what you've asked for, which is to both redirect the script's stdin from the supplied files and pass stdin to the command inside the script. This can be done by combining the above techniques:
if [ $# -gt 0 ]; then
exec < <(cat "$#")
fi
command <(cat) <<EOF
here-doc goes here
EOF
...although I can't think why you'd actually want to do this.

The Perl diamond operator essentially loops across all the command line arguments, treating each as a filename. It opens each file and reads them line-by-line. Here's some bash code that will do approximately the same.
for f in "$#"
do
# Do something with $f, such as...
cat $f | command1 | command2
-or-
command1 < $f
-or-
# Read $f line-by-line
cat $f | while read line_from_f
do
# Do stuff with $line_from_f
done
done

You want to take the first argument and do something with it, and then either read from any files specified or stdin if no files?
Personally, I'd suggest using getopt to indicate arguments using the "-a value" syntax to help disambiguate, but that's just me. Here's how I'd do it in bash without getopts:
firstarg=${1?:usage: $0 arg [file1 .. fileN]}
shift
typeset -a files
if [[ ${##} -gt 0 ]]
then
files=( "$#" )
else
files=( "/dev/stdin" )
fi
for file in "${files[#]}"
do
whatever_you_want < "$file"
done
The ?: operator will die if there are no args specified, since you seem to want at least one arg either way. After grabbing that, shift the args over by one, and then either use the remaining args as your file list, or the bash special filehandle "/dev/stdin" if there were no other args.
I think that the "if no files are specified, use /dev/stdin - otherwise use the files on the command line" piece is probably what you're looking for, but the rest of the code is at least useful for context.

Also a little cheezy, but how about this:
if [[ $# -eq 0 ]]
then
# read from stdin
else
# read from $* (args)
fi
If you need to read and process line-by-line (which is likely) and don't want to copy/paste the same code twice (which is likely), define a function in your script and just pass the lines one-by-one to this function, and process them in said function.

Why not use ``cat #* in the script? For example:
x=`cat $*`
echo $x

Related

How can I force perl to process args ONLY from stdin and not from a file on command line?

If I have this inline command:
perl -pi -e 's/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g'
Which is simply to substitute four-digit hex, and add it a 'x' in front. -i used with no filenames on the command line, reading from STDIN. So for params: 0000 0776, results are \x00\x00\x07\x76
I know, that if -n or -p (with printing) called, perl takes <> diamond. But I want to pass args only AFTER command, but perl assumes it as files to read. So how do I force -n or -p to regard args after command to be regular args for <> in program, and not args as files to read?
Also, I do not understand the role of i here. If i would not include it, then I would be adding args line after line (as does <>), but with i, it takes all my args at once?
If there are no arguments (i.e., if #ARGV is empty), then your one-line script (which implicitly uses <>) will read input from STDIN. So the solution is to clear #ARGV at compile time.
perl -pi -e 'BEGIN{#ARGV=()}
s/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g'
Another solution: Force ARGV (the implicit file handle that the base <> operator reads from) to point to STDIN. This solution doesn't clobber your #ARGV, if any.
perl -pi -e 'BEGIN{*ARGV=*STDIN}
s/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g'
The -p option is equivalent to the following code:
LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
-n is the same without the continue block. There's no way to change what it reads from (which is unfortunate, since <<>> and <STDIN> are both safer options), but it's pretty easy to replicate it with your modification (the error checking is rarely necessary here):
perl -e 'while (<STDIN>) { s/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g } continue { print }'

xargs pass multiple arguments to perl subroutine?

I know how to pipe multiple arguments with xargs:
echo a b | xargs -l bash -c '1:$0 2:$1'
and I know how to pass the array of arguments to my perl module's subroutine from xargs:
echo a b | xargs --replace={} perl -I/home/me/module.pm -Mme -e 'me::someSub("{}")'
But I can't seem to get multiple individual arguments passed to perl using those dollar references (to satisfy the me::someSub signature):
echo a b | xargs -l perl -e 'print("$0 $1")'
Just prints:
-e
So how do I get the shell arguments: $0, $1 passed to my perl module's subroutine?
I know I could just delimit a;b so that the xarg {} could be processed by perl splitting it to get individual arguments), but I could also just completely process all STDIN with perl. Instead, my objective is to use perl -e so that I can explicitly call the subroutine I want (rather than having some pre-process in the script that figures out what subroutine to call and what arguments to use based on STDIN, to avoid script maintenance costs).
While bash's argument are available as $# and $0, $1, $2, etc, Perl's arguments are available via #ARGV. This means that the Perl equivalent of
echo a b | xargs -l bash -c 'echo "1:$0 2:$1"'
is
echo a b | xargs -l perl -e'CORE::say "1:$ARGV[0] 2:$ARGV[1]"'
That said, it doesn't make sense to use xargs in this way because there's no way to predict how many times it will call perl, and there's no way to predict how many arguments it will pass to perl each time. You have an XY Problem, and you haven't provided any information to help us. Maybe you're looking for
perl -e'CORE::say "1:$ARGV[0] 2:$ARGV[1]"' $( echo a b )
I am not sure about the details of your design, so I take it that you need a Perl one-liner to use shell's variables that are seen in the scope in which it's called.
A perl -e'...' executes a Perl program given under ''. For any variables from the environment where this program runs -- a pipeline, or a shell script -- to be available to the program their values need be passed to it. Ways to do this with a one-liner are spelled out in this post, and here is a summary.
A Perl program receives arguments passed to it on the command-line in #ARGV array. So you can invoke it in a pipeline as
... | perl -e'($v1, $v2) = #ARGV; ...' "$0" "$1"
or as
... | xargs -l perl -e'($v1, $v2) = #ARGV; ...'
if xargs is indeed used to feed the Perl program its input. In the first example the variables are quoted to protect possible interesting characters in them (spaces, *, etc) from being interpreted by the shell that sets up and runs the perl program.
If input contains multiple lines to process and the one-liner uses -n or -p for it then unpack arguments in a BEGIN block
... | perl -ne'BEGIN { ($v1, $v2) = splice(#ARGV,0,2) }; ...' "$0" "$1" ...
which runs at compile time, so before the loop over input lines provided by -n/-p. The arguments other than filenames are now removed from #ARGV, so to leave only the filenames there for -n/-p, in case input comes from files.
There is also a rudimentary mechanism for command-line switches in a one-liner, via the -s switch. Please see the link above for details; I'd recommend #ARGV over this.
Finally, your calling code could set up environment variables which are then available to the Perl progam in %ENV. However, that doesn't seem to be suitable to what you seem to want.
Also see this post for another example.

Passing a Variable to SED Command

What is the syntax to pass a variable to sed command that updates the second column in a CSV file. The variable name is $tag
This is the command I have used but I don't know where to put the variable exactly.
basename "$dec" | sed 's/.*/&,A/' >> home/kelsabry/Downloads/Tests/results.csv
where $decis variable that returns to me a certain directory.
Output:
Downloads, A
Documents, A
etc.
My command to pass the variable into sed to update the second column was:
basename "$dec" | sed 's/.*/&,'$tag'/' >> home/kelsabry/Downloads/Tests/results.csv
but it gave me this output:
Downloads, '$tag'
Documents, '$tag'
etc.
So, where should I write the variable $tag in sed command?
Unfortunately, sed is neither aware of fields nor capable of accepting variables. For that, you'd use shell or awk or shell or some other language.
sed is a Stream EDitor and in your example is taking input from stdin, not a variable.
If you do want to embed shell variables inside a sed script, understand that you are basically creating your sed script on-the-fly, and it's important to make sure you do it safely.
For example, if there's the possibility that your $tag variable might contain something that will cause misinterpretation of the sed script (i.e. perhaps it came from user input),
you need protection. In POSIX shell, perhaps something like this:
if [ "$tag" != "${tag#*[!A-Z]}" ]; then
printf 'ERROR: invalid tag\n' >&2
exit 1
fi
or even:
case "$tag" in
[A-Z]) : ;;
*) printf 'ERROR: invalid tag\n' >&2; exit 1 ;;
esac
then
# Note the alternative to `basename`
echo "${dec##*/}" | sed 's/$/,'"$tag"'/' >> path/to/file.csv
Note that sed doesn't know anything about fields or CSV. sed is simply being used to append a string on to the end of the line.
Of course, in csh (which perhaps shouldn't be used for scripted automation), you are missing the more useful parameter expansion tools, but you can still protect yourself in other ways:
if ( $%tag == 1 ) then
switch ($tag)
case [A-Z]:
printf '%s,%s\n' `basename "$dec"` "$tag"
breaksw
default:
printf 'ERROR: invalid tag\n'
exit 1
breaksw
endsw
else
printf 'ERROR: invalid tag\n'
exit 1
endif
(Note: this is untested. Mileage varies based on multiple conditions. May contain nuts.)
The issue you listed in your question was a quoting problem. You said: sed 's/.*/&,'$tag'/' >.
An alternative might be to use awk:
echo "${dec##*/}" | awk -v tag="$tag" '{print $0 OFS tag}' OFS=, >> path/to/file.csv
Awk is a more complete programming language, and supports named variables, unlike sed. The -v option allows you to pre-load an awk variable with the contents of a shell variable.
CSH is considered harmful by some. I'd recommend doing this in a POSIX shell instead, if only to take advantage of the much larger pool of experts who can help with your scripting questions. :)

< operator in UNIX, passing to Perl script

When evaluating if(-t STDIN), does the < UNIX operator count as STDIN? If not, how do I get that data?
So someone types perl example.pl < testing.txt. This doesn't behave like data piped in via ls | ./example.pl. How can I get that behavior?
Test -p STDIN, which checks if the filehandle STDIN is attached to a pipe.
touch foo
perl -e 'print -p STDIN' < foo # nothing
cat foo | perl -e 'print -p STDIN' # 1
But I'm not sure I understand your question. In all three of these cases
1. perl -e 'print $_=<STDIN>' < <(echo foo)
2. echo foo | perl -e 'print $_=<STDIN>'
3. perl -e 'print $_=<STDIN>' # then type "foo\n" to the console
the inputs are the same and all accessible through the STDIN filehandle. In the first two cases, -t STDIN will evaluate to false, and in the second case, -p STDIN will be true.
The differences in behavior between these three cases are subtle, and usually not important. The third case, obviously, will wait until at least one line of input (terminated with "\n" or EOF) is received. The difference between the first two cases is even more subtle. When the input to your program is piped from the output of another process, you are somewhat at the mercy of that first process with respect to latency or whether that program buffers its output.
Maybe you could expand on what you mean when you say
perl example.pl < testing.txt
doesn't behave like
ls | ./example.pl
-t tests whether or not STDIN is attached to a tty.
When you pipe data to perl, it will not be attached to a tty. This should not depend on the mechanism you use to pipe (ie, whether you pipe a command using | or pipe a file using <.) However, you will have a tty attached when you run the program directly. Given the following example:
#!/usr/bin/perl
print ((-t STDIN) ? "is a tty\n" : "is not a tty\n");
You would expect the following output:
% perl ./ttytest.pl
is a tty
% perl ./ttytest.pl < somefile
is not a tty
% ls | perl ./ttytest.pl
is not a tty

perl -pe to manipulate filenames

I was trying to do some quick filename cleanup at the shell (zsh, if it matters). Renaming files. (I'm using cp instead of mv just to be safe)
foreach f (\#*.ogg)
cp $f `echo $f | perl -pe 's/\#\d+ (.+)$/"\1"/'`
end
Now, I know there are tools to do stuff like this, but for personal interest I'm wondering how I can do it this way. Right now, I get an error:
cp: target `When.ogg"' is not a directory
Where 'When.ogg' is the last part of the filename. I've tried adding quotes (see above) and escaping the spaces, but nonetheless this is what I get.
Is there a reason I can't use the output of s perl pmr=;omrt as the final argument to another command line tool?
It looks like you have a space in the file names being processed, so each of your cp command lines evaluates to something like
cp \#nnnn When.Ogg When.ogg
When the cp command sees more than two arguments, the last one must be a target directory name for all the files to be copied to - hence the error message. Because your source filename ($f) contains a space it is being treated as two arguments - cp sees three args, rather than the two you intend.
If you put double quotes around the first $f that should prevent the two 'halves' of the name from being treated as separate file names:
cp "$f" `echo ...
This is what you need in bash, hope it's good for zsh too.
cp "$f" "`echo $f | perl -pe 's/\#\d+ (.+)$/\1/'`"
If the filename contains spaces, you also have quote the second argument of cp.
I often use
dir /b ... | perl -nle"$o=$_; s/.../.../; $n=$_; rename $o,$n if !-e $n"
The -l chomps the input.
The -e check is to avoid accidentally renaming all the files to one name. I've done that a couple of times.
In bash (and I'm guessing zsh), that would be
foreach f (...)
echo "$f" | perl -nle'$o=$_; s/.../.../; $n=$_; rename $o,$n if !-e $n'
end
or
find -name '...' -maxdepth 1 \
| perl -nle'$o=$_; s/.../.../; $n=$_; rename $o,$n if !-e $n'
or
find -name '...' -maxdepth 1 -exec \
perl -e'for (#ARGV) {
$o=$_; s/.../.../; $n=$_;
rename $o,$n if !-e $n;
}' {} +
The last supports file names with newlines in them.