What is the default `perl` print target? - perl

I was assuming that print "foo" is just a shortcut for print STDOUT "foo".
However I noticed that (at least) in combination with the -i switch this assumption no longer holds:
perl -ni 'print $_' file
this simply does not change the content of the file.
perl -ni 'print STDOUT $_' file
This however prints the content to the terminal and leaves the file empty.
Therefore the question: What is the default target of print? I.e. where does the first print print to?
perldoc -f print says:
To set the default output handle to something other than STDOUT, use the select operation.
Obviously perl internally used the select operation to set the default output handle to something else. Is there any way to explicitly specify the current default output handle?
This would allow me to write something like
perl -i -wne 'print {/b/ ? STDOUT : XXX } $_' file
to build a grep which removes all printed lines.

The default filehandle for print is controlled by the select function. It defaults to STDOUT, but can be changed at any time.
To quote the documentation:
select FILEHANDLE
select
Returns the currently selected filehandle. If FILEHANDLE is supplied, sets the new current default filehandle for output. This has
two effects: first, a write or a print without a filehandle default to
this FILEHANDLE. Second, references to variables related to output
will refer to this output channel.
If you look at the documentation for the -i option, you'll see that the expansion includes the line select(ARGVOUT). That's what causes output to go back to the file you're editing.
ARGVOUT is special only during -i processing, but it does work there:
$ cat foo
a1
b1
a2
b2
$ perl -i -wne 'print {/b/ ? STDOUT : ARGVOUT } $_' foo
b1
b2
$ cat foo
a1
a2
You can also use perl -i -wne 'print {/b/ ? STDOUT : select } $_' foo since ARGVOUT will be the currently selected filehandle.

It prints to the currently-selected handle, which is STDOUT by default. You can change the selected handle yourself using the one-argument form of select, and in-place editing mode -i flag / $^I automatically selects the destination file for you (see the description of -i in perlrun for code equivalent to what -i does).

Related

How can I force perl to process args ONLY from stdin and not from a file on command line?

If I have this inline command:
perl -pi -e 's/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g'
Which is simply to substitute four-digit hex, and add it a 'x' in front. -i used with no filenames on the command line, reading from STDIN. So for params: 0000 0776, results are \x00\x00\x07\x76
I know, that if -n or -p (with printing) called, perl takes <> diamond. But I want to pass args only AFTER command, but perl assumes it as files to read. So how do I force -n or -p to regard args after command to be regular args for <> in program, and not args as files to read?
Also, I do not understand the role of i here. If i would not include it, then I would be adding args line after line (as does <>), but with i, it takes all my args at once?
If there are no arguments (i.e., if #ARGV is empty), then your one-line script (which implicitly uses <>) will read input from STDIN. So the solution is to clear #ARGV at compile time.
perl -pi -e 'BEGIN{#ARGV=()}
s/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g'
Another solution: Force ARGV (the implicit file handle that the base <> operator reads from) to point to STDIN. This solution doesn't clobber your #ARGV, if any.
perl -pi -e 'BEGIN{*ARGV=*STDIN}
s/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g'
The -p option is equivalent to the following code:
LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
-n is the same without the continue block. There's no way to change what it reads from (which is unfortunate, since <<>> and <STDIN> are both safer options), but it's pretty easy to replicate it with your modification (the error checking is rarely necessary here):
perl -e 'while (<STDIN>) { s/([\da-f]{2})([\da-f]{2})\s?/\\x$1\\x$2\t/g } continue { print }'

Perl in command line: perl -p -i -e "some text" /path

I am not familiar with perl. I am reading an installation guide atm and the following Linux command has come up:
perl -p -i -e "s/enforcing/disabled/" /etc/selinux/config
Now, I am trying to understand this. Here is my understanding so far:
-e simply allows for executing whatever follows
-p puts my commands that follow -e in a loop. Now this is strange to me, as to me this command seems to be trying to say: Write "s/enforcing/disabled/" into /etc/selinux/config. Then again, where is the "write" command? And what is this -i (inline) good for?
-p changes
s/enforcing/disabled/
to something equivalent to
while (<>) {
s/enforcing/disabled/;
print;
}
which is short for
while (defined( $_ = <ARGV> )) {
$_ =~ s/enforcing/disabled/;
print($_);
}
What this does:
It reads a line from ARGV into $_. ARGV is a special file handle that reads from the each of the files specified as arguments (or STDIN if no files are provided).
If EOF has been reached, the loop and therefore the program exits.
It replaces the first occurrence of enforcing with disabled.
It prints out the modified line to the default output handle. Because of -i, this is a handle to a new file with the same name as the one from which the program is currently reading.*
Repeat.
For example,
$ cat a
foo
bar enforcing the law
baz
enforcing enforcing
$ perl -pe's/enforcing/disabled/' -i a
$ cat a
foo
bar disabled the law
baz
disabled enforcing
* — In old versions of Perl, the old file has already been deleted at this point, but it's still accessible as long as there's an open file handle to it. In very new versions of Perl, this writes to temporary file that will later overwrite the file from which the program is reading.
To find out exactly what Perl is going to do, you can use the O module
perl -MO=Deparse -p -i -e "s/enforcing/disabled/" file
outputs
BEGIN { $^I = ""; }
LINE: while (defined($_ = readline ARGV)) {
s/enforcing/disabled/;
}
continue {
die "-p destination: $!\n" unless print $_;
}
-e syntax OK

perl one-liner to keep only desired lines

I have a text file (input.txt) like this:
NP_414685.4: 15-26, 131-138, 441-465
NP_418580.2: 493-500
NP_418780.2: 36-48, 44-66
NP_418345.2:
NP_418473.3: 1-19, 567-1093
NP_418398.2:
I want a perl one-liner that keeps only those lines in file where ":" is followed by number range (that means, here, the lines containing "NP_418345.2:" and "NP_418398.2:" get deleted). For this I have tried:
perl -ni -e "print unless /: \d/" -pi.bak input.txt del input.txt.bak
But it shows exactly same output as the input file.
What will be the exact pattern that I can match here?
Thanks
First, print unless means print if not -- opposite to what you want.
More to the point, it doesn't make sense using both -n and -p, and when you do -p overrides the other. While both of them open the input file(s) and set up the loop over lines, -p also prints $_ for every iteration. So with it you are reprinting every line. See perlrun.
Finally, you seem to be deleting the .bak file ... ? Then don't make it. Use just -i
Altogether
perl -i -ne 'print if /:\s*\d+\s*-\s*\d+/' input.txt
If you do want to keep the backup file use -i.bak instead of -i
You can see the code equivalent to a one-liner with particular options with B::Deparse (via O module)
Try: perl -MO=Deparse -ne 1 and perl -MO=Deparse -pe 1
This way:
perl -i.bak -ne 'print if /:\s+\d+-\d/' input.txt
This:
perl -ne 'print if /:\s*(\d+\s*-\s*\d+\s*,?\s*)+\s*$/' input.txt
Prints:
NP_414685.4: 15-26, 131-138, 441-465
NP_418580.2: 493-500
NP_418780.2: 36-48, 44-66
NP_418473.3: 1-19, 567-1093
I'm not sure if you want to match lines that are possibly like this:
NP_418580.2: 493-500, asdf
or this:
NP_418580.2: asdf
This answer will not print these lines, if given to it.

< operator in UNIX, passing to Perl script

When evaluating if(-t STDIN), does the < UNIX operator count as STDIN? If not, how do I get that data?
So someone types perl example.pl < testing.txt. This doesn't behave like data piped in via ls | ./example.pl. How can I get that behavior?
Test -p STDIN, which checks if the filehandle STDIN is attached to a pipe.
touch foo
perl -e 'print -p STDIN' < foo # nothing
cat foo | perl -e 'print -p STDIN' # 1
But I'm not sure I understand your question. In all three of these cases
1. perl -e 'print $_=<STDIN>' < <(echo foo)
2. echo foo | perl -e 'print $_=<STDIN>'
3. perl -e 'print $_=<STDIN>' # then type "foo\n" to the console
the inputs are the same and all accessible through the STDIN filehandle. In the first two cases, -t STDIN will evaluate to false, and in the second case, -p STDIN will be true.
The differences in behavior between these three cases are subtle, and usually not important. The third case, obviously, will wait until at least one line of input (terminated with "\n" or EOF) is received. The difference between the first two cases is even more subtle. When the input to your program is piped from the output of another process, you are somewhat at the mercy of that first process with respect to latency or whether that program buffers its output.
Maybe you could expand on what you mean when you say
perl example.pl < testing.txt
doesn't behave like
ls | ./example.pl
-t tests whether or not STDIN is attached to a tty.
When you pipe data to perl, it will not be attached to a tty. This should not depend on the mechanism you use to pipe (ie, whether you pipe a command using | or pipe a file using <.) However, you will have a tty attached when you run the program directly. Given the following example:
#!/usr/bin/perl
print ((-t STDIN) ? "is a tty\n" : "is not a tty\n");
You would expect the following output:
% perl ./ttytest.pl
is a tty
% perl ./ttytest.pl < somefile
is not a tty
% ls | perl ./ttytest.pl
is not a tty

Perl's diamond operator: can it be done in bash?

Is there an idiomatic way to simulate Perl's diamond operator in bash? With the diamond operator,
script.sh | ...
reads stdin for its input and
script.sh file1 file2 | ...
reads file1 and file2 for its input.
One other constraint is that I want to use the stdin in script.sh for something else other than input to my own script. The below code does what I want for the file1 file2 ... case above, but not for data provided on stdin.
command - $# <<EOF
some_code_for_first_argument_of_command_here
EOF
I'd prefer a Bash solution but any Unix shell is OK.
Edit: for clarification, here is the content of script.sh:
#!/bin/bash
command - $# <<EOF
some_code_for_first_argument_of_command_here
EOF
I want this to work the way the diamond operator would work in Perl, but it only handles filenames-as-arguments right now.
Edit 2: I can't do anything that goes
cat XXX | command
because the stdin for command is not the user's data. The stdin for command is my data in the here-doc. I would like the user data to come in on the stdin of my script, but it can't be the stdin of the call to command inside my script.
Sure, this is totally doable:
#!/bin/bash
cat $# | some_command_goes_here
Users can then call your script with no arguments (or '-') to read from stdin, or multiple files, all of which will be read.
If you want to process the contents of those files (say, line-by-line), you could do something like this:
for line in $(cat $#); do
echo "I read: $line"
done
Edit: Changed $* to $# to handle spaces in filenames, thanks to a helpful comment.
Kind of cheezy, but how about
cat file1 file2 | script.sh
I am (like everyone else, it seems) a bit confused about exactly what the goal is here, so I'll give three possible answers that may cover what you actually want. First, the relatively simple goal of getting the script to read from either a list of files (supplied on the command line) or from its regular stdin:
if [ $# -gt 0 ]; then
exec < <(cat "$#")
fi
# From this point on, the script's stdin is redirected from the files
# (if any) supplied on the command line
Note: the double-quoted use of $# is the best way to avoid problems with funny characters (e.g. spaces) in filenames -- $* and unquoted $# both mess this up. The <() trick I'm using here is a bash-only feature; it fires off cat in the background to feed data from files supplied on the command line, and then we use exec to replace the script's stdin with the output from cat.
...but that doesn't seem to be what you actually want. What you seem to really want is to pass the supplied filenames or the script's stdin as arguments to a command inside the script. This requires sort of the opposite process: converting the script's stdin into a file (actually a named pipe) whose name can be passed to the command. Like this:
if [[ $# -gt 0 ]]; then
command "$#" <<EOF
here-doc goes here
EOF
else
command <(cat) <<EOF
here-doc goes here
EOF
fi
This uses <() to launder the script's stdin through cat to a named pipe, which is then passed to command as an argument. Meanwhile, command's stdin is taken from the here-doc.
Now, I think that's what you want to do, but it's not quite what you've asked for, which is to both redirect the script's stdin from the supplied files and pass stdin to the command inside the script. This can be done by combining the above techniques:
if [ $# -gt 0 ]; then
exec < <(cat "$#")
fi
command <(cat) <<EOF
here-doc goes here
EOF
...although I can't think why you'd actually want to do this.
The Perl diamond operator essentially loops across all the command line arguments, treating each as a filename. It opens each file and reads them line-by-line. Here's some bash code that will do approximately the same.
for f in "$#"
do
# Do something with $f, such as...
cat $f | command1 | command2
-or-
command1 < $f
-or-
# Read $f line-by-line
cat $f | while read line_from_f
do
# Do stuff with $line_from_f
done
done
You want to take the first argument and do something with it, and then either read from any files specified or stdin if no files?
Personally, I'd suggest using getopt to indicate arguments using the "-a value" syntax to help disambiguate, but that's just me. Here's how I'd do it in bash without getopts:
firstarg=${1?:usage: $0 arg [file1 .. fileN]}
shift
typeset -a files
if [[ ${##} -gt 0 ]]
then
files=( "$#" )
else
files=( "/dev/stdin" )
fi
for file in "${files[#]}"
do
whatever_you_want < "$file"
done
The ?: operator will die if there are no args specified, since you seem to want at least one arg either way. After grabbing that, shift the args over by one, and then either use the remaining args as your file list, or the bash special filehandle "/dev/stdin" if there were no other args.
I think that the "if no files are specified, use /dev/stdin - otherwise use the files on the command line" piece is probably what you're looking for, but the rest of the code is at least useful for context.
Also a little cheezy, but how about this:
if [[ $# -eq 0 ]]
then
# read from stdin
else
# read from $* (args)
fi
If you need to read and process line-by-line (which is likely) and don't want to copy/paste the same code twice (which is likely), define a function in your script and just pass the lines one-by-one to this function, and process them in said function.
Why not use ``cat #* in the script? For example:
x=`cat $*`
echo $x