Inserting headers into multiple files - perl

I found some command line with Perl that inserts headers into my files without going through the tedious process of inserting them one by one. Can someone walk me through the Perl aspect of this command line? I'm new to this and can't seem to find the right explanations for what I wrote.
cat header.txt | perl -0 -i -pe 'BEGIN{$h = <STDIN>}; print $h' 1*

-e
rather than provide a script in a xxxx.pl file, provide it on the command line
-p
makes it iterate over filename arguments somewhat like sed but also prints the contents of $_ at the end of the script.
the two above are combined in -pe
-i
indicate you want to edit the file in place and write the output to the same file. In practice, Perl renames the input file and reads from this renamed version while writing to a new file with the original name
-0
redefines the end of record character (\n by default) so that you can read the entire input file as a single line
1*
is the command line argument to your script, so I guess you are modifying any file with a name that starts with 1 (you could have used *.c, or whatever depending on the type of files you are trying to modify)
print $h
prints the variable $h that is the "main" of your script. if it was initialized with the content of the header file (the intent of this one-liner) then it will print the header file
BEGIN{ some code here }
this is stuff you execute before the script starts. this is where I'm stumped. this doesn't seem like valid perl code
so basically:
this will supposedly slurp the entire header file (because of -0) in the BEGIN block and store it in the variable $h
iterate over all the files specified by the wildcards at the end of the command line
for each file: print the header (print $h) then print hte file itself (because of -pe)
so it's equivalent to spelling the script out:
$h = gets content of the entire header file
while (<>){ #loop implied by -pe, iterates over all the 1* files
# the main contents of the "-e" script are inserted below as part of executing -pe
print h$; #print the header we saved
print $_; # implied by -pe, and since we are using -0, this prints the entire content in one shot
# end of the "-e" script. again it was a single print $h statement, the second print is implied by -pe
}
It's a bit hard to explain, take a look at the perlrun documentation for details (run man perlrun).
This is not 100% complete explanation because I don;t think the BEGIN block is right. I tried it on my ubuntu machine and it complained about its syntax too

Here's something similar, with an explanation. The program in the question doesn't run on my mac.
I needed to add the #nullable disable directive to the top of all my csharp files as part of migrating to nullable reference types.
perl -w -i -p -0777 -e 's/^/#nullable disable\n\n/' $(find . -iname '*.cs')
-w enable warnings
-i edit files in place
-p read each file block by block, printing each block after applying a perl expression. the default block size is one line
-0777 changes the default block size to the entire file
-e the perl expression to execute
The final argument uses shell command substitution to create a list of files. It passes that list of file paths to the perl command. The find command searches for files that end in .cs.
The perl program is a single substitution command. It matches the very beginning of the block and replaces (prepends, really) with "#nullable disable" and a couple new-lines.

Related

Can I pass a string from perl back to the calling c-shell?

RHEL6
I have a c-shell script that runs a perl script. After dumping tons of stuff to stdout, it determines where (what dir) the parent shell should cd to when the perl script finishes. But that's a string, not an int which is all I can pass back with "exit()".
Storing the name of the dir in a file which the c-shell script can read is what I have now. It works, but is not elegant. Is there a better way to do this ? Maybe a little chunk of memory that I can share with the perl script ?
Short:
Redirect Perl's streams and restore in the end to print that info, taken by the shell script
Or, print that last and the shell script can pass output to the console and take the last line
Or, use a named pipe (either shell) or specific file descriptors (not csh) for that print
When the Perl script prints out that name you can assign it to a variable
in the shell script
#!/bin/csh
set DIR `perl -e'print "dir_name"'`
while in bash
#!/bin/bash
DIR="$(perl -e'print "dir_name"')"
where $(...) is preferred for the command substitution.
But those other prints to console from the Perl script then need be handled
One way is to redirect all output in Perl script other than that one print, what can be controlled by a command-line option (filename to which to redirect, which shell script can print out)
Or, take all Perl's output and pass it to console, the last line being the needed "return." This puts the burden on the Perl script to print that last (perhaps in an END block). The program's output can be printed from the shell script after it completes or line by line as it is emitted.
Or, use a named pipe (both shells) or a specific file descriptor (bash only) to which the Perl script can print that information. In this case its streams go straight to the console.
The question explicitly mentions csh so it is given below. But I must repeat the old and worn fact that shell scripting is far better done in bash than in csh. I strongly recommend to reconsider.
bash
If you need the program's output on the console as it goes, take and print it line by line
#!/bin/bash
while read line; do
echo "$line"
DIR=$line
done < <(perl script.pl)
echo "$DIR"
Or, if you don't need output on the console before the script is finished
#!/bin/bash
mapfile -t lines < <(perl script.pl)
DIR="${lines[-1]}"
printf '%s\n' "${lines[#]}" # print script.pl's output
Or, use file descriptors for that particular print
F=$(mktemp) # safe filename
exec 3> "$F" # open fd 3 to write to it
exec 4< "$F" # open fd 4 to read from it
rm -f "$F" # remove file(name) for safety; opened fd's can still access
perl -E'$fd=shift; say "...normal prints to STDOUT...";
open(FH, ">&=$fd") or die $!;
say FH "dirname";
close FH
' 3
read dir_name <&4
exec 3>&- # close them
exec 4<&-
echo "$dir_name"
I couldn't get it to work with a single file descriptor for both reading and writing (exec 3<> ...), I think because the read can't rewind after the write, thus separate descriptors are used.
With a Perl script (and not the demo one-liner above) pass the fd number as a command-line option. The script can then do this only if it's invoked with that option.
Or, use a named pipe very similarly to how it's done for csh below. This is probably best here, if the manipulation of the program's STDOUT isn't to your liking.
csh
Iterate over the program's (completed) output line by line
#!/bin/csh
foreach line ( "`perl script.pl`" )
echo "$line"
set dir_name = "$line"
end
echo "Directory name: $dir_name"
or extract the last line first and then print the whole output
#!/bin/csh
set lines = ( "`perl script.pl`" )
set dir_name = $lines[$#]
# Print program's output
while ( $#lines )
echo "$lines[1]"
shift lines
end
or use a named pipe
set fifo_name = "/tmp/fifo$$" # or use mktemp
mkfifo "$fifo_name"
( perl script.pl --fifo $fifo_name [other args] & )
set dir_name = `cat "$fifo_name"`
rm -f $fifo_name
echo "dir name from FIFO: $dir_name"
The Perl command is in the background since FIFO blocks until written and read. So if the shell script were to wait for perl ... to complete the Perl script would block as it's writing to FIFO (since that's not being read) so shell would never get to read it; we would deadlock. It is also in a subshell, with ( ), so to avoid the informational prints about the background job.
The --fifo NAME command-line option is needed so that Perl script knows what special file to use (and not to do this if the option is not there).
For an in-line example replace ( perl script ...) with this one-liner, used above as well
( perl -E'$ff = shift; say qq(\t...normal prints to STDOUT...);
open FF, ">$ff" or die $!;
say FF "dir_name_$$";
close FF
' $fifo_name
& )
(broken over lines for readability)

perl script to add line of code only modifies one file

I have this:
perl -pi -e 'print "code I want to insert\n" if $. == 2' *.php
which puts the line code I want to insert on the second line of the file, which is what I need done to every single PHP file
If I run it in a directory with both PHP files and non-PHP files it does the right thing, but only to one PHP file. I thought *.php would apply it to all PHP files, but it doesn't do it.
How can I write it so it will modify every PHP file in a directory? Bonus if there is an easy way to do this recursively through all directories. I don't mind running the Perl script for each directory as there aren't that many, but don't want to hand edit every single file.
The problem is that the file handle ARGV that Perl uses to read the files passed on the command line is never explicitly closed, so the line number $. just keeps incrementing after the end of the first file and never goes back to one.
Fix this by closing ARGV when it has reached end of file. Perl will reopen it to read the next file in the list, and so reset $.
perl -i -pe 'print "code I want to insert\n" if $. == 2; close ARGV if eof' *.php
If you can use sed, this should work:
sed -si '2i\CODE YOU WANT TO INSERT' *.php
To do it recursively, you might try:
find -name '*.php' -execdir sed -si '2i\CODE YOU WANT TO INSERT' '{}' +
Using File::Find.
Note, I've included 3 sanity checks to verify that things are actually being processed they way that you want.
Initially the script will just print out the found files until you comment out the bare return.
Then the script will save backups unless you uncomment the unlink statement.
Finally, the script will only process a single file until you comment out the exit statement.
These three checks are just so you can verify that everything is working as you desire before editing a whole directory tree.
use strict;
use warnings;
use File::Find;
my $to_insert = "code I want to insert\n";
find(sub {
return unless -f && /\.php$/;
print "Edit $File::Find::name\n";
return; # Comment out once satisfied with found files
local $^I = '.bak';
local #ARGV = $_;
while (<>) {
print $to_insert if $. == 2 && $_ ne $to_insert;
print;
}
# unlink "$_$^I"; # Uncomment to delete backups once certain that first file is processed correctly.
exit; # Comment out once certain that first file is processed correctly
}, '.')

Perl ambiguous command line options, and security implications of eval with -i?

I know this is incorrect. I just want to know how perl parses this.
So, I'm playing around with perl, what I wanted was perl -ne what I typed was perl -ie the behavior was kind of interesting, and I'd like to know what happened.
$ echo 1 | perl -ie'next unless /g/i'
So perl Aborted (core dumped) on that. Reading perl --help I see -i takes an extension for backups.
-i[extension] edit <> files in place (makes backup if extension supplied)
For those that don't know -e is just eval. So I'm thinking one of three things could have happened either it was parsed as
perl -i -e'next unless /g/i' i gets undef, the rest goes as argument to e
perl -ie 'next unless /g/i' i gets the argument e, the rest is hanging like a file name
perl -i"-e'next unless /g/i'" whole thing as an argument to i
When I run
$ echo 1 | perl -i -e'next unless /g/i'
The program doesn't abort. This leads me to believe that 'next unless /g/i' is not being parsed as a literal argument to -e. Unambiguously the above would be parsed that way and it has a different result.
So what is it? Well playing around with a little more, I got
$ echo 1 | perl -ie'foo bar'
Unrecognized switch: -bar (-h will show valid options).
$ echo 1 | perl -ie'foo w w w'
... works fine guess it reads it as `perl -ie'foo' -w -w -w`
Playing around with the above, I try this...
$ echo 1 | perl -ie'foo e eval q[warn "bar"]'
bar at (eval 1) line 1.
Now I'm really confused.. So how is Perl parsing this? Lastly, it seems you can actually get a Perl eval command from within just -i. Does this have security implications?
$ perl -i'foo e eval "warn q[bar]" '
Quick answer
Shell quote-processing is collapsing and concatenating what it thinks is all one argument. Your invocation is equivalent to
$ perl '-ienext unless /g/i'
It aborts immediately because perl parses this argument as containing -u, which triggers a core dump where execution of your code would begin. This is an old feature that was once used for creating pseudo-executables, but it is vestigial in nature these days.
What appears to be a call to eval is the misparse of -e 'ss /g/i'.
First clue
B::Deparse can your friend, provided you happen to be running on a system without dump support.
$ echo 1 | perl -MO=Deparse,-p -ie'next unless /g/i'
dump is not supported.
BEGIN { $^I = "enext"; }
BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined(($_ = <ARGV>))) {
chomp($_);
(('ss' / 'g') / 'i');
}
So why does unle disappear? If you’re running Linux, you may not have even gotten as far as I did. The output above is from Perl on Cygwin, and the error about dump being unsupported is a clue.
Next clue
Of note from the perlrun documentation:
-u
This switch causes Perl to dump core after compiling your program. You can then in theory take this core dump and turn it into an executable file by using the undump program (not supplied). This speeds startup at the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello world" executable comes out to about 200K on my machine.) If you want to execute a portion of your program before dumping, use the dump operator instead. Note: availability of undump is platform specific and may not be available for a specific port of Perl.
Working hypothesis and confirmation
Perl’s argument processing sees the entire chunk as a single cluster of options because it begins with a dash. The -i option consumes the next word (enext), as we can see in the implementation for -i processing.
case 'i':
Safefree(PL_inplace);
[Cygwin-specific code elided -geb]
{
const char * const start = ++s;
while (*s && !isSPACE(*s))
++s;
PL_inplace = savepvn(start, s - start);
}
if (*s) {
++s;
if (*s == '-') /* Additional switches on #! line. */
s++;
}
return s;
For the backup file’s extension, the code above from perl.c consumes up to the first whitespace character or end-of-string, whichever is first. If characters remain, the first must be whitespace, then skip it, and if the next is a dash then skip it also. In Perl, you might write this logic as
if ($$s =~ s/i(\S+)(?:\s-)//) {
my $extension = $1;
return $extension;
}
Then, all of -u, -n, -l, and -e are valid Perl options, so argument processing eats them and leaves the nonsensical
ss /g/i
as the argument to -e, which perl parses as a series of divisions. But before execution can even begin, the archaic -u causes perl to dump core.
Unintended behavior
An even stranger bit is if you put two spaces between next and unless
$ perl -ie'next unless /g/i'
the program attempts to run. Back in the main option-processing loop we see
case '*':
case ' ':
while( *s == ' ' )
++s;
if (s[0] == '-') /* Additional switches on #! line. */
return s+1;
break;
The extra space terminates option parsing for that argument. Witness:
$ perl -ie'next nonsense -garbage --foo' -e die
Died at -e line 1.
but without the extra space we see
$ perl -ie'next nonsense -garbage --foo' -e die
Unrecognized switch: -onsense -garbage --foo (-h will show valid options).
With an extra space and dash, however,
$ perl -ie'next -unless /g/i'
dump is not supported.
Design motivation
As the comments indicate, the logic is there for the sake of harsh shebang (#!) line constraints, which perl does its best to work around.
Interpreter scripts
An interpreter script is a text file that has execute permission enabled and whose first line is of the form:
#! interpreter [optional-arg]
The interpreter must be a valid pathname for an executable which is not itself a script. If the filename argument of execve specifies an interpreter script, then interpreter will be invoked with the following arguments:
interpreter [optional-arg] filename arg...
where arg... is the series of words pointed to by the argv argument of execve.
For portable use, optional-arg should either be absent, or be specified as a single word (i.e., it should not contain white space) …
Three things to know:
'-x y' means -xy to Perl (for some arbitrary options "x" and "y").
-xy, as common for unix tools, is a "bundle" representing -x -y.
-i, like -e absorbs the rest of the argument. Unlike -e, it considers a space to be the end of the argument (as per #1 above).
That means
-ie'next unless /g/i'
which is just a fancy way of writing
'-ienext unless /g/i'
unbundles to
-ienext -u -n -l '-ess /g/i'
^^^^^ ^^^^^^^
---------- ----------
val for -i val for -e
perlrun documents -u as:
This switch causes Perl to dump core after compiling your program. You can then in theory take this core dump and turn it into an executable file by using the undump program (not supplied). This speeds startup at the expense of some disk space (which you can minimize by stripping the executable). (Still, a "hello world" executable comes out to about 200K on my machine.) If you want to execute a portion of your program before dumping, use the dump() operator instead. Note: availability of undump is platform specific and may not be available for a specific port of Perl.

Perl oneliner match repeating itself

I'm trying to read a specific section of a line out of a file with Perl.
The file in question is of the following syntax.
# Sets $USER1$
$USER1$=/usr/....
# Sets $USER2$
#$USER2$=/usr/...
My oneliner is simple,
perl -ne 'm/^\$USER1\$\s*=\s*(\S*?)\s*$/m; print "$1";' /my/file
For some reason I'm getting the extraction for $1 repeated several times over, apparently once for every line in the file after my match occurs. What am I missing here?
You are executing print for every line of the file because print gets called for every line, whether the regex matches or not. Replace the first ; with an &&.
From perlre:
NOTE: Failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.
Try this instead:
perl -ne 'print "$1" if m/^\$USER1\$\s*=\s*(\S*?)\s*$/m;' /my/file
$ cat test.txt
# Sets $USER1$
$USER1$=/usr/....
# Sets $USER2$
#$USER2$=/usr/...
$ perl -nle 'print if /^\$USER1/;' test.txt
$USER1$=/usr/....
Try this
perl -ne '/^.*1?=([\w\W].*)$/;print "$1";' file

How do I use Perl on the command line to search the output of other programs?

As I understand (Perl is new to me) Perl can be used to script against a Unix command line. What I want to do is run (hardcoded) command line calls, and search the output of these calls for RegEx matches. Is there a way to do this simply in Perl? How?
EDIT: Sequence here is:
-Call another program.
-Run a regex against its output.
my $command = "ls -l /";
my #output = `$command`;
for (#output) {
print if /^d/;
}
The qx// quasi-quoting operator (for which backticks are a shortcut) is stolen from shell syntax: run the string as a command in a new shell, and return its output (as a string or a list, depending on context). See perlop for details.
You can also open a pipe:
open my $pipe, "$command |";
while (<$pipe>) {
# do stuff
}
close $pipe;
This allows you to (a) avoid gathering the entire command's output into memory at once, and (b) gives you finer control over running the command. For example, you can avoid having the command be parsed by the shell:
open my $pipe, '-|', #command, '< single argument not mangled by shell >';
See perlipc for more details on that.
You might be able to get away without Perl, as others have mentioned. However, if there is some Perl feature you need, such as extended regex features or additional text manipulation, you can pipe your output to perl then do what you need. Perl's -e switch let's you specify the Perl program on the command line:
command | perl -ne 'print if /.../'
There are several other switches you can pass to perl to make it very powerful on the command line. These are documented in perlrun. Also check out some of the articles in Randal Schwartz's Unix Review column, especially his first article for them. You can also google for Perl one liners to find lots of examples.
Do you need Perl at all? How about
command -I use | grep "myregexp" && dosomething
right in the shell?
#!/usr/bin/perl
sub my_action() {
print "Implement some action here\n";
}
open PROG, "/path/to/your/command|" or die $!;
while (<PROG>) {
/your_regexp_here/ and my_action();
print $_;
}
close PROG;
This will scan output from your command, match regexps and do some action (which now is printing the line)
In Perl you can use backticks to execute commands on the shell. Here is a document on using backticks. I'm not sure about how to capture the output, but I'm sure there's more than a way to do it.
You indeed use a one-liner in a case like this. I recently coded up one that I use, among other ways, to produce output which lists the directory structure present in a .zip archive (one dir entry per line). So using that output as an example of command output that we'd like to filter, we could put a pipe in and then use perl with the -n -e flags to filter the incoming data (and/or do other things with it):
[command_producing_text_output] | perl -MFile::Path -n -e \
"BEGIN{#PTM=()} if (m{^perl/(bin|lib(?!/site))}) {chomp;push #PTM,$_}" ^
-e "END{#WDD=mkpath (\#PTM,1);" ^
-e "printf qq/Created %u dirs to reflect part of structure present in the .ZIP file\n/, scalar(#WDD);}"
the shell syntax used, including: quoting of perl code and escaping of newlines, reflects CMD.exe usage in Windows NT-like consoles. If you need to, mentally replace
"^" with "\" and " with ' in the appropriate places.
The one-liner above adds only the directory names that start with "perl/bin" or
"perl/lib (not followed by "/site"); it then creates those directories. You wind
up with a (empty) tree that you can use for whatever evil purposes you desire.
The main point is to illustrate that there are flags available (-n, -p) to
allow perl to loop over each input record (line), and that what you can do is unlimited in terms of complexity.