How to use grep command on multiple files at same time? - perl

I wish to save some execution time on my code. So rather than applying grep on individual file (using foreach), I want to use grep on all files in an array at same time
I am using:
`grep -Pi "search string" $files`
but it's not giving me desired result because search string is in double quotes ("...") and complete command is in backticks.
Here $files contains path of all files separated by single space and search string has regex like ^\\s*abc|bcd|efg\\s+...

Do yourself a favor and install ack. You'll rarely use grep after that. ack:
automatically recurses into subdirectories
treats search parameter as a Perl regular expression
rich set of command-line options more powerful and DWIMmy than grep
Install by installing the App::Ack distribution from CPAN

something like this should work
$s1='search string';
#x = qx(grep -Pi "$s1" $files);

Related

perl batch rename files in command line

I want to rename files with 'sr' in their names, replacing 'sr' with 'SR'. This one succeeded:
ls | perl -e 'while(<>){chomp;if(/(.*)sr(.*)/){rename $_,$1."SR".$2}}'
But this one failed:
ls | perl -e "while(<>){chomp;if(/sr/){rename $_,$\`.'SR'.($')}}"
with this error message:
Not enough arguments for rename at -e line 1, near "rename ,"`
Execution of -e aborted due to compilation errors.
It seems that $_ has become an empty string, but I don't quite understand why. Thanks for any explanations.
Now quotes have been an interesting problem and this is my test:
ls | perl -e "while(<>){chomp;if(/sr/){print $_;print\"\n\";print $\`,$&,($');print \"\n\";print $_,$\`,$&,($');print\"\n\";print $_;print\"\n\"}}"
outputs this:
3sr
3sr
3sr
3sr
sr1
sr1
sr1
sr1
sr2
sr2
sr2
sr2
it seems that when using alone, $_ is not empty; but it become empty when using along with $`,$& and $'. According to the last line of each file, I guess $_ has temporarily changed when not using alone?
Besides, according to a1111exe's answer, I test this:
ls | perl -e "while(<>){chomp;if(/sr/){print \$_,$\`,$&,($');print \"\n\"}}"
and got this:
3sr3sr
sr1sr1
sr2sr2
First in linux we should use single quote instead of double quote.
And instead of ls command you can use perl inbuilt function glob
And to capture the pre and post match you can use the $POSTMATCH and $PREMATCH from English module
so your one liner should be
perl -MEnglish -e 'while(<*>){chomp;if(/sr/){rename $_,$PREMATCH."SR".$POSTMATCH}}'
EDITED
Single quote and double quote is not about Perl this is about shell.
Single quote
Enclosing characters in single quotes (') preserves the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash.
Double quote
Enclosing characters in double quotes (‘"’) preserves the literal value of all characters within the quotes, with the exception of ‘$’, ‘`’, ‘\’, and, when history expansion is enabled, ‘!’.
In shell script we are accessing the shell variable prefix with $, so while using $ inside the double quote it is looking for the shell variable not a Perl variable. For example you can run the following line in your terminal,
m=4; perl -e "print $m;"
Here
m=4; perl -e "print $m;"
^ ^
| Accessing shell variable
Assigning shell variable
Output is 4. Because m is shell variable you are accessing the shell variable inside your Perl script.
And in windows, we need to use double-quote instead of single quote
It seems that double quotes mess between your shell environment and Perl. You can certainly do what #mkHun suggested. One other way:
ls | perl -e 'while(<>){chomp;($new=$_)=~s/sr/SR/g;rename $_,$new}'
Also, if you escape the '$' sigil in '$_', your oneliner will work too:
ls | perl -e "while(<>){chomp;if(/sr/){rename \$_,$\`.'SR'.$'}}"
I still don't get why though.. But it really seems like bash/perl interpolation issue.

Why is Perl complaining about an unterminated string here?

I have a Perl script which runs fine under Perl 5.005 on HPUX 11i v3 and that causes a small problem under Perl 5.10 on my Ubuntu 11.04 box.
It boils down to a line like:
open (CMD, "do something | grep -E 'sometext$' |");
(it's actually capturing ps -ef output to see if a process is running but I don't think that's important (a)).
Now this runs fine on the HPUX environment but, when I try it out under Ubuntu, I get:
sh: Syntax error: Unterminated quoted string
By inserting copious debug statements, I tracked it down to that offending line then started removing characters one-by-one until it stopped complaining. Luckily, $ was the first one I tried and it stopped giving me the error, so I then changed the line to:
open (CMD, "do something | grep -E 'sometext\$' |");
and it worked fine (under Linux anyway - I haven't tested that on HPUX since I don't have access to that machine today - if it does work, I'll just use that approach but I'd still like to know why it's a problem).
So it seems obvious that the $ is "swallowing" the closing single quote under my Linux environment but not apparently on the HPUX one.
My question is simply, why? Surely there weren't any massive changes between 5.005 and 5.10. Or is there some sort of configuration item I'm missing?
(a) But, if you know a better way to do this without external CPAN modules (ie, with just the baseline Perl 5.005 installation), I'd be happy to know about it.
$' is a special variable (see perldoc perlvar). 5.005 was many versions ago, so it's possible that something has changed in the regexp engine to make this variable different (although it appears to be in 5.005 also)
As for the better way, you could at least only run the 'ps -ef' in a pipeline and do the 'grep' in perl.
Use the following!!!
use strict;
use warnings;
You would have gotten
Use of uninitialized value $' in concatenation (.) or string
A sigil followed by any punctuation symbol (on a standard keyboard) is a variable in Perl, regardless of if it is defined or not. So in a double quoted string [$#][symbol] will always be read as one token and interpolated unless the sigil is escaped.
I have a feeling that the difference you are seeing has to do with different system shells rather than different versions of perl.
Consider your line:
open (CMD, "do something | grep -E 'sometext$' |");
When perl sees that, it will interpolate the empty $' variable into the double quoted string, so the string becomes:
open (CMD, "do something | grep -E 'sometext |");
At that point, your shell gets to process a line that looks like:
do something | grep -E 'sometext
And if it succeeds or fails will depend on the shell's rules regarding unterminated strings (some shells will complain loudly, others will automatically terminate the string at eof).
If you were using the warnings pragma in your script, you probably would have gotten a warning about interpolating an undefined variable.
A shorter and cleaner way to read in the output of ps would be:
my #lines = grep /sometext\$/, `ps -ef`;
Or using an explicit open:
my #lines = grep /sometext\$/, do {
open my $fh, '|-', 'ps -ef' or die $!;
<$fh>
};
Because $' is a special variable in recent versions of Perl.
From the official documentation (perlvar):
$':
The string following whatever was matched by the last successful
pattern match (not counting any matches hidden within a BLOCK or
eval() enclosed by the current BLOCK).
If there were no successful pattern matches, $' is empty and your statement essentially interpolates to
open (CMD, "do something | grep -E 'sometext |");
Escaping the dollar sign (the solution that works for you on Linux) should work on HPUX too.
I'm not sure when was this variable added, but I can confirm that it exists in Perl 5.8.5. What's New for Perl 5.005 mentions $' (not as a new feature), so I think it was already there before then.
You should probably use single-quotes rather than double quotes around the string since there is nothing in the string that should be interpolated:
open (CMD, q{do something | grep -E 'sometext$' |});
Better would be to use the 3-argument form of open with a lexical file handle:
open my $cmd, '-|', q{do something | grep -E 'sometext$'} or die 'a horrible death';
I don't have a good explanation for why $' is being recognized as the special variable in 5.10 and yet it was not in 5.005. That is unexpected.
Is there a really good reason you can't upgrade to something like 5.14.1? Even if you don't change the system-provided Perl, there's no obvious reason you can't install a recent version in some other location and use that for all your scripting work.
$' is a special variable.
If you want to avoid variable expansion, just use q():
open (CMD, q(do something | grep -E 'sometext$' |));

How to find path of find2perl script on Unix using bash or perl

We (the company I work for) need to run the find2perl script on over a thousand different Unix servers of different flavors (Linux, Solaris, HP-UX, AIX) and different versions.
The one thing that all the servers have in common, is that they all have at least one implementation of perl installed. However, not all systems have it configured the same way.
Finding the location of perl is easy enough using the which command. However, on 70% of the servers, the actual directory containing find2perl (the bin folder of perl) is not present in the $PATH variable and can't be located that way.
On some servers, perl is actually a symbolic link pointing another location, in which case I can use ls -l and sed to extract the target of the link to find where perl is actually installed.
On other servers however, it's more complicated, as it seems perl was compiled to a custom location and the binary of perl present in /bin or /usr/bin (or wherever perl is found) is not a symbolic link, but rather a full blown executable. In this case, I thought about using the #INC variable of perl to try to find find2perl but it seems rather excessive.
What would be the better/best/fullproof method (one-liner if possible) to always get the location of find2perl on a Unix system?
Ways to locate find2perl
Two ways, both of which rely on asking the perl install how it was configured:
Config.pm
Its probably scriptdirexp from Config.pm.
$ perl -MConfig -E 'say $Config{scriptdirexp}'
/usr/bin
And indeed, that's where find2perl is on my system. You can use Config; in your perl scripts, which is its major advantage over the next method.
perl -V:varname
As per Yanick Girouard's comment, you can also use perl -V:scriptdirexp to get this, in a format suitable to passing to eval in a shell script. There are actually several formats available (so, you don't need to use e.g., cut to parse it):
OPTION OUTPUT (\n = actual newline) NOTES
-V:scriptdirexp scriptdirexp='/usr/bin';\n full shell syntax, even if multiple -V options
-V:scriptdirexp: scriptdirexp='/usr/bin' trailing colon omits semicolon and newline
-V::scriptdirexp '/usr/bin'; \n extra leading colon omits var= part
-V::scriptdirexp: '/usr/bin' you can combine them.
Full documentation is in the perlrun manpage.
Ways to embed find2perl
If you decide to copy over find2perl, as per evil otto's comment, you can actually do that by embedding it in your shell script. There are many ways. If neither of the two below work, then you can certainly use shar (which has an extremely long history, and is likely compatible with everything).
Quoted here-document
The easiest way is if your shell supports quoted here-documents. They all should, as its a POSIX requirement:
#!/bin/sh
perl - -name 'foo' -mtime 2 -print <<'FIND2PERL'
#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$#"}'
if $running_under_some_shell;
⋮
FIND2PERL
Hex dump in a non-quoted here-document
If some of your shells don't implement quoted here-documents (POSIX‽ what's that!), then you have to protect find2perl from shell expansion. An easy way is to hex dump it, as 0–9 and a–f are all safe from shell expansion. The dump is easily done with xxd -p /usr/bin/find2perl, which only requires xxd on one machine. To read back the dump, you can use plain perl:
#!/bin/sh
perl -n -e 'chomp; print pack("H*", $_)' <<HEX | perl - -name 'foo'
23212f7573722f62696e2f7065726c0a202020206576616c202765786563
202f7573722f62696e2f7065726c202d5320243020247b312b222440227d
⋮
HEX
Using find2perl several times
Naturally, with either approach, you could also write find2perl to a temporary file (if you need to invoke it multiple times, for example). You could also embed it in a shell function.
perl -lwe '$_ = $^X; s/perl$/find2perl/; -f or die qq($_ not -f); print'
Copy the interpreter executable path into dollar default argument. Patch the value, assuming that find2perl is in the same directory as perl itself. (This is specified as UNIX only, so you don't have to cater for perl.exe, which would be easy enough to deal with.) Then test the file exists, and die if it doesn't. (You might invent some better error handling.) Then print the path if we're still alive. That's it.
Okay, here's a version that works for Windows, too:
perl -lwe "$_ = $^X; s/perl(\.exe)?$/find2perl/;
-f or -f qq($_.bat) or die qq($_ not -f); print"
Note the double quotes, de rigueur on Windows for cmd.exe. And it has to go on one line, I just wrapped it for readability.

command-line grep for a string with '$' in it?

I'm trying to find where two variables are being concatenated in a directory of scripts, but when I try the following:
grep -lire "$DATA_PATH . $AWARDS_YEAR" *
I get "undefined variable" errors...
I thought I could escape the $s by using:
grep -lire "\$DATA_PATH . \$AWARDS_YEAR" *
But I get the same error - so, how do you grep for strings with $s in?
Tcsh is a little different about variables than the usual shells, and it's the default on FreeBSD.
So, just use single quotes, '$VAR', or escape the $ outside of the quotes: \$"VAR"
Put it in single quotes, with the escaping slash:
grep -lire '\$DATA_PATH . \$AWARDS_YEAR' *
Also note, that the dot (.) is a regex character. If you don't want it to be, escape it, too (or don't use the -e option).
Here's a nice reference with more general info.

How do I use Perl on the command line to search the output of other programs?

As I understand (Perl is new to me) Perl can be used to script against a Unix command line. What I want to do is run (hardcoded) command line calls, and search the output of these calls for RegEx matches. Is there a way to do this simply in Perl? How?
EDIT: Sequence here is:
-Call another program.
-Run a regex against its output.
my $command = "ls -l /";
my #output = `$command`;
for (#output) {
print if /^d/;
}
The qx// quasi-quoting operator (for which backticks are a shortcut) is stolen from shell syntax: run the string as a command in a new shell, and return its output (as a string or a list, depending on context). See perlop for details.
You can also open a pipe:
open my $pipe, "$command |";
while (<$pipe>) {
# do stuff
}
close $pipe;
This allows you to (a) avoid gathering the entire command's output into memory at once, and (b) gives you finer control over running the command. For example, you can avoid having the command be parsed by the shell:
open my $pipe, '-|', #command, '< single argument not mangled by shell >';
See perlipc for more details on that.
You might be able to get away without Perl, as others have mentioned. However, if there is some Perl feature you need, such as extended regex features or additional text manipulation, you can pipe your output to perl then do what you need. Perl's -e switch let's you specify the Perl program on the command line:
command | perl -ne 'print if /.../'
There are several other switches you can pass to perl to make it very powerful on the command line. These are documented in perlrun. Also check out some of the articles in Randal Schwartz's Unix Review column, especially his first article for them. You can also google for Perl one liners to find lots of examples.
Do you need Perl at all? How about
command -I use | grep "myregexp" && dosomething
right in the shell?
#!/usr/bin/perl
sub my_action() {
print "Implement some action here\n";
}
open PROG, "/path/to/your/command|" or die $!;
while (<PROG>) {
/your_regexp_here/ and my_action();
print $_;
}
close PROG;
This will scan output from your command, match regexps and do some action (which now is printing the line)
In Perl you can use backticks to execute commands on the shell. Here is a document on using backticks. I'm not sure about how to capture the output, but I'm sure there's more than a way to do it.
You indeed use a one-liner in a case like this. I recently coded up one that I use, among other ways, to produce output which lists the directory structure present in a .zip archive (one dir entry per line). So using that output as an example of command output that we'd like to filter, we could put a pipe in and then use perl with the -n -e flags to filter the incoming data (and/or do other things with it):
[command_producing_text_output] | perl -MFile::Path -n -e \
"BEGIN{#PTM=()} if (m{^perl/(bin|lib(?!/site))}) {chomp;push #PTM,$_}" ^
-e "END{#WDD=mkpath (\#PTM,1);" ^
-e "printf qq/Created %u dirs to reflect part of structure present in the .ZIP file\n/, scalar(#WDD);}"
the shell syntax used, including: quoting of perl code and escaping of newlines, reflects CMD.exe usage in Windows NT-like consoles. If you need to, mentally replace
"^" with "\" and " with ' in the appropriate places.
The one-liner above adds only the directory names that start with "perl/bin" or
"perl/lib (not followed by "/site"); it then creates those directories. You wind
up with a (empty) tree that you can use for whatever evil purposes you desire.
The main point is to illustrate that there are flags available (-n, -p) to
allow perl to loop over each input record (line), and that what you can do is unlimited in terms of complexity.