Creating CSV of information extracted from filenames in a given format

Creating CSV of information extracted from filenames in a given format - perl

I have a little script that lists paths to all files in a directory and all subdirectories and parses each path on the list with regex in Perl.
#!/bin/sh
find * -type f | while read j; do
echo $j | perl -n -e '/\/(\d{2})\/(\d{2})\/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?/ && print "\"0\";\"$1$2$3\";\"$4\";\"$5\";$fl\""' >> bss.csv
echo | readlink -f -n "$j" >>bss.csv
echo \">>bss.csv
done
Output:
"0";"13957";"4121113";"2";"/home/root/dir1/bss/164146/13/95/7___/000240216___Abc-4121113_2.jpg"
I am using the readlink from GNU coreutils: -n suppresses newline at the end, -f performs canonicalization by recursively following symlinks on the path.
Problem is, when input string did not pass regex I have only line with file path.
How can I add condition to check if regex passed - show path, else - no.
I broke my brain with various combinations, but didn't find any that work properly.

Description of solution
In Perl, use if (/…/) {…} else {…} instead of /…/ && …. Thus you can execute print if match is successful and some other code otherwise.
If this is not the problem and you only want to get rid of the readlink output and closing quote, you can call readlink from Perl using backticks.
Resulting code
I turned everything into a single Perl program, used File::Find instead of find command, assumed $fl at the end of print in Perl is a relict (ignored it) and used Cwd::realpath() to find canonical path of the file instead of readlink -f from GNU coreutils. If you still want to use readlink -f, feel free to change Cwd::realpath($_) to `readlink -f '$_'` (including the backticks!), but then it will not work for filenames containing a single-quote.
You should call this script as ./script-name starting-directory > bss.csv. If you put it in the directory you are examining, the output would contain it too, along with the bss.csv.
#!/usr/bin/perl
# Usage: ./$0 [<starting-directory>...]
use strict;
use warnings;
use File::Find;
use Cwd;
no warnings 'File::Find';
sub handleFile() {
return if not -f;
if ($File::Find::name =~ /\/(\d{2})\/(\d{2})\/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?/) {
local $, = ';', $\ = "\n";
print map "\"$_\"", 0, $1.$2.$3, $4, $5, Cwd::realpath($_);
} else {
print STDERR "File $File::Find::name did not match\n";
}
}
find(\&handleFile, #ARGV ? #ARGV : '.');
For reference I also enclose polished version of the original program. It is calling readlink from Perl as I suggested above and really utilizes the -n option of Perl, avoiding the while read loop.
#!/bin/sh
find . -type f | perl -n -e 'm{/(\d{2})/(\d{2})/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?} && print qq{"0";"$1$2$3";"$4";"$5";"`readlink -f -n '\''$_'\''`"}' > bss.csv
Other remarks to the original code
The echo | before the readlink does nothing and should be removed. Readlink does not read its stdin.
Where does $fl at the end of print in Perl come from? I assume it is a relict.
Use of generic quotes like qq{} and thoughtful use of delimiters (e.g. in regex matching and other quote-like operators) can save you from quoting hell. I already used this tip above: /…/ → m{…} and "…" → qq{…}. Thx, Slade! See perlop manpage for more info.

If I understand you, you want to capture the following parts of the filename:
/home/root/dir1/bss/164146/13/95/7___/000240216___Abc-4121113_2.jpg
~~ ~~ ~ ~~~ ~~~~~~~ ~
1 2 3 4 5 6
But your perl regex doesn't do that. Let's break it apart for better understanding.
/\/(\d{2})\/(\d{2})\/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?/
Sliced into pieces, this would be...
\/(\d{2}) - a slash then two digits (with the digits captured)
\/(\d{2}) - another slash and two digits
\/(\d) - one more slash and any number of digits
.*- - any run of characters until the final hyphen in the input string
([a-zA-Z]+) - one or more alpha characters
(?:_(\d{1}))? - nonsensical (I think) construct matching an optional single digit that won't be captured (because it's inside a (?:...))
If you step through your filename, you'll see that there is nothing here to handle the second last string of digits.
I'd do this using simpler tools. Sed, for example:
[ghoti#pc ~]$ s="/home/root/dir1/bss/164146/13/95/7___/000240216___Abc-4121113_2.jpg"
[ghoti#pc ~]$ echo "$s" | sed -rne 's/.*/"&"/;h;s:.*/([0-9]{2})/([0-9]{2})/([0-9]+)[^[a-zA-Z]]*[^-]+-([0-9]+)(_([0-9]+))?.*:"0";"\1\2\3";"\4";"\6":;G;s/\n/;/;p'
"0";"13957";"4121113";"2";"/home/root/dir1/bss/164146/13/95/7___/000240216___Abc-4121113_2.jpg"
[ghoti#pc ~]$
I'll break up the sed script for easier reading:
s/.*/"&"/; - Put quotes around the filename.
h; - Store the filename in Sed's "hold" space, for future use...
s: - Start the big substitution...
.*/([0-9]{2})/([0-9]{2})/([0-9]+)[^[a-zA-Z]]*[^-]+-([0-9]+)(_([0-9]+))?.* - This is the pattern we want to match for substitution. Similar to what you did in Perl, obviously, but using ERE instead of PCRE.
:"0";"\1\2\3";"\4";"\6":; - The replacement pattern, with \n being replaced by the bracketed elements of the RE. Note that \5 is skipped in the replace string, as that subexpression is only being used for the match.
G; - Append the "hold" space to the pattern space
s/\n/;/; - and remove the newline between them.
p - Print the result.
Note that this solution, as is, assumes that all input lines match the pattern you're looking for. If that's not the case, then you may get unpredictable output, and should put some pattern matching into the script.

Related

Extract everything between first and last occurence of the same pattern in single iteration

This question is very much the same as this except that I am looking to do this as fast as possible, doing only a single pass of the (unfortunately gzip compressed) file.
Given the pattern CAPTURE and input
1:.........
...........
100:CAPTURE
...........
150:CAPTURE
...........
200:CAPTURE
...........
1000:......
Print:
100:CAPTURE
...........
150:CAPTURE
...........
200:CAPTURE
Can this be accomplished with a regular expression?
I vaguely remember that this kind of grammar cannot be captured by a regular expression but not quite sure as regular expressions these days provide look aheads,etc.

You can buffer the lines until you see a line that contains CAPTURE, treating the first occurrence of the pattern specially.
#!/usr/bin/env perl
use warnings;
use strict;
my $first=1;
my #buf;
while ( my $line = <> ) {
push #buf, $line unless $first;
if ( $line=~/CAPTURE/ ) {
if ($first) {
#buf = ($line);
$first = 0;
}
print #buf;
#buf = ();
}
}
Feed the input into this program via zcat file.gz | perl script.pl.
Which can of course be jammed into a one-liner, if need be...
zcat file.gz | perl -ne '$x&&push#b,$_;if(/CAPTURE/){$x||=#b=$_;print#b;#b=()}'
Can this be accomplished with a regular expression?
You mean in a single pass, in a single regex? If you don't mind reading the entire file into memory, sure... but this is obviously not a good idea for large files.
zcat file.gz | perl -0777ne '/((^.*CAPTURE.*$)(?s:.*)(?2)(?:\z|\n))/m and print $1'

I would write
gunzip -c file.gz | sed -n '/CAPTURE/,$p' | tac | sed -n '/CAPTURE/,$p' | tac

Find the first CAPTURE and look back for the last one.
echo "/CAPTURE/,?CAPTURE? p" | ed -s <(gunzip -c inputfile.gz)
EDIT: Answer to comment and second (better?) solution.
When your input doesn't end with a newline, ed will complain, as shown by these tests.
# With newline
printf "1,$ p\n" | ed -s <(printf "%s\n" test)
# Without newline
printf "1,$ p\n" | ed -s <(printf "%s" test)
# message removed
printf "1,$ p\n" | ed -s <(printf "%s" test) 2> /dev/null
I do not know the memory complications this will give for a large file, but you would prefer a streaming solution.
You can use sed for the next approach.
Keep reading lines until you find the first match. During this time only remember the last line read (by putting it in a Hold area).
Now change your tactics.
Append each line to the Hold area. You do not know when to flush until the next match.
When you have the next match, recall the Hold area and print this.
I needed some tweeking for preventing the second match to be printed twice. I solved this by reading the next line and replacing the HOLD area with that line.
The total solution is
gunzip -c inputfile.gz | sed -n '1,/CAPTURE/{h;n};H;/CAPTURE/{x;p;n;h};'
When you don't like the sed holding space, you can implemnt the same approach with awk:
gunzip -c inputfile.gz |
awk '/CAPTURE/{capt=1} capt==1{a[i++]=$0} /CAPTURE/{for(j=0;j<i;j++) print a[j]; i=0}'

I don't think regex will be faster than double scan...
Here is an awk solution (double scan)
$ awk '/pattern/ && NR==FNR {a[++f]=NR; next} a[1]<=FNR && FNR<=a[f]' file{,}
Alternatively if you have any a priori information on where the patterns appear on the file you can have heuristic approaches which will be faster on those special cases.

Here is one more example with regex (the cons is that if files are large, it will consume a large memory)
#!/usr/bin/perl
{
local $/ = undef;
open FILE, $ARGV[0] or die "Couldn't open file: $!";
binmode FILE;
$string = <FILE>;
close FILE;
}
print $1 if $string =~ /([^\n]+(CAPTURE).*\2.*?)\n/s;
Or with one liner:
cat file.tmp | perl -ne '$/=undef; print $1 if <STDIN> =~ /([^\n]+(CAPTURE).*\2.*?)\n/s'
result:
100:CAPTURE
...........
150:CAPTURE
...........
200:CAPTURE

This might work for you (GNU sed):
sed '/CAPTURE/!d;:a;n;:b;//ba;$d;N;bb' file
Delete all lines until the first containing the required string. Print the line containing the required string. Replace the pattern space with the next line. If this line contains the required string, repeat the last two previous sentences. If it is the last line of the file, delete the pattern space. Otherwise, append the next line and repeat the last three previous sentences.
Having studied the test files used for haukex's benchmark, it would seem that sed is not the tool to extract this file. Using a mixture of csplit, grep and sed presents a reasonably fast solution as follows:
lines=$(grep -nTA1 --no-group-separator CAPTURE oldFile |
sed '1s/\t.*//;1h;$!d;s/\t.*//;H;x;s/\n/ /')
csplit -s oldFile $lines && rm xx0{0,2} && mv xx01 newFile
Split the original file into three files. A file preceding the first occurrence of CAPTURE, a file from the first CAPTURE to the last CAPTURE and a file containing of the remainder. The first and third files are discarded and the second file renamed.
csplit can use line numbers to split the original file. grep is extremely fast at filtering patterns and can return the line numbers of all patterns that match CAPTURE and the following context line. sed can manipulate the results of grep into two line numbers which are supplied to the csplit command.
When run against the test files (as above) I get timings around 10 seconds.

While posting this question, the problem I had at hand was that I had several huge gzip compressed log files generated by a java application.
The log lines were of the following format:
[Timestamp] (AppName) {EventId} [INFO]: Log text...
[Timestamp] (AppName) {EventId} [EXCEPTION]: Log text...
at com.application.class(Class.java:154)
caused by......
[Timestamp] (AppName) {EventId} [LogLevel]: Log text...
Given an EventId, I needed to extract all the lines corresponding to the event from these files. The problem became unsolvable with a trivial grep for EventId just due to the fact that the exception lines could be of arbitrary length and do not contain the EventId.
Unfortunately I forgot to consider the edge case where the last log line for an EventId could be the exception and the answers posted here would not print the stacktrace lines. However it wasn't hard to modify haukex's solution to cover these cases as well:
#!/usr/bin/env perl
use warnings;
use strict;
my $first=1;
my #buf;
while ( my $line = <> ) {
push #buf, $line unless $first;
if ( $line=~/EventId/ or ($first==0 and $line!~/\(AppName\)/)) {
if ($first) {
#buf = ($line);
$first = 0;
}
print #buf;
#buf = ();
}
else {
$first = 1;
}
}
I am still wondering if the faster solutions(mainly walter's sed solution or haukex's in-memory perl solution) could be modified to do the same.

What's the use of <> in Perl?

What's the use of <> in Perl. How to use it ?
If we simply write
<>;
and
while(<>)
what is that the program doing in both cases?

The answers above are all correct, but it might come across more plainly if you understand general UNIX command line usage. It is very common to want a command to work on multiple files. E.g.
ls -l *.c
The command line shell (bash et al) turns this into:
ls -l a.c b.c c.c ...
in other words, ls never see '*.c' unless the pattern doesn't match. Try this at a command prompt (not perl):
echo *
you'll notice that you do not get an *.
So, if the shell is handing you a bunch of file names, and you'd like to go through each one's data in turn, perl's <> operator gives you a nice way of doing that...it puts the next line of the next file (or stdin if no files are named) into $_ (the default scalar).
Here is a poor man's grep:
while(<>) {
print if m/pattern/;
}
Running this script:
./t.pl *
would print out all of the lines of all of the files that match the given pattern.
cat /etc/passwd | ./t.pl
would use cat to generate some lines of text that would then be checked for the pattern by the loop in perl.
So you see, while(<>) gets you a very standard UNIX command line behavior...process all of the files I give you, or process the thing I piped to you.

<>;
is a short way of writing
readline();
or if you add in the default argument,
readline(*ARGV);
readline is an operator that reads a line from the specified file handle. Reading from the special file handle ARGV will read from STDIN if #ARGV is empty or from the concatenation of the files named by #ARGV if it's not.
As for
while (<>)
It's a syntax error. If you had
while (<>) { ... }
it get rewritten to
while (defined($_ = <>)) { ... }
And as previously explained, that means the same as
while (defined($_ = readline(*ARGV))) { ... }
That means it will read lines from (previously explained) ARGV until there are no more lines to read.

It is called the diamond operator and feeds data from either stdin if ARGV is empty or each line from the files named in ARGV. This webpage http://docstore.mik.ua/orelly/perl/learn/ch06_02.htm explains it very well.

In many cases of programming with syntactical sugar like this, Deparse of O is helpful to find out what's happening:
$ perl -MO=Deparse -e 'while(<>){print 42}'
while (defined($_ = <ARGV>)) {
print 42;
}
-e syntax OK

Quoting perldoc perlop:
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk, and any other Unix filter program that takes
a list of filenames, doing the same to each line of input from all of
them. Input from <> comes either from standard input, or from each
file listed on the command line.

it takes the STDIN standard input:
> cat temp.pl
#!/usr/bin/perl
use strict;
use warnings;
my $count=<>;
print "$count"."\n";
>
below is the execution:
> temp.pl
3
3
>
so as soon as you execute the script it will wait till the user gives some input.
after 3 is given as input,it stores that value in $count and it prints the value in the next statement.

Extract a specific pattern from lines with sed, awk or perl

Can I use sed if I need to extract a pattern enclosed by a specific pattern, if it exists in a line?
Suppose I have a file with the following lines :
There are many who dare not kill themselves for [/fear/] of what the neighbors will say.
Advice is what we ask for when we already know the /* answer */ but wish we didn’t.
In both the cases I have to scan the line for the first occurring pattern i.e ' [/ ' or '/* ' in their respective cases and store the following pattern till then exit pattern i.e ' /] 'or ' */ ' respectively .
In short , I need fear and answer .If possible , Can it be extended for multiple lines ;in the sense ,if the exit pattern occurs in a line different than the same .
Any kind of help in the form of suggestions or algorithms are welcome. Thanks in advance for the replies

use strict;
use warnings;
while (<DATA>) {
while (m#/(\*?)(.*?)\1/#g) {
print "$2\n";
}
}
__DATA__
There are many who dare not kill themselves for [/fear/] of what the neighbors will say.
Advice is what we ask for when we already know the /* answer */ but wish we didn’t.
As a one-liner:
perl -nlwe 'while (m#/(\*?)(.*?)\1/#g) { print $2 }' input.txt
The inner while loop will iterate between all matches with the /g modifier. The backreference \1 will make sure we only match identical open/close tags.
If you need to match blocks that extend over multiple lines, you need to slurp the input:
use strict;
use warnings;
$/ = undef;
while (<DATA>) {
while (m#/(\*?)(.*?)\1/#sg) {
print "$2\n";
}
}
__DATA__
There are many who dare not kill themselves for [/fear/] of what the neighbors will say. /* foofer */
Advice is what we ask for when we already know the /* answer */ but wish we didn’t.
foo bar /
baz
baaz / fooz
One-liner:
perl -0777 -nlwe 'while (m#/(\*?)(.*?)\1/#sg) { print $2 }' input.txt
The -0777 switch and $/ = undef will cause file slurping, meaning all of the file is read into a scalar. I also added the /s modifier to allow the wildcard . to match newlines.
Explanation for the regex: m#/(\*?)(.*?)\1/#sg
m# # a simple m//, but with # as delimiter instead of slash
/(\*?) # slash followed by optional *
(.*?) # shortest possible string of wildcard characters
\1/ # backref to optional *, followed by slash
#sg # s modifier to make . match \n, and g modifier
The "magic" here is that the backreference requires a star * only when one is found before it.

Quick and dirty way in awk
awk 'NF{ for (i=1;i<=NF;i++) if($i ~ /^\[\//) { print gensub (/^..(.*)..$/,"\\1","g",$i); } else if ($i ~ /^\/\*/) print $(i+1);next}1' input_file
Test:
$ cat file
There are many who dare not kill themselves for [/fear/] of what the neighbors will say.
Advice is what we ask for when we already know the /* answer */ but wish we didn't.
$ awk 'NF{ for (i=1;i<=NF;i++) if($i ~ /^\[\//) { print gensub (/^..(.*)..$/,"\\1","g",$i); } else if ($i ~ /^\/\*/) print $(i+1);next}1' file
fear
answer

Single-Line Matches
If you really want to do this in sed, you can extract your delimited patterns relatively easily as long as they are on the same line.
# Using GNU sed. Escape a whole lot more if your sed doesn't handle
# the -r flag.
sed -rn 's![^*/]*(/\*?.*/).*!\1!p' /tmp/foo
Multi-Line Matches
If you want to perform multi-line matches with sed, things get a little uglier. However, it can certainly be done.
# Multi-line matching of delimiters with GNU sed.
sed -rn ':loop
/\/[^\/]/ {
N
s![^*/]+(/\*?.*\*?/).*!\1!p
T loop
}' /tmp/foo
The trick is to look for a starting delimiter, then keep appending lines in a loop until you find the ending delimiter.
This works really well as long as you really do have an ending delimiter. Otherwise, the contents of the file will keep being appended to the pattern space until sed finds one, or until it reaches the end of the file. This may cause problems with certain versions of sed or with really, really large files where the size of the pattern space gets out of hand.
See GNU sed's Limitations and Non-limitations for more information.

Simple search and replace without regex

I've got a file with various wildcards in it that I want to be able to substitute from a (Bash) shell script. I've got the following which works great until one of the variables contains characters that are special to regexes:
VERSION="1.0"
perl -i -pe "s/VERSION/${VERSION}/g" txtfile.txt # No problems here
APP_NAME="../../path/to/myapp"
perl -i -pe "s/APP_NAME/${APP_NAME}/g" txtfile.txt # Error!
So instead I want something that just performs a literal text replacement rather than a regex. Are there any simple one-line invocations with Perl or another tool that will do this?

The 'proper' way to do this is to escape the contents of the shell variables so that they aren't seen as special regex characters. You can do this in Perl with \Q, as in
s/APP_NAME/\Q${APP_NAME}/g
but when called from a shell script the backslash must be doubled to avoid it being lost, like so
perl -i -pe "s/APP_NAME/\\Q${APP_NAME}/g" txtfile.txt
But I suggest that it would be far easier to write the entire script in Perl

Use the following:
perl -i -pe "s|APP_NAME|\\Q${APP_NAME}|g" txtfile.txt
Since a vertical bar is not a legal character as part of a path, you are good to go.

I don't particularly like this answer because there should be a better way to do a literal replace in Perl. \Q is cryptic. Using quotemeta adds extra lines of code.
But... You can use substr to replace a portion of a string.
#!/usr/bin/perl
my $name = "Jess.*";
my $sentence = "Hi, my name is Jess.*, dude.\n";
my $new_name = "Prince//";
my $name_idx = index $sentence, $name;
if ($name_idx >= 0) {
substr($sentence, $name_idx, length($name), $new_name);
}
print $sentence;
Output:
Hi, my name is Prince//, dude.

You don't have to use a regular expression for this (using substr(), index(), and length()):
perl -pe '
foreach $var ("VERSION", "APP_NAME") {
while (($i = index($_, $var)) != -1) {
substr($_, $i, length($var)) = $ENV{$var};
}
}
'
Make sure you export your variables.

You can use a regex but escape any special characters.
Something like this may work.
APP_NAME="../../path/to/myapp"
APP_NAME=`echo "$APP_NAME" | sed -e '{s:/:\/:}'`
perl -i -pe "s/APP_NAME/${APP_NAME}/g" txtfile.txt

Use:
perl -i -pe "\$r = qq/\Q${APP_NAME}\E/; s/APP_NAME/\$r/go"
Rationale: Escape sequences

I managed to get a working solution, partly based on bits and pieces from other peoples' answers:
app_name='../../path/to/myapp'
perl -pe "\$r = q/${app_name//\//\\/}/; s/APP_NAME/\$r/g" <<<'APP_NAME'
This creates a Perl variable, $r, from the result of the shell parameter expansion:
${app_name//\//\\/}
${ # Open parameter expansion
app_name # Variable name
// # Start global substitution
\/ # Match / (backslash-escaped to avoid being interpreted as delimiter)
/ # Delimiter
\\/ # Replace with \/ (literal backslash needs to be escaped)
} # Close parameter expansion
All that work is needed to prevent forward slashes inside the variable from being treated as Perl syntax, which would otherwise close the q// quotes around the string.
In the replacement part, use the variable $r (the $ is escaped, to prevent it from being treated as a shell variable within double quotes).
Testing it out:
$ app_name='../../path/to/myapp'
$ perl -pe "\$r = q/${app_name//\//\\/}/; s/APP_NAME/\$r/g" <<<'APP_NAME'
../../path/to/myapp

How do I best pass arguments to a Perl one-liner?

I have a file, someFile, like this:
$cat someFile
hdisk1 active
hdisk2 active
I use this shell script to check:
$cat a.sh
#!/usr/bin/ksh
for d in 1 2
do
grep -q "hdisk$d" someFile && echo "$d : ok"
done
I am trying to convert it to Perl:
$cat b.sh
#!/usr/bin/ksh
export d
for d in 1 2
do
cat someFile | perl -lane 'BEGIN{$d=$ENV{'d'};} print "$d: OK" if /hdisk$d\s+/'
done
I export the variable d in the shell script and get the value using %ENV in Perl. Is there a better way of passing this value to the Perl one-liner?

You can enable rudimentary command line argument with the "s" switch. A variable gets defined for each argument starting with a dash. The -- tells where your command line arguments start.
for d in 1 2 ; do
cat someFile | perl -slane ' print "$someParameter: OK" if /hdisk$someParameter\s+/' -- -someParameter=$d;
done
See: perlrun

Sometimes breaking the Perl enclosure is a good trick for these one-liners:
for d in 1 2 ; do cat kk2 | perl -lne ' print "'"${d}"': OK" if /hdisk'"${d}"'\s+/';done

Pass it on the command line, and it will be available in #ARGV:
for d in 1 2
do
perl -lne 'BEGIN {$d=shift} print "$d: OK" if /hdisk$d\s+/' $d someFile
done
Note that the shift operator in this context removes the first element of #ARGV, which is $d in this case.

Combining some of the earlier suggestions and adding my own sugar to it, I'd do it this way:
perl -se '/hdisk([$d])/ && print "$1: ok\n" for <>' -- -d='[value]' [file]
[value] can be a number (i.e. 2), a range (i.e. 2-4), a list of different numbers (i.e. 2|3|4) (or almost anything else, that's a valid pattern) or even a bash variable containing one of those, example:
d='2-3'
perl -se '/hdisk([$d])/ && print "$1: ok\n" for <>' -- -d=$d someFile
and [file] is your filename (that is, someFile).

If you are having trouble writing a one-liner, maybe it is a bit hard for one line (just my opinion). I would agree with #FM's suggestion and do the whole thing in Perl. Read the whole file in and then test it:
use strict;
local $/ = '' ; # Read in the whole file
my $file = <> ;
for my $d ( 1 .. 2 )
{
print "$d: OK\n" if $file =~ /hdisk$d\s+/
}
You could do it looping, but that would be longer. Of course it somewhat depends on the size of the file.
Note that all the Perl examples so far will print a message for each match - can you be sure there are no duplicates?

My solution is a little different. I came to your question with a Google search the title of your question, but I'm trying to execute something different. Here it is in case it helps someone:
FYI, I was using tcsh on Solaris.
I had the following one-liner:
perl -e 'use POSIX qw(strftime); print strftime("%Y-%m-%d", localtime(time()-3600*24*2));'
which outputs the value:
2013-05-06
I was trying to place this into a shell script so I could create a file with a date in the filename, of X numbers of days in the past. I tried:
set dateVariable=`perl -e 'use POSIX qw(strftime); print strftime("%Y-%m-%d", localtime(time()-3600*24*$numberOfDaysPrior));'`
But this didn't work due to variable substitution. I had to mess around with the quoting, to get it to interpret it properly. I tried enclosing the whole lot in double quotes, but this made the Perl command not syntactically correct, as it messed with the double quotes around date format. I finished up with:
set dateVariable=`perl -e "use POSIX qw(strftime); print strftime('%Y-%m-%d', localtime(time()-3600*24*$numberOfDaysPrior));"`
Which worked great for me, without having to resort to any fancy variable exporting.
I realise this doesn't exactly answer your specific question, but it answered the title and might help someone else!

That looks good, but I'd use:
for d in $(seq 1 2); do perl -nle 'print "hdisk$ENV{d} OK" if $_ =~ /hdisk$ENV{d}/' someFile; done

It's already written on the top in one long paragraph but I am also writing for lazy developers who don't read those lines.
Double quotes and single quote has big different meaning for the bash.
So please take care
Doesn't WORK perl '$VAR' $FILEPATH
WORKS perl "$VAR" $FILEPATH

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Creating CSV of information extracted from filenames in a given format - perl

Related

Extract everything between first and last occurence of the same pattern in single iteration

What's the use of <> in Perl?

Extract a specific pattern from lines with sed, awk or perl

Simple search and replace without regex

How do I best pass arguments to a Perl one-liner?

Categories

Resources