grep lines matching the pattern using perl programming - perl

i want to grep pattern from a large file. But using grep it is very slow and pattern to be grep is also case insensitive. So , i read that perl is faster for file reading. Someone please tel me a way to do it in perl.
Thanks.

cat File1.txt File2.txt | grep -in "exception" | grep -v "pattern"
In perl:
perl -ne 'print "$.:$_" if /exception/i and !/pattern/' File1.txt File2.txt
or without cat:
grep -in "exception" File1.txt File2.txt | grep -v "pattern"

I am doubtful that Perl would give you a performance advantage in this area. The grep utility is a pre-compiled binary written in C; Perl is an interpreted language and bears an extra performance overhead which e.g. GNU grep does not. If grep is slow the bottleneck is most likely the file being loaded from disk into main memory. How big is the file?
FYI, grep has flags which enable case-insensitive matching and Perl-style regex syntax.
% grep -i 'abc' <file> # Matches abc, ABC, aBc, etc.
% grep -i 'ab\|cd' <file> # Matches ab or cd
% grep -P 'ab|cd' <file> # Matches ab or cd
An equivalent Perl program is:
# grep.pl
$pat = shift;
while(<>) { if(/$pat/i) { print; } }
which can be invoked as
perl grep.pl abc file1.txt file2.txt ...
My advice, stick with grep.

grep -i 'pattern' file->perl -lne 'print if(/pattern/i)' file
grep -vi 'pattern' file->perl -lne 'print unless(/pattern/i)' file
If you want everything with line numebrs as well, the replace the print
with print "$. $_" in the above commands

Related

Perl '-d' operator is not detecting a directory

I am piping the output of some commands to perl. The output consists of a set of filenames and directories, and I want perl to filter out the ones that are directories. Something like this:
...some commands... | perl -ne 'print $_ unless -d($_);'
The thing is, it is not filtering the directories! For example, output is something like:
test/unit_test/ipc
test/unit_test/ipc/tc1.cpp
test/unit_test/ipc is a directory, but it is still output.
The values of $_ which are read in by the perl one-liner include a trailing newline. Therefore, -d does not even find the directory, let alone recognize that it is a directory.
Here is a solution:
...some commands... | perl -ne 'chomp $_; print "$_\n" unless -d $_ ;'
Note the use of chomp to remove the trailing newline.
In conjunction with -n or -p, -l not only adds a newline to printed strings, it chomps the input. That means your code can be simplified to
...some commands... | perl -nle 'print $_ unless -d $_;'
or even
...some commands... | perl -nle'print if !-d'

sed does not recognize -r flag on AIX

thanks in advance for the help.
I have the following line that does work on linux.
myfile (extract)
active_instance_count=
aq_tm_processes=1
archive_lag_target=0
audit_file_dest=?/rdbms/audit
audit_sys_operations=FALSE
audit_trail=NONE
background_core_dump=partial
background_dump_dest=/home1/oracle/app/oracle/admin/iopecom/bdump
...
cat myfile |sed -r 's/ {1,}//g'|sed -r 's/\t*//g' |grep -v "^#"|sed -s "/^$/d" |sed =|sed 'N;s/\n/\t/'|sed -r "s/#.*//g" | sed "s/\t/;/g"|sed "s/\t/;/g"|sed -e "s,',\o042,g"
The result will be:
1;O7_DICTIONARY_ACCESSIBILITY=TRUE
2;active_instance_count=
3;aq_tm_processes=1
4;archive_lag_target=0
5;audit_file_dest=?/rdbms/audit
6;audit_sys_operations=FALSE
7;audit_trail=NONE
8;background_core_dump=partial
9;background_dump_dest=/home1/oracle/app/oracle/admin/iopecom/bdump
But, I can't figure out, how to perform the same command on AIX server.
Help is very welcome.
Regards.
Antonio.
Unless you have a compelling reason to use sed, you could use alternate tools:
awk -v OFS=';' '{print NR,$0}' filename
would produce the desired output.
You could also use perl:
perl -ne 'print "$.;$_"' filename
It appears that your sed expression would skip lines beginning with a #. As such, you could say:
perl -ne '$,=";"; !/^#/ && print ++$i,$_' filename
or something like:
grep -v '^#' filename | awk ...
reformatting your pipeline:
cat myfile |
sed -r 's/ {1,}//g' | # strip all spaces (1)
sed -r 's/\t*//g' | # strip all tabs (2)
grep -v "^#" | # delete all lines beginning `#` (3)
sed -s "/^$/d" | # delete all empty lines (4)
sed = | # interleave with line numbers (5)
sed 'N;s/\n/\t/' | # join line number and line with `\t` (6)
sed -r "s/#.*//g" | # strip all `#` comments (7)
sed "s/\t/;/g" | # replace all tabs with `;` (8)
sed "s/\t/;/g" | # do it again (9)
sed -e "s,',\o042,g" # replace all ' with " (10)
Boiling that down and using cat -n to provide the line numbers up front gets:
cat -n myfile |
sed "$(print 's/\t/;/')
$(print 's/[ \t]*//g')
s/#.*//g
/^$/d
s/'/\"/g"
which behaves identically unless I'm misreading the aix docs. The $(...) construction is command substitution, it runs that command and substitutes its output. print would be printf on linux.

perl -a: How to change column separator?

I want to read the columns in a file where the separator is :.
I tried it like this (because according to http://www.asciitable.com, the octal representation of the colon is 072):
$ echo "a:b:c" | perl -a -072 -ne 'print "$F[1]\n";'
I want it to print b, but it doesn't work.
Look at -F in perlrun:
% echo "a:b:c" | perl -a -F: -ne 'print "$F[1]\n";'
b
Note that the value is taken as regular expression, so some delimiters may need some extra escaping:
% echo "a.b.c" | perl -a -F. -ne 'print "$F[1]\n";'
% echo "a.b.c" | perl -a -F\\. -ne 'print "$F[1]\n";'
b
-0 specifies the record (line) separator. It was cause Perl to receive three lines:
>echo a:b:c | perl -072 -nE"say"
a:
b:
c
Since there's no whitespace on any of those lines, $F[1] would be empty if -a were to be used.
-F specifies the input field separator. This is what you want.
perl -F: -lanE'say $F[1];'
Or if you're stuck with an older Perl:
perl -F: -lane'print $F[1];'
Command line options are documented in perlrun.

Only print matching lines in perl from the command line

I'm trying to extract all ip addresses from a file. So far, I'm just using
cat foo.txt | perl -pe 's/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
but this also prints lines that don't contain a match. I can fix this by piping through grep, but this seems like it ought to be unnecessary, and could lead to errors if the regexes don't match up perfectly.
Is there a simpler way to accomplish this?
Try this:
cat foo.txt | perl -ne 'print if s/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
or:
<foo.txt perl -ne 'print if s/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
It's the shortest alternative I can think of while still using Perl.
However this way might be more correct:
<foo.txt perl -ne 'if (/((\d{1,3}\.){3}\d{1,3})/) { print $1 . "\n" }'
If you've got grep, then just call grep directly:
grep -Po "(\d{1,3}\.){3}\d{1,3}" foo.txt
You've already got a suitable answer of using grep to extract the IP addresses, but just to explain why you were seeing non-matches being printed:
perldoc perlrun will tell you about all the options you can pass Perl on the command line.
Quoting from it:
-p causes Perl to assume the following loop around your program, which makes it
iterate over filename arguments somewhat like sed:
LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
You could have used the -n switch instead, which does similar, but does not automatically print, for example:
cat foo.txt | perl -ne '/((?:\d{1,3}\.){3}\d{1,3})/ and print $1'
Also, there's no need to use cat; Perl will open and read the filenames you give it, so you could say e.g.:
perl -ne '/((?:\d{1,3}\.){3}\d{1,3})/ and print $1' foo.txt
ruby -0777 -ne 'puts $_.scan(/((?:\d{1,3}\.){3}\d{1,3})/)' file

sed or grep or awk to match very very long lines

more file
param1=" 1,deerfntjefnerjfntrjgntrjnvgrvgrtbvggfrjbntr*rfr4fv*frfftrjgtrignmtignmtyightygjn 2,3,4,5,6,7,8,
rfcmckmfdkckemdio8u548384omxc,mor0ckofcmineucfhcbdjcnedjcnywedpeodl40fcrcmkedmrikmckffmcrffmrfrifmtrifmrifvysdfn drfr4fdr4fmedmifmitfmifrtfrfrfrfnurfnurnfrunfrufnrufnrufnrufnruf"****
need to match the content of param1 as
sed -n "/$param1/p" file
but because the line length (very long line) I cant match the line
what’s the best way to match very long lines?
The problem you are facing is that param1 contains special characters which are being interpreted by sed. The asterisk ('*') is used to mean 'zero or more occurrences of the previous character', so when this character is interpreted by sed there is nothing left to match the literal asterisk you are looking for.
The following is a working bash script that should help:
#!/bin/bash
param1=' 1,deerfntjefnerjfntrjgntrjnvgrvgrtbvggfrjbntr\*rfr4fv\*frfftrjgtrignmtignmtyightygjn 2,3,4,5,6,7,8, rfcmckmfdkckemdio8u548384omxc,mor0ckofcmineucfhcbdjcnedjcnywedpeodl40fcrcmkedmrikmckffmcrffmrfrifmtrifmrifvysdfn'
cat <<EOF | sed "s/${param1}/Bubba/g"
1,deerfntjefnerjfntrjgntrjnvgrvgrtbvggfrjbntr*rfr4fv*frfftrjgtrignmtignmtyightygjn 2,3,4,5,6,7,8, rfcmckmfdkckemdio8u548384omxc,mor0ckofcmineucfhcbdjcnedjcnywedpeodl40fcrcmkedmrikmckffmcrffmrfrifmtrifmrifvysdfn
EOF
Maybe the problem is that your $param1 contains special characters? This works for me:
A="$(perl -e 'print "a" x 10000')"
echo $A | sed -n "/$A/p"
($A contains 10 000 a characters).
echo $A | grep -F $A
and
echo $A | grep -P $A
also works (second requires grep with built-in PCRE support. If you want pattern matching you should use either this or pcregrep. If you don't, use the fixed grep (grep -F)).
echo $A | grep $A
is too slow.