grep regex to perl or awk - perl

I have been using Linux env and recently migrated to solaris. Unfortunately one of my bash scripts requires the use of grep with the P switch [ pcre support ] .As Solaris doesnt support the pcre option for grep , I am obliged to find another solution to the problem.And pcregrep seems to have an obvious loop bug and sed -r option is unsupported !
I hope that using perl or nawk will solve the problem on solaris.
I have not yet used perl in my script and am unware neither of its syntax nor the flags.
Since it is pcre , I beleive that a perl scripter can help me out in a matter of minutes. They should match over multiple lines .
Which one would be a better solution in terms of efficiency the awk or the perl solution ?
Thanks for the replies .

These are some grep to perl conversions you might need:
grep -P PATTERN FILE(s) ---> perl -nle 'print if m/PATTERN/' FILE(s)
grep -Po PATTERN FILE(s) ---> perl -nle 'print "$1\n" while m/(PATTERN)/g' FILE(s)
That's my guess as to what you're looking for, if grep -P is out of the question.

Here's a shorty:
grep -P /regex/ ====> perl -ne 'print if /regex/;'
The -n takes each line of the file as input. Each line is put into a special perl variable called $_ as Perl loops through the whole file.
The -e says the Perl program is on the command line instead of passing it a file.
The Perl print command automatically prints out whatever is in $_ if you don't specify for it to print out anything else.
The if /regex/ matches the regular expression against whatever line of your file is in the $_ variable.

Related

perl one-liner to keep only desired lines

I have a text file (input.txt) like this:
NP_414685.4: 15-26, 131-138, 441-465
NP_418580.2: 493-500
NP_418780.2: 36-48, 44-66
NP_418345.2:
NP_418473.3: 1-19, 567-1093
NP_418398.2:
I want a perl one-liner that keeps only those lines in file where ":" is followed by number range (that means, here, the lines containing "NP_418345.2:" and "NP_418398.2:" get deleted). For this I have tried:
perl -ni -e "print unless /: \d/" -pi.bak input.txt del input.txt.bak
But it shows exactly same output as the input file.
What will be the exact pattern that I can match here?
Thanks
First, print unless means print if not -- opposite to what you want.
More to the point, it doesn't make sense using both -n and -p, and when you do -p overrides the other. While both of them open the input file(s) and set up the loop over lines, -p also prints $_ for every iteration. So with it you are reprinting every line. See perlrun.
Finally, you seem to be deleting the .bak file ... ? Then don't make it. Use just -i
Altogether
perl -i -ne 'print if /:\s*\d+\s*-\s*\d+/' input.txt
If you do want to keep the backup file use -i.bak instead of -i
You can see the code equivalent to a one-liner with particular options with B::Deparse (via O module)
Try: perl -MO=Deparse -ne 1 and perl -MO=Deparse -pe 1
This way:
perl -i.bak -ne 'print if /:\s+\d+-\d/' input.txt
This:
perl -ne 'print if /:\s*(\d+\s*-\s*\d+\s*,?\s*)+\s*$/' input.txt
Prints:
NP_414685.4: 15-26, 131-138, 441-465
NP_418580.2: 493-500
NP_418780.2: 36-48, 44-66
NP_418473.3: 1-19, 567-1093
I'm not sure if you want to match lines that are possibly like this:
NP_418580.2: 493-500, asdf
or this:
NP_418580.2: asdf
This answer will not print these lines, if given to it.

Only print matching lines in perl from the command line

I'm trying to extract all ip addresses from a file. So far, I'm just using
cat foo.txt | perl -pe 's/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
but this also prints lines that don't contain a match. I can fix this by piping through grep, but this seems like it ought to be unnecessary, and could lead to errors if the regexes don't match up perfectly.
Is there a simpler way to accomplish this?
Try this:
cat foo.txt | perl -ne 'print if s/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
or:
<foo.txt perl -ne 'print if s/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
It's the shortest alternative I can think of while still using Perl.
However this way might be more correct:
<foo.txt perl -ne 'if (/((\d{1,3}\.){3}\d{1,3})/) { print $1 . "\n" }'
If you've got grep, then just call grep directly:
grep -Po "(\d{1,3}\.){3}\d{1,3}" foo.txt
You've already got a suitable answer of using grep to extract the IP addresses, but just to explain why you were seeing non-matches being printed:
perldoc perlrun will tell you about all the options you can pass Perl on the command line.
Quoting from it:
-p causes Perl to assume the following loop around your program, which makes it
iterate over filename arguments somewhat like sed:
LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
You could have used the -n switch instead, which does similar, but does not automatically print, for example:
cat foo.txt | perl -ne '/((?:\d{1,3}\.){3}\d{1,3})/ and print $1'
Also, there's no need to use cat; Perl will open and read the filenames you give it, so you could say e.g.:
perl -ne '/((?:\d{1,3}\.){3}\d{1,3})/ and print $1' foo.txt
ruby -0777 -ne 'puts $_.scan(/((?:\d{1,3}\.){3}\d{1,3})/)' file

Perl regex to act on a file from the command line

In a file, say xyz.txt i want to replace the pattern of any number followed by a dot example:1.,2.,10.,11. etc.. with a whitespace.
How to compose a perl command on the command line to act on the file to do the above, what should be the regex to be used ?
Please Help
Thank You.
This HAS to be a Perl oneliner?
perl -i -pe 's/\d+\./ /g' <fileName>
The Perl command line options: -i is used to specify what happens to the input file. If you don't give it a file extension, the original file is lost and is replaced by the Perl munged output. For example, if I had this:
perl -i.bak -pe 's/\d+\./ /g' <fileName>
The original file would be stored with a .bak suffix and <fileName> itself would contain your output.
The -p means to enclose your Perl program in a print loop that looks SOMEWHAT like this:
while ($_ = <>) {
<Your Perl one liner>
print "$_";
}
This is a somewhat simplified explanation what's going on. You can see the actual perl loop by doing a perldoc perlrun from the command line. The main idea is that it allows you to act on each line of a file just like sed or awk.
The -e simply contains your Perl command.
You can also do file redirection too:
perl -pe 's/\d+\./ /g' < xyz.txt > xyz.txt.out
Answer (not tested):
perl -ipe "s/\d+\./ /g" xyz.txt
Both
perl -ipe "s/\d+\./ /g" xyz.txt
and
perl -pie
cannot execute on my system.
I use the following order:
perl -i -pe

perl one liner alternative to this bash "chain"?

I am trying to comprehend Perl following the way describe in the book "Minimal Perl".
I've uploaded all source txt files onto my own server : results folder
I got the output from using several bash commands in a "chain" like this:
cat run*.txt | grep '^Bank[[:space:]]Balance'|cut -d ':' -f2 | grep -E '\$[0-9]+'
I know this is far from the most concise and efficient, but at least it works...
As our uni subject now moves onto the Perl part, I'd like to know if there is a way to get the same results in one line?
I am trying something like the following code but stuck in the middle:
Chenxi Mao#chenxi-a6b123bb /cygdrive/c/eMarket/output
$ perl -wlne 'print; if $n=~/^Bank Balance/'
syntax error at -e line 1, near "if $n"
Execution of -e aborted due to compilation errors.
you shouldn't have a ; after the print. So
perl -wlne 'print $1 if $n=~/^Bank Balance\s*:\s*(\d+)/'
perl -F/\:/ -ane 'print $F[1]."\n" if /Bank Balance/ && $F[1]!~/\$-/' run*.txt
also here's a short version of your bash command, using just awk
awk -F": " '/Bank[ \t]*Balance/&& $2!~/\$-/{print $2}' run*.txt

DOS to UNIX path substitution within a file

I have a file that contains this kind of paths:
C:\bad\foo.c
C:\good\foo.c
C:\good\bar\foo.c
C:\good\bar\[variable subdir count]\foo.c
And I would like to get the following file:
C:\bad\foo.c
C:/good/foo.c
C:/good/bar/foo.c
C:/good/bar/[variable subdir count]/foo.c
Note that the non matching path should not be modified.
I know how to do this with sed for a fixed number of subdir, but a variable number is giving me trouble. Actually, I would have to use many s/x/y/ expressions (as many as the max depth... not very elegant).
May be with awk, but this kind of magic is beyond my skills.
FYI, I need this trick to correct some gcov binary files on a cygwin platform.
I am dealing with binary files; therefore, I might have the following kind of data:
bindata\bindata%bindataC:\good\foo.c
which should be translated as:
bindata\bindata%bindataC:/good/foo.c
The first \ must not be translated, despite that it is on the same line.
However, I have just checked my .gcno files while editing this text and it looks like all the paths are flanked with zeros, so most of the answers below should fit.
sed -e '/^C:\\good/ s/\\/\//g' input_file.txt
I would recommend you look into the cygpath utility, which converts path names from one format to another. For instance on my machine:
$ cygpath `pwd`
/home/jericson
$ cygpath -w `pwd`
D:\root\home\jericson
$ cygpath -m `pwd`
D:/root/home/jericson
Here's a Perl implementation of what you asked for:
$ echo 'C:\bad\foo.c
C:\good\foo.c
C:\good\bar\foo.c
C:\good\bar\[variable subdir count]\foo.c' | perl -pe 's|\\|/|g if /good/'
C:\bad\foo.c
C:/good/foo.c
C:/good/bar/foo.c
C:/good/bar/[variable subdir count]/foo.c
It works directly with the string, so it will work anywhere. You could combine it with cygpath, but it only works on machines that have that path:
perl -pe '$_ = `cygpath -m $_` if /good/'
(Since I don't have C:\good on my machine, I get output like C:goodfoo.c. If you use a real path on your machine, it ought to work correctly.)
You want to substitute '/' for all '\' but only on the lines that match the good directory path. Both sed and awk will let you do this by having a LHS (matching) expression that only picks the lines with the right path.
A trivial sed script to do this would look like:
/[Cc]:\\good/ s/\\/\//g
For a file:
c:\bad\foo
c:\bad\foo\bar
c:\good\foo
c:\good\foo\bar
You will get the output below:
c:\bad\foo
c:\bad\foo\bar
c:/good/foo
c:/good/foo/bar
Here's how I would do it in awk:
# fixpaths.awk
/C:\\good/ {
gsub(/\\/,"/",$1);
print $1 >> outfile;
}
Then run it using the command:
awk -f fixpaths.awk paths.txt; mv outfile paths.txt
Or with some help from good ol' Bash:
#!/bin/bash
cat file | while read LINE
do
if <bad_condition>
then
echo "$LINE" >> newfile
else
echo "$LINE" | sed -e "s/\\/\//g" >> newfile
fi
done
try this
sed -re '/\\good\\/ s/\\/\//g' temp.txt
or this
awk -F"\\" '{if($2=="good"){OFS="\/"; $1=$1;} print $0}' temp.txt