Perl regex to act on a file from the command line - perl

In a file, say xyz.txt i want to replace the pattern of any number followed by a dot example:1.,2.,10.,11. etc.. with a whitespace.
How to compose a perl command on the command line to act on the file to do the above, what should be the regex to be used ?
Please Help
Thank You.

This HAS to be a Perl oneliner?
perl -i -pe 's/\d+\./ /g' <fileName>
The Perl command line options: -i is used to specify what happens to the input file. If you don't give it a file extension, the original file is lost and is replaced by the Perl munged output. For example, if I had this:
perl -i.bak -pe 's/\d+\./ /g' <fileName>
The original file would be stored with a .bak suffix and <fileName> itself would contain your output.
The -p means to enclose your Perl program in a print loop that looks SOMEWHAT like this:
while ($_ = <>) {
<Your Perl one liner>
print "$_";
}
This is a somewhat simplified explanation what's going on. You can see the actual perl loop by doing a perldoc perlrun from the command line. The main idea is that it allows you to act on each line of a file just like sed or awk.
The -e simply contains your Perl command.
You can also do file redirection too:
perl -pe 's/\d+\./ /g' < xyz.txt > xyz.txt.out

Answer (not tested):
perl -ipe "s/\d+\./ /g" xyz.txt

Both
perl -ipe "s/\d+\./ /g" xyz.txt
and
perl -pie
cannot execute on my system.
I use the following order:
perl -i -pe

Related

perl one-liner to keep only desired lines

I have a text file (input.txt) like this:
NP_414685.4: 15-26, 131-138, 441-465
NP_418580.2: 493-500
NP_418780.2: 36-48, 44-66
NP_418345.2:
NP_418473.3: 1-19, 567-1093
NP_418398.2:
I want a perl one-liner that keeps only those lines in file where ":" is followed by number range (that means, here, the lines containing "NP_418345.2:" and "NP_418398.2:" get deleted). For this I have tried:
perl -ni -e "print unless /: \d/" -pi.bak input.txt del input.txt.bak
But it shows exactly same output as the input file.
What will be the exact pattern that I can match here?
Thanks
First, print unless means print if not -- opposite to what you want.
More to the point, it doesn't make sense using both -n and -p, and when you do -p overrides the other. While both of them open the input file(s) and set up the loop over lines, -p also prints $_ for every iteration. So with it you are reprinting every line. See perlrun.
Finally, you seem to be deleting the .bak file ... ? Then don't make it. Use just -i
Altogether
perl -i -ne 'print if /:\s*\d+\s*-\s*\d+/' input.txt
If you do want to keep the backup file use -i.bak instead of -i
You can see the code equivalent to a one-liner with particular options with B::Deparse (via O module)
Try: perl -MO=Deparse -ne 1 and perl -MO=Deparse -pe 1
This way:
perl -i.bak -ne 'print if /:\s+\d+-\d/' input.txt
This:
perl -ne 'print if /:\s*(\d+\s*-\s*\d+\s*,?\s*)+\s*$/' input.txt
Prints:
NP_414685.4: 15-26, 131-138, 441-465
NP_418580.2: 493-500
NP_418780.2: 36-48, 44-66
NP_418473.3: 1-19, 567-1093
I'm not sure if you want to match lines that are possibly like this:
NP_418580.2: 493-500, asdf
or this:
NP_418580.2: asdf
This answer will not print these lines, if given to it.

Perl deleting "blank" lines from a csv file

I'm looking to delete blank lines in a CSV file, using Perl.
I'm not too sure how to do this, as these lines aren't exactly "blank" (they're just a bunch of commas).
I'd also like to save the output as a file of the same name, overwriting the original.
How could I go about doing this?
edit: I can't use modules or any source code due to network restrictions...
You can do this using a simple Perl one-liner:
perl -i -ne 'print unless /^[,\s]*$/' <filename>
The -n flag assumes this loop around your program:
while(<>) {
print unless /^[,\s]*$/;
}
and the -i flag means inplace and modifies your input file.
Note: If you are worried about losing your data with -i, you can specify -i.bak and perl will automatically write the original file to your <filename>.bak
More of a command line hack,
perl -i -ne 'print if /[^,\r\n]/' file.csv
If you want to put it inside a shell script you can do this ...
#!/bin/sh
$(perl -i -n -e 'print $_ unless ($_ =~ /^\,+$/);' $*)

How can I delete a line in file if the line matched the required PATH, in Perl?

My target is to delete line in file only if PATH match the PATH in the file
For example, I need to delete all lines that have /etc/sysconfig PATH from /tmp/file file
more /tmp/file
/etc/sysconfig/network-scripts/ifcfg-lo file1
/etc/sysconfig/network-scripts/ifcfg-lo file2
/etc/sysconfig/network-scripts/ifcfg-lo file3
I write the following Perl code (the perl code integrated in my bash script) in order to delete lines that have "/etc/sysconfig"
export FILE=/etc/sysconfig
perl -i -pe 's/\Q$ENV{FILE}\E// ' /tmp/file
But I get the following after I run the perl code: (in place to get empty lines)
/network-scripts/ifcfg-lo file1
/network-scripts/ifcfg-lo file2
/network-scripts/ifcfg-lo file3
first question:
How to change the perl syntax : perl -i -pe 's/\Q$ENV{FILE }\E// ' in order to delete line that matches the required PATH (/etc/sysconfig)?
second question:
The same as the first question but line will deleted only if PATH match the first field in the file
Example:
/tmp/file before perl edit:
file1 /etc/sysconfig/network-scripts/ifcfg-lo
/etc/sysconfig/network-scripts/ifcfg-lo file2
/etc/sysconfig/network-scripts/ifcfg-lo file3
/tmp/file after perl edit:
file1 /etc/sysconfig/network-scripts/ifcfg-lo
Perl is a fine way to do it. Use the -n switch, not -p.
perl -i -l -n -e'print unless /\Q$ENV{FILE}/' filename
s/pattern/otherpattern/ won't delete entire lines; it will only alter substrings. You need to entirely change your program to delete entire lines. In pseudocode, it would be:
while (read in a line)
{
if (doesn't match)
{
write the line back out unaltered.
}
}
It can still be rewritten as a oneliner though, with knowledge of how continue and redo work in loops: perl -pe '$_ = <> and redo if /Q$ENV{FILE}\E/'
mef#iwlappy:~$ cat /tmp/file
aaaa
/etc/sysconfig/network-scripts/ifcfg-lofile1
/etc/sysconfig/network-scripts/ifcfg-lofile2
/etc/sysconfig/network-scripts/ifcfg-lofile3
aaa
mef#iwlappy:~$ perl -i -pe 's/$ENV{FILE}\E.*//' /tmp/file
mef#iwlappy:~$ cat /tmp/file
aaaa
aaa
You can do a further regexp to remove empty lines with s/^$//
If I were doing this from the command line, I probably wouldn't even use Perl. I'd just use a negated grep:
$ mv old.txt old.bak; grep -v $FILE old.bak > old.txt
Renaming the original file and writing to a new file with the old name is the same thing that perl's -i switch does for you.
If you want to match just the first column, then I might punt to perl so I don't have to use awk or cut. perl's -a switch splits the line on whitespace and puts the results in #F:
$ perl -ai.bak -ne 'print if $F[0] !~ /^\Q$ENV{FILE}/' old.txt
When you think you have it right, you can remove the .bak training wheels that saves a copy of your original file. Or not. I tend to like the safety net.
See perlrun for the details of command-line switches.

Perl: Loop through a file and substitute

I simply wanna read in a logfile, do a search and replace, and then write out the changes to that same logfile.
What's the best practice way of doing this in Perl?
I normally code up a one liner for this:
perl -i -pe 's/some/thing/' log.file
See Here
This is often done with a one-liner:
perl -pi.bak -e "s/find/replace/g" <file>
Note the -i.bak portion -- this creates a backup file with the extension .bak. If you want to play without a net you can do this to overwrite the existing file without a backup:
perl -pi -e "s/find/replace/g" <file>
or you can use sed (I know... you asked about perl):
sed -i 's/find/replace/g' <file>

DOS to UNIX path substitution within a file

I have a file that contains this kind of paths:
C:\bad\foo.c
C:\good\foo.c
C:\good\bar\foo.c
C:\good\bar\[variable subdir count]\foo.c
And I would like to get the following file:
C:\bad\foo.c
C:/good/foo.c
C:/good/bar/foo.c
C:/good/bar/[variable subdir count]/foo.c
Note that the non matching path should not be modified.
I know how to do this with sed for a fixed number of subdir, but a variable number is giving me trouble. Actually, I would have to use many s/x/y/ expressions (as many as the max depth... not very elegant).
May be with awk, but this kind of magic is beyond my skills.
FYI, I need this trick to correct some gcov binary files on a cygwin platform.
I am dealing with binary files; therefore, I might have the following kind of data:
bindata\bindata%bindataC:\good\foo.c
which should be translated as:
bindata\bindata%bindataC:/good/foo.c
The first \ must not be translated, despite that it is on the same line.
However, I have just checked my .gcno files while editing this text and it looks like all the paths are flanked with zeros, so most of the answers below should fit.
sed -e '/^C:\\good/ s/\\/\//g' input_file.txt
I would recommend you look into the cygpath utility, which converts path names from one format to another. For instance on my machine:
$ cygpath `pwd`
/home/jericson
$ cygpath -w `pwd`
D:\root\home\jericson
$ cygpath -m `pwd`
D:/root/home/jericson
Here's a Perl implementation of what you asked for:
$ echo 'C:\bad\foo.c
C:\good\foo.c
C:\good\bar\foo.c
C:\good\bar\[variable subdir count]\foo.c' | perl -pe 's|\\|/|g if /good/'
C:\bad\foo.c
C:/good/foo.c
C:/good/bar/foo.c
C:/good/bar/[variable subdir count]/foo.c
It works directly with the string, so it will work anywhere. You could combine it with cygpath, but it only works on machines that have that path:
perl -pe '$_ = `cygpath -m $_` if /good/'
(Since I don't have C:\good on my machine, I get output like C:goodfoo.c. If you use a real path on your machine, it ought to work correctly.)
You want to substitute '/' for all '\' but only on the lines that match the good directory path. Both sed and awk will let you do this by having a LHS (matching) expression that only picks the lines with the right path.
A trivial sed script to do this would look like:
/[Cc]:\\good/ s/\\/\//g
For a file:
c:\bad\foo
c:\bad\foo\bar
c:\good\foo
c:\good\foo\bar
You will get the output below:
c:\bad\foo
c:\bad\foo\bar
c:/good/foo
c:/good/foo/bar
Here's how I would do it in awk:
# fixpaths.awk
/C:\\good/ {
gsub(/\\/,"/",$1);
print $1 >> outfile;
}
Then run it using the command:
awk -f fixpaths.awk paths.txt; mv outfile paths.txt
Or with some help from good ol' Bash:
#!/bin/bash
cat file | while read LINE
do
if <bad_condition>
then
echo "$LINE" >> newfile
else
echo "$LINE" | sed -e "s/\\/\//g" >> newfile
fi
done
try this
sed -re '/\\good\\/ s/\\/\//g' temp.txt
or this
awk -F"\\" '{if($2=="good"){OFS="\/"; $1=$1;} print $0}' temp.txt