perl one-liner to keep only desired lines - perl

I have a text file (input.txt) like this:
NP_414685.4: 15-26, 131-138, 441-465
NP_418580.2: 493-500
NP_418780.2: 36-48, 44-66
NP_418345.2:
NP_418473.3: 1-19, 567-1093
NP_418398.2:
I want a perl one-liner that keeps only those lines in file where ":" is followed by number range (that means, here, the lines containing "NP_418345.2:" and "NP_418398.2:" get deleted). For this I have tried:
perl -ni -e "print unless /: \d/" -pi.bak input.txt del input.txt.bak
But it shows exactly same output as the input file.
What will be the exact pattern that I can match here?
Thanks

First, print unless means print if not -- opposite to what you want.
More to the point, it doesn't make sense using both -n and -p, and when you do -p overrides the other. While both of them open the input file(s) and set up the loop over lines, -p also prints $_ for every iteration. So with it you are reprinting every line. See perlrun.
Finally, you seem to be deleting the .bak file ... ? Then don't make it. Use just -i
Altogether
perl -i -ne 'print if /:\s*\d+\s*-\s*\d+/' input.txt
If you do want to keep the backup file use -i.bak instead of -i
You can see the code equivalent to a one-liner with particular options with B::Deparse (via O module)
Try: perl -MO=Deparse -ne 1 and perl -MO=Deparse -pe 1

This way:
perl -i.bak -ne 'print if /:\s+\d+-\d/' input.txt

This:
perl -ne 'print if /:\s*(\d+\s*-\s*\d+\s*,?\s*)+\s*$/' input.txt
Prints:
NP_414685.4: 15-26, 131-138, 441-465
NP_418580.2: 493-500
NP_418780.2: 36-48, 44-66
NP_418473.3: 1-19, 567-1093
I'm not sure if you want to match lines that are possibly like this:
NP_418580.2: 493-500, asdf
or this:
NP_418580.2: asdf
This answer will not print these lines, if given to it.

Related

Perl deleting "blank" lines from a csv file

I'm looking to delete blank lines in a CSV file, using Perl.
I'm not too sure how to do this, as these lines aren't exactly "blank" (they're just a bunch of commas).
I'd also like to save the output as a file of the same name, overwriting the original.
How could I go about doing this?
edit: I can't use modules or any source code due to network restrictions...
You can do this using a simple Perl one-liner:
perl -i -ne 'print unless /^[,\s]*$/' <filename>
The -n flag assumes this loop around your program:
while(<>) {
print unless /^[,\s]*$/;
}
and the -i flag means inplace and modifies your input file.
Note: If you are worried about losing your data with -i, you can specify -i.bak and perl will automatically write the original file to your <filename>.bak
More of a command line hack,
perl -i -ne 'print if /[^,\r\n]/' file.csv
If you want to put it inside a shell script you can do this ...
#!/bin/sh
$(perl -i -n -e 'print $_ unless ($_ =~ /^\,+$/);' $*)

Only print matching lines in perl from the command line

I'm trying to extract all ip addresses from a file. So far, I'm just using
cat foo.txt | perl -pe 's/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
but this also prints lines that don't contain a match. I can fix this by piping through grep, but this seems like it ought to be unnecessary, and could lead to errors if the regexes don't match up perfectly.
Is there a simpler way to accomplish this?
Try this:
cat foo.txt | perl -ne 'print if s/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
or:
<foo.txt perl -ne 'print if s/.*?((\d{1,3}\.){3}\d{1,3}).*/\1/'
It's the shortest alternative I can think of while still using Perl.
However this way might be more correct:
<foo.txt perl -ne 'if (/((\d{1,3}\.){3}\d{1,3})/) { print $1 . "\n" }'
If you've got grep, then just call grep directly:
grep -Po "(\d{1,3}\.){3}\d{1,3}" foo.txt
You've already got a suitable answer of using grep to extract the IP addresses, but just to explain why you were seeing non-matches being printed:
perldoc perlrun will tell you about all the options you can pass Perl on the command line.
Quoting from it:
-p causes Perl to assume the following loop around your program, which makes it
iterate over filename arguments somewhat like sed:
LINE:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
You could have used the -n switch instead, which does similar, but does not automatically print, for example:
cat foo.txt | perl -ne '/((?:\d{1,3}\.){3}\d{1,3})/ and print $1'
Also, there's no need to use cat; Perl will open and read the filenames you give it, so you could say e.g.:
perl -ne '/((?:\d{1,3}\.){3}\d{1,3})/ and print $1' foo.txt
ruby -0777 -ne 'puts $_.scan(/((?:\d{1,3}\.){3}\d{1,3})/)' file

Perl regex to act on a file from the command line

In a file, say xyz.txt i want to replace the pattern of any number followed by a dot example:1.,2.,10.,11. etc.. with a whitespace.
How to compose a perl command on the command line to act on the file to do the above, what should be the regex to be used ?
Please Help
Thank You.
This HAS to be a Perl oneliner?
perl -i -pe 's/\d+\./ /g' <fileName>
The Perl command line options: -i is used to specify what happens to the input file. If you don't give it a file extension, the original file is lost and is replaced by the Perl munged output. For example, if I had this:
perl -i.bak -pe 's/\d+\./ /g' <fileName>
The original file would be stored with a .bak suffix and <fileName> itself would contain your output.
The -p means to enclose your Perl program in a print loop that looks SOMEWHAT like this:
while ($_ = <>) {
<Your Perl one liner>
print "$_";
}
This is a somewhat simplified explanation what's going on. You can see the actual perl loop by doing a perldoc perlrun from the command line. The main idea is that it allows you to act on each line of a file just like sed or awk.
The -e simply contains your Perl command.
You can also do file redirection too:
perl -pe 's/\d+\./ /g' < xyz.txt > xyz.txt.out
Answer (not tested):
perl -ipe "s/\d+\./ /g" xyz.txt
Both
perl -ipe "s/\d+\./ /g" xyz.txt
and
perl -pie
cannot execute on my system.
I use the following order:
perl -i -pe

How can I delete a line in file if the line matched the required PATH, in Perl?

My target is to delete line in file only if PATH match the PATH in the file
For example, I need to delete all lines that have /etc/sysconfig PATH from /tmp/file file
more /tmp/file
/etc/sysconfig/network-scripts/ifcfg-lo file1
/etc/sysconfig/network-scripts/ifcfg-lo file2
/etc/sysconfig/network-scripts/ifcfg-lo file3
I write the following Perl code (the perl code integrated in my bash script) in order to delete lines that have "/etc/sysconfig"
export FILE=/etc/sysconfig
perl -i -pe 's/\Q$ENV{FILE}\E// ' /tmp/file
But I get the following after I run the perl code: (in place to get empty lines)
/network-scripts/ifcfg-lo file1
/network-scripts/ifcfg-lo file2
/network-scripts/ifcfg-lo file3
first question:
How to change the perl syntax : perl -i -pe 's/\Q$ENV{FILE }\E// ' in order to delete line that matches the required PATH (/etc/sysconfig)?
second question:
The same as the first question but line will deleted only if PATH match the first field in the file
Example:
/tmp/file before perl edit:
file1 /etc/sysconfig/network-scripts/ifcfg-lo
/etc/sysconfig/network-scripts/ifcfg-lo file2
/etc/sysconfig/network-scripts/ifcfg-lo file3
/tmp/file after perl edit:
file1 /etc/sysconfig/network-scripts/ifcfg-lo
Perl is a fine way to do it. Use the -n switch, not -p.
perl -i -l -n -e'print unless /\Q$ENV{FILE}/' filename
s/pattern/otherpattern/ won't delete entire lines; it will only alter substrings. You need to entirely change your program to delete entire lines. In pseudocode, it would be:
while (read in a line)
{
if (doesn't match)
{
write the line back out unaltered.
}
}
It can still be rewritten as a oneliner though, with knowledge of how continue and redo work in loops: perl -pe '$_ = <> and redo if /Q$ENV{FILE}\E/'
mef#iwlappy:~$ cat /tmp/file
aaaa
/etc/sysconfig/network-scripts/ifcfg-lofile1
/etc/sysconfig/network-scripts/ifcfg-lofile2
/etc/sysconfig/network-scripts/ifcfg-lofile3
aaa
mef#iwlappy:~$ perl -i -pe 's/$ENV{FILE}\E.*//' /tmp/file
mef#iwlappy:~$ cat /tmp/file
aaaa
aaa
You can do a further regexp to remove empty lines with s/^$//
If I were doing this from the command line, I probably wouldn't even use Perl. I'd just use a negated grep:
$ mv old.txt old.bak; grep -v $FILE old.bak > old.txt
Renaming the original file and writing to a new file with the old name is the same thing that perl's -i switch does for you.
If you want to match just the first column, then I might punt to perl so I don't have to use awk or cut. perl's -a switch splits the line on whitespace and puts the results in #F:
$ perl -ai.bak -ne 'print if $F[0] !~ /^\Q$ENV{FILE}/' old.txt
When you think you have it right, you can remove the .bak training wheels that saves a copy of your original file. Or not. I tend to like the safety net.
See perlrun for the details of command-line switches.

What am I doing wrong in this Perl one-liner?

I have a file that contains a lot of these
"/watch?v=VhsnHIUMQGM"
and I would like to output the letter code using a perl one-liner. So I try
perl -nle 'm/\"\/watch\?v=(.*?)\"/g' filename.txt
but it doesn't print anything.
What am I doing wrong?
The -n option processes each line but doesn't print anything out. So you need to add an explicit print if you successfully match.
perl -ne 'while ( m/\"\/watch\?v=(.+?)\"/g ) { print "$1\n" }' filename.txt
Another approach, if you're sure every line will match, is to use the -p option which prints out the value of $_ after processing, e.g.:
perl -pe 's/\"\/watch\?v=(.+?)\"/$1//' filename.txt
Your regex is fine. You're getting no output because the -n option won't print anything. It simply wraps a while (<>) { ... } loop around your program (run perl --help for brief explanations of the Perl options).
The following uses your regex, but add some printing. In list context, regexes with the /g option return all captures. Effectively, we print each capture.
perl -nle 'print for m/\"\/watch\?v=(.*?)\"/g' data.dat
You can split the string on "=" instead of matching:
perl -paF= -e '$_= #F[1]' filename.txt