I'm using \D to not display digits but why the digits are being displayed using perl regular expressions?
Here's the content of the text2.tx file
1. Hello Brue this is a test.
2. Hello Lisa this is a test.
This is a test 1.
This is a test 2.
Here is the perl program.
#!/usr/bin/perl
use strict;
use warnings;
open READFILE,"<", "test2.txt" or die "Unable to open file";
while(<READFILE>)
{
if(/\D/)
{
print;
}
}
/\D/ just checks that the line has at least one non-digit character (including the newline...). Can you explain what you wanted to check? What output you were expecting?
If you want to only print lines that don't have a digit, you want to do:
if ( ! /\d/ )
(does the line not have a digit), not
if ( /\D/ )
(does the line have a non-digit).
Lets take a look at what is going on behind the scenes. Your while loop is equivalent to:
while(defined($_ = <READFILE>))
{
if($_ =~ /\D/)
{
print $_;
}
}
So, you are checking if the line contains a non-digit character (which it does) and then printing that line.
If you want to print Hello Brue this is a test. instead of 1. Hello Brue this is a test., then you would have to use something like:
while(<READFILE>) {
s/^\d+\. //;
print;
}
Also, it would make for more readable code if you used a variable rather than $_.
What you want is to reject lines that have a digit rather than match lines that have a non-digit (as you're doing)
while (<READFILE>) {
print unless /\d/;
}
This will print each line unless it has a digit on it.
Related
I have a uniprot document with a protein sequence as well as some metadata. I need to use perl to match the sequence and print it out but for some reason the last line always comes out two times. The code I wrote is here
#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {
if($_=~m /^\s+(\D+)/) { #this is the pattern I used to match the sequence in the document
$seq=$1;
$seq=~s/\s//g;} #removing the spaces from the sequence
print $seq;
}
I instead tried $seq.=$1; but it printed out the sequence 4.5 times. Im sure i have made a mistake here but not sure what. Here is the input file https://www.uniprot.org/uniprot/P30988.txt
Here is your code reformatted and extra whitespace added between operators to make it clearer what scope the statements are running in.
#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {
if ($_ =~ m /^\s+(\D+)/) {
$seq = $1;
$seq =~ s/\s//g;
}
print $seq;
}
The placement of the print command means that $seq will be printed for every line from the input file -- even those that don't match the regex.
I suspect you want this
#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {
if ($_ =~ m /^\s+(\D+)/) {
$seq = $1;
$seq =~ s/\s//g;
# only print $seq for lines that match with /^\s+(\D+)/
# Also - added a newline to make it easier to debug
print $seq . "\n";
}
}
When I run that I get this
MRFTFTSRCLALFLLLNHPTPILPAFSNQTYPTIEPKPFLYVVGRKKMMDAQYKCYDRMQ
QLPAYQGEGPYCNRTWDGWLCWDDTPAGVLSYQFCPDYFPDFDPSEKVTKYCDEKGVWFK
HPENNRTWSNYTMCNAFTPEKLKNAYVLYYLAIVGHSLSIFTLVISLGIFVFFRSLGCQR
VTLHKNMFLTYILNSMIIIIHLVEVVPNGELVRRDPVSCKILHFFHQYMMACNYFWMLCE
GIYLHTLIVVAVFTEKQRLRWYYLLGWGFPLVPTTIHAITRAVYFNDNCWLSVETHLLYI
IHGPVMAALVVNFFFLLNIVRVLVTKMRETHEAESHMYLKAVKATMILVPLLGIQFVVFP
WRPSNKMLGKIYDYVMHSLIHFQGFFVATIYCFCNNEVQTTVKRQWAQFKIQWNQRWGRR
PSNRSARAAAAAAEAGDIPIYICHQELRNEPANNQGEESAEIIPLNIIEQESSA
You can simplify this a bit:
while (<IN>) {
next unless m/^\s/;
s/\s+//g;
print;
}
You want the lines that begin with whitespace, so immediately skip those that don't. Said another way, quickly reject things you don't want, which is different than accepting things you do want. This means that everything after the next knows it's dealing with a good line. Now the if disappears.
You don't need to get a capture ($1) to get the interesting text because the only other text in the line is the leading whitespace. That leading whitespace disappears when you remove all the whitespace. This gets rid of the if and the extra variable.
Finally, print what's left. Without an argument, print uses the value in the topic variable $_.
Now that's much more manageable. You escape that scoping issue with if causing the extra output because there's no scope to worry about.
I've a text file which has equal symbol as shown below many times the start of the line. How can i extract such a line. I"ve tried the below code, but its not working. ANy clues as to why its not matching?
Text file line:
[==========] 10 tests from 4 test cases ran. (43950 ms total)
Code:
if (/^\Q[==========]\E/ .. /^\Qran\)\E/) {
print "$i.Match Found:".$_."\n";
$i++;
}
Try this, have not tested, but should work. I HAVE tested the regex and it works.
#!/usr/bin/perl
use strict;
use warnings;
open (somefile, 'data.txt');
while(<somefile>) {
chomp;
if ( $_ =~ m/^\[==========\]/ ) {
print "Match found: ";
}
}
close (somefile);
For clarification purposes; chomp removes new lines from end of the line, and is not essential in this case.
#!/usr/bin/perl
# your code goes here
use strict;
use warnings;
while(chomp(my $line = <DATA>)) {
if ( $line =~ m$^\[=.*?]$ ){
print "Line which starts with [==] is $line\n";
}
}
__DATA__
[==========] 10 tests from 4 test cases ran. (43950 ms total)
A line without the equal signs at the beginning
[==========] 4 tests from 2 test cases ran. (30950 ms total)
[===]A line with equal signs at beginning.
Demo
You're using the flip-flop operator which will match lines starting from the first regex and ending at the second one (or the end of the data). From the regex you're using, I don't think this is your intention.
To match a line starting with [==========] and extract everything up to the word ran you need to use a capture group:
if (/^\Q[==========]\E(.*?ran)/) {
print "$. Match Found: $1\n";
}
The brackets match any character up to and including ran then place them in the special $1 variable. Note also the use of $., the current line number, to save you keeping count with $i.
If you wanted to extract just the numbers you could use:
if (/^\Q[==========]\E (\d+) tests from (\d+) test cases ran/) {
print "$. Match Found: $1 $2\n";
}
I'm trying to take a file INPUT and, if a line in that file contains a string, replace the line with something else (the entire line, including line breaks), or nothing at all (remove the line like it wasn't there). Writing all this to a new file .
Here's that section of code...
while(<INPUT>){
if ($_ =~ / <openTag>/){
chomp;
print OUTPUT "Some_Replacement_String";
} elsif ($_ =~ / <\/closeTag>/) {
chomp;
print OUTPUT ""; #remove the line
} else {
chomp;
print OUTPUT "$_\r\n"; #print the original line
}
}
while(<INPUT>) should read one line at a time (if my understanding is correct) and store each line in the special variable $_
However, when I run the above code I get only the very first if statement condition returned Some_Replacement_String, and only once. (1 line, out of a file with 1.3m, and expecting 600,000 replacements). This obviously isn't the behavior I expect. If I do something like while(<INPUT>){print OUTPUT $_;) I get a copy of the entire file, every line, so I know the entire file is being read (expected behavior).
What I'm trying to do is get a line, test it, do something with it, and move on to the next one.
If it helps with troubleshooting at all, if I use print $.; anywhere in that while statement (or after it), I get 1 returned. I expected this to be the "Current line number for the last filehandle accessed.". So by the time my while statement loops through the entire file, it should be equal to the number of lines in the file, not 1.
I've tried a few other variations of this code, but I think this is the closest I've come. I assume there's a good reason I'm not getting the behavior I expect, can anyone tell me what it is?
The problem you are describing indicates that your input file only contains one line. This may be because of a great many different things, such as:
You have changed the input record separator $/
Your input file does not contain the correct line endings
You are running your script with -0777 switch
Some notes on your code:
if ($_ =~ / <openTag>/){
chomp;
print OUTPUT "Some_Replacement_String";
No need to chomp a line you are not using.
} elsif ($_ =~ / <\/closeTag>/) {
chomp;
print OUTPUT "";
This is quite redundant. You don't need to print an empty string (ever, really), and chomp a value you're not using.
} else {
chomp;
print OUTPUT "$_\r\n"; #print the original line
No need to remove newlines and then put them back. Also, normally you would use \n as your line ending, even on windows.
And, since you are chomping in every if-else clause, you might as well move that outside the entire if-block.
chomp;
if (....) {
But since you are never relying on line endings not being there, why bother using chomp at all?
When using the $_ variable, you can abbreviate some commands, such as you are doing with chomp. For example, a lone regex will be applied to $_:
} elsif (/ <\/closeTag>/) { # works splendidly
When, like above, you have a regex that contains slashes, you can choose another delimiter for your regex, so that you do not need to escape the slashes:
} elsif (m# </closeTag>#) {
But then you need to use the full notation of the m// operator, with the m in front.
So, in short
while(<INPUT>){
if (/ <openTag>/){
print OUTPUT "Some_Replacement_String";
} elsif (m# </closeTag>#) {
# do nothing
} else {
print OUTPUT $_; # print the original line
}
}
And of course, the last two can be combined into one, with some negation logic:
} elsif (not m# </closeTag>#) {
print OUTPUT $_;
}
Suppose I have a file with these inputs:
line 1
line 2
line3
My program should only store "line1", "line2" and "line3" not the newlines. How do I achieve that?
My program already removed leading and trailing whitespaces but it doesn't help to remove newline.
I am setting $/ as \n because each input is separated by a \n.
while (<>) {
chomp;
next unless /\S/;
print "$_\n";
}
Set
$/ = q(); # that's an empty string, like "" or ''
while (<>) {
chomp;
...
}
The special value of the defined empty string is how you tell the input operator to treat one or more newlines as the terminator (preferring more), and also to get chomp to remove them all. That way each record always starts with real data.
Perl -n is the equivalent of wrapping while(<>) { } around your script. Assuming that all you need to do is eliminate blank lines, you can do it like this:
#! /usr/bin/perl -n
print unless ( /^$/ );
... On the other hand, if that's all you need to do, you might as well ditch perl and use
grep -n '^$'
Edit: your post says that you want to store values where lines are not blank... in that case, assuming that you don't have too much work to do in the rest of your script, you might do something like this:
#! /usr/bin/perl -n
my #values;
push #values, $_ unless ( /^$/ );
END {
# do whatever work you want to do here
}
... but this quickly reaches a point of limiting returns if you have very much code inside the END{} block.
I have one string of line like
comments:[I#1278327] is related to office communicator.i fixed the bug to declare it null at first time.
Here I am searching index of I#then I want the whole word means [I#1278327]. I'm doing it like this:
open(READ1,"<letter.txt");
while(<READ1>)
{
if(index($_,"I#")!=-1)
{
$indexof=index($_,"I#");
print $indexof,"\n";
$string=substr($_,$indexof);##i m cutting that string first from index of I# to end then...
$string=substr($string,0,index($string," "));
$lengthof=length($string);
print $lengthof,"\n";
print $string,"\n";
print $_,"\n";
}
}
Is any API is there in perl to find the word length directly after finding the index of I# in that line.
You could do something like:
$indexof=index($_,"I#");
$index2 = index($_,' ',$indexof);
$lengthof = $index2 - $indexof;
However, the bigger issue is you are using Perl as if it were BASIC. A more perlish approach to the task of printing selected lines:
use strict;
use warnings;
open my $read, '<', 'letter.txt'; # safer version of open
LINE:
while (<$read>) {
print "$1 - $_" if (/(I#.*?) /);
}
I would use a regex instead, a regex will allow you to match a pattern ("I#") and also capture other data from the string:
$_ =~ m/I#(\d+)/;
The line above will match and set $1 to the number.
See perldoc perlre