I'm iterating through a file and after some condition I have to step back by a line
when file line match the regexp, the second while loop goes in and it iterates over a file until it match while's condition, after than my code have to STEP BACK by 1 line!
while(my $line = <FL>){
if($line =~ /some regexp/){
while($line =~ /^\+/){
$line = <FL>; #Step into next line
}
seek(FL, -length($line), 1); #This should get me back the previous line
#Some tasks with previous line
}
}
actually seek should work but it doesn't, it return me the same line... What is the problem?
When you read from a filehandle, it has already advanced to the next line. Therefore if you go back the length of the current line, all you're doing is setting up to read the line over again.
Also, relating the length of a line to its length on disk assumes the encoding is :raw instead of :crlf or some other format. This is a big assumption.
What you need are state variables to keep track of your past values. There is no need to literally roll back a file handle.
The following is a stub of what you might be aiming to do:
use strict;
use warnings;
my #buffer;
while (<DATA>) {
if (my $range = /some regexp/ ... !/^\+/) {
if ($range =~ /E/) { # Last Line of range
print #buffer;
}
}
# Save a buffer of last 4 lines
push #buffer, $_;
shift #buffer if #buffer > 4;
}
__DATA__
stuff
more stuff
some regexp
+ a line
+ another line
+ last line
break out
more stuff
ending stuff
Output:
some regexp
+ a line
+ another line
+ last line
What about something like: (as an alternative)
open(my $fh, '<', "$file") or die $!;#use three argument open
my $previous_line = q{}; #initially previous line would be empty
while(my $current_line = <$fh>){
chomp;
print"$current_line\n";
print"$previous_line\n";
#assign current line into previous line before it go to next line
$previous_line = $current_line;
}
close($fh);
Related
I want to split parts of a file. Here is what the start of the file looks like (it continues in same way):
Location Strand Length PID Gene
1..822 + 273 292571599 CDS001
906..1298 + 130 292571600 trxA
I want to split in Location column and subtract 822-1 and do the same for every row and add them all together. So that for these two results the value would be: (822-1)+1298-906) = 1213
How?
My code right now, (I don't get any output at all in the terminal, it just continue to process forever):
use warnings;
use strict;
my $infile = $ARGV[0]; # Reading infile argument
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $line2 = <$IN>;
my $coding = 0; # Initialize coding variable
while(my $line = $line2){ # reading the file line by line
# TODO Use split and do the calculations
my #row = split(/\.\./, $line);
my #row2 = split(/\D/, $row[1]);
$coding += $row2[0]- $row[0];
}
print "total amount of protein coding DNA: $coding\n";
So what I get from my code if I put:
print "$coding \n";
at the end of the while loop just to test is:
821
1642
And so the first number is correct (822-1) but the next number doesn't make any sense to me, it should be (1298-906). What I want in the end outside the loop:
print "total amount of protein coding DNA: $coding\n";
is the sum of all the subtractions of every line i.e. 1213. But I don't get anything, just a terminal that works on forever.
As a one-liner:
perl -nE '$c += $2 - $1 if /^(\d+)\.\.(\d+)/; END { say $c }' input.txt
(Extracting the important part of that and putting it into your actual script should be easy to figure out).
Explicitly opening the file makes your code more complicated than it needs to be. Perl will automatically open any files passed on the command line and allow you to read from them using the empty file input operator, <>. So your code becomes as simple as this:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $total;
while (<>) {
my ($min, $max) = /(\d+)\.\.(\d+)/;
next unless $min and $max;
$total += $max - $min;
}
say $total;
If this code is in a file called adder and your input data is in add.dat, then you run it like this:
$ adder add.dat
1213
Update: And, to explain where you were going wrong...
You only ever read a single line from your file:
my $line2 = <$IN>;
And then you continually assign that same value to another variable:
while(my $line = $line2){ # reading the file line by line
The comment in this line is wrong. I'm not sure where you got that line from.
To fix your code, just remove the my $line2 = <$IN> line and replace your loop with:
while (my $line = <$IN>) {
# your code here
}
Following code is for copying file content from readfile to writefile. Instead of copying upto last, i want to copy upto some keyword.
use strict;
use warnings;
use File::Slurp;
my #lines = read_file('readfile.txt');
while ( my $line = shift #lines) {
next unless ($line =~ m/END OF HEADER/);
last; # here suggest some other logic
}
append_file('writefile.txt', #lines);
next will continue to the next iteration of the loop, effectively skipping the rest of the statements in the loop for that iteration (in this case, the last).
last will immediately exit the loop, which sounds like what you want. So you should be able to simply put the conditional statement on the last.
Also, I'm not sure why you want to read the entire file into memory to iterate over its lines? Why not just use a regular while(<>)? And I would recommend avoiding File::Slurp, it has some long-standing issues.
You don't show any example input with expected output, and your description is unclear - you said "i want to copy upto some keyword" but in your code you use shift, which removes items from the beginning of the array.
Do you want to remove the lines before or after and including or not including "END OF HEADER"?
This code will copy over only the header:
use warnings;
use strict;
my $infile = 'readfile.txt';
my $outfile = 'writefile.txt';
open my $ifh, '<', $infile or die "$infile: $!";
open my $ofh, '>', $outfile or die "$outfile: $!";
while (<$ifh>) {
last if /END OF HEADER/;
print $ofh $_;
}
close $ifh;
close $ofh;
Whereas if you want to copy everything after the header, you could replace the while above with:
while (<$ifh>) {
last if /END OF HEADER/;
}
while (<$ifh>) {
print $ofh $_;
}
Which will loop and do nothing until it sees END OF HEADER, then breaking out of the first loop and moving to the second, which prints out the lines after the header.
data.txt:
fsffs
sfsfsf
sfSDFF
END OF HEADER
{ dsgs xdgfxdg zFZ }
dgdbg
vfraeer
Code:
use strict;
use warnings;
use 5.020;
use autodie;
use Data::Dumper;
my $infile = 'data.txt';
my $header_file = 'header.txt';
my $after_header_file = 'after_header.txt';
open my $DATA, '<', $infile;
open my $HEADER, '>', $header_file;
open my $AFTER_HEADER, '>', $after_header_file;
{
local $/ = "END OF HEADER";
my $header = <$DATA>;
say {$HEADER} $header;
my $rest = <$DATA>;
say {$AFTER_HEADER} $rest;
}
close $DATA;
close $HEADER;
close $AFTER_HEADER;
say "Created files: $header_file, $after_header_file";
Output:
$ perl 1.pl
Created files: header.txt, after_header.txt
$ cat header.txt
fsffs
sfsfsf
sfSDFF
END OF HEADER
$ cat after_header.txt
{ dsgs xdgfxdg zFZ }
dgdbg
vfraeer
$/ specifies the input record separator, which by default is a newline. Therefore, when you read from a file:
while (my $x = <$INFILE>) {
}
each value of $x is a sequence of characters up to and including the input recored separator, i.e. a newline, which is what we normally think of as a line of text in a file. Often, we chomp off the newline/input_record_separator at the end of the text:
while (my $x = <$INFILE>) {
chomp $x;
say "$x is a dog";
}
But, you can set the input record separator to anything you want, like your "END OF HEADER" text. That means a line will be all the text up to and including the input record separator, which in this case is "END OF HEADER". For example, a line will be: "abc\ndef\nghi\nEND OF HEADER". Furthermore, chomp() will now remove "END OF HEADER" from the end of its argument, so you could chomp your line if you don't want the "END OF HEADER" marker in the output file.
If perl cannot find the input record separator, then perl keeps reading the file until perl hits the end of the file, then perl returns all the text that was read.
You can use those operations to your advantage when you want to seek to some specific text in a file.
Declaring a variable as local makes the variable magical: when the closing brace of the surrounding block is encountered, perl sets the variable back to the value it had just before the opening brace of the surrounding block:
#Here, by default $/ = "\n", but some code out here could have
#also set $/ to something else
{
local $/ = "END OF HEADER";
} # $/ gets set back to whatever value it had before this block
When you change one of perl's predefined global variables, it's considered good practice to only change the variable for as long as you need to use the variable, then change the variable back to what it was.
If you want to target just the text between the braces, you can do:
data.txt:
fsffs
sfsfsf
sfSDFF
END OF HEADER { dsgs xdgfxdg zFZ }
dgdbg
vfraeer
Code snippet:
...
...
{
local $/ = 'END OF HEADER {';
my $pre_brace = <$DATA>;
$/ = '}';
my $target_text = <$DATA>;
chomp $target_text; #Removes closing brace
say "->$target_text<-";
}
--output:--
-> dsgs xdgfxdg zFZ <-
I am working on the perl script and need some help with it. The requirement is, I have to find a lable and once the label is found, I have to replace the word in a line immediately following the label. for Example, if the label is ABC:
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
I want to write a script to match the label (ABC) and once the label is found, replace a word in the next line immediately following the label.
Here is my attempt:
open(my $fh, "<", "file1.txt") or die "cannot open file:$!";
while (my $line = <$fh>))
{
next if ($line =~ /ABC/) {
$line =~ s/original_string/replaced_string/;
}
else {
$msg = "pattern not found \n ";
print "$msg";
}
}
Is this correct..? Any help will be greatly appreciated.
The following one-liner will do what you need:
perl -pe '++$x and next if /ABC:/; $x-- and s/old/new/ if $x' inFile > outFile
The code sets a flag and gets the next line if the label is found. If the flag is set, it's unset and the substitution is executed.
Hope this helps!
You're doing this in your loop:
next if ($line =~ /ABC/);
So, you're reading the file, if a line contains ABC anywhere in that line, you skip the line. However, for every other line, you do the replacement. In the end, you're replacing the string on all other lines and printing that out, and your not printing out your labels.
Here's what you said:
I have to read the file until I find a line with the label:
Once the label is found
I have to read the next line and replace the word in a line immediately following the label.
So:
You want to read through a file line-by-line.
If a line matches the label
read the next line
replace the text on the line
Print out the line
Following these directions:
use strict;
use warnings; # Hope you're using strict and warnings
use autodie; # Program automatically dies on failed opens. No need to check
use feature qw(say); # Allows you to use say instead of print
open my $fh, "<", "file1.txt"; # Removed parentheses. It's the latest style
while (my $line = <$fh>) {
chomp $line; # Always do a chomp after a read.
if ( $line eq "ABC:" ) { # Use 'eq' to ensure an exact match for your label
say "$line"; # Print out the current line
$line = <$fh> # Read the next line
$line =~ s/old/new/; # Replace that word
}
say "$line"; # Print the line
}
close $fh; # Might as well do it right
Note that when I use say, I don't have to put the \n on the end of the line. Also, by doing my chomp after my read, I can easily match the label without worrying about the \n on the end.
This is done exactly as you said it should be done, but there are a couple of issues. The first is that when we do $line = <$fh>, there's no guarantee we are really reading a line. What if the file ends right there?
Also, it's bad practice to read a file in multiple places. It makes it harder to maintain the program. To get around this issue, we'll use a flag variable. This allows us to know if the line before was a tag or not:
use strict;
use warnings; # Hope you're using strict and warnings
use autodie; # Program automatically dies on failed opens. No need to check
use feature qw(say); # Allows you to use say instead of print
open my $fh, "<", "file1.txt"; # Removed parentheses. It's the latest style
my $tag_found = 0; # Flag isn't set
while (my $line = <$fh>) {
chomp $line; # Always do a chomp after a read.
if ( $line eq "ABC:" ) { # Use 'eq' to ensure an exact match for your label
$tag_found = 1 # We found the tag!
}
if ( $tag_found ) {
$line =~ s/old/new/; # Replace that word
$tag_found = 0; # Reset our flag variable
}
say "$line"; # Print the line
}
close $fh; # Might as well do it right
Of course, I would prefer to eliminate mysterious values. For example, the tag should be a variable or constant. Same with the string you're searching for and the string you're replacing.
You mentioned this was a word, so your regular expression replacement should probably look like this:
$line =~ s/\b$old_word\b/$new_word/;
The \b mark word boundaries. This way, if you're suppose to replace the word cat with dog, you don't get tripped up on a line that says:
The Jeopardy category is "Say what".
You don't want to change category to dogegory.
Your problem is that reading in a file does not work like that. You're doing it line by line, so when your regex tests true, the line you want to change isn't there yet. You can try adding a boolean variable to check if the last line was a label.
#!/usr/bin/perl;
use strict;
use warnings;
my $found;
my $replacement = "Hello";
while(my $line = <>){
if($line =~ /ABC/){
$found = 1;
next;
}
if($found){
$line =~ s/^.*?$/$replacement/;
$found = 0;
print $line, "\n";
}
}
Or you could use File::Slurp and read the whole file into one string:
use File::Slurp;
$x = read_file( "file.txt" );
$x =~ s/^(ABC:\s*$ [\n\r]{1,2}^.*?)to\sbe/$1to was/mgx;
print $x;
using /m to make the ^ and $ match embedded begin/end of lines
x is to allow the space after the $ - there is probably a better way
Yields:
ABC:
string to was replaced
some other lines
ABC:
string to was replaced
some other lines
ABC:
string to was replaced
Also, relying on perl's in-place editing:
use File::Slurp qw(read_file write_file);
use strict;
use warnings;
my $file = 'fakefile1.txt';
# Initialize Fake data
write_file($file, <DATA>);
# Enclosed is the actual code that you're looking for.
# Everything else is just for testing:
{
local #ARGV = $file;
local $^I = '.bac';
while (<>) {
print;
if (/ABC/ && !eof) {
$_ = <>;
s/.*/replaced string/;
print;
}
}
unlink "$file$^I";
}
# Compare new file.
print read_file($file);
1;
__DATA__
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
some other lines
ABC:
string to be replaced
ABC:
outputs
ABC:
replaced string
some other lines
ABC:
replaced string
some other lines
ABC:
replaced string
ABC:
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a big file with repeated lines as follows:
#UUSM
ABCDEADARFA
+------qqq
!2wqeqs6777
I will like to output the all the 'second line' in the file. I have the following code snipped for doing this, but it's not working as expected. Lines 1, 3 and 4 are in the output instead.
open(IN,"<", "file1.txt") || die "cannot open input file:$!";
while (<IN>) {
$line = $line . $_;
if ($line =~ /^\#/) {
<IN>;
#next;
my $line = $line;
}
}
print "$line";
Please help!
try this
open(IN,"<", "file1.txt") || die "cannot open input file:$!";
my $lines = "";
while (<IN>) {
if ($. % 4 == 2) $lines .= $_;
}
print "$lines";
I assume what you are asking is how to print the line that comes after a line that begins with #:
perl -ne 'if (/^\#/) { print scalar <> }' file1.txt
This says, "If the line begins with #, then print the next line. Do this for all the files in the argument list." The scalar function is used here to impose a scalar context on the file handle, so that it does not print the whole file. By default print has a list context for its arguments.
If you actually want to print the second line in the file, well, that's even easier. Here's a few examples:
Using the line number $. variable, printing if it equals line number 2.
perl -ne '$. == 2 and print, close ARGV' yourfile.txt
Note that if you have multiple files, you must close the ARGV file handle to reset the counter $.. Note also the use of the lower precedence operator and will force print and close to both be bound to the conditional.
Using regular logic.
perl -ne 'print scalar <>; close ARGV;'
perl -pe '$_ = <>; close ARGV;'
Both of these uses a short-circuit feature by closing the ARGV file handle when the second line is printed. If you should want to print every other line of a file, both these will do that if you remove the close statements.
perl -ne '$at = $. if /^\#/; print if $. - 1 == $at' file1.txt
Written out longhand, the above is equivalent to
open my $fh, "<", "file1.txt";
my $at_line = 0;
while (<$fh>) {
if (/^\#/) {
$at_line = $.;
}
else {
print if $. - 1 == $at_line;
}
}
If you want lines 2, 6, 10 printed, then:
while (<>)
{
print if $. % 4 == 2;
}
Where $. is the current line number — and I didn't spend the time opening and closing the file. That might be:
{
my $file = "file1.txt";
open my $in, "<", $file or die "cannot open input file $file: $!";
while (<$in>)
{
print if $. % 4 == 2;
}
}
This uses the modern preferred form of file handle (a lexical file handle), and the braces around the construct mean the file handle is closed automatically. The name of the file that couldn't be opened is included in the error message; the or operator is used so the precedence is correct (the parentheses and || in the original were fine too and could be used here, but conventionally are not).
If you want the line after a line starting with # printed, you have to organize things differently.
my $print_next = 0;
while (<>)
{
if ($print_next)
{
print $_;
$print_next = 0;
}
elsif (m/^#/)
{
$print_next = 1;
}
}
Dissecting the code in the question
The original version of the code in the question was (line numbers added for convenience):
1 open(IN,"<", "file1.txt") || die "cannot open input file:$!";
2 while (<IN>) {
3 $line = $line . $_;
4 if ($line =~ /^\#/) {
5 <IN>;
6 #next;
7 my $line = $line;
8 }
9 }
10 print "$line";
Discussion of each line:
OK, though it doesn't use a lexical file handle or report which file could not be opened.
OK.
Premature and misguided. This adds the current line to the variable $line before any analysis is done. If it was desirable, it could be written $line .= $_;
Suggests that the correct description for the desired output is not 'the second lines' but 'the line after a line starting with #. Note that since there is no multi-line modifier on the regex, this will always match only the first line segment in the variable $line. Because of the premature concatenation, it will match on each line (because the first line of data starts with #), executing the code in lines 5-8.
Reads another line into $_. It doesn't test for EOF, but that's harmless.
Comment line; no significance except to suggest some confusion.
my $line = $line; is a self-assignment to a new variable hiding the outer $line...mainly, this is weird and to a lesser extent it is a no-op. You are not using use strict; and use warnings; because you would have warnings if you did. Perl experts use use strict; and use warnings; to make sure they haven't made silly mistakes; novices should use them for the same reason.
Of itself, OK. However, the code in the condition has not really done very much. It skips the second line in the file; it will later skip the fourth, the sixth, the eighth, etc.
OK.
OK, but...if you're only interested in printing the lines after the line starting #, or only interested in printing the line numbers 2N+2 for integral N, then there is no need to build up the entire string in memory before printing each line. It will be simpler to print each line that needs printing as it is found.
I'd to read a file, e.g. test.test which contains
#test:testdescription\n
#cmd:binary\n
#return:0\n
#stdin:|\n
echo"toto"\n
echo"tata"\n
#stdout:|\n
toto\n
tata\n
#stderr:\n
I succeeded in taking which are after #test: ; #cmd: etc...
but for stdin or stdout, I want to take all the line before the next # to a table #stdin and #stdout.
I do a loop while ($line = <TEST>) so it will look at each line. If i see a pattern /^#stdin:|/, I want to move to the next line and take this value to a
table until i see the next #.
How do I move to the next line in the while loop?
This file format can be easily handled with some creativity in selecting the appropriate value for $/:
use strict; use warnings;
my %parsed;
{
local $/ = '#';
while ( my $line = <DATA> ) {
chomp $line;
my $content = (split /:/, $line, 2)[1];
next unless defined $content;
$content =~ s/\n+\z//;
if ( my ($chan) = $line =~ /^(std(?:err|in|out))/ ) {
$content =~ s/^\|\n//;
$parsed{$chan} = [ split /\n/, $content];
}
elsif ( my ($var) = $line =~ /^(cmd|return|test)/ ) {
$parsed{ $var } = $content;
}
}
}
use YAML;
print Dump \%parsed;
__DATA__
#test:testdescription
#cmd:binary
#return:0
#stdin:|
echo"toto"
echo"tata"
#stdout:|
toto
tata
#stderr:
Output:
---
cmd: binary
return: 0
stderr: []
stdin:
- echo"toto"
- echo"tata"
stdout:
- toto
- tata
test: testdescription
UPDATED as per user's colmments
If I understand the question correctly, you want to read one more line within a loop?
If so, you can either:
just do another line read inside the loop.
my $another_line = <TEST>;
Keep some state flag and use it next iteration of the loop, and accumulate lines between stdins in a buffer:
my $last_line_was_stdin = 0;
my #line_buffer = ();
while ($line = <TEST>) {
if (/^#stdin:|/) {
#
# Some Code to process all lines acccumulated since last "stdin"
#
#line_buffer = ();
$last_line_was_stdin = 1;
next;
}
push #line_buffer, $line;
}
This solution may not do 100% of what you need but it defines a pattern you need to follow in your state machine implementation: read a line. Check your current state (if it matters). Based on the current state and a pattern in the line, verify what do do about the current line (add to the buffer? change the state? If changing a state, process the buffer based on last state?)
Also, as per your comment, you have a bug in your regex - the pipe (| character) means "OR" in regex, so you are saying "if line starts with #stdin OR matches an empty regex" - the latter part is always true so your regex will match 100% of time. You need to escape the "|" via /^#stdin:\|/ or /^#stdin:[|]/