Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a big file with repeated lines as follows:
#UUSM
ABCDEADARFA
+------qqq
!2wqeqs6777
I will like to output the all the 'second line' in the file. I have the following code snipped for doing this, but it's not working as expected. Lines 1, 3 and 4 are in the output instead.
open(IN,"<", "file1.txt") || die "cannot open input file:$!";
while (<IN>) {
$line = $line . $_;
if ($line =~ /^\#/) {
<IN>;
#next;
my $line = $line;
}
}
print "$line";
Please help!
try this
open(IN,"<", "file1.txt") || die "cannot open input file:$!";
my $lines = "";
while (<IN>) {
if ($. % 4 == 2) $lines .= $_;
}
print "$lines";
I assume what you are asking is how to print the line that comes after a line that begins with #:
perl -ne 'if (/^\#/) { print scalar <> }' file1.txt
This says, "If the line begins with #, then print the next line. Do this for all the files in the argument list." The scalar function is used here to impose a scalar context on the file handle, so that it does not print the whole file. By default print has a list context for its arguments.
If you actually want to print the second line in the file, well, that's even easier. Here's a few examples:
Using the line number $. variable, printing if it equals line number 2.
perl -ne '$. == 2 and print, close ARGV' yourfile.txt
Note that if you have multiple files, you must close the ARGV file handle to reset the counter $.. Note also the use of the lower precedence operator and will force print and close to both be bound to the conditional.
Using regular logic.
perl -ne 'print scalar <>; close ARGV;'
perl -pe '$_ = <>; close ARGV;'
Both of these uses a short-circuit feature by closing the ARGV file handle when the second line is printed. If you should want to print every other line of a file, both these will do that if you remove the close statements.
perl -ne '$at = $. if /^\#/; print if $. - 1 == $at' file1.txt
Written out longhand, the above is equivalent to
open my $fh, "<", "file1.txt";
my $at_line = 0;
while (<$fh>) {
if (/^\#/) {
$at_line = $.;
}
else {
print if $. - 1 == $at_line;
}
}
If you want lines 2, 6, 10 printed, then:
while (<>)
{
print if $. % 4 == 2;
}
Where $. is the current line number — and I didn't spend the time opening and closing the file. That might be:
{
my $file = "file1.txt";
open my $in, "<", $file or die "cannot open input file $file: $!";
while (<$in>)
{
print if $. % 4 == 2;
}
}
This uses the modern preferred form of file handle (a lexical file handle), and the braces around the construct mean the file handle is closed automatically. The name of the file that couldn't be opened is included in the error message; the or operator is used so the precedence is correct (the parentheses and || in the original were fine too and could be used here, but conventionally are not).
If you want the line after a line starting with # printed, you have to organize things differently.
my $print_next = 0;
while (<>)
{
if ($print_next)
{
print $_;
$print_next = 0;
}
elsif (m/^#/)
{
$print_next = 1;
}
}
Dissecting the code in the question
The original version of the code in the question was (line numbers added for convenience):
1 open(IN,"<", "file1.txt") || die "cannot open input file:$!";
2 while (<IN>) {
3 $line = $line . $_;
4 if ($line =~ /^\#/) {
5 <IN>;
6 #next;
7 my $line = $line;
8 }
9 }
10 print "$line";
Discussion of each line:
OK, though it doesn't use a lexical file handle or report which file could not be opened.
OK.
Premature and misguided. This adds the current line to the variable $line before any analysis is done. If it was desirable, it could be written $line .= $_;
Suggests that the correct description for the desired output is not 'the second lines' but 'the line after a line starting with #. Note that since there is no multi-line modifier on the regex, this will always match only the first line segment in the variable $line. Because of the premature concatenation, it will match on each line (because the first line of data starts with #), executing the code in lines 5-8.
Reads another line into $_. It doesn't test for EOF, but that's harmless.
Comment line; no significance except to suggest some confusion.
my $line = $line; is a self-assignment to a new variable hiding the outer $line...mainly, this is weird and to a lesser extent it is a no-op. You are not using use strict; and use warnings; because you would have warnings if you did. Perl experts use use strict; and use warnings; to make sure they haven't made silly mistakes; novices should use them for the same reason.
Of itself, OK. However, the code in the condition has not really done very much. It skips the second line in the file; it will later skip the fourth, the sixth, the eighth, etc.
OK.
OK, but...if you're only interested in printing the lines after the line starting #, or only interested in printing the line numbers 2N+2 for integral N, then there is no need to build up the entire string in memory before printing each line. It will be simpler to print each line that needs printing as it is found.
Related
I have the following code in a file perl_script.pl:
while (my $line = <>) {
chomp $line;
// etc.
}.
I call the script with more than 1 file e.g.
perl perl_script.pl file1.txt file2.txt
Is there a way to know if the $line is started to read from file2.txt etc?
The $ARGV variable
Contains the name of the current file when reading from <>
and you can save the name and test on every line to see if it changed, updating when it does.
If it is really just about getting to a specific file, as the question seems to say, then it's easier since you can also use #ARGV, which contains command-line arguments, to test directly for the needed name.
One other option is to use eof (the form without parenthesis!) to test for end of file so you'll know that the next file is coming in the next iteration -- so you'll need a flag of some sort as well.
A variation on this is to explicitly close the filehandle at the end of each file so that $. gets reset for each new file, what normally doesn't happen for <>, and then $. == 1 is the first line of a newly opened file
while (<>) {
if ($. == 1) { say "new file: $ARGV" }
}
continue {
close ARGV if eof;
}
A useful trick which is documented in perldoc -f eof is the } continue { close ARGV if eof } idiom on a while (<>) loop. This causes $. (input line number) to be reset between files of the ARGV iteration, meaning that it will always be 1 on the first line of a given file.
There's the eof trick, but good luck explaining that to people. I usually find that I want to do something with the old filename too.
Depending on what you want to do, you can track the filename you're
working on so you can recognize when you change to a new file. That way
you know both names at the same time:
use v5.10;
my %line_count;
my $current_file = $ARGV[0];
while( <> ) {
if( $ARGV ne $current_file ) {
say "Change of file from $current_file to $ARGV";
$current_file = $ARGV;
}
$line_count{$ARGV}++
}
use Data::Dumper;
say Dumper( \%line_count );
Now you see when the file changes, and you can use $ARGV
Change of file from cache-filler.pl to common.pl
Change of file from common.pl to wc.pl
Change of file from wc.pl to wordpress_posts.pl
$VAR1 = {
'cache-filler.pl' => 102,
'common.pl' => 13,
'wordpress_posts.pl' => 214,
'wc.pl' => 15
};
Depending what I'm doing, I might not let the diamond operator do all
the work. This give me a lot more control over what's happening and
how I can respond to things:
foreach my $arg ( #ARGV ) {
next unless open my $fh, '<', $arg;
while( <$fh> ) {
...
}
}
I want to split parts of a file. Here is what the start of the file looks like (it continues in same way):
Location Strand Length PID Gene
1..822 + 273 292571599 CDS001
906..1298 + 130 292571600 trxA
I want to split in Location column and subtract 822-1 and do the same for every row and add them all together. So that for these two results the value would be: (822-1)+1298-906) = 1213
How?
My code right now, (I don't get any output at all in the terminal, it just continue to process forever):
use warnings;
use strict;
my $infile = $ARGV[0]; # Reading infile argument
open my $IN, '<', $infile or die "Could not open $infile: $!, $?";
my $line2 = <$IN>;
my $coding = 0; # Initialize coding variable
while(my $line = $line2){ # reading the file line by line
# TODO Use split and do the calculations
my #row = split(/\.\./, $line);
my #row2 = split(/\D/, $row[1]);
$coding += $row2[0]- $row[0];
}
print "total amount of protein coding DNA: $coding\n";
So what I get from my code if I put:
print "$coding \n";
at the end of the while loop just to test is:
821
1642
And so the first number is correct (822-1) but the next number doesn't make any sense to me, it should be (1298-906). What I want in the end outside the loop:
print "total amount of protein coding DNA: $coding\n";
is the sum of all the subtractions of every line i.e. 1213. But I don't get anything, just a terminal that works on forever.
As a one-liner:
perl -nE '$c += $2 - $1 if /^(\d+)\.\.(\d+)/; END { say $c }' input.txt
(Extracting the important part of that and putting it into your actual script should be easy to figure out).
Explicitly opening the file makes your code more complicated than it needs to be. Perl will automatically open any files passed on the command line and allow you to read from them using the empty file input operator, <>. So your code becomes as simple as this:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $total;
while (<>) {
my ($min, $max) = /(\d+)\.\.(\d+)/;
next unless $min and $max;
$total += $max - $min;
}
say $total;
If this code is in a file called adder and your input data is in add.dat, then you run it like this:
$ adder add.dat
1213
Update: And, to explain where you were going wrong...
You only ever read a single line from your file:
my $line2 = <$IN>;
And then you continually assign that same value to another variable:
while(my $line = $line2){ # reading the file line by line
The comment in this line is wrong. I'm not sure where you got that line from.
To fix your code, just remove the my $line2 = <$IN> line and replace your loop with:
while (my $line = <$IN>) {
# your code here
}
I need to write a perl script to search for a keyword in a large file and then print all the lines containing the keyword plus the line below each keyword to a new file.
In the original file, there are multiple lines (the exact number varies) below each keyword-containing line. I already have a script that makes the variable number of lines to equal 1. I need this functionality to remain in the script and build upon it.
I found out that I could use grep to extract the lines, but this requires running the script I already have first and then using the grep command. I'd really need to have these functions to be combined into one.
Any help is much appreaciated!
Here is the script I have so far:
use strict;
open (FILE, $ARGV[0]) or die ("Cannot open file");
my $name;
my $sequence;
while (my $line = <FILE>) {
chomp ($line);
if (substr ($line, 0, 1) eq ">") {
if ($sequence ne "") {
printf if / ("%s\n%s\n", $name, $sequence);
}
$name = $line;
$sequence = "";
} else {
$sequence .= $line;
}
}
if ($sequence ne "") {
printf ("%s\n%s\n", $name, $sequence);
}
And an example of the original file:
sp|Q6GZX4|001R_FRG3G Putative transcription factor 001R OS=Frog virus 3 (isolate Goorha) GN=FV3-001R PE=4 SV=1
MAFSAEDVLKEYDRRRRMEALLLSLYYPNDRKLLDYKEWSPPRVQVECPKAPVEWNNPPSEKGLIVGHFSGIKYKGEKAQASEVDVNKMCCWVSKFKDAMRRYQGIQTCKIPGKVLSDLDAKIKAYNLTVEGVEGFVRYSRVTKQHVAAFLKELRHSKQYENVNLIHYILTDKRVDIQHLEKDLVKDFKALVESAHRMRQGHMINVKYILYQLLKKHGHGPDGPDILTVKTGSKGVLYDDSFRKIYTDLGW
In this example, the keyword would be "FRG3G". The keyword is always in the same place, the characters before it vary, but the structure is the same.
If you have only 1 line to print after the keyword line, you can just remember if you found the keyword and then print the line like this:
my $matched = 0;
while (<FILE>) {
print if ($matched);
if (m/$keyword/) {
print;
matched = 1;
}
else {
matched = 0;
}
}
If you can detect the end of the lines you want to print somehow, you can adjust the code above instead of just hard-coding it to print 1 line.
Redirect to a new file as needed.
I'm iterating through a file and after some condition I have to step back by a line
when file line match the regexp, the second while loop goes in and it iterates over a file until it match while's condition, after than my code have to STEP BACK by 1 line!
while(my $line = <FL>){
if($line =~ /some regexp/){
while($line =~ /^\+/){
$line = <FL>; #Step into next line
}
seek(FL, -length($line), 1); #This should get me back the previous line
#Some tasks with previous line
}
}
actually seek should work but it doesn't, it return me the same line... What is the problem?
When you read from a filehandle, it has already advanced to the next line. Therefore if you go back the length of the current line, all you're doing is setting up to read the line over again.
Also, relating the length of a line to its length on disk assumes the encoding is :raw instead of :crlf or some other format. This is a big assumption.
What you need are state variables to keep track of your past values. There is no need to literally roll back a file handle.
The following is a stub of what you might be aiming to do:
use strict;
use warnings;
my #buffer;
while (<DATA>) {
if (my $range = /some regexp/ ... !/^\+/) {
if ($range =~ /E/) { # Last Line of range
print #buffer;
}
}
# Save a buffer of last 4 lines
push #buffer, $_;
shift #buffer if #buffer > 4;
}
__DATA__
stuff
more stuff
some regexp
+ a line
+ another line
+ last line
break out
more stuff
ending stuff
Output:
some regexp
+ a line
+ another line
+ last line
What about something like: (as an alternative)
open(my $fh, '<', "$file") or die $!;#use three argument open
my $previous_line = q{}; #initially previous line would be empty
while(my $current_line = <$fh>){
chomp;
print"$current_line\n";
print"$previous_line\n";
#assign current line into previous line before it go to next line
$previous_line = $current_line;
}
close($fh);
I am trying to print the array but the out put contain only the last line of the array. the partial code is as follow.
open OUT, "> /myFile.txt"
or die "Couldn't open output file: $!";
foreach (#result) {
print OUT;
}
the out put is
List Z
which is the last line, but when I do print "#result" the out put is
List A
List B
List C so on...
I am little bit confuse why the results are different on the same array.
Working on a hunch, I tried adding \r to the end of your input lines, and sure enough, it creates the illusion that only the last line of your input is printed to the file. Here's the code to test it:
use strict;
use warnings;
my #result = map "$_\r", 'A' .. 'Z';
open (OUT, "> myFile.txt") or die("Couldn't open output file: $!");
foreach (#result) {
print OUT ;
}
What you have probably done is performed chomp on lines from a file from a different operating system (DOS, Windows), which does not strip the \r line endings. Hence, when the lines are printed, the lines overwrite each other.
If this is what is wrong, the solution is to use the dos2unix tool to fix your files, or to use:
s/\s+\z//;
to strip your newlines.
You may inspect your input by using the Data::Dumper module, using the option Useqq, e.g.:
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper \#result;
If these whitespace characters are in your output, they will then be visible.
the problem is here
open OUT, "> /myFile.txt"
this should be
open OUT, ">>", "/myfile.txt"
What you wrote overwrites the entire file for each iteration of the foreach(#result) loop.
What you are intending to do is append to it (">>").
">>" appends, ">" overwrites.
Also take note of how i broke ">> /myfile.txt" into ">>", "/myfile.txt".
This is both more secure, and more robust for less specific applications of open.
Foreign line terminators from any platform can easily be fixed by clearing whitespace from the end of the line and adding it back when printing it
Like this
open my $out, '>', '/myFile.txt' or die "Couldn't open output file: $!";
foreach (#result) {
s/\s+$//;
print $out "$_\n";
}
or
foreach my $line (#result) {
$line =~ s/\s+$//;
print $out "$line\n";
}