Perl: Looping over input lines with an index-based approach - perl

This is a beginner-best-practice question in perl. I'm new to this language. The question is:
If I want to process the output lines from a program, how can I format THE FIRST LINE in a special way?
I think of two possibilities:
1) A flag variable, once the loop is executed first time is set. But it will be evaluated for each cycle. BAD solution
2) An index-based loop (like a "for"). Then I would start the loop in i=1. This solution is far better. The problem is HOW CAN I DO IT?
I just found the code for looping over with the while ( <> ) construct.
Here you can see better:
$command_string = "par-format 70j p0 s0 < " . $ARGV[0] . "|\n";
open DATA, $command_string or die "Couldn't execute program: $!";
print "\t <div>&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;|-- <strong>Description</strong></div>\n";
while ( defined( my $line = <DATA> ) ) {
# print "$line\n";
print "\t <div>&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;&‎nbsp;|&‎nbsp;&‎nbsp;&‎nbsp;-- " . $line . "</div>\n";
close DATA;
Please also don't hesitate in correcting any code in here, this is my first perl poem.

You can always use $. or the English name $INPUT_LINE_NUMBER to control the logic in your loop with:
while (my $line = <>) {
if ($. == 1) {
# do cool stuff here
# do normal stuff here

To handle the first line differently, you could just put
$line = <DATA>;
above your loop.
With proper checking for read problems (empty file, etc.) this should be
if ($line = <DATA>) { special things...
while (my $line = <DATA>) { regular things...
I'm not sure about the defined() call. You might not need it, since an empty string has a false truth value.

From a 'best practices' perspective there is much wrong with that code sample:
open DATA, $command_string or die "Couldn't execute program: $!";
Security hole, please exploit me.
DATA is a magical value that points to a __DATA__ section at the end of the current file.
You should use
open my $fh
Which uses a lexical variable for a file handle instead of a global.
You should use 3 arg open, ie:
open my $fh, '<' , $filename
open my $fh, '-|' , $command
open my $fh, '-|' , $command, #args
sadly I have yet to work out how 3-arg works with dual-pipes.
theres' this IPC::Open2 thing, but I haven't worked out how
to use that effectively yet. Suggestions welcome .


How to check for a string using Perl given

I'm trying to replace a particular line in a file. I can get my program to run, but it doesn't actually do the replacing that I want it to.
Here is my sample file:
test line 1
test line 2
line to be overwritten
test line 3
Here is the code that I have:
my $origFile = $file_path . "junk\.file";
my $newFile = $file_path . "junk\.file\.backup";
# system command to make a backup of the file
system "mv $origFile $newFile";
#opens the files
open( my $INFILE, $newFile ) || die "Unable to read $newFile\n";
open( my $OUTFILE, '>' . $origFile ) || die "Unable to create $origFile\n";
# While loop to read in the file line by line
while ( <$INFILE> ) {
given ($_) {
when ("line to be overwritten") {
print $OUTFILE "line has been overwritten\n";
default {
print $OUTFILE $_;
I've tried to change the when statements several different ways to no avail:
when ($_ eq "line to be overwritten")
when ($_ == "line to be overwritten")
when ($_ cmp "line to be overwritten")
But those only generate errors. Anyone know what I'm doing wrong here?
As highlighted in a comment on the original question, given/when is an experimental feature of perl. I would personally recommend using if/else in a loop, and then either use string equality or a regex to match the line(s) you want to replace. A quick example:
use strict;
use warnings;
while(my $line = <DATA>) {
if ( $line =~ /line to be overwritten/ ) {
print "Overwritten\n";
} else {
print $line;
test line 1
test line 2
line to be overwritten
test line 3
This will give the output:
test line 1
test line 2
test line 3
You could also use the string equality if you aren't confident in your regex, or the string is guaranteed to be the same:
if ($line eq 'line to be overwritten') {
On your initial open, it is recommended to use the 3 argument version of open to save from unexpected issues:
open(my $INFILE, '<', $newFile) || die "Unable to read $newFile\n";
(for more info on this, see here:
strict & warnings
Also, it is recommended to use strict and warnings in your code file, as seen in my example above - this will save you from accidental mistakes like trying to use a variable which has not been declared, and syntax errors which may give you head-scratching results!
Experimental Features
Experimental features in perl are where there is no guarantee made for backwards compatibility to be maintained when a new release of perl comes out. Obviously if you are using the same version of perl everywhere it should be compatible, but things may break if you update to another major version of perl. answered here as I dont have the reputation to answer in the comments...
You seem to be making it way more complicated than it needs to be - a simple regex to check each line and act accordingly should do the job.
if /^line to be overwritten$/ )
print $OUTFILE "line has been overwritten\n";
print $OUTFILE "$_\n";
One way to do it is to use Tie::File module. It allows to replace data right in the file. You can make the backup same way you are currently doing, before changing the original file.
use strict;
use warnings;
use Tie::File;
my $file = 'test.txt';
tie my #textFile, 'Tie::File', $file, recsep => "\n" or die $!;
s/line to be overwritten/line has been overwritten/ for #textFile;
untie #textFile;

How to get rid of the syntax error in this code?

Below I'v proided just a chunk of a huge perl script I am trying to write. I am getting syntax errors in else statement but in the console window its only saying syntax error at perl script and not clearly telling the error. I am trying to create a variable file file_no_$i.txt and copy contents of t_code.txt in it and then find and replace string in the variable file with some selected keys of hash %defines_2
open ( my $pointer, "<", "t_code.txt" ) or die $!;
my $out_pointer;
for (my $i=0 ; $i <=$#match ; $i++) {
for (my $j=0; $j <= $#match ; $j++) {
if ($match[$i]=~$match[$j]) {
else {
my $file_name = "file_no_$i.txt";
open $out_pointer, ">" , $file_name or die "Can't open the output file!";
copy("$file_name","t_code.txt") or die "Copy failed: $!";
my #lin = <$out_pointer>;
foreach $_(#lin) {
$_ =~ s/UART90_BASE_ADDRESS/$defines_2{ $_ = grep{/$match[$i]/} (keys %defines_2)};
You cannot use / unquoted inside a s/// construct. Instead of backslashes, you can use different delimiters:
s#UART90_BASE_ADDRESS#$defines_2{ $_ = grep{/$match[$i]/} (keys %defines_2)}#;
It fixes the syntax error, but I fear it still won't do what you want. Without data, it's hard to test, though.
What I think you're doing is editing a number of text files whose names look like file_no_1.txt etc. You're doing that by copying the current file to t_code.txt and then reading that file line by line, editing as required, as writing the lines back to the original text file.
The problem with that approach is that the file will be copied and rewritten many times, and it would be better to read the whole file into an array, make all the edits, and then write them back in one operation. That would be fine unless the file is enormous — say, several GB.
Here's some code that implements that approach. You see that $file_name is defined and #lines is filled outside the inner loop. The innermost loop modifies the elements of #lines and, outside that loop again, #lines is written back to the original text file.
I couldn't fathom a couple of things about your code.
I'm not sure if you should be using =~ or if you intended a simple eq. The former does a contains test, and you had a problem in the past where you meant to check that the first string had the second at the end
The grep call
grep{/$match[$i]/} (keys %defines_2)
worries me, as it can potentially return more than one key of the %defines_2 hash, in which case your own code will insert what is pretty much a random selection from the hash elements
If your code is working then that's fine, but if not then I hope this helps you fix it. If you need more help on this chunk of code then you should include a small sample of the data so that we can better understand what is going on.
for my $i (0 .. $#match) {
my $file_name = "file_no_$i.txt";
my #lines = do {
open my $in_fh, '<', 't_code.txt' or die $!;
for my $j (0 .. $#match) {
next if $match[$i] =~ $match[$j];
for ( #lines ) {
my ($match) = grep { /$match[$i]/ } keys %defines_2;
open my $out_fh, '>', $file_name or die qq{Can't open "$file_name" for output: $!};
print $out_fh $_ for #lines;
close $out_fh or die qq{Failed to close output file "$file_name": $!};

Match file names in if condition of perl

I've files with filenames such as lin.txt and lin1.txt along with other .txt files. I need to find only these files and print its content only by one. I've the below code, but its somehow not matching the files starting with lin*. What is the issue?
$te_dir= "/projects/xxx/";
opendir (DIR, $te_dir) or die $!;
while (my $file = readdir(DIR))
if ($file=~/\.txt/)
#// Doing some tasks.
if($file ~= 'lin*.txt')
open(LINFILE, $linfile) or die "Couldn't open file $file:$!";
while(my $line = <LINFILE>)
print $line;
close LINFILE;
You are mixing globs (shell wildcards) with regular expressions. These are two different formalisms with different syntax and semantics. In regular expressions (which is what Perl matching uses), n* matches zero or more occurrences of the character n. You probably mean
if ($file =~ /lin.*\.txt/)
Notice also the syntax error in the operator. You correctly have =~ in the first conditional, but you misspelled it as ~= where you do this comparison. (Maybe it's just a transcription error; for me, this creates a clear syntax error, so the script would not run in the first place.)
As noted in #brianadams' answer, the proper regular expression for this is
if ($file =~ /^lin.*\.txt$/)
with beginning of line ^ and end of line $ anchors to prevent e.g. feline.txt.html from matching. The default behavior of Perl's regular expressions is to find a match anywhere in the input string.
Here's a quick (and minimal) rewrite of your code that might help:
use strict;
use warnings;
my $te_dir = "/projects/xxx/";
opendir( my $dirh, $te_dir ) or die "Could not open '$te_dir': $!";
while ( my $file = readdir($dirh) ) {
next unless $file =~ /\.txt$/;
#// Doing some tasks.
if ( $file =~ /^ lin \d* \.txt $/x ) {
my $linfile = "$te_dir/$file";
open( my $fh, $linfile ) or die "Couldn't open file $linfile: $!";
while ( my $line = <$fh> ) {
print $line;
close $fh or die "Could not close $linfile: $!";
First, note that we've put strict and warnings at the top of the code. That will tell you about all sorts of interesting issues, including misspelled variable names.
Next, we've switch to lexical handles (e.g., my $dirh instead of DIR). The "bareword" version of the handles you're using (DIR and LINFILE have been discouraged for a long time because those are effectively global constructs and generally global data is bad because when it gets broken, it's awfully hard to tell what broke it, so we much, much prefer the lexical versions (the handles declared with the my builtin).
Also, this line you had probably doesn't do what you're thinking:
You're trying to smash together a directory and filename with a forward slash, but since you didn't use string interpolation, you're actually using division. Both your director and filename will, in this numeric context, probably evaluate to zero, giving you a divide by zero error when you're trying to open a file!
However, if you're willing to use a CPAN module, you can make this even easier:
use strict;
use warnings;
use File::Find::Rule;
my $te_dir = "/projects/xxx/";
my #files = File::Find::Rule->file->name('lin*.txt')->in($te_dir);
foreach my $linfile (#files) {
#// Doing some tasks.
open my $fh, $linfile or die "Couldn't open file $linfile: $!";
while ( my $line = <$fh> ) {
print $line;
No muss, no fuss. Get only the files you want in the first pass and already have the correct file names (note that I didn't close the filehandle because it will close automatically when $fh goes out of scope at the end of the foreach loop.)
To match files starting with lin
if ( $file =~ /^lin.*\.txt$/ )
Try changing your 2nd if condition from this,
if($file ~= 'lin*.txt')
to this,
if($file =~ /lin*\.txt/)
You could also try: if($file =~ /^lin*\.txt/) , as already pointed out in other answers, but you'll need to make sure that the file names stored in the $file variable contain only the file name and not the entire path as well.

I want to replace a sequence name in fasta file with another name

I have one fasta file and one text file fasta file contains sequences in fasta format and text file contains name of genes now I want to replace name of the sequences in fasta file after '>' sign with the gene names in text file
I am new to perl though I have written a script but I don't know why its not working can anyone help me on that please
following is my script:
print"Enter annotated file...";
print"Enter sequence file...";
open(FILE1,$f1) || die"Can't open $f1";
open(FILE2,$f2) || die"Can't open $f2";
print #seqfile[$j];
my files looks like following:
pool75_contig_389 ubiquitin ligase e3a
pool75_contig_704 tumor susceptibility
pool75_contig_1977 serine threonine-protein phosphatase 4 catalytic subunit
pool75_contig_3064 bardet-biedl syndrome 2 protein P
pool75_contig_2499 succinyl- ligase
Consider using Bio::SeqIO to parse your Fasta dataset, instead of doing it yourself. Bio::SeqIO lives for this task, and is well developed for it. Additionally, if you're in bioinformatics, it would serve you well to get to know Bio::SeqIO. Given this, consider the following:
use strict;
use warnings;
use Bio::SeqIO;
open my $fh, '<', 'annot.txt' or die $!;
my %annot = map { /(\S+)\s+(.+)/; $1 => $2 } <$fh>;
close $fh;
my $in = Bio::SeqIO->new( -file => 'goat300.fasta', -format => 'Fasta' );
while ( my $seq = $in->next_seq() ) {
my $seqID = $annot{ $seq->id } // $seq->id;
print "$seqID\n" . $seq->seq . "\n";
Output on your datasets:
tumor susceptibility
ubiquitin ligase e3a
serine threonine-protein phosphatase 4 catalytic subunit
bardet-biedl syndrome 2 protein P
succinyl- ligase
The hash %annot is initialized by reading and capturing the contents of your annot.txt data. A Bio::SeqIO object is created using your goat300.fasta file data. The while loop iterates through your fasta sequences. The variable $seqID either takes the associated value of the key in the %annot hash or it keeps the current sequence ID (the // notation means defined or, so that insures $seqID will be defined). Finally, the Fasta record is printed.
Hope this helps!
There were a lot of warnings in your code, and your approach was inefficient. Let me first show you a working Perl program. I'll explain afterwards.
use strict;
use warnings;
# Read the annotations file
print"Enter annotated file...\n";
# my $f1 = <STDIN>;
my $f1 = 'annot.txt';
open(my $fh_annotations, '<', $f1) or die "Can't open $f1";
my #annotfile = <$fh_annotations>;
close $fh_annotations;
# Read the sequence file
print"Enter sequence file...\n";
# my $f2 = <STDIN>;
my $f2 = 'goat300.fasta';
open(my $fh_genes, '<', $f2) or die "Can't open $f2";
my #seqfile = <$fh_genes>;
close $fh_genes;
# Process the annotations data
my %names; # this hash is going to hold the names
foreach my $line (#annotfile) {
chomp $line; # remove newline
my #fields = split /\t/, $line; # split into array
$names{$fields[0]} = $fields[1]; # save in the hash as key->value pair
# Process the sequence data
foreach my $line (#seqfile) {
# Look at each line
if ($line =~ m/>(.+)$/) {
# If there is a heading there, remember it...
if (exists $names{$1}) {
# ... check if we know a name for it and replace it in the line
$line =~ s/($1)/$names{$1}/;
# output the line (this would be done to another filehandle)
print $line;
This reads both files and saves them in memory, just like yours did. But instead of trying to build two arrays for the names, I went with a hash, which is a key/value pair. Think of it like an array with names instead of numbers and no particular sorting.
Once these names are set up, I can process the sequence file. I simply look at each line and check if there is a heading there, by looking for the > sign. If it's there (it goes into $1 because of the parenthesis), I look if we have a hash entry (with exists) in our %names hash. If we do, we can replace the heading with the proper name.
After that, we could write it out to a new file. I'm just printing it.
I've used a few other techniques. Unfortunately the literature people get in a BioPerl context is quite outdated. Please take this advice, it will make your live easier.
Always use strict and warnings. They will tell you about problems with your code.
Always declare your variables with my. This is not like other languages, where you need to set up a variable at the top of your problem. You can declare it where you need it. The vars only live in a certain scope, which means between the nearest enclosing { and } brackets, or block.
Use three-argument open and lexical file handles for security. Read more here.
Perl offers foreach as an alternative to the C for loop. In this case, it made things a lot easier.
One more thing about this program: While this example data was rather short, I believe your actual data might be a lot larger. Consider processing the sequence file while you read it so you do not run out of memory. There's no need to save all the lines, unless you want to do something else with them.
open my $fh_out, '>', $filename_out or die $!;
open my $fh_in, '<', $filename_in or die $!;
while (my $line = <$fh_in>) {
# do stuff with the line, like your regex
print $fh_out $line;
close $fh_in;
close $fh_out;

Reading file line by line iteration issue

I have the following simple piece of code (identified as the problem piece of code and extracted from a much larger program).
Is it me or can you see an obvious error in this code that it stopping it from matching against $variable and printing $found when it definitely should be doing?
Nothing is printed when I try to print $variable, and there are definitely matching lines in the file I am using.
The code:
if (defined $var) {
open (MESSAGES, "<$messages") or die $!;
my $theText = $mech->content( format => 'text' );
print "$theText\n";
foreach my $variable (<MESSAGES>) {
chomp ($variable);
print "$variable\n";
if ($theText =~ m/$variable/) {
print "FOUND\n";
I have located this as the point at which the error is occurring but cannot understand why?
There may be something I am totally overlooking as its very late?
Update I have since realised that I misread your question and this probably doesn't solve the problem. However the points are valid so I am leaving them here.
You probably have regular expression metacharacters in $variable. The line
if ($theText =~ m/$variable/) { ... }
should be
if ($theText =~ m/\Q$variable/) { ... }
to escape any that there are.
But are you sure you don't just want eq?
In addition, you should read from the file using
while (my $variable = <MESSAGES>) { ... }
as a for loop will unnecessarily read the entire file into memory. And please use a better name than $variable.
This works for me.. Am I missing the question at hand? You're just trying to match "$theText" to anything on each line in the file right?
use warnings;
use strict;
my $fh;
my $filename = $ARGV[0] or die "$0 filename\n";
open $fh, "<", $filename;
my $match_text = "whatever";
my $matched = '';
# I would use a while loop, out of habit here
#while(my $line = <$fh>) {
foreach my $line (<$fh>) {
$matched =
$line =~ m/$match_text/ ? "Matched" : "Not matched";
print $matched . ": " . $line;
close $fh
./ testfile
Not matched: this is some textfile
Matched: with a bunch of lines or whatever and
Not matched: whatnot....
Edit: Ah, I see.. Why don't you try printing before and after the "chomp()" and see what you get? That shouldn't be the issue, but it doesn't hurt to test each case..