I'm trying to write a script that searches the BLAST output against a previously-generated file that gives the genomic positions of each GI number. However, I get three syntax errors related to closing IF statements. Being a novice at Perl, I am at a loss as to how to fix this. Can anyone help me? I have copied the code and labeled the offending closing braces. I did do a quick check to make sure all delimiters are balanced.
#!/usr/bin/perl -w
#decided to have input file entered in command line
#call program followed by genome name.
#the program assumes that a file with the extensions ptt and faa exist in the same dirctory.
#####INPUT Name of multiple seq file containing ORF of genome, open file and assign IN filehandle #############
unless(#ARGV==2) {die "usage: perl nucnums.pl BLAST_output_filename query.ref subject.ref\n\nSubject is the database you made with FormatDB or MakeBlastDB.\n\nQuery is the other file";}
$blastname=$ARGV[0];
$queryname=$ARGV[1];
$subjectname=$ARGV[2];
#nameparts=split(/\./, $blastname);
$ofilename="$blastname[0]"."pos";
open(INBLAST, "< $blastname") or die "cannot open $blastname:$!";
open(OUT, "> $ofilename") or die "cannot open $ofilename:$!";
$line=<INBLAST>;
print OUT $line;
while (defined ($line=<INBLAST>)){ # read through rest of table line by line
if ($line=/^g/){
#parts=split/\t/,$line;
#girefq=split/\|/,$parts[0];
$ginumq = ($girefq[1]);
$postartq = $parts[6];
#girefs=split/|/,$parts[1];
$ginums = ($girefs[1]);
$postarts = $parts[8];
open(INQUER, "< $queryname") or die "cannot open $queryname:$!";
open(INSUBJ, "< $subjectname") or die "cannot open $subjectname:$!";
SCOOP: while (defined ($locq=<INQUER>)){
#locsq=split/\t/,$locq
if $locsq[0] = $ginumq{
$posq = $locsq[1] + $postartq - 1;
} # <- Syntax error
}
close(INQUER);
SLOOP: while (defined ($locs=<INSUBJ>)){
#locsq=split/\t/,$locs
if $locss[0] = $ginums {
$poss = $locss[1] + $postarts - 1;
} # <- Syntax error
}
close(INSUBJ);
print "$ginumq at position $posq matches with $ginums at position $poss \n" or die "Failed to find a match, try changing the order of the REF files";
print OUT "$ginumq\t$posq\t$ginums\t$poss\t$parts[2]\t$parts[3]\t$parts[4]\t$parts[5]\t$parts[6]\t$parts[7]\t$parts[8]\t$parts[9]\t$parts[10]\t$parts[11]\t$parts[12]\n";
} # <- Syntax error
}
close(OUT);
close(INBLAST);
You need to change:
#locsq=split/\t/,$locs
if $locss[0] = $ginums {
to something like:
#locsq=split/\t/,$locs;
if ($locss[0] == $ginums) {
Same goes for $locsq[0]. If you are comparing strings instead of numbers, use eq instead of ==.
UPDATE: Thanks to Zaid for pointing out the missing semicolons.
#locsq=split/\t/,$locs
if $locss[0] = $ginums {
$poss = $locss[1] + $postarts - 1;
} # <- Syntax error
You get a misleading error message here because the first part of that is syntactically valid, though not what you intend. The missing semicolon on the first line means that the first part of the above is parsed as a postfix if (since the condition in a postfix if doesn't require parentheses):
#locsq=split/\t/,$locs if $locss[0] = $ginums
More than that, the part starting with $ginums{ is parsed as a reference to an element of a hash. (There's no %ginums hash, but that error would be reported after any syntax errors):
#locsq=split/\t/,$locs if $locss[0] = $ginums{$poss = $locss[1] + $postarts - 1;}
where $poss = $locss[1] + $postarts - 1; is taken as the hash key.
You only got a syntax error message because of the semicolon preceding the }. If you had omitted that semicolon, you probably would have gotten a complaint about %ginums being nonexistent (assumimg you have use strict; use warnings;; without that you might not get a warning at all).
This is one of those cases where a typo can transform a piece of code into something that's valid (or, in this case almost valid) but that doesn't mean anything like what you intended.
Add a semicolon to the end of the first line, and parenthesize the if condition.
It looks like you have written the entire program before doing any testing on it. That isn't the way to go. You should write small sections and test that they work in isolation before adding to them or assembling them into the complete program.
use warnings is preferable to putting -w on the #! line.
There are also a number of errors in this program that would have been highlighted if you had added use strict at the top of your program and declared all your variables using my. For instance, you write
$ofilename = "$blastname[0]" . "pos";
but there is no #blastname array. $ofilename will end up always containing just pos, but use strict wouldn't have let you run the program in this condition.
You also write (what I presume is supposed to be)
my #locsq = split /\t/, $locs;
if ($locss[0] = $ginums) {
$poss = $locss[1] + $postarts - 1;
}
and, again, there is no #locss array, so this if will never be executed unless $ginums is an empty string.
I recommend you at least take a look at this rewrite of your program, which uses commonly-accepted good practice, and I hope you will agree is more readable.
There is a problem with your final print statement, as print to the console always returns true so the die will never get executed, but I don't understand enough about what you are doing to fix it.
use strict;
use warnings;
unless (#ARGV == 2) {
die <<END;
usage: perl nucnums.pl BLAST_output_filename query.ref subject.ref
Subject is the database you made with FormatDB or MakeBlastDB.
Query is the other file
END
}
my ($blastname, $queryname, $subjectname) = #ARGV;
my #nameparts = split /\./, $blastname;
my $ofilename = "${blastname}pos";
open my $inblast, '<', $blastname or die "cannot open $blastname: $!";
open my $out, '>', $ofilename or die "cannot open $ofilename: $!";
print $out scalar <$inblast>;
while (my $line = <$inblast>) {
next unless $line =~ /^g/;
my #parts = split /\t/, $line;
my #girefq = split /\|/, $parts[0];
my $ginumq = $girefq[1];
my $postartq = $parts[6];
my #girefs = split /|/, $parts[1];
my $ginums = $girefs[1];
my $postarts = $parts[8];
my ($posq, $poss);
open my $inquer, '<', $queryname or die "cannot open $queryname: $!";
while (my $locq = <$inquer>) {
my #locsq = split /\t/, $locq;
if ($locsq[0] = $ginumq) {
$posq = $locsq[1] + $postartq - 1;
}
}
close($inquer);
open my $insubj, '<', $subjectname or die "cannot open $subjectname: $!";
while (my $locs = <$insubj>) {
my #locss = split /\t/, $locs;
if ($locss[0] = $ginums) {
$poss = $locss[1] + $postarts - 1;
}
}
close($insubj);
print "$ginumq at position $posq matches with $ginums at position $poss \n"
or die "Failed to find a match, try changing the order of the REF files";
print $out join("\t", $ginumq, $posq, $ginums, $poss, #parts[2..12]), "\n";
}
close $inblast;
close $out or die $!;
Related
I want to create output file that has values from file 1 and file 2.
The line from file 1:
chr1 Cufflinks exon 708356 708487 1000 - .
gene_id "CUFF.3"; transcript_id "CUFF.3.1"; exon_number "5"; FPKM
"3.1300591420"; frac "1.000000"; conf_lo "2.502470"; conf_hi
"3.757648"; cov "7.589085"; chr1Cufflinks exon 708356
708487 . - . gene_id "XLOC_001284"; transcript_id
"TCONS_00007667"; exon_number "7"; gene_name "LOC100288069"; oId
"CUFF.15.2"; nearest_ref "NR_033908"; class_code "j"; tss_id
"TSS2981";
The line from file 2:
CUFF.48557
chr4:160253850-160259462:160259621-160260265:160260507-160262715
The second column from this file is unique id (uniq_id).
I want to get output file in the following format:
transcript_id(CUFF_id) uniq_id gene_id(XLOC_ID) FPKM
My script takes XLOC_ID and FPKM values from first file and print them together with two columns from the second file.
#!/usr/bin/perl -w
use strict;
my $v_merge_gtf = shift #ARGV or die $!;
my $unique_gtf = shift #ARGV or die $!;
my %fpkm_hash;
my %xloc_hash;
open (FILE, "$v_merge_gtf") or die $!;
while (<FILE>) {
my $line = $_;
chomp $line;
if ($line =~ /[a-z]/) {
my #array = split("\t", $line);
if ($array[2] eq 'exon') {
my $id = $array[8];
if ($id =~ /transcript_id \"(CUFF\S+)/) {
$id = $1;
$id =~ s/\"//g;
$id =~ s/;//;
}
my $fpkm = $array[8];
if ($fpkm =~ /FPKM \"(\S+)/) {
$fpkm = $1;
$fpkm =~ s/\"//g;
$fpkm =~ s/;//;
}
my $xloc = $array[17];
if ($xloc =~ /gene_id \"(XLOC\S+)/) {
$xloc = $1;
$xloc =~ s/\"//g;
$xloc =~ s/;//;
}
$fpkm_hash{$id} = $fpkm;
$xloc_hash{$id} = $xloc;
}
}
}
close FILE;
open (FILE, "$unique_gtf") or die $!;
while (<FILE>) {
my $line = $_;
chomp $line;
if ($line =~ /[a-z]/) {
my #array = split("\t", $line);
my $id = $array[0];
my $uniq = $array[1];
print $id . "\t" . $uniq . "\t" . $xloc_hash{$id} . "\t" . $fpkm_hash{$id} . "\n";
}
}
close FILE;
I initialized hashes outside of the files, but I get the following error for each CUFF values:
CUFF.24093
chr17:3533641-3539345:3527526-3533498:3526786-3527341:3524707-3526632
Use of uninitialized value in concatenation (.) or string at ex_1.pl
line 55, line 9343.
Use of uninitialized value in concatenation (.) or string at ex_1.pl
line 55, line 9343.
How can I fix this issue?
Thank you!
I think the warning message is because the $id key, (CUFF.24093), you get on line 9343 of the second file isn't contained in the hashes you created in the first file.
Is it possible that an ID in the second file isn't contained in the first file? That seems to be the case here.
If so, and you just want to skip over this unknown ID, you could add a line to your program like:
my $id = $array[0];
my $uniq = $array[1];
next unless exists $fpkm_hash{$id}; # add this line
print $id . "\t" . $uniq . "\t" . $xloc_hash{$id} . "\t" . $fpkm_hash{$id} . "\n";
This will bypass the following print statement and go back to the top of the while loop and read in the next line and continue processing.
It depends on what action you want to take if you encounter an unknown ID.
Update: I thought I might make some observations/improvements to your code.
my $v_merge_gtf = shift #ARGV or die $!;
my $unique_gtf = shift #ARGV or die $!;
The error variable $! serves no purpose here (this is a fact I only recently discovered even after 14 years using Perl). $! is only set for system calls, (where you are involving the operating system).The most common are open and close for files, and opendir and closedir for directories. If an error occurs in opening/closing a file or a directory, $! will contain the error message. (See in my included code how I handled this - I created a message, $usage to be printed if the shift didn't succeed.
Instead of using 2 hashes to store the information, I used 1 hash,%data. The advantage is that it will use less memory, (because its only storing 1 set of keys instead of 2), Though, you could use the 2 if you like.
I used the recommended 3 argument (filehandle, mode, filename) form for opening the files. The 2 argument approach you used is outdated and less safe (for reasons I won't go into detail here). Also, the lexical filehandles I used, my $mrg and my $unique are the newer ways to create filehandles (instead of usingFILEfor your 2 opens).
You can directly assign to $linein your while loop like while (my $line = <FILE>) instead of the way you did it. In my sample program, I didn't assign to $line, but instead relied on the default variable $_. (It simplifies the 2 following statements, next unless /\S/; my #array = split /\t/;). I didn't chomp for the first file because you're only parsing inside the string and aren't using anything from the end of the string.chomp is necessary for the second while loop because the second variable my $uniq = ... would have a newline at its end if it wasn't removed by chomp.
I didn't know what you meant by this statement, if ($line =~ /[a-z]/). I am assuming you wanted to check for empty lines and only process lines with non-space data. That's why I wrote next unless /\S/;instead. (says to skip the following statements and got to the top of the while loop and read the next record).
Your first while loop worked because you had no errors in your input file. If there had errors, the way you wrote the code could have been a problem.
The statementmy $id = $array[8]; gives $id a value that would have been wrongly used if the following if statement had been false. (The same thing for the 2 other variables you want to capture,$fpkm and $xloc). You can see in my code example how I handled this.
In my code, I died if the match didn't succeed, You might not want todie but say match or next to try the next line of data. It depends on how you would want to handle a failed match.
And in this line$array[8] =~ /gene_id "(CUFF\S+)";/, Note that I put the ";following the captured data, so there is no need to remove it from the captured data (as you did in your substitutions)
Well, I know this is a long comment on your code, but I hope you get some good ideas about why I recommended the changes given.
or die "Could not find ID in $v_merge_gtf (line# $.)";
$. is the line number of the file being read.
#!/usr/bin/perl
use warnings;
use strict;
my $usage = "USAGE: perl $0 merge_gtf_file unique_gtf_file\n";
my $v_merge_gtf = shift #ARGV or die $usage;
my $unique_gtf = shift #ARGV or die $usage;
my %data;
open my $mrg, '<', $v_merge_gtf or die $!;
while (<$mrg>) {
next unless /\S/;
my #array = split /\t/;
if ($array[2] eq 'exon') {
$array[8] =~ /gene_id "(CUFF\S+)";/
or die "Could not find ID in $v_merge_gtf (line# $.)";
my $id = $1;
$array[8] =~ /FPKM "(\S+)";/
or die "Could not find FPKM in $v_merge_gtf (line# $.)";
my $fpkm = $1;
$array[17] =~ /gene_id "(XLOC\S+)";/
or die "Could not find XLOC in $v_merge_gtf (line# $.)";
my $xloc = $1;
$data{$id}{fpkm} = $fpkm;
$data{$id}{xloc} = $xloc;
}
}
close $mrg or die $!;
open my $unique, '<', $unique_gtf or die $!;
while (<$unique>) {
next unless /\S/;
chomp;
my ($id, $uniq) = split /\t/;
print join("\t", $id, $uniq, $data{$id}{fpkm}, $data{$id}{xloc}), "\n";
}
close $unique or die $!;
I need the decrypting to be in ROT-25 which i think I already have set up. Next it needs to decrypt a file read in from the command line and that's where my problem is. I'm guessing it would have to be run like perl filename anyfile.txt but how do i set this up?
#!/Strawberry/perl/bin/perl
use v5.14;
my ($file1) = #ARGV;
open my $fh1, '<', $file1;
while (<$fh1>) {
sub encode_decode {
shift =~ tr/A-Za-z/Z-ZA-Yz-za-y/r;
}
my $enc = encode_decode();
my $dec = encode_decode($enc);
say "Enc: ", $enc;
say "Dec: ", $dec;
}
close $fh1;
There are several issue here. First, a function that uses the same logic to encode_decode() doesn't make sense for ROT25, only for ROT13. To create your initial encoded file, you can use Unix to do it:
echo "The secret of getting ahead is getting started -- Mark Twain" | tr "A-Za-z" "Z-ZA-Yz-za-y" > encoded_twain.txt
then run your program on encoded_twain.txt
Since you need to determine if "the" appears anywere in the text, reading the file in line by line isn't your best bet. You're better off reading it in as a single string and then both decoding and testing that.
Your decoder has to do the opposite of what it does now (encoding.)
Putting it all together, we get something like:
use English;
my $file_name = shift;
sub decode
{
return shift =~ tr/Z-ZA-Yz-za-y/A-Za-z/r;
}
open my $file_handle, '<', $file_name;
my $encoded = '';
{ # allow us to read entire file in as a string:
local $INPUT_RECORD_SEPARATOR = undef;
$encoded = <$file_handle>;
}
close $file_handle;
my $decoded = &decode($encoded);
if ($decoded =~ m/(^| )the /m) # make this more robust!
{
print($decoded);
}
Only a small change, either declare a variable for holding the current line (or use $_):
open my $fh1, '<', $file1;
while ( my $line = <$fh1> ) {
my $dec = decode( $line );
# say "Dec: ", $dec;
}
close $fh1;
You can test the decoded lines for "the". If it is found, open the file again and print all lines.
I am trying to both learn perl and use it in my research. I need to do a simple task which is counting the number of sequences and their lengths in a file such as follow:
>sequence1
ATCGATCGATCG
>sequence2
AAAATTTT
>sequence3
CCCCGGGG
The output should look like this:
sequence1 12
sequence2 8
sequence3 8
Total number of sequences = 3
This is the code I have written which is very crude and simple:
#!/usr/bin/perl
use strict;
use warnings;
my ($input, $output) = #ARGV;
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
while (<INFILE>) {
chomp;
if (/^>/)
{
my $number_of_sequences++;
}else{
my length = length ($input);
}
}
print length, number_of_sequences;
close (INFILE);
I'd be grateful if you could give me some hints, for example, in the else block, when I use the length function, I am not sure what argument I should pass into it.
Thanks in advance
You're printing out just the last length, not each sequence length, and you want to catch the sequence names as you go:
#!/usr/bin/perl
use strict;
use warnings;
my ($input, $output) = #ARGV;
my ($lastSeq, $number_of_sequences) = ('', 0);
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
# You never use OUTFILE
# open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
while (<INFILE>) {
chomp;
if (/^>(.+)/)
{
$lastSeq = $1;
$number_of_sequences++;
}
else
{
my $length = length($_);
print "$lastSeq $length\n";
}
}
print "Total number of sequences = $number_of_sequences\n";
close (INFILE);
Since you have indicated that you want feedback on your program, here goes:
my ($input, $output) = #ARGV;
open(INFILE, '<', $input) or die "Can't open $input, $!\n"; # Open a file for reading.
open(OUTFILE, '>', $output) or die "Can't open $output, $!"; # Open a file for writing.
Personally, I think when dealing with a simple input/output file relation, it is best to just use the diamond operator and standard output. That means that you read from the special file handle <>, commonly referred to as "the diamond operator", and you print to STDOUT, which is the default output. If you want to save the output in a file, just use shell redirection:
perl program.pl input.txt > output.txt
In this part:
my $number_of_sequences++;
you are creating a new variable. This variable will go out of scope as soon as you leave the block { .... }, in this case: the if-block.
In this part:
my length = length ($input);
you forgot the $ sigil. You are also using length on the file name, not the line you read. If you want to read a line from your input, you must use the file handle:
my $length = length(<INFILE>);
Although this will also include the newline in the length.
Here you have forgotten the sigils again:
print length, number_of_sequences;
And of course, this will not create the expected output. It will print something like sequence112.
Recommendations:
Use a while (<>) loop to read your input. This is the idiomatic method to use.
You do not need to keep a count of your input lines, there is a line count variable: $.. Though keep in mind that it will also count "bad" lines, like blank lines or headers. Using your own variable will allow you to account for such things.
Remember to chomp the line before finding out its length. Or use an alternative method that only counts the characters you want: my $length = ( <> =~ tr/ATCG// ) This will read a line, count the letters ATGC, return the count and discard the read line.
Summary:
use strict;
use warnings; # always use these two pragmas
my $count;
while (<>) {
next unless /^>/; # ignore non-header lines
$count++; # increment counter
chomp;
my $length = (<> =~ tr/ATCG//); # get length of next line
s/^>(\S+)/$1 $length\n/; # remove > and insert length
} continue {
print; # print to STDOUT
}
print "Total number is sequences = $count\n";
Note the use of continue here, which will allow us to skip a line that we do not want to process, but that will still get printed.
And as I said above, you can redirect this to a file if you want.
For starters, you need to change your inner loop to this:
...
chomp;
if (/^>/)
{
$number_of_sequences++;
$sequence_name = $_;
}else{
print "$sequence_name ", length($input), "\n";
}
...
Note the following:
The my declaration has been removed from $number_of_sequences
The sequence name is captured in the variable $sequence_name. It is used later when the next line is read.
To make the script run under strict mode, you can add my declarations for $number_of_sequences and $sequence_name outside of the loop:
my $sequence_name;
my $number_of_sequences = 0;
while (<INFILE>) {
...(as above)...
}
print "Total number of sequences: $number_of_sequences\n";
The my keyword declares a new lexically scoped variable - i.e. a variable which only exists within a certain block of code, and every time that block of code is entered, a new version of that variable is created. Since you want to have the value of $sequence_name carry over from one loop iteration to the next you need to place the my outside of the loop.
#!/usr/bin/perl
use strict;
use warnings;
my ($file, $line, $length, $tag, $count);
$file = $ARGV[0];
open (FILE, "$file") or print"can't open file $file\n";
while (<FILE>){
$line=$_;
chomp $line;
if ($line=~/^>/){
$tag = $line;
}
else{
$length = length ($line);
$count=1;
}
if ($count==1){
print "$tag\t$length\n";
$count=0
}
}
close FILE;
I have another question here, i have several dats and want to merge them. But the script first checks for header of all the DATs and if not matching it will throw error and stop the script. Now i want to run the script skipping the problematic dat and output the error in separate text file with list of errored DAts and reason. Could anyone please help on this. Here is what i have so far:
use strict;
my $rootdir = $ARGV[0];
die "usage: perl mergetxtfiles.pl <folder>" if ($#ARGV != 0);
#$rootdir =~ s/\\/\\\\/g;
print "\nFolder = $rootdir\n\n";
opendir(DIR, $rootdir)
or die "failed opening the directory $rootdir";
open(OF,">:utf8",'combined_'.time.'.dat')
or die "failed opening the file";
my $icr = 0;
my $cnt = 0;
my $header = '';
my $header_flag = 0;
while(my $fname = readdir(DIR)) {
# add extensions if needed
if ($fname =~ m/(\.txt)|(\.dat)|(\.csv)$/i) {
$icr++;
my $fnamepath = $rootdir.'\\'.$fname;
print "\($icr\) $fname\n";
open(IF, "<:utf8", $fnamepath)
or die "ERROR: cannot open the file\n$fnamepath ";
my $sep_icr = 0;
while(<IF>) {
my $line = $_;
chomp $line;
next if (/^$/);
$sep_icr++;
$cnt++;
my #ar = split(/\t/,$line);
if ($cnt == 1) {
$header_flag = 1;
$header = $line;
}
if ($sep_icr == 1 and $header_flag == 1) {
#print "$line \n $header\n";
if ($line ne $header) {
die "Headers are not same\n";
}
elsif (($line eq $header) and ($cnt >1)) {
print "INFO\: ignoring the same header for $fname \n";
$cnt--;
next;
}
}
print OF $line."\n";
}
print "\--Line count= $sep_icr\n\n";
close IF;
#print OF "\n";
}
}
print "\-\-\> Total line count= $cnt\n";
Named Loops
In your loop, we have to change your if-clause and the outer loop a bit:
FILE:
while(my $fname = readdir(DIR)) {
...;
if ($line ne $header) {
logger($fname, "Headers not matching");
next FILE;
}
...;
}
In Perl, loops can be labeled, so we can specify which loop we do next, instead of setting and checking flags. I used an example logging function loggeras given below, but you can substitute it with an appropriate print statement.
Logging
This is probably a bit more than asked, but here is a little logging function for flexibility. Arguments are a filename, a reason, and an optional severity. You can remove the severity code if it isn't needed. The severity is optional anyway and defaults to debug.
open my $logfile, ">>", "FILENAME" or die "..."; # open for append
sub logger {
my ($file, $reason, $severity) = (#_, 'debug');
$severity = {
debug => '',
info => 'INFO',
warn => '!WARN!',
fatal => '!!!ERROR!!!',
}->{$severity} // $severity; # transform the severity if it is a name we know
$severity .= ' ' if length $severity; # append space if we have a severity
print {$logfile} $severity . qq{$reason while processing "$file"\n};
}
If called with logger("./foo/bar", "Headers not matching", 'warn') it will output:
!WARN! Headers not matching while processing "./foo/bar"
Change the printed error message to something more machine-readable if needed.
Style tips and tricks:
If find these lines more elegant:
die "usage: ...\n" unless #ARGV;
my ($rootdir) = #ARGV;
note the newline at the end (supresses the "at line 3" etc). In scalar context, an array returns the array length. In the second line we can avoid array subscripting by assigning in list context. Surplus elements are ignored.
Instead
if ($fname =~ m/(\.txt)|(\.dat)|(\.csv)$/i) { ...; }
we can say
next unless $fname =~ m/(?: \.txt | \.dat | \.csv )$/xi;
and avoid unneccessary intendation, therefore improving readability.
I modified the regex so that all suffixes must come at the end, not only the .csv suffix, and added the /x modifier so that I can use non-semantic whitespace inside the regex.
Windows, and pretty much any OS, understand forward slashes in path names. So instead
my $fnamepath = $rootdir.'\\'.$fname;
we can write
my $fnamepath = "$rootdir/$fname";
I find that easier to write and understand.
The
while(<IF>) {
my $line = $_;
construct can be simplified to
while(my $line = <IF>) {...}
Last but not least, consider starting a habit of using filehandles with my. Often, global filehandles are not needed and can cause some bugs.
I'm trying to read from two files, and generate output in a third. I first wanted to edit the first one on the go but I didn't find a suitable method save for arrays.
My problem is that the third file (output) is empty whenever I uncomment the "_ref_param_handling" function. BUT the following is what puzzles me the most: If I do a UNIX very basic `cat` system call on the output file at then end (see code below), it works just fine. If I open the filehandle just before and close it right after editing, it also works fine (around my print FILEHANDLE LIST).
I undoubtedly am missing something here. Apart from a problem between my keyboard and my chair, what is it? A filehandle conflict? A scope problem?
Every variable is declared and has the value I want it to have.
Edit (not applicable anymore).
Using IO::File on the three files didn't change anything.
Edit 2 : New full subroutine code
My code works (except when my ref already exists, but that's because of the "append" mode i think) but there might be some mistakes and unperlish ways of coding (sorry, Monks). I, however, use Strict and warnings !
sub _ref_edit($) {
my $manda_def = "$dir/manda_def.list";
my $newrefhandle;
my $ref = $_[0];
(my $refout = $ref) =~ s/empty//;
my $refhandle;
my $parname = '';
my $parvalue = '';
my #val;
_printMan;
my $flush = readline STDIN; # Wait for <enter>
# If one or both of the ref. and the default values are missing
if ( !( -e $manda_def && -e $ref ) ) {
die "Cannot find $ref and/or $manda_def";
}
# Open needed files (ref & default)
open( $refhandle, "<", $ref ) or die "Cannot open ref $ref : $!";
open( $newrefhandle, ">>", $refout )
or die "Cannot open new ref $refout : $!";
# Read each line
while ( my $refline = <$refhandle> ) {
# If line read not an editable macro
if ( $refline =~ /^define\({{(.+)}},\s+{{.*__VALUE__.*}}\)/ ){
$parname = $1; # $1 = parameter name captured in regexp
# Prompt user
$parvalue = _ref_param_handling( $parname, $manda_def );
# Substitution in ref
$refline =~ s/__VALUE__/$parvalue/;
# Param not specified and no default value
$parvalue eq '' ? $refline=~s/__COM__/#/ : $refline=~s/__COM__//;
}
print $newrefhandle $refline;
}
close $newrefhandle;
close $refhandle;
return $refout;
} # End ref edit
the _ref_param_handle subroutine still is :
open( $mde, '<', $_[1] )
or die "Cannot open mandatory/default list $_[1] : $!";
# Read default/mandatory file list
while (<$mde>) {
( $name, $manda, $default, $match, $descript ) = split( /\s+/, $_, 5 );
next if ( $name !~ $ref_param ); # If param read differs from parname
(SOME IF/ELSE)
} # End while <MDE>
close $mde;
return $input;
}
Extract from manda_def file :
NAME Mandatory? Default Match Comm.
PORT y NULL ^\d+$ Database port
PROJECT y NULL \w{1,5} Project name
SERVER y NULL \w+ Server name
modemRouting n NULL .+
modlib y bin .+
modules y sms .+
Extract from ref_file :
define({{PORT}}, {{__VALUE__}})dnl
define({{PROJECT}}, {{__VALUE__}})dnl
define({{SERVER}}, {{__VALUE__}})dnl
define({{modemRouting}}, {{__COM__{{$0}} '__VALUE__'}})dnl
define({{modlib}}, {{__COM__{{$0}} '__VALUE__'}})dnl
define({{modules}}, {{__COM__{{$0}} '__VALUE__'}})dnl
Any help appreciated.
It is unclear what is initialising $refhandle, $newrefhandle and $mde. Depending on the values they have will affect the behaviour of open - i.e. whether it will close any filehandles before opening a new one.
I would suggest that you start using the IO::File interface to open/write to files, as this makes the job of filehandle management much easier, and will avoid any inadvertent closes. Something like...
use IO::File;
my $refhandle = IO::File->new("< $ref") or die "open() - $!";
$refhandle->print(...);
As far as editing files in place goes, this is a common pattern I use to achieve this, make sure of the -i behaviour of perl.
sub edit_file
{
my ($filename) = #_;
# you can re-create the one-liner above by localizing #ARGV as the list of
# files the <> will process, and localizing $^I as the name of the backup file.
local (#ARGV) = ($filename);
local($^I) = '.bak';
while (<>)
{
s/original string/new string/g;
}
continue
{
print;
}
}
try opening the second file handle for input outside the loop and pass a reference to the subroutine _ref_param_handle.Use seek function to seek file back to start.If your file is not too large you can also think of storing the content in an array and the accessing it instead of looping over same contents.
EDIT:
Here is a small example to support what I was trying to say above:
#!/usr/bin/perl -w
sub test
{
my $fh_to_read = $_[0] ;
my $fh_to_write = $_[1] ;
while(<$fh_to_read>)
{
print $fh_to_write $_ ;
}
seek($fh_to_read,0,0) ;
}
open(FH1,"<dummy1");
open(FH2,"<dummy2");
open(FH3,">dummy3");
while(<FH2>)
{
print FH3 "$_" ;
test(\*FH1,\*FH3);
}
Info about perl references
From what I gather, your script wants to convert a file in the following form:
define({{VAR1}}, {{__VALUE__}})
define({{VAR2}}, {{__VALUE__}})
define({{VAR3}}, {{__VALUE__}})
define({{VAR4}}, {{__VALUE__}})
to something like this:
define({{VAR1}}, {{}})
define({{VAR2}}, {{VALUE2}})
define({{VAR3}}, {{VALUE3}})
define({{VAR4}}, {{}})
The following works. I don't know what manda_def means, and also I didn't bother to create an actual variable replacement function.
#!/usr/bin/perl
use strict;
use warnings;
sub work {
my ($ref, $newref, $manda_def) = #_;
# Open needed files (ref & default)
open(my $refhandle, '<', $ref) or die "Cannot open ref $ref : $!";
open(my $newrefhandle, '>', $newref) or die "Cannot open new ref $newref: $!";
# Read each line
while (my $refline = <$refhandle>) {
# if line read is not an editable macro
if ($refline =~ /^define\({{(.+)}},\s+{{.*__VALUE__.*}}\)/){
my $parvalue = _ref_param_handling($1, $manda_def); # manda_def?
# Substitution in ref
$refline =~ s/__VALUE__/$parvalue/;
# Param not specified and no default value
$refline =~ s/__COM__/#/ if $parvalue eq '';
}
print $newrefhandle $refline;
}
close $newrefhandle;
close $refhandle;
return $newref;
}
sub _ref_param_handling {
my %parms = (VAR2 => 'VALUE2', VAR3 => 'VALUE3');
return $parms{$_[0]} if exists $parms{$_[0]};
}
work('ref.txt', 'newref.txt', 'manda.txt');
Guys, I seriously consider hanging myself with my wireless mouse.
My script never failed. I just didn't ran it through the end (it's actually a very long parameter list). The printing is just done as soon as the filehandle is closed (or so I guessed)...
/me *cries*
I've spent 24 hours on this...