I can't open a new file to write to in perl - perl

I have looked on several links and other questions to try and find a solution but I still can't open the file that I'm trying to open. This is the block of code that I can't get to function:
$filename = "Related Traits: Chromosome 1";
open ($output1, ">", "gwasfiles4/$filename".".txt");
$length1 = scalar(#chr1);
if ($length1 > 1) {
#chr1 = sort {$a <=> $b} #chr1;
for ($x = 0; $x <= $length1; $x++){
for ($y = $x + 1; $y <= $length1 - 1; $y++){
if (abs($chr1[$x] - $chr1[$y]) < 500000){
print $output1 "$chr1[$x]\t$chr1[$y]\n";
}
}
}
}
When I run this, I got the error:
print() on closed filehandle $output at file.pl line 94
Why is the file not opened?
The file now opens with this but is empty:
my #chr1;
my $filename = "Related_Traits_Chromosome_1_$ARGV[0]";
open (my $output1, '>', "gwasfiles4/$filename") or die $!;
my $length1 = scalar(#chr1);
if ($length1 > 1) {
#chr1 = sort {$a <=> $b} #chr1;
for (my $x = 0; $x <= $length1; $x++){
for (my $y = $x + 1; $y <= $length1 - 1; $y++){
if (abs($chr1[$x] - $chr1[$y]) < 500000){
print $output1 "$chr1[$x]\t$chr1[$y]\n";
}
}
}
}

use strict; and use warnings; should always be at the start of your program. Fix the errors they generate first, and you'll have better code.
So too - you should check the error code from open:
open my $output1, '>', "gwasfiles4/$filename.txt" or die $!;
This will print the error that open generated if it failed. I'll guess either gwasfiles4 doesn't exist, or your OS doesn't like the filename with an embedded :.
If strict and warnings don't help enough, use diagnostics; will give you yet another layer of information about the problem.
autodie is particularly useful - it puts that or die $! line after each open statement automatically. (and a few other things besides).
I would suggest as a point of style - enclose your lexical filehandle in {} when printing, as this makes it very clear it's a filehandle.
print {$output1} "$chr1[$x]\t$chr1[$y]\n";
Edit:
Following your changes, you have a completely different problem:
my #chr1; #create empty array
my $length1 = scalar(#chr1); # scalar here takes length, array is empty, therefore length is _always_ zero.
if ($length1 > 1) { #therefore never happens

I suggest that either the directory gwasfiles4 doesn't exist or you'rte working on a Windows system which doesn't allow colons : in its filenames

Related

Open (IN...) command failing possibly due to problems with naming

New to Perl and quite new to coding in general so I apologise if this is formatted terribly and an easy question! Trying simply to input somebody's elses code as a step in a larger project involving PRAAT. The code is designed to distinguish beats in speech rhythm, I've followed their nomenclature in file naming (on line 2) but the code won't move past line 13. Could anyone tell me why? Is it trying to open a directory called "intensities"? Additionally, is there anywhere else I may have to change the code, it is quite possibly reasonably old! Thank you very much!
#!/usr/local/bin/perl -w
scalar(#ARGV) == 1 or scalar(#ARGV) == 2 or die "Usage: getBeatsOneShot.pl someSoundFile <threshold>";
$stem = shift;
# Parameters to fiddle with
if (scalar(#ARGV) == 0) {
$threshold = 0.2;
} else {
$threshold = shift;
print "Threshold is $threshold\n";
}
open(IN, "intensities/$stem.intensity") or die "badly";
open(OUT, ">beats/$stem.beats") or die "eek";
# File type = "ooTextFile short"
$_ = <IN>; print OUT $_;
# replace "Intensity" with "TextGrid"
$_ = <IN>; print OUT "\"TextGrid\"\n\n";
# skip a line
$_ = <IN>;
chomp($xmin = <IN>);
chomp($xmax = <IN>);
chomp($nx = <IN>); $nx = 0; #(just suprress a arning here)
chomp($dx = <IN>);
chomp($x1 = <IN>);
# Read in intensity contour into #e (envelope)
#e = ();
while($_ = <IN>) { chomp; last unless $_ eq "1";}
push #e, $_;
while($_ = <IN>) {
chomp($_);
push #e, $_;
}
# (1) Find max and min
$max = 0; $min = 1000000;
foreach $ival (#e) {
if($ival > $max) {
$max = $ival;
}
if($ival < $min) {
$min = $ival;
}
}
# (2) look for beats
#beats = ();
print "Thresh: $threshold\n";
open doesn't create the path to the file. Directories intensities/ and beats/ therefore must exist in the current working directory before the script is run.
When open fails, it sets $! to the reason of the failure. Instead of eek or badly, use die $! so Perl can tell you what went wrong.
Moreover, you should turn strict and warnings on. They prevent many common mistakes. As a newbie, you might like to enable diagnostics, too, to get detailed explanations of all the errors and warnings.

perl - incorrect syntax when closing if statement

I'm trying to write a script that searches the BLAST output against a previously-generated file that gives the genomic positions of each GI number. However, I get three syntax errors related to closing IF statements. Being a novice at Perl, I am at a loss as to how to fix this. Can anyone help me? I have copied the code and labeled the offending closing braces. I did do a quick check to make sure all delimiters are balanced.
#!/usr/bin/perl -w
#decided to have input file entered in command line
#call program followed by genome name.
#the program assumes that a file with the extensions ptt and faa exist in the same dirctory.
#####INPUT Name of multiple seq file containing ORF of genome, open file and assign IN filehandle #############
unless(#ARGV==2) {die "usage: perl nucnums.pl BLAST_output_filename query.ref subject.ref\n\nSubject is the database you made with FormatDB or MakeBlastDB.\n\nQuery is the other file";}
$blastname=$ARGV[0];
$queryname=$ARGV[1];
$subjectname=$ARGV[2];
#nameparts=split(/\./, $blastname);
$ofilename="$blastname[0]"."pos";
open(INBLAST, "< $blastname") or die "cannot open $blastname:$!";
open(OUT, "> $ofilename") or die "cannot open $ofilename:$!";
$line=<INBLAST>;
print OUT $line;
while (defined ($line=<INBLAST>)){ # read through rest of table line by line
if ($line=/^g/){
#parts=split/\t/,$line;
#girefq=split/\|/,$parts[0];
$ginumq = ($girefq[1]);
$postartq = $parts[6];
#girefs=split/|/,$parts[1];
$ginums = ($girefs[1]);
$postarts = $parts[8];
open(INQUER, "< $queryname") or die "cannot open $queryname:$!";
open(INSUBJ, "< $subjectname") or die "cannot open $subjectname:$!";
SCOOP: while (defined ($locq=<INQUER>)){
#locsq=split/\t/,$locq
if $locsq[0] = $ginumq{
$posq = $locsq[1] + $postartq - 1;
} # <- Syntax error
}
close(INQUER);
SLOOP: while (defined ($locs=<INSUBJ>)){
#locsq=split/\t/,$locs
if $locss[0] = $ginums {
$poss = $locss[1] + $postarts - 1;
} # <- Syntax error
}
close(INSUBJ);
print "$ginumq at position $posq matches with $ginums at position $poss \n" or die "Failed to find a match, try changing the order of the REF files";
print OUT "$ginumq\t$posq\t$ginums\t$poss\t$parts[2]\t$parts[3]\t$parts[4]\t$parts[5]\t$parts[6]\t$parts[7]\t$parts[8]\t$parts[9]\t$parts[10]\t$parts[11]\t$parts[12]\n";
} # <- Syntax error
}
close(OUT);
close(INBLAST);
You need to change:
#locsq=split/\t/,$locs
if $locss[0] = $ginums {
to something like:
#locsq=split/\t/,$locs;
if ($locss[0] == $ginums) {
Same goes for $locsq[0]. If you are comparing strings instead of numbers, use eq instead of ==.
UPDATE: Thanks to Zaid for pointing out the missing semicolons.
#locsq=split/\t/,$locs
if $locss[0] = $ginums {
$poss = $locss[1] + $postarts - 1;
} # <- Syntax error
You get a misleading error message here because the first part of that is syntactically valid, though not what you intend. The missing semicolon on the first line means that the first part of the above is parsed as a postfix if (since the condition in a postfix if doesn't require parentheses):
#locsq=split/\t/,$locs if $locss[0] = $ginums
More than that, the part starting with $ginums{ is parsed as a reference to an element of a hash. (There's no %ginums hash, but that error would be reported after any syntax errors):
#locsq=split/\t/,$locs if $locss[0] = $ginums{$poss = $locss[1] + $postarts - 1;}
where $poss = $locss[1] + $postarts - 1; is taken as the hash key.
You only got a syntax error message because of the semicolon preceding the }. If you had omitted that semicolon, you probably would have gotten a complaint about %ginums being nonexistent (assumimg you have use strict; use warnings;; without that you might not get a warning at all).
This is one of those cases where a typo can transform a piece of code into something that's valid (or, in this case almost valid) but that doesn't mean anything like what you intended.
Add a semicolon to the end of the first line, and parenthesize the if condition.
It looks like you have written the entire program before doing any testing on it. That isn't the way to go. You should write small sections and test that they work in isolation before adding to them or assembling them into the complete program.
use warnings is preferable to putting -w on the #! line.
There are also a number of errors in this program that would have been highlighted if you had added use strict at the top of your program and declared all your variables using my. For instance, you write
$ofilename = "$blastname[0]" . "pos";
but there is no #blastname array. $ofilename will end up always containing just pos, but use strict wouldn't have let you run the program in this condition.
You also write (what I presume is supposed to be)
my #locsq = split /\t/, $locs;
if ($locss[0] = $ginums) {
$poss = $locss[1] + $postarts - 1;
}
and, again, there is no #locss array, so this if will never be executed unless $ginums is an empty string.
I recommend you at least take a look at this rewrite of your program, which uses commonly-accepted good practice, and I hope you will agree is more readable.
There is a problem with your final print statement, as print to the console always returns true so the die will never get executed, but I don't understand enough about what you are doing to fix it.
use strict;
use warnings;
unless (#ARGV == 2) {
die <<END;
usage: perl nucnums.pl BLAST_output_filename query.ref subject.ref
Subject is the database you made with FormatDB or MakeBlastDB.
Query is the other file
END
}
my ($blastname, $queryname, $subjectname) = #ARGV;
my #nameparts = split /\./, $blastname;
my $ofilename = "${blastname}pos";
open my $inblast, '<', $blastname or die "cannot open $blastname: $!";
open my $out, '>', $ofilename or die "cannot open $ofilename: $!";
print $out scalar <$inblast>;
while (my $line = <$inblast>) {
next unless $line =~ /^g/;
my #parts = split /\t/, $line;
my #girefq = split /\|/, $parts[0];
my $ginumq = $girefq[1];
my $postartq = $parts[6];
my #girefs = split /|/, $parts[1];
my $ginums = $girefs[1];
my $postarts = $parts[8];
my ($posq, $poss);
open my $inquer, '<', $queryname or die "cannot open $queryname: $!";
while (my $locq = <$inquer>) {
my #locsq = split /\t/, $locq;
if ($locsq[0] = $ginumq) {
$posq = $locsq[1] + $postartq - 1;
}
}
close($inquer);
open my $insubj, '<', $subjectname or die "cannot open $subjectname: $!";
while (my $locs = <$insubj>) {
my #locss = split /\t/, $locs;
if ($locss[0] = $ginums) {
$poss = $locss[1] + $postarts - 1;
}
}
close($insubj);
print "$ginumq at position $posq matches with $ginums at position $poss \n"
or die "Failed to find a match, try changing the order of the REF files";
print $out join("\t", $ginumq, $posq, $ginums, $poss, #parts[2..12]), "\n";
}
close $inblast;
close $out or die $!;

Perl module to find out whether a word is a verb/noun/adjective/article/preposition

I have a list of words and I want to group them into different groups depending on whether they are verbs/adjectives/nouns/etc. So, basically I am looking for a Perl module which tells whether a word is verb/noun etc.
I googled but couldn't find what I was looking for. Thanks.
Lingua::EN::Tagger, Lingua::EN::Semtags::Engine, Lingua::EN::NamedEntity
See the Lingua::EN:: namespace in CPAN. Specifically, Link Grammar and perhaps Lingua::EN::Tagger can help you. Also WordNet provides that kind of information and you can query it using this perl module.
follow code perl help you to find all this thing in your text file in your folder only give the path of directory and it will process all file at once and save result in report.txt file strong text
#!/usr/local/bin/perl
# for loop execution
# Perl Program to calculate Factorial
sub fact
{
# Retriving the first argument
# passed with function calling
my $x = $_[0];
my #names = #{$_[1]};
my $length = $_[2];
# checking if that value is 0 or 1
if ($x < $length)
{
#print #names[$x],"\n";
use Lingua::EN::Fathom;
my $text = Lingua::EN::Fathom->new();
# Analyse contents of a text file
$dirlocation="./2015/";
$path =$dirlocation.$names[$x];
$text->analyse_file($path); # Analyse contents of a text file
$accumulate = 1;
# Analyse contents of a text string
$text->analyse_block($text_string,$accumulate);
# TO Do, remove repetition
$num_chars = $text->num_chars;
$num_words = $text->num_words;
$percent_complex_words = $text->percent_complex_words;
$num_sentences = $text->num_sentences;
$num_text_lines = $text->num_text_lines;
$num_blank_lines = $text->num_blank_lines;
$num_paragraphs = $text->num_paragraphs;
$syllables_per_word = $text->syllables_per_word;
$words_per_sentence = $text->words_per_sentence;
# comment needed
%words = $text->unique_words;
foreach $word ( sort keys %words )
{
# print("$words{$word} :$word\n");
}
$fog = $text->fog;
$flesch = $text->flesch;
$kincaid = $text->kincaid;
use strict;
use warnings;
use 5.010;
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh $text->report;
close $fh;
say 'done';
print($text->report);
$x = $x+1;
fact($x,\#names,$length);
}
# Recursively calling function with the next value
# which is one less than current one
else
{
done();
}
}
# Driver Code
$a = 0;
#names = ("John Paul", "Lisa", "Kumar","touqeer");
opendir DIR1, "./2015" or die "cannot open dir: $!";
my #default_files= grep { ! /^\.\.?$/ } readdir DIR1;
$length = scalar #default_files;
print $length;
# Function call and printing result after return
fact($a,\#default_files,$length);
sub done
{
print "Done!";
}

Why does my Perl script keep reading from same file, even though I closed it?

I'm writing this Perl script that gets two command line arguments: a directory and a year. In this directory is a ton of text files or html files(depending on the year). Lets say for instance it's the year 2010 which contains files that look like this <number>rank.html with the number ranging from 2001 to 2212. I want it to open each file individually and take a part of the title in the html file and print it to a text file. However, when I run my code it just prints the first files title to the text file. It seems that it only ever opens the first file 2001rank.html and no others. I'll post the code below and thanks to anyone that helps.
my $directory = shift or "Must supply directory\n";
my $year = shift or "Must supply year\n";
unless (-d $directory) {
die "Error: Directory must be a directory\n";
}
unless ($directory =~ m/\/$/) {
$directory = "$directory/";
}
open COLUMNS, "> columns$year.txt" or die "Can't open columns file";
my $column_name;
for (my $i = 2001; $i <= 2212; $i++) {
if ($year >= 2009) {
my $html_file = $directory.$i."rank.html";
open FILE, $html_file;
#check if opened correctly, if not, skip it
unless (defined fileno(FILE)) {
print "skipping $html_file\n";
next;
}
$/ = "\n";
my $line = <FILE>;
if (defined $line) {
$column_name = "";
$_ = <FILE> until m{</title>};
$_ =~ m{<title>CIA - The World Factbook -- Country Comparison :: (.+)</title>}i;
$column_name = $1;
}
else {
close FILE;
next;
}
close FILE;
}
else {
my $text_file = $directory.$i."rank.txt";
open FILE, $text_file;
unless (defined fileno(FILE)) {
print "skipping $text_file\n";
next;
}
$/ = "\r";
my $line = <FILE>;
if (defined $line) {
$column_name = "";
$_ = <FILE> until /Rank/i;
$_ =~ /Rank(\s+)Country(\s+)(.+)(\s+)Date/i;
$column_name = $3;
}
else {
close FILE;
next;
}
close FILE;
}
print "Adding $column_name to text file\n";
print COLUMNS "$column_name\n";
}
close COLUMNS;
In other words $column_name gets set equal to the same thing every pass in the loop, even though I know the html files are different.
You'll probably be able to debug this a lot faster if you convert using local lexicals for your filehandles instead of globals, as well as turn on strict checking:
use strict;
use warnings;
while (...)
{
# ...
open my $filehandle, $html_file;
# ...
my $line = <$filehandle>;
}
This way, the filehandle(s) will go out of scope during each loop iteration, so you can more clearly see what exactly is being referenced and where. (Hint: you may have missed a condition where the filehandle gets closed, so it is improperly reused the next time around.)
For more on best practices with open and filehandles, see:
Why is three-argument open calls with autovivified filehandles a Perl best practice?
What's the best way to open and read a file in Perl?
Some other points:
Don't ever explicitly assign to $_, that's asking for trouble. Declare your own variable to hold your data: my $line = <$filehandle> (as in the example above)
Pull out your matches directly into variables, rather than using $1, $2 etc, and only use parentheses for the portions you actually need: my ($column_name) = ($line =~ m/Rank\s+Country\s+.+(\s+)Date/i);
put the error conditions first, so the bulk of your code can be outdented one (or more) level(s). This will improve readability, as when the bulk of your algorithm is visible on the screen at once, you can better visualize what it is doing and catch errors.
If you apply the points above I'm pretty sure that you'll spot your error. I spotted it while making this last edit, but I think you'll learn more if you discover it yourself. (I'm not trying to be snooty; trust me on this!)
Your processing is similar for HTML and text files, so make your life easy and factor out the common part:
sub scrape {
my($path,$pattern,$sep) = #_;
unless (open FILE, $path) {
warn "$0: skipping $path: $!\n";
return;
}
local $/ = $sep;
my $column_name;
while (<FILE>) {
next unless /$pattern/;
$column_name = $1;
last;
}
close FILE;
($path,$column_name);
}
Then make it specific for the two types of input:
sub scrape_html {
my($directory,$i) = #_;
scrape $directory.$i."rank.html",
qr{<title>CIA - The World Factbook -- Country Comparison :: (.+)</title>}i,
"\n";
}
sub scrape_txt {
my($directory,$i) = #_;
scrape $directory.$i."rank.txt",
qr/Rank\s+Country\s+(.+)\s+Date/i,
"\r";
}
Then your main program is straightforward:
my $directory = shift or die "$0: must supply directory\n";
my $year = shift or die "$0: must supply year\n";
die "$0: $directory is not a directory\n"
unless -d $directory;
# add trailing slash if necessary
$directory =~ s{([^/])$}{$1/};
my $columns_file = "columns$year.txt";
open COLUMNS, ">", $columns_file
or die "$0: open $columns_file: $!";
for (my $i = 2001; $i <= 2212; $i++) {
my $process = $year >= 2009 ? \&scrape_html : \&scrape_txt;
my($path,$column_name) = $process->($directory,$i);
next unless defined $path;
if (defined $column_name) {
print "$0: Adding $column_name to text file\n";
print COLUMNS "$column_name\n";
}
else {
warn "$0: no column name in $path\n";
}
}
close COLUMNS or warn "$0: close $columns_file: $!\n";
Note how careful you have to be to close global filehandles. Please use lexical filehandles as in
open my $fh, $path or die "$0: open $path: $!";
Passing $fh as a parameter or stuffing it in hashes is much nicer. Also, lexical filehandles close automatically when they go out of scope. There's no chance of stomping on a handle someone else is already using.
Have you considered grep?
grep out just the line from the HTML containing the title, and then process the output of grep.
Simpler, as you would not have to write any file-handling code. You didn't say what you want with that title - if you only need a list, you might not need to write any code at all.
Try something like:
grep -ri title <directoryname>

How can I read from a Perl filehandle that is an array element?

I quickly jotted off a Perl script that would average a few files with just columns of numbers. It involves reading from an array of filehandles. Here is the script:
#!/usr/local/bin/perl
use strict;
use warnings;
use Symbol;
die "Usage: $0 file1 [file2 ...]\n" unless scalar(#ARGV);
my #fhs;
foreach(#ARGV){
my $fh = gensym;
open $fh, $_ or die "Unable to open \"$_\"";
push(#fhs, $fh);
}
while (scalar(#fhs)){
my ($result, $n, $a, $i) = (0,0,0,0);
while ($i <= $#fhs){
if ($a = <$fhs[$i]>){
$result += $a;
$n++;
$i++;
}
else{
$fhs[$i]->close;
splice(#fhs,$i,1);
}
}
if ($n){ print $result/$n . "\n"; }
}
This doesn't work. If I debug the script, after I initialize #fhs it looks like this:
DB<1> x #fhs
0 GLOB(0x10443d80)
-> *Symbol::GEN0
FileHandle({*Symbol::GEN0}) => fileno(6)
1 GLOB(0x10443e60)
-> *Symbol::GEN1
FileHandle({*Symbol::GEN1}) => fileno(7)
So far, so good. But it fails at the part where I try to read from the file:
DB<3> x $fhs[$i]
0 GLOB(0x10443d80)
-> *Symbol::GEN0
FileHandle({*Symbol::GEN0}) => fileno(6)
DB<4> x $a
0 'GLOB(0x10443d80)'
$a is filled with this string rather than something read from the glob. What have I done wrong?
You can only use a simple scalar variable inside <> to read from a filehandle. <$foo> works. <$foo[0]> does not read from a filehandle; it's actually equivalent to glob($foo[0]). You'll have to use the readline builtin, a temporary variable, or use IO::File and OO notation.
$text = readline($foo[0]);
# or
my $fh = $foo[0]; $text = <$fh>;
# or
$text = $foo[0]->getline; # If using IO::File
If you weren't deleting elements from the array inside the loop, you could easily use a temporary variable by changing your while loop to a foreach loop.
Personally, I think using gensym to create filehandles is an ugly hack. You should either use IO::File, or pass an undefined variable to open (which requires at least Perl 5.6.0, but that's almost 10 years old now). (Just say my $fh; instead of my $fh = gensym;, and Perl will automatically create a new filehandle and store it in $fh when you call open.)
If you are willing to use a bit of magic, you can do this very simply:
use strict;
use warnings;
die "Usage: $0 file1 [file2 ...]\n" unless #ARGV;
my $sum = 0;
# The current filehandle is aliased to ARGV
while (<>) {
$sum += $_;
}
continue {
# We have finished a file:
if( eof ARGV ) {
# $. is the current line number.
print $sum/$. , "\n" if $.;
$sum = 0;
# Closing ARGV resets $. because ARGV is
# implicitly reopened for the next file.
close ARGV;
}
}
Unless you are using a very old perl, the messing about with gensym is not necessary. IIRC, perl 5.6 and newer are happy with normal lexical handles: open my $fh, '<', 'foo';
I have trouble understanding your logic. Do you want to read several files, which just contains numbers (one number per line) and print its average?
use strict;
use warnings;
my #fh;
foreach my $f (#ARGV) {
open(my $fh, '<', $f) or die "Cannot open $f: $!";
push #fh, $fh;
}
foreach my $fh (#fh) {
my ($sum, $n) = (0, 0);
while (<$fh>) {
$sum += $_;
$n++;
}
print "$sum / $n: ", $sum / $n, "\n" if $n;
}
Seems like a for loop would work better for you, where you could actually use the standard read (iteration) operator.
for my $fh ( #fhs ) {
while ( defined( my $line = <$fh> )) {
# since we're reading integers we test for *defined*
# so we don't close the file on '0'
#...
}
close $fh;
}
It doesn't look like you want to shortcut the loop at all. Therefore, while seems to be the wrong loop idiom.