Open (IN...) command failing possibly due to problems with naming - perl

New to Perl and quite new to coding in general so I apologise if this is formatted terribly and an easy question! Trying simply to input somebody's elses code as a step in a larger project involving PRAAT. The code is designed to distinguish beats in speech rhythm, I've followed their nomenclature in file naming (on line 2) but the code won't move past line 13. Could anyone tell me why? Is it trying to open a directory called "intensities"? Additionally, is there anywhere else I may have to change the code, it is quite possibly reasonably old! Thank you very much!
#!/usr/local/bin/perl -w
scalar(#ARGV) == 1 or scalar(#ARGV) == 2 or die "Usage: getBeatsOneShot.pl someSoundFile <threshold>";
$stem = shift;
# Parameters to fiddle with
if (scalar(#ARGV) == 0) {
$threshold = 0.2;
} else {
$threshold = shift;
print "Threshold is $threshold\n";
}
open(IN, "intensities/$stem.intensity") or die "badly";
open(OUT, ">beats/$stem.beats") or die "eek";
# File type = "ooTextFile short"
$_ = <IN>; print OUT $_;
# replace "Intensity" with "TextGrid"
$_ = <IN>; print OUT "\"TextGrid\"\n\n";
# skip a line
$_ = <IN>;
chomp($xmin = <IN>);
chomp($xmax = <IN>);
chomp($nx = <IN>); $nx = 0; #(just suprress a arning here)
chomp($dx = <IN>);
chomp($x1 = <IN>);
# Read in intensity contour into #e (envelope)
#e = ();
while($_ = <IN>) { chomp; last unless $_ eq "1";}
push #e, $_;
while($_ = <IN>) {
chomp($_);
push #e, $_;
}
# (1) Find max and min
$max = 0; $min = 1000000;
foreach $ival (#e) {
if($ival > $max) {
$max = $ival;
}
if($ival < $min) {
$min = $ival;
}
}
# (2) look for beats
#beats = ();
print "Thresh: $threshold\n";

open doesn't create the path to the file. Directories intensities/ and beats/ therefore must exist in the current working directory before the script is run.
When open fails, it sets $! to the reason of the failure. Instead of eek or badly, use die $! so Perl can tell you what went wrong.
Moreover, you should turn strict and warnings on. They prevent many common mistakes. As a newbie, you might like to enable diagnostics, too, to get detailed explanations of all the errors and warnings.

Related

Perl: CAM::PDF makes changes, but these don't materialise in final doc

I've been using the CAM::PDF module to try editing pdf docs at work - essentially just trying to change the date on docs automatically to show they have been reviewed recently
unfortunately, despite my code telling me that I am making changes to the PDF objects ($pdf->{changes})
and giving the pdfs the doc is attempting to change maximum accessibility (anyone can access, read, write)
the pdf's outputted never seem to materialise with these changes. I have also been grepping the object node tmp files I output on mass and found that all of these show no sign of the old date after running the code; yet when I view the pdf after running it, the old date is still on the pdf. Has anyone encountered this before or can suggest anything?
just doing this manually isn't an option; I want to script this so I can have a script I just run against multiple files at once (I have LOTS of these files to sort out at work) but other than changing dates written on the doc, the doc has to remain looking the sameish (by which I mean, it would be ok if they changed in size a little but not ok if they completely changed in appearance)
I strictly followed the example changepdfstring.pl (https://metacpan.org/pod/distribution/CAM-PDF/bin/changepdfstring.pl) from the author of the module CAM::PDF on how to do this for my code, then tried different variations of it to try and get things to work - so I'm bemused that nothing has worked in the end
#!/usr/bin/perl
use strict;
use warnings;
use CAM::PDF;
use Data::Dumper;
my $pdf = CAM::PDF->new('Order fulfilment process flowchart.pdf');
if (!$pdf->canModify())
{
die "This PDF forbids modification\n";
}
my $olddate = "15.02.2019";
my $newdate = "22.02.2022";
foreach my $objectnumber (keys %{$pdf->{xref}}){
my $objectnode = $pdf->dereference($objectnumber);
$pdf->changeString($objectnode, {$olddate=>$newdate});
}
my $change = $pdf->{changes};
print Dumper($change);
my $count = 0;
foreach my $objectnumber (keys %{$pdf->{xref}}){
my $objectnode = $pdf->dereference($objectnumber);
$count++;
open (ONO, ">tmp.objectnode.$count");
print ONO Dumper($objectnode);
close (ONO);}
if (!scalar %{$pdf->{changes}})
{
die "no changes were made :(";
}
$pdf->preserveOrder();
$pdf->cleanoutput('pleasework.pdf');
Any help or advice would be greatly appreciated
A quick search in page 145 of the PDF specification[1] shows that there are 2 metadata fields that should allow a simple change to achieve what you are trying to do.
CreationDate
ModDate
Below you can find a quick script using CAM::PDF to set/modify the ModDate with the current date, thus giving the illusion of "modifying" the PDF.
The script can, if needed, be amended to use a specific date instead of the current time to set the modification date.
Please note that I'm not sure that CAM::PDF is the best option to get this task done.
The script is a only a sample of what can be done within the limitations and simplicity of CAM::PDF.
[1] https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/pdf_reference_archives/PDFReference.pdf
#!/usr/bin/env perl
use strict;
use warnings;
use Time::Local;
use CAM::PDF;
use CAM::PDF::Node;
my $infile = shift || die 'syntax...';
my $outfile = shift || die 'syntax...';
my $pdf = CAM::PDF->new($infile) || die;
my $info = $pdf->getValue($pdf->{trailer}->{Info});
if ($info) {
my #time = localtime(time);
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = #time;
$year += 1900;
$mon++;
my $gmt_offset_in_seconds = timegm(#time) - timelocal(#time);
my $gmt_offset_min = ($gmt_offset_in_seconds / 60) % 60;
my $gmt_offset_hour = abs(int($gmt_offset_in_seconds / (60*60)));
my $offset_char = "";
if ($gmt_offset_in_seconds < 0) {
$offset_char = "-";
} else {
$offset_char = "+";
}
my $date = sprintf("D:%04d%02d%02d%02d%02d%02d%s%02d'%02d'", $year, $mon, $mday, $hour, $min, $sec, $offset_char, $gmt_offset_hour, $gmt_offset_min);
my $objnum = undef;
for my $obj ('Creator', 'Producer', 'CreationDate') {
if (exists $info->{$obj} and exists $info->{$obj}->{objnum}) {
$objnum = $info->{$obj}->{objnum};
last;
}
}
die "Cannot find objnum, halting..." if not defined $objnum;
my $mod_date = $info->{ModDate};
if ($mod_date) {
$mod_date->{value} = $date;
} else {
my $mod_date = new CAM::PDF::Node('string',$date);
$mod_date->{gennum} = 0;
$mod_date->{objnum} = $objnum;
$info->{ModDate} = $mod_date;
}
$pdf->preserveOrder();
$pdf->cleanoutput($outfile);
} else {
print "Cannot find PDF info section, doing nothing!\n";
}
I'm the author of CAM::PDF. Without seeing the PDF, I can only guess but I'd wager that the problem is that $olddate simply doesn't match any text in the doc. Kerning can break strings into multiple parts, for example. Also, there are several different ways to encode strings that appear the same in the resulting doc. So the trick for you will be figuring out what is the pattern for the dates in your specific docs.
That said, I also like the clever idea that #Bruce Ramos offered in a separate answer. That approach won't change a date that's visible in the rendered PDF (like, if you print it) but it should show up as metadata in nearly any PDF viewer.
I found that the line I was trying to edit was not actually a contiguous set of characters in the pdf, but rather it was inside a TJ operator in a BT line in the PDF. I cannot see any provision for handling cases where the desired text is in TJ lines in the CAM::PDF library (although perhaps there is #ChrisDolan ?) hence it was unable to be operated on nor "swapped out" by CAM::PDF. After decompressing all the streams (where applicable) I found this 'TJ' line which had the text I wished to operate on:
[(D)-20(a)24(t)62(e)-46(:)86( )-46(1)52(5)-37(.)70(0)-37(2)52(.)-20(2)52(0)-37(1)52(9)] TJ
I don't believe it would have been possible for CAM::PDF to act on TJ lines, perhaps it can only act on Tj lines
For anyone looking for a quick answer to this same problem, this "dirty" script worked for me in this case:
#!/usr/bin/perl
use strict;
use Compress::Raw::Zlib;
use bytes;
open(OUT,'>', "newfromoldscript.pdf");
my $fname = 'Order fulfilment process flowchart.pdf';
open(FILE, '<:raw', $fname) || die("can't open($fname): $!");
$/ = undef;
my $file = <FILE>;
my $file_len = length($file);
my $i = 0;
my $offset;
my $offset;
my $o;
do {
$o = doX(substr($file, $offset, $file_len), $i);
$offset+=$o;
$i++;
} while($o && $i< 100);
sub doX {
my $file = shift;
my $i = shift;
my $stream = index($file, "\nstream");
if ($stream < 0) {
print OUT $file;
return 0;
}
$stream++;
my $deflate = 1;
my $line_before = rindex(substr($file,0,$stream), "<<");
print OUT substr($file,0,$line_before);
my $x = substr($file, $line_before,$stream-$line_before);
if ($i == 22) {
print "";
}
my $stream_len;
if ($x =~ /FlateDecode\/Length (\d+)>>/) {
$stream_len = $1;
}
if ($x =~ /FlateDecode\/Length (\d+)\//) {
print "Warn Object $i has len/len what the even is this?\n";
$stream_len = $1;
}
if ($x =~ /XML\/Length (\d+)>>/) {
$deflate = 0;
$stream_len = $1;
}
if (!$stream_len) {
die("I fail with no stream len : $x");
}
print "-->$line_before,$i,$stream=$stream_len=$x<--\n";
my $bytes = substr($file, $stream+8,$stream_len);
my $orig_bytes = $bytes; # inflate seems to mangle bytes, so take a copy
my $o;
my $d=new Compress::Raw::Zlib::Inflate();
if ($deflate) {
$d->inflate($bytes,$o);
} else {
$o = $bytes;
}
my $orig_x = $x;
my $changes;
my %change = (
'-20(2)52(0)-37(.)52(.)' => '-20(2)52(0)-37(2)52(0)', #trialling different reg ex's here
'-37(1)52(9)'=>'-37(2)52(0)', #reg ex's
'Date: 15.02.2019'=>'Date: 12.02.2020',
'[(A)[\d-]+(p)[\d-]+(p)[\d-]+(r)[\d-]+(o)[\d-]+(ve)[\d-]+(d)[\d-]+( )[\d-]+(B[^\]]+\] TJ'=>'(Approved By: George W) Tj??G-TAG??' #scrap the whole TJ, replace for Tj
);
foreach my $re (keys %change) {
my $to = $change{$re};
$re =~ s/([\(\)])/\\\1/g; # escape round brackets
print $re;
open (GW, ">tmp.gw");
print GW $re;
close (GW);
if ($o=~/$re/m) {
$o =~ s/$re/$to/mg;
print $o;
$changes++;
}
}
if ($changes) {
print "\n MADE CHANGES\n";
#split, get rid of the ? mark tag
my #remains = split('\?\?G-TAG\?\?', $o);
my $firsthalf = $remains[0];
my $secondhalf = $remains[1];
#reverse the string
$firsthalf = scalar reverse ($firsthalf);
if ($firsthalf =~ m/fT 52\.8 2F/){print "FOUND THE REVERSE"}
$firsthalf =~ s/fT 52\.8 2F/fT 52\.8 0F/;
#reg ex to back track to the nearest and thus relevant Font/F and set it to F0
#put it back in correct orientation
$firsthalf = scalar reverse ($firsthalf);
$o = join("", $firsthalf, $secondhalf);
open (WEIRD, ">tmp.weird");
print WEIRD $firsthalf;
close (WEIRD);
$changes++;
my $d = new Compress::Raw::Zlib::Deflate();
my $obytes;
my $obytes2;
my $status = $d->deflate($o, $obytes);
$d->flush($obytes2);
$bytes = $obytes . $obytes2;
if (length($bytes) != $stream_len) {
my $l = length($bytes);
print "-->$x<--\n";
warn("what do we do here $l != $stream_len");
$orig_x =~ s/$stream_len/$l/;
}
print OUT $orig_x . "stream\r\n";
print OUT $bytes . "\r";
} else {
print OUT $orig_x . "stream\r\n";
print OUT $orig_bytes . "\r";
}
open(TMP,">out/tmp.$i.bytes");
print TMP $o;
close(TMP);
return $stream + 8 + $stream_len + 1;
}
Essentially I swap out the TJ for a Tj for changing someone elses name on the document to my name, which makes it simpler to insert my change (but potentially messy). To enable this to display with capitalised letters, I had to reverse the string and swap out the font (F) it was under (F2) to F0
For the TJ line relating to date, I swapped out the TJ characters for the date I wished to change it to, this meant I had to abide by the "unfriendly" syntax TJ operator lines abide by

I can't open a new file to write to in perl

I have looked on several links and other questions to try and find a solution but I still can't open the file that I'm trying to open. This is the block of code that I can't get to function:
$filename = "Related Traits: Chromosome 1";
open ($output1, ">", "gwasfiles4/$filename".".txt");
$length1 = scalar(#chr1);
if ($length1 > 1) {
#chr1 = sort {$a <=> $b} #chr1;
for ($x = 0; $x <= $length1; $x++){
for ($y = $x + 1; $y <= $length1 - 1; $y++){
if (abs($chr1[$x] - $chr1[$y]) < 500000){
print $output1 "$chr1[$x]\t$chr1[$y]\n";
}
}
}
}
When I run this, I got the error:
print() on closed filehandle $output at file.pl line 94
Why is the file not opened?
The file now opens with this but is empty:
my #chr1;
my $filename = "Related_Traits_Chromosome_1_$ARGV[0]";
open (my $output1, '>', "gwasfiles4/$filename") or die $!;
my $length1 = scalar(#chr1);
if ($length1 > 1) {
#chr1 = sort {$a <=> $b} #chr1;
for (my $x = 0; $x <= $length1; $x++){
for (my $y = $x + 1; $y <= $length1 - 1; $y++){
if (abs($chr1[$x] - $chr1[$y]) < 500000){
print $output1 "$chr1[$x]\t$chr1[$y]\n";
}
}
}
}
use strict; and use warnings; should always be at the start of your program. Fix the errors they generate first, and you'll have better code.
So too - you should check the error code from open:
open my $output1, '>', "gwasfiles4/$filename.txt" or die $!;
This will print the error that open generated if it failed. I'll guess either gwasfiles4 doesn't exist, or your OS doesn't like the filename with an embedded :.
If strict and warnings don't help enough, use diagnostics; will give you yet another layer of information about the problem.
autodie is particularly useful - it puts that or die $! line after each open statement automatically. (and a few other things besides).
I would suggest as a point of style - enclose your lexical filehandle in {} when printing, as this makes it very clear it's a filehandle.
print {$output1} "$chr1[$x]\t$chr1[$y]\n";
Edit:
Following your changes, you have a completely different problem:
my #chr1; #create empty array
my $length1 = scalar(#chr1); # scalar here takes length, array is empty, therefore length is _always_ zero.
if ($length1 > 1) { #therefore never happens
I suggest that either the directory gwasfiles4 doesn't exist or you'rte working on a Windows system which doesn't allow colons : in its filenames

How to look at the next line of a file in Perl

I have a piece of code which opens up a file and parses it. This text document has a redundant structure and has multiple entries. I need to peek ahead within my loop to see if there is a new entry, if there is, I will be able to parse all of the data my program extracts. Let me first show my implementation so far
use strict;
my $doc = open(my $fileHandler, "<", "test.txt");
while(my $line = <$fileHandler>) {
## right here I want to look at the next line to see if
## $line =~ m/>/ where > denotes a new entry
}
Try handling the iteration yourself:
my $line = <$fileHandler>;
while(1) { # keep looping until I say so
my $nextLine = <$fileHandler>;
if ($line =~ m/>/ || !defined $nextLine) {
### Do the stuff
}
### Do any other stuff;
last unless defined $nextLine;
$line = $nextLine;
}
I added the extra check in the if statement under the assumption that you will also want to process what you have when you reach the end of the file.
Alternatively, as suggested by friedo, if the file can fit into memory, you can load the whole thing into an array at once:
my #lines = <$fileHandler>;
for (my $i = 0; $i <= $#lines; $i++) {
if ($i == $#lines || $lines[$i+1] =~ />/) {
### Do the stuff
}
}
This is more flexible in that you can access any arbitrary line of the file, in any order, but as mentioned the file does have to be small enough to fit into memory.
A nice way to handle these problems is using Tie::File, which allows you to treat a file like an array, without the performance penalty of actually loading the file into memory. It is also a core module since perl v5.7.3.
use Tie::File;
tie my #file, 'Tie::File', "test.txt" or die $!;
for my $linenr (0 .. $#file) { # loop over line numbers
if ($file[$linenr] =~ /foo/) { # this is the current line
if ($file[$linenr + 1] =~ /^>/ && # this is the next line
$linenr <= $#file) { # don't go past end of file
# do stuff
}
}
}
untie #file; # all done

Skip the problematic DAT and proceed with next DATs and out put the error or missed DATs list in separate text file in Perl

I have another question here, i have several dats and want to merge them. But the script first checks for header of all the DATs and if not matching it will throw error and stop the script. Now i want to run the script skipping the problematic dat and output the error in separate text file with list of errored DAts and reason. Could anyone please help on this. Here is what i have so far:
use strict;
my $rootdir = $ARGV[0];
die "usage: perl mergetxtfiles.pl <folder>" if ($#ARGV != 0);
#$rootdir =~ s/\\/\\\\/g;
print "\nFolder = $rootdir\n\n";
opendir(DIR, $rootdir)
or die "failed opening the directory $rootdir";
open(OF,">:utf8",'combined_'.time.'.dat')
or die "failed opening the file";
my $icr = 0;
my $cnt = 0;
my $header = '';
my $header_flag = 0;
while(my $fname = readdir(DIR)) {
# add extensions if needed
if ($fname =~ m/(\.txt)|(\.dat)|(\.csv)$/i) {
$icr++;
my $fnamepath = $rootdir.'\\'.$fname;
print "\($icr\) $fname\n";
open(IF, "<:utf8", $fnamepath)
or die "ERROR: cannot open the file\n$fnamepath ";
my $sep_icr = 0;
while(<IF>) {
my $line = $_;
chomp $line;
next if (/^$/);
$sep_icr++;
$cnt++;
my #ar = split(/\t/,$line);
if ($cnt == 1) {
$header_flag = 1;
$header = $line;
}
if ($sep_icr == 1 and $header_flag == 1) {
#print "$line \n $header\n";
if ($line ne $header) {
die "Headers are not same\n";
}
elsif (($line eq $header) and ($cnt >1)) {
print "INFO\: ignoring the same header for $fname \n";
$cnt--;
next;
}
}
print OF $line."\n";
}
print "\--Line count= $sep_icr\n\n";
close IF;
#print OF "\n";
}
}
print "\-\-\> Total line count= $cnt\n";
Named Loops
In your loop, we have to change your if-clause and the outer loop a bit:
FILE:
while(my $fname = readdir(DIR)) {
...;
if ($line ne $header) {
logger($fname, "Headers not matching");
next FILE;
}
...;
}
In Perl, loops can be labeled, so we can specify which loop we do next, instead of setting and checking flags. I used an example logging function loggeras given below, but you can substitute it with an appropriate print statement.
Logging
This is probably a bit more than asked, but here is a little logging function for flexibility. Arguments are a filename, a reason, and an optional severity. You can remove the severity code if it isn't needed. The severity is optional anyway and defaults to debug.
open my $logfile, ">>", "FILENAME" or die "..."; # open for append
sub logger {
my ($file, $reason, $severity) = (#_, 'debug');
$severity = {
debug => '',
info => 'INFO',
warn => '!WARN!',
fatal => '!!!ERROR!!!',
}->{$severity} // $severity; # transform the severity if it is a name we know
$severity .= ' ' if length $severity; # append space if we have a severity
print {$logfile} $severity . qq{$reason while processing "$file"\n};
}
If called with logger("./foo/bar", "Headers not matching", 'warn') it will output:
!WARN! Headers not matching while processing "./foo/bar"
Change the printed error message to something more machine-readable if needed.
Style tips and tricks:
If find these lines more elegant:
die "usage: ...\n" unless #ARGV;
my ($rootdir) = #ARGV;
note the newline at the end (supresses the "at line 3" etc). In scalar context, an array returns the array length. In the second line we can avoid array subscripting by assigning in list context. Surplus elements are ignored.
Instead
if ($fname =~ m/(\.txt)|(\.dat)|(\.csv)$/i) { ...; }
we can say
next unless $fname =~ m/(?: \.txt | \.dat | \.csv )$/xi;
and avoid unneccessary intendation, therefore improving readability.
I modified the regex so that all suffixes must come at the end, not only the .csv suffix, and added the /x modifier so that I can use non-semantic whitespace inside the regex.
Windows, and pretty much any OS, understand forward slashes in path names. So instead
my $fnamepath = $rootdir.'\\'.$fname;
we can write
my $fnamepath = "$rootdir/$fname";
I find that easier to write and understand.
The
while(<IF>) {
my $line = $_;
construct can be simplified to
while(my $line = <IF>) {...}
Last but not least, consider starting a habit of using filehandles with my. Often, global filehandles are not needed and can cause some bugs.

Perl module to find out whether a word is a verb/noun/adjective/article/preposition

I have a list of words and I want to group them into different groups depending on whether they are verbs/adjectives/nouns/etc. So, basically I am looking for a Perl module which tells whether a word is verb/noun etc.
I googled but couldn't find what I was looking for. Thanks.
Lingua::EN::Tagger, Lingua::EN::Semtags::Engine, Lingua::EN::NamedEntity
See the Lingua::EN:: namespace in CPAN. Specifically, Link Grammar and perhaps Lingua::EN::Tagger can help you. Also WordNet provides that kind of information and you can query it using this perl module.
follow code perl help you to find all this thing in your text file in your folder only give the path of directory and it will process all file at once and save result in report.txt file strong text
#!/usr/local/bin/perl
# for loop execution
# Perl Program to calculate Factorial
sub fact
{
# Retriving the first argument
# passed with function calling
my $x = $_[0];
my #names = #{$_[1]};
my $length = $_[2];
# checking if that value is 0 or 1
if ($x < $length)
{
#print #names[$x],"\n";
use Lingua::EN::Fathom;
my $text = Lingua::EN::Fathom->new();
# Analyse contents of a text file
$dirlocation="./2015/";
$path =$dirlocation.$names[$x];
$text->analyse_file($path); # Analyse contents of a text file
$accumulate = 1;
# Analyse contents of a text string
$text->analyse_block($text_string,$accumulate);
# TO Do, remove repetition
$num_chars = $text->num_chars;
$num_words = $text->num_words;
$percent_complex_words = $text->percent_complex_words;
$num_sentences = $text->num_sentences;
$num_text_lines = $text->num_text_lines;
$num_blank_lines = $text->num_blank_lines;
$num_paragraphs = $text->num_paragraphs;
$syllables_per_word = $text->syllables_per_word;
$words_per_sentence = $text->words_per_sentence;
# comment needed
%words = $text->unique_words;
foreach $word ( sort keys %words )
{
# print("$words{$word} :$word\n");
}
$fog = $text->fog;
$flesch = $text->flesch;
$kincaid = $text->kincaid;
use strict;
use warnings;
use 5.010;
my $filename = 'report.txt';
open(my $fh, '>>', $filename) or die "Could not open file '$filename' $!";
say $fh $text->report;
close $fh;
say 'done';
print($text->report);
$x = $x+1;
fact($x,\#names,$length);
}
# Recursively calling function with the next value
# which is one less than current one
else
{
done();
}
}
# Driver Code
$a = 0;
#names = ("John Paul", "Lisa", "Kumar","touqeer");
opendir DIR1, "./2015" or die "cannot open dir: $!";
my #default_files= grep { ! /^\.\.?$/ } readdir DIR1;
$length = scalar #default_files;
print $length;
# Function call and printing result after return
fact($a,\#default_files,$length);
sub done
{
print "Done!";
}