Why I am not getting "success" with this program? - perl

I have written the following program with the hope of getting success. But I could never get it.
my $fileName = 'myfile.txt';
print $fileName,"\n";
if (open MYFILE, "<", $fileName) {
my $Data;
{
local $/ = undef;
$Data = <MYFILE>;
}
my #values = split('\n', $Data);
chomp(#values);
if($values[2] eq '9999999999') {
print "Success"."\n";
}
}
The content of myfile.txt is
160002
something
9999999999
700021

Try splitting by \s*[\r\n]+
my $fileName = 'myfile.txt';
print $fileName,"\n";
if (open MYFILE, "<", $fileName) {
my $Data;
{
local $/ = undef;
$Data = <MYFILE>;
}
my #values = split(/\s*[\r\n]+/, $Data);
if($values[2] eq '9999999999') {
print "Success";
}
}

If myfile.txt contain carriage return (CR, \r), it will not work as expected.
Another possible cause is trailing spaces before linefeed (LF, \n).

You don't need to read an entire file into an array to check one line. Open the file, skip the lines you don't care about, then play with the line you do care about. When you've done what you need to do, stop reading the file. This way, only one line is ever in memory:
my $fileName = 'myfile.txt';
open MYFILE, "<", $fileName or die "$filename: $!";
while( <MYFILE> ) {
next if $. < 3; # $. is the line number
last if $. > 3;
chomp;
print "Success\n" if $_ eq '9999999999';
}
close MYFILE;

my $fileName = 'myfile.txt';
open MYFILE, "<", $fileName || die "$fileName: $!";
while( $rec = <MYFILE> ) {
for ($rec) { chomp; s/\r//; s/^\s+//; s/\s+$//; } #Remove line-feed and space characters
$cnt++;
if ( $rec =~ /^9+$/ ) { print "Success\n"; last; } #if record matches "9"s only
#print "Success" and live the loop
}
close MYFILE;
#Or you can write: if ($cnt==3 and $rec =~ /^9{10}$/) { print "Success\n"; last; }
#If record 3 matches ten "9"s print "Success" and live the loop.

Related

Perl - substring keywords

I have a text file where is lot of lines, I need search in this file keywords and if exist write to log file line where is keywords and line one line below and one above the keyword. Now search or write keyword not function if find write all and I dont known how can I write line below and above. Thanks for some advice.
my $vstup = "C:/Users/Omega/Documents/Kontroly/testkontroly/kontroly20220513_154743.txt";
my $log = "C:/Users/Omega/Documents/Kontroly/testkontroly/kontroly.log";
open( my $default_fh, "<", $vstup ) or die $!;
open( my $main_fh, ">", $log ) or die $!;
my $var = 0;
while ( <$default_fh> ) {
if (/\Volat\b/)
$var = 1;
}
if ( $var )
print $main_fh $_;
}
}
close $default_fh;
close $main_fh;
The approach below use one semaphore variable and a buffer variable to enable the desired behavior.
Notice that the pattern used was replaced by 'A` for simplicity testing.
#!/usr/bin/perl
use strict;
use warnings;
my ($in_fh, $out_fh);
my ($in, $out);
$in = 'input.txt';
$out = 'output.txt';
open($in_fh, "< ", $in) || die $!."\n";
open($out_fh, "> ", $out) || die $!;
my $p_next = 0;
my $p_line;
while (my $line = <$in_fh>) {
# print line after occurrence
print $out_fh $line if ($p_next);
if ($line =~ /A/) {
if (defined($p_line)) {
# print previous line
print $out_fh $p_line;
# once printed undefine variable to avoid printing it again in the next loop
undef($p_line);
}
# Print current line if not already printed as the line follows a pattern
print $out_fh $line if (!$p_next);
# toggle semaphore to print the next line
$p_next = 1;
} else {
# pattern not found.
# if pattern was not detected in both current and previous line.
$p_line = $line if (!$p_next);
$p_next = 0;
}
}
close($in_fh);
close($out_fh);

Nested if statements: Swapping headers and sequences in fasta files

I am opening a directory and processing each file. A sample file looks like this when opened:
>AAAAA
TTTTTTTTTTTAAAAATTTTTTTTTT
>BBBBB
TTTTTTTTTTTTTTTTTTBBBBBTTT
>CCCCC
TTTTTTTTTTTTTTTTCCCCCTTTTT
For the above sample file, I am trying to make them look like this:
>TAAAAAT
AAAAA
>TBBBBBT
BBBBB
>TCCCCCT
CCCCC
I need to find the "header" in next line sequence, take flanks on either side of the match, and then flip them. I want to print each file's worth of contents to another separate file.
Here is my code so far. It runs without errors, but doesn't generate any output. My guess is this is probably related to the nested if statements. I have never worked with those before.
#!/usr/bin/perl
use strict;
use warnings;
my ($directory) = #ARGV;
my $dir = "$directory";
my #ArrayofFiles = glob "$dir/*";
my $count = 0;
open(OUT, ">", "/path/to/output_$count.txt") or die $!;
foreach my $file(#ArrayofFiles){
open(my $fastas, $file) or die $!;
while (my $line = <$fastas>){
$count++;
if ($line =~ m/(^>)([a-z]{5})/i){
my $header = $2;
if ($line !~ /^>/){
my $sequence .= $line;
if ($sequence =~ m/(([a-z]{1})($header)([a-z]{1}))/i){
my $matchplusflanks = $1;
print OUT ">", $matchplusflanks, "\n", $header, "\n";
}
}
}
}
}
How can I fix this code? Thanks.
Try this
foreach my $file(#ArrayofFiles)
{
open my $fh," <", $file or die"error opening $!\n";
while(my $head=<$fh>)
{
chomp $head;
$head=~s/>//;
my $next_line = <$fh>;
my($extract) = $next_line =~m/(.$head.)/;
print ">$extract\n$head\n";
}
}
There are several mistakes in your code but the main problem is:
if ($line =~ m/(^>)([a-z]{5})/i) {
my $header = $2;
if ($line !~ /^>/) {
# here you write to the output file
Because the same line can't start and not start with > at the same time, your output files are never written. The second if statement always fails and its block is never executed.
open(OUT, ">", "/path/to/output_$count.txt") or die $!; and $count++ are misplaced. Since you want to produce an output file (with a new name) for each input file, you need to put them in the foreach block, not outside or in the while loop.
Example:
#!/usr/bin/perl
use strict;
use warnings;
my ($dir) = #ARGV;
my #files = glob "$dir/*";
my $count;
my $format = ">%s\n%s\n";
foreach my $file (#files) {
open my $fhi, '<', $file
or die "Can't open file '$file': $!";
$count++;
my $output_path = "/path/to/output_$count.txt";
open my $fho, '>', $output_path
or die "Can't open file '$output_path': $!";
my ($header, $seq);
while(<$fhi>) {
chomp;
if (/^>([a-z]{5})/i) {
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
($header, $seq) = ($1, '');
} else { $seq .= $_; }
}
if ($seq) { printf $fho $format, $seq =~ /([a-z]$header[a-z])/i, $header; }
}
close $fhi;
close $fho;

Failed to open GLOB error

I've noticed that when you drag & drop a file into OS X Terminal and any part of the pathway contains a space (for example in a folder name) it substitutes this for a \
This then leads to an error in opening files in my script:
use strict;
use warnings;
use File::Basename;
my $in;
my $filename = $ARGV[0];
unless ($filename){
print "\n\nPlease drag and drop a FASTA/FA or plain-text file containing your sequence into the prompt window and hit ENTER. Alternatively, manually specify the file-pathway:\n";
$filename = <STDIN>;
chomp $filename;
}
open($in, $filename) or die "Failed to open $in: $!";
my $DNA = read_fasta($in);
my $len = length($DNA);
print "\nFASTA/Sequence Length is: $len bp \n";
print "\nPlease enter restriction sites (degeneracy characters permitted) seperated by a single space: ";
my $sites=<STDIN>;
chomp $sites;
my #pats = split ' ', $sites;
for (#pats) {
s/K/[GT]/g;
s/M/[AC]/g;
s/Y/[CT]/g;
s/S/[CG]/g;
s/W/[AT]/g;
s/B/[CGT]/g;
s/V/[ACG]/g;
s/H/[ACT]/g;
s/D/[AGT]/g;
s/X/[AGCT]/g;
s/R/[AG]/g;
s/N/[AGCT]/g;
}
for (#pats) {
my $m = () = $DNA =~ /$_/gi;
print "\nTotal DNA matches to $_ are: $m \n";
}
my $DIR = dirname($filename);
my $name = basename($filename);
(my $extrem = $name) =~ s/\.[^.]+$//;
open my $out, ">$DIR/$extrem $sites.txt";
my $pat=join("|",#pats);
my #cutarr = split(/$pat/, $DNA);
for (#cutarr) {
my $len = length($_);
print $out "$len \n";
}
print "\nYour results are located at: $DIR/$extrem $sites.txt\n\n";
close($out);
close($in);
#Subfunction - Reading formatted FASTA/FA files
sub read_fasta {
my ($fh) = #_;
my $sequence;
while (<$fh>) {
unless (/^>/) {
chomp;
$sequence .= $_;
}
}
return $sequence;
}
It will open files if the pathway contains no spaces. Is there a better way I can go about opening the file to avoid this occurring?
Try to remove backslashes from your file name,
$filename =~ tr|\\||d;
open(my $in, $filename) or die $!;

Perl. I cannot write on a file. The file stays empty

I wrote the code below.
I cannot find any error in it.
But
say $valid $1;
does not work. $valid file is empty when the program finishes.
What's wrong?
Thanks in advance! ^^
#!/usr/bin/perl
use 5.012;
use strict;
use warnings;
use LWP::Simple;
open my $input, '<', 'c:\perl\015_JiraGet\addrHDP.txt' or die "Cannot read: $!\n";
open my $valid, '<', 'c:\perl\015_JiraGet\valid.txt' or die "Cannot read: $!\n";
my #totalReport;
my $eachAddr;
my $copyFile;
my $copyFilePath = 'c:\perl\015_JiraGet\HADOOP XML\\';
my $tempFile;
my $tempFilePath = 'c:\perl\015_JiraGet\temp.txt';
my $analyzed;
my $analyzedPath = 'c:\perl\015_JiraGet\analyzed - HADOOP.txt';
my $undefCheck;
my $i = 0;
my $j = 0;
my $title = 'temp';
my $dup = 0;
while(<$input>) { chomp; push #totalReport, $_; }
foreach(#totalReport)
{
$eachAddr = $_;
$undefCheck = get($eachAddr);
if(defined($undefCheck) == 0) { next; }
# Copy one XML file to 'temp.txt' and then close the file.
open $tempFile, '>', $tempFilePath or die "Cannot open 1: $!\n";
print $tempFile get($eachAddr);
close $tempFile;
# If the entry is a duplicate, go on to the next entry
open $tempFile, '<', $tempFilePath or die "Cannot open 2: $!\n";
($title, $dup) = isDuplicate($tempFile, $title);
if($dup == 1) { close $tempFile; next; }
close $tempFile;
say ++$i . "th report!!!";
# Copy one XML file to HDD.
if($eachAddr =~ /.*\/(.*)/)
{
say $valid $1;
open $copyFile, '>', $copyFilePath . $1 or die "Cannot open 3: $!\n";
print $copyFile get($eachAddr);
close $copyFile;
}
# If the entry is NOT fixed or resolved, go on to the next entry
open $tempFile, '<', $tempFilePath or die "Cannot open 4: $!\n";
if(isFixCloseResolve($tempFile) == 0) { close $tempFile; next; }
close $tempFile;
# Analyze one entry
open $tempFile, '<', $tempFilePath or die "Cannot open 5: $!\n";
open $analyzed, '>>', $analyzedPath or die "Cannot open 6: $!\n";
analyzeOneReport($tempFile, $analyzed);
close $tempFile;
close $analyzed;
say ' ' . ++$j . "th fixed & closed report!!!";
}
say "$i total reports.";
say "$j total fixed & closed reports.";
close $input;
close $valid;
say "Finished!";
sub isDuplicate
{
my $iReport = $_[0];
my $title = 'temp';
my $dup = 0;
while(<$iReport>)
{
if ($_ =~ /.*\<title>(.*)\<\/title>/)
{
if($1 ne 'ASF JIRA') { $title = $1; if($title eq $_[1]) { $dup = 1; } last; }
}
}
return ($title, $dup);
}
# returns 1 if an entry is a Bug and Fixed and Closed
sub isFixCloseResolve
{
my $iReport = $_[0];
my $isCloseResolve = 0;
my $isFixed = 0;
while(<$iReport>)
{
if ($_ =~ /.*\<status[^>]*>(.*)\<\/status>/) { if(($1 eq 'Closed')||($1 eq 'Resolved')) { $isCloseResolve = 1;} }
elsif($_ =~ /.*\<resolution[^>]*>(.*)\<\/resolution>/) { if($1 eq 'Fixed') { $isFixed = 1;} }
}
return $isCloseResolve * $isFixed;
}
sub analyzeOneReport
{
my $iReport = $_[0];
my $oReport = $_[1];
while(<$iReport>)
{
chomp;
if ($_ =~ /.*\<title>(.*)\<\/title>/) { if($1 ne 'ASF JIRA') { say $oReport "Title : $1"; } }
elsif($_ =~ /.*\<assignee username="(.*)">.*\<\/assignee>/) { say $oReport "Assignee: $1"; }
elsif($_ =~ /.*\<reporter username="(.*)">.*\<\/reporter>/) { say $oReport "Reporter: $1"; }
elsif($_ =~ /.*\<type[^>]*>(.*)\<\/type>/) { say $oReport "Type : $1"; }
elsif($_ =~ /.*\<priority[^>]*>(.*)\<\/priority>/) { say $oReport "Priority: $1"; }
elsif($_ =~ /.*\<created>(.*)\<\/created>/) { say $oReport "Created : $1"; }
elsif($_ =~ /.*\<resolved>(.*)\<\/resolved>/) { say $oReport "Resolved: $1"; }
}
say $oReport '--------------------------------------------';
}
--- Postscript ---
Oh, I was wrong on '>' part!! Thank you everyone!!
But when I changed that into '>', still nothing was written on the file 'DURING PROGRAM RUNNING TIME'.
So I was confused...and I found that Perl actually writes the contents to the file 'WHEN IT CLOSED THE FILE'.
So during running time, for 4~8 hours, I could not see anything in the file.
Data is written on the file when the file is closed.
That's one of the reason why I thought this code was not working. ^^;
Hope nobody else suffer from this problem again! :)
Here:
open my $valid, '<',....
you have opened $valid for reading. If you wish to write to the file, you must instead write:
open my $valid, '>',....
If you need to keep existing contents and write only to the end then instead use
open my $valid, '>>',....
You're only opening the file under the $valid file handle with read capabilities, as you can see from this line:
open my $valid, '<', 'c:\perl\015_JiraGet\valid.txt' or die "Cannot read: $!\n";
So nothing you write to the file will actually go into it. Change it to read/write (or append, if you need it, just use +>> instead of +> in the code below), and you should be good, as follows:
open my $valid, '+>', 'c:\perl\015_JiraGet\valid.txt' or die "Cannot read: $!\n";
I am going to review this code as if it had been posted to Code Review.
First off you are writing in Perl as if it were C. Which in general is not that bad, but it does mean that you are doing quite a bit more work than is necessary.
Instead of using this overly verbose, and potentially memory intensive:
my #totalReport
...
while(<$input>) { chomp; push #totalReport, $_; }
foreach(#totalReport)
{
$eachAddr = $_;
...
}
while( my $addr = <$input> ){
chomp $addr;
...
}
Notice how I've eliminated a variable, and made it so that it loops on the input once, instead of twice. It also doesn't keep the values in memory over the entire length of the program.
Instead of opening a file for writing, closing it, and opening it again:
my $tempFile;
open $tempFile, '>', $tempFilePath or die "Cannot open 1: $!\n";
print $tempFile get($eachAddr);
close $tempFile;
open $tempFile, '<', $tempFilePath or die "Cannot open 2: $!\n";
open my $tempFile, '+>', $tempFilePath or die "Can't open '$tempFilePath' with mode'+>': '$!'";
print $tempFile get($eachAddr);
seek $tempFile, 0, 0;
Instead of getting the text at the given URL twice, and using a weird defined test:
$undefCheck = get($eachAddr);
if(defined($undefCheck) == 0) { next; }
...
print $tempFile get($eachAddr);
my $text = get( $addr );
next unless defined $text;
...
print $tempFile $text;
Instead of a bunch of:
open ... or die ...;
I would use autodie.
use autodie;
...
# will now die on error and will tell you the file it fails on.
open my $fh, '<', $filename;
Another thing I would like to point out is that die "...\n" prevents die from appending the location of the error. The only time you should do that is if the default behaviour is unhelpful.
If you closed $tempFile before checking $dup this could be simpler:
if($dup == 1) { close $tempFile; next; }
close $tempFile;
close $tempFile;
next if $dup;
Instead of this repetitive block of code:
while(<$iReport>)
{
chomp;
if ($_ =~ /.*\<title>(.*)\<\/title>/) { if($1 ne 'ASF JIRA') { say $oReport "Title : $1"; } }
elsif($_ =~ /.*\<assignee username="(.*)">.*\<\/assignee>/) { say $oReport "Assignee: $1"; }
elsif($_ =~ /.*\<reporter username="(.*)">.*\<\/reporter>/) { say $oReport "Reporter: $1"; }
elsif($_ =~ /.*\<type[^>]*>(.*)\<\/type>/) { say $oReport "Type : $1"; }
elsif($_ =~ /.*\<priority[^>]*>(.*)\<\/priority>/) { say $oReport "Priority: $1"; }
elsif($_ =~ /.*\<created>(.*)\<\/created>/) { say $oReport "Created : $1"; }
elsif($_ =~ /.*\<resolved>(.*)\<\/resolved>/) { say $oReport "Resolved: $1"; }
}
use List::Util qw'max';
my #simple_tags = qw'title type priority created resolved';
my $simple_tags_length = max map length, #simple_tags, qw'assignee reporter';
my $simple_tags = join '|', #simple_tags;
...
while( <$iReport> ){
my($tag,$contents);
if( ($tag,$contents) = /<($simple_tags)[^>]*>(.*?)<\/\g{1}>/ ){
}elsif( ($tag,$contents) = /<(assignee|reporter) username="(.*?)">.*?<\/\g{1}>/ ){
}else{ next }
printf $oReport "%-${simple_tags_length}s: %s\n", ucfirst($tag), $contents;
}
While this code isn't any shorter, or clearer, it would be very easy to add another tag to compare against. So it isn't really better, as less repetitive.
I would like to point out that $_ =~ /.../ is better written as /.../.
You could use or instead of if/elsif/else with empty blocks.
...
while( <$iReport> ){
/<($simple_tags)[^>]*>(.*?)<\/\g{1}>/
or /<(assignee|reporter) username="(.*?)">.*?<\/\g{1}>/
or next;
my($tag,$contents) = ($1,$2);
printf $oReport "%-${simple_tags_length}s: %s\n", ucfirst($tag), $contents;
}
It might be best to combine them into a single regex using /x and (?<NAME>REGEX) syntax with %- or %+.
...
while( <$iReport> ){
/
(?:
# simple tags
< (?<tag> $simple_tags ) [^>]* >
# contents between tags
(?<contents> .*? )
|
# tags with contents in `username` attribute
<
(?<tag> assignee|reporter )
[ ]
# contents in `username` attribute
username = "(?<contents> .*? )"
>
.*? # throw out stuff between tags
)
<\/ \g{tag} > # end tag matches start tag
/x or next; # skip if it doesn't match
printf $oReport "%-${simple_tags_length}s: %s\n", ucfirst($+{tag}), $+{contents};
}
Or even use (DEFINE) (I'll leave that as an exercise for the reader since this is already fairly long).
Perhaps the worst part of the code is that you define almost all of you variables up-front.
my #totalReport;
my $eachAddr;
my $copyFile;
my $copyFilePath = 'c:\perl\015_JiraGet\HADOOP XML\\';
my $tempFile;
my $tempFilePath = 'c:\perl\015_JiraGet\temp.txt';
my $analyzed;
my $analyzedPath = 'c:\perl\015_JiraGet\analyzed - HADOOP.txt';
my $undefCheck;
my $i = 0;
my $j = 0;
my $title = 'temp';
my $dup = 0;
This means that you are practically using global variables. While some of these appear to need to be defined there, some of them don't, and therefore shouldn't be defined there. You should really be defining your variables at the point you need them, or at least at the beginning of the block where you need them.
The reason you aren't getting the output until the file is closed is because Perl buffers the output.
Perl normally buffers output so it doesn't make a system call for every bit of output. By saving up output, it makes fewer expensive system calls. …
- perlfaq5
The old way to turn off buffering is to select the file for output and set $| to a non-zero value, and then re-select the original output.
{
my $previous_default = select($file); # save previous default output handle
$| = 1; # autoflush
select($previous_default); # restore previous default output handle
}
The new way is to use $file->autoflush which comes from IO::Handle.
(The module will get automatically loaded for you on recent versions of Perl 5)
You can also flush the output when you choose by using flush or $file->flush.
IO::Handle also adds a $file->printflush which turns on autoflush temporarily during the print.

Using perl, how do I search a text file for _NN (at the end of a word) and print the word in front?

This gives the whole line:
#!/usr/bin/perl
$file = 'output.txt';
open(txt, $file);
while($line = <txt>) {
print "$line" if $line =~ /_NN/;
}
close(txt);
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
binmode(STDOUT, ":utf8") || die;
my $file = "output.txt";
open(TEXT, "< :utf8", $file) || die "Can't open $file: $!";
while(<TEXT>) {
print "$1\n" while /(\w+)_NN\b/g;
}
close(TEXT) || die "Can't close $file: $!";
Your answer script reads a bit awkwardly, and has a couple of potential errors. I'd rewrite the main logic loop like so:
foreach my $line (grep { /expend_VB/ } #sentences) {
my #nouns = grep { /_NN/ } split /\s+/, $line;
foreach my $word (#nouns) {
$word =~ s/_NN//;
print "$word\n";
}
print "$line\n" if scalar(#nouns);
}
You need to put the my declaration inside the loop - otherwise it will persist longer than you want it to, and could conceivably cause problems later.
foreach is a more common perl idiom for iterating over a list.
print "$1" if $line =~ /(\S+)_NN/;
#!/usr/bin/perl
use strict;
use warnings FATAL => "all";
my $search_key = "expend"; ## CHANGE "..." to <>
open(my $tag_corpus, '<', "ch13tagged.txt") or die $!;
my #sentences = <$tag_corpus>; # This breaks up each line into list
my #words;
for (my $i=0; $i <= #sentences; $i++) {
if ( defined( $sentences[$i] ) and $sentences[$i] =~ /($search_key)_VB.*/i) {
#words = split /\s/,$sentences[$i]; ## \s is a whitespace
for (my $j=0; $j <= #words; $j++) {
#FILTER if word is noun:
if ( defined( $words[$j] ) and $words[$j] =~ /_NN/) {
#PRINT word and sentence:
print "**",split(/_\S+/,$words[$j]),"**", "\n";
print split(/_\S+/,$sentences[$i]), "\n"
}
} ## put print sentences here to print each sentence after all the nouns inside
}
}
close $tag_corpus || die "Can't close $tag_corpus: $!";