eliminate empty files in a subroutine in perl - perl

I want to a add a code in the next script to eliminate those empty output files.
The script convert a single fastq file or all the fastq files in a folder to fasta format, all the output fasta files keep the same name of the fastq file; the script present an option to exclude all the sequences that present a determinate number of NNN repeats (NNNNNNNNNNNNNNNNNNATAGTGAAGAATGCGACGTACAGGATCATCTA), I added this option because some sequences present only NNNNN in the sequences, example: if the -n option is equal to 15 (-n 15) it will exclude all the sequences that present 15 o more N repeats, to this point the code works well, but it generate an empty files (in those fastq files that all the sequences present 15 or more N repeats are excluded). I want to eliminate all the empty files (without sequences) and add a count of how many files were eliminate because it were empty.
Code:
#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Long;
my ($infile, $file_name, $file_format, $N_repeat, $help, $help_descp,
$options, $options_descrp, $nofile, $new_file, $count);
my $fastq_extension = "\\.fastq";
GetOptions (
'in=s' => \$infile,
'N|n=i' =>\$N_repeat,
'h|help' =>\$help,
'op' =>\$options
);
# Help
$help_descp =(qq(
Ussaje:
fastQF -in fastq_folder/ -n 15
or
fastQF -in file.fastq -n 15
));
$options_descrp =(qq(
-in infile.fastq or fastq_folder/ required
-n exclude sequences with more than N repeat optional
-h Help description optional
-op option section optional
));
$nofile =(qq(
ERROR: "No File or Folder Were Chosen !"
Usage:
fastQF -in folder/
Or See -help or -op section
));
# Check Files
if ($help){
print "$help_descp\n";
exit;
}
elsif ($options){
print "$options_descrp\n";
exit;
}
elsif (!$infile){
print "$nofile\n";
exit;
}
#Subroutine to convert from fastq to fasta
sub fastq_fasta {
my $file = shift;
($file_name = $file) =~ s/(.*)$fastq_extension.*/$1/;
# eliminate old files
my $oldfiles= $file_name.".fasta";
if ($oldfiles){
unlink $oldfiles;
}
open LINE, '<', $file or die "can't read or open $file\n";
open OUTFILE, '>>', "$file_name.fasta" or die "can't write $file_name\n";
while (
defined(my $head = <LINE>) &&
defined(my $seq = <LINE>) &&
defined(my $qhead = <LINE>) &&
defined(my $quality = <LINE>)
) {
substr($head, 0, 1, '>');
if (!$N_repeat){
print OUTFILE $head, $seq;
}
elsif ($N_repeat){
my $number_n=$N_repeat-1;
if ($seq=~ m/(n)\1{$number_n}/ig){
next;
}
else{
print OUTFILE $head, $seq;
}
}
}
close OUTFILE;
close LINE;
}
# execute the subrutine to extract the sequences
if (-f $infile) { # -f es para folder !!
fastq_fasta($infile);
}
else {
foreach my $file (glob("$infile/*.fastq")) {
fastq_fasta($file);
}
}
exit;
I have tried to use the next code outside of the subroutine (before exit) but it just work for the last file :
$new_file =$file_name.".fasta";
foreach ($new_file){
if (-z $new_file){
$count++;
if ($count==1){
print "\n\"The choosen File present not sequences\"\n";
print " \"or was excluded due to -n $N_repeat\"\n\n";
}
elsif ($count >=1){
print "\n\"$count Files present not sequences\"\n";
print " \" or were excluded due to -n $N_repeat\"\n\n";
}
unlink $new_file;
}
}
and I just have tried something similar inside of the subroutine but this last code don´t work !!!!
Any Advise !!!!???
Thanks So Much !!!

you should check, if something was written to your new file at the end of our fastq_fasta subroutine. Just put your code after the close OUTFILE statement:
close OUTFILE;
close LINE;
my $outfile = $file_name.".fasta";
if (-z $outfile)
{
unlink $outfile || die "Error while deleting '$outfile': $!";
}
Additionally, it will be better to add the die/warn statement also to the other unlink line. Empty files should be deleted.
Maybe another solution if you are not fixed to perl, but allowed to use sed and a bash loop:
for i in *.fastq
do
out=$(dirname "$i")/$(basename "$i" .fastq).fasta
sed -n '1~4{s/^#/>/;N;p}' "$i" > "$out"
if [ -z $out ]
then
echo "Empty output file $out"
rm "$out"
fi
done
Hope that helps!
Best Frank

The easiest thing to do is probably to add a counter to your subroutine to keep track of the number of sequences in the outfile:
sub fastq_fasta {
my $counter1 = 0;
my $file = shift;
($file_name = $file) =~ s/(.*)$fastq_extension.*/$1/;
# eliminate old files
my $oldfiles= $file_name.".fasta";
if ($oldfiles){
unlink $oldfiles;
}
open LINE, '<', $file or die "can't read or open $file\n";
open OUTFILE, '>>', "$file_name.fasta" or die "can't write $file_name\n";
while (
defined(my $head = <LINE>) &&
defined(my $seq = <LINE>) &&
defined(my $qhead = <LINE>) &&
defined(my $quality = <LINE>)
) {
$counter1 ++;
substr($head, 0, 1, '>');
if (!$N_repeat){
print OUTFILE $head, $seq;
}
elsif ($N_repeat){
my $number_n=$N_repeat-1;
if ($seq=~ m/(n)\1{$number_n}/ig){
$counter1 --;
next;
}
else{
print OUTFILE $head, $seq;
}
}
}
close OUTFILE;
close LINE;
return $counter1;
}
You can then delete files when the returned count is zero:
if (-f $infile) { # -f es para folder !!
fastq_fasta($infile);
}
else {
foreach my $file (glob("$infile/*.fastq")) {
if (fastq_fasta($file) == 0) {
$file =~ s/(.*)$fastq_extension.*/$1.fasta/;
unlink $file;
}
}
}

Related

How to print the output of seaching a string from a no of files to a output File In Perl

I have the following code and I want the output to .txt file so can someone pls help to print the output Into some file ?
Rather It should have an option for a user to push to file or print to the command prompt Itself.
# Opening Keyword File here
open( my $kw, '<', 'IMSRegistration_Success_MessageFlow.txt') or die $!;
my #keywords = <$kw>;
chomp(#keywords); # remove newlines at the end of keywords
# post-processing your keywords file for adding comments
my $kwhashref = {
map {
/^(.*?)(#.*?#)*$/;
defined($2) ? ($1 => $2) : ( $1 => undef )
} #keywords
};
# get list of files in current directory
my #files = grep { -f } (<*main_log>,<*Project>,<*properties>);
# loop over each file to search keywords in
foreach my $file (#files)
{
open(my $fh, '<', $file) or die $!;
my #content = <$fh>;
close($fh);
my $l = 0;
foreach my $kw (keys %$kwhashref)
{
my $search = quotemeta($kw); # otherwise keyword is used as regex, not literally
foreach (#content)
{ # go through every line for this keyword
$l++;
if (/$search/)
{
print $kwhashref->{$kw}."\n" if defined($kwhashref->{$kw}) ;
printf 'Found keyword %s in file %s, line %d:%s'.$/, $kw, $file, $l, $_
}
}
}
}
script.pl >output.txt
I can get the output Into a file using below code:
print $out_file $kwhashref->{$kw}."\n" if defined($kwhashref->{$kw}) ;
printf $out_file 'Found keyword %s in file %s, line %d:%s'.$/, $kw, $file, $l, $_;

how to find the first occurrence of a string in all the files in a folder in perl

I'm trying to find the line of first occurrence of the string "victory" in each txt file in a folder. For each first "victory" in file I would like to save the number from that line to #num and the file name to #filename
Example: For the file a.txt that starts with the line: "lalala victory 123456" -> $num[$i]=123456 and $filename[$i]="a.txt"
ARGV holds all the file names. my problem is that I'm trying to go line by line and I don't know what I'm doing wrong.
one more thing - how can I get the last occurrence of "victory" in the last file??
use strict;
use warnings;
use File::Find;
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #ARGV, $File::Find::name } }, $dir); $^I = ".bak";
my $argvv;
my $counter=0;
my $prev_arg=0;
my $line = 0;
my #filename=0;
my #num=0;
my $i = 0;
foreach $argvv (#ARGV)
{
#open $line, $argvv or die "Could not open file: $!";
my $line = IN
while (<$line>)
{
if (/victory/)
{
$line = s/[^0-9]//g;
$first_bit[$i] = $line;
$filename[$i]=$argvv;
$i++;
last;
}
}
close $line;
}
for ($i=0; $i<3; $i++)
{
print $filename[$i]." ".$num[$i]."\n";
}
Thank you very much! :)
Your example script has a number of minor problems. The following example should do what you want in a fairly clean manner:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
# Find the files we're interested in parsing
my #files = ();
my $dir = "D:/New folder";
find(sub { if (-f && /\.txt$/) { push #files, $File::Find::name } }, $dir);
# We'll store our results in a hash, rather than in 2 arrays as you did
my %foundItems = ();
foreach my $file (#files)
{
# Using a lexical file handle is the recommended way to open files
open my $in, '<', $file or die "Could not open $file: $!";
while (<$in>)
{
# Uncomment the next two lines to see what's being parsed
# chomp; # Not required, but helpful for the debug print below
# print "$_\n"; # Print out the line being parsed; for debugging
# Capture the number if we find the word 'victory'
# This assumes the number is immediately after the word; if that
# is not the case, it's up to you to modify the logic here
if (m/victory\s+(\d+)/)
{
$foundItems{$file} = $1; # Store the item
last;
}
}
close $in;
}
foreach my $file (sort keys %foundItems)
{
print "$file=> $foundItems{$file}\n";
}
the below searches for a string abc in all the files(file*.txt) and prints only the first line.
perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' file*.txt
tested:
> cat temp
abc 11
22
13
,,
abc 22
bb
cc
,,
ww
kk
ll
,,
> cat temp2
abc t goes into 1000
fileA1, act that abc specific place
> perl -lne 'BEGIN{$flag=1}if(/abc/ && $flag){print $_;$flag=0}if(eof){$flag=1}' temp temp2
abc 11
abc t goes into 1000
>

Perl opening files from recursive directory

So my program is supposed to recursively go through a directory and then for each file in the directory, open up the file and search for the words "error" "fail" and "failed." Then it should write the instances where these words occur, as well as the rest of the characters on the line after those words, out to a file designated in the command prompt. I have been having some trouble making sure the program performs the search on the files that are found in the directory. Right now it does recurse through the directory and even creates a file to write out to, however, it does not seem to be searching through the files found in the recursing. Here is my code:
#!/usr/local/bin/perl
use warnings;
use strict;
use File::Find;
my $argument2 = $ARGV[0];
my $dir = "c:/program/Scripts/Directory1"; #directory to search through
open FILE, ">>$argument2" or die $!; #file to write out
my $unsuccessful = 0;
my #errors = ();
my #failures= ();
my #failures2 = ();
my #name = ();
my #file;
my $file;
my $filename;
opendir(DIR, $dir) or die $!;
while($file = readdir(DIR)){
next if($file =~ m/^\./);
foreach(0..$#file){
print $_;
open(FILELIST, '<', $_);
while(<FILELIST>){
if (/Unsuccessful/i){
$unsuccessful = 1;
}
if(/ERROR/ ){
push(#errors, "ERROR in line $.\n");
print "\t\tERROR in line $.:$1\n" if (/Error\s+(.+)/);
}
if(/fail/i ){
push(#failures, "ERROR in line $.\n");
print FILE "ERROR in line $.:$1\n" if (/fail\s+(.+)/);
}
if(/failed/i ){
push(#failures2, "ERROR in line $.\n");
print FILE "ERROR in line $.:$1\n" if (/failed\s+(.+)/);
}
if ($unsuccessful){
}
}
close FILELIST;
}
}
closedir(DIR);
close FILE;
So, to clarify, my problem is that the search contained in the "while()" loop does not seem to be executing on the files found in the directory recursively. Any comments/suggestions/help that you can give on why this may be happening would be very helpful. I am new to Perl so some sample code would also just help me understand what you are trying to say. Thank you very much.
Typically, when I want to do something on recursive files, I start with find2perl . -print which generates the boilerplate for me with the wanted function from which I can modify to do whatever I want.
For example
# Traverse desired filesystems
File::Find::find({wanted => \&wanted}, '.');
exit;
sub wanted {
return unless -f $File::Find::name;
return unless -R $File::Find::name;
open (F,"<",$File::Find::name) or warn("Error opening $File::Find::name : $!\n");
while(<F>) {
if(m/error/) { print; }
if(m/fail/) { print; }
}
}
This is an example of a recursive perl directory listing. In reality, I would probably use file::find, or really just grep -R, but I am assuming this is homework of some kind:
use strict;
my $dir = $ARGV[0];
my $level = 0;
depthFirstDirectoryList($dir, $level);
sub depthFirstDirectoryList{
my ($dir, $level) = #_;
opendir (my $ind, $dir) or die "Can't open $dir for reading: $!\n";
while(my $file = readdir($ind)){
if(-d "$dir/$file" && $file ne "." && $file ne ".."){
depthFirstDirectoryList("$dir/$file", $level + 1);
}
else{
no warnings 'uninitialized';
print "\t" x $level . "file: $dir/$file\n";
}
}
}

Perl. I cannot write on a file. The file stays empty

I wrote the code below.
I cannot find any error in it.
But
say $valid $1;
does not work. $valid file is empty when the program finishes.
What's wrong?
Thanks in advance! ^^
#!/usr/bin/perl
use 5.012;
use strict;
use warnings;
use LWP::Simple;
open my $input, '<', 'c:\perl\015_JiraGet\addrHDP.txt' or die "Cannot read: $!\n";
open my $valid, '<', 'c:\perl\015_JiraGet\valid.txt' or die "Cannot read: $!\n";
my #totalReport;
my $eachAddr;
my $copyFile;
my $copyFilePath = 'c:\perl\015_JiraGet\HADOOP XML\\';
my $tempFile;
my $tempFilePath = 'c:\perl\015_JiraGet\temp.txt';
my $analyzed;
my $analyzedPath = 'c:\perl\015_JiraGet\analyzed - HADOOP.txt';
my $undefCheck;
my $i = 0;
my $j = 0;
my $title = 'temp';
my $dup = 0;
while(<$input>) { chomp; push #totalReport, $_; }
foreach(#totalReport)
{
$eachAddr = $_;
$undefCheck = get($eachAddr);
if(defined($undefCheck) == 0) { next; }
# Copy one XML file to 'temp.txt' and then close the file.
open $tempFile, '>', $tempFilePath or die "Cannot open 1: $!\n";
print $tempFile get($eachAddr);
close $tempFile;
# If the entry is a duplicate, go on to the next entry
open $tempFile, '<', $tempFilePath or die "Cannot open 2: $!\n";
($title, $dup) = isDuplicate($tempFile, $title);
if($dup == 1) { close $tempFile; next; }
close $tempFile;
say ++$i . "th report!!!";
# Copy one XML file to HDD.
if($eachAddr =~ /.*\/(.*)/)
{
say $valid $1;
open $copyFile, '>', $copyFilePath . $1 or die "Cannot open 3: $!\n";
print $copyFile get($eachAddr);
close $copyFile;
}
# If the entry is NOT fixed or resolved, go on to the next entry
open $tempFile, '<', $tempFilePath or die "Cannot open 4: $!\n";
if(isFixCloseResolve($tempFile) == 0) { close $tempFile; next; }
close $tempFile;
# Analyze one entry
open $tempFile, '<', $tempFilePath or die "Cannot open 5: $!\n";
open $analyzed, '>>', $analyzedPath or die "Cannot open 6: $!\n";
analyzeOneReport($tempFile, $analyzed);
close $tempFile;
close $analyzed;
say ' ' . ++$j . "th fixed & closed report!!!";
}
say "$i total reports.";
say "$j total fixed & closed reports.";
close $input;
close $valid;
say "Finished!";
sub isDuplicate
{
my $iReport = $_[0];
my $title = 'temp';
my $dup = 0;
while(<$iReport>)
{
if ($_ =~ /.*\<title>(.*)\<\/title>/)
{
if($1 ne 'ASF JIRA') { $title = $1; if($title eq $_[1]) { $dup = 1; } last; }
}
}
return ($title, $dup);
}
# returns 1 if an entry is a Bug and Fixed and Closed
sub isFixCloseResolve
{
my $iReport = $_[0];
my $isCloseResolve = 0;
my $isFixed = 0;
while(<$iReport>)
{
if ($_ =~ /.*\<status[^>]*>(.*)\<\/status>/) { if(($1 eq 'Closed')||($1 eq 'Resolved')) { $isCloseResolve = 1;} }
elsif($_ =~ /.*\<resolution[^>]*>(.*)\<\/resolution>/) { if($1 eq 'Fixed') { $isFixed = 1;} }
}
return $isCloseResolve * $isFixed;
}
sub analyzeOneReport
{
my $iReport = $_[0];
my $oReport = $_[1];
while(<$iReport>)
{
chomp;
if ($_ =~ /.*\<title>(.*)\<\/title>/) { if($1 ne 'ASF JIRA') { say $oReport "Title : $1"; } }
elsif($_ =~ /.*\<assignee username="(.*)">.*\<\/assignee>/) { say $oReport "Assignee: $1"; }
elsif($_ =~ /.*\<reporter username="(.*)">.*\<\/reporter>/) { say $oReport "Reporter: $1"; }
elsif($_ =~ /.*\<type[^>]*>(.*)\<\/type>/) { say $oReport "Type : $1"; }
elsif($_ =~ /.*\<priority[^>]*>(.*)\<\/priority>/) { say $oReport "Priority: $1"; }
elsif($_ =~ /.*\<created>(.*)\<\/created>/) { say $oReport "Created : $1"; }
elsif($_ =~ /.*\<resolved>(.*)\<\/resolved>/) { say $oReport "Resolved: $1"; }
}
say $oReport '--------------------------------------------';
}
--- Postscript ---
Oh, I was wrong on '>' part!! Thank you everyone!!
But when I changed that into '>', still nothing was written on the file 'DURING PROGRAM RUNNING TIME'.
So I was confused...and I found that Perl actually writes the contents to the file 'WHEN IT CLOSED THE FILE'.
So during running time, for 4~8 hours, I could not see anything in the file.
Data is written on the file when the file is closed.
That's one of the reason why I thought this code was not working. ^^;
Hope nobody else suffer from this problem again! :)
Here:
open my $valid, '<',....
you have opened $valid for reading. If you wish to write to the file, you must instead write:
open my $valid, '>',....
If you need to keep existing contents and write only to the end then instead use
open my $valid, '>>',....
You're only opening the file under the $valid file handle with read capabilities, as you can see from this line:
open my $valid, '<', 'c:\perl\015_JiraGet\valid.txt' or die "Cannot read: $!\n";
So nothing you write to the file will actually go into it. Change it to read/write (or append, if you need it, just use +>> instead of +> in the code below), and you should be good, as follows:
open my $valid, '+>', 'c:\perl\015_JiraGet\valid.txt' or die "Cannot read: $!\n";
I am going to review this code as if it had been posted to Code Review.
First off you are writing in Perl as if it were C. Which in general is not that bad, but it does mean that you are doing quite a bit more work than is necessary.
Instead of using this overly verbose, and potentially memory intensive:
my #totalReport
...
while(<$input>) { chomp; push #totalReport, $_; }
foreach(#totalReport)
{
$eachAddr = $_;
...
}
while( my $addr = <$input> ){
chomp $addr;
...
}
Notice how I've eliminated a variable, and made it so that it loops on the input once, instead of twice. It also doesn't keep the values in memory over the entire length of the program.
Instead of opening a file for writing, closing it, and opening it again:
my $tempFile;
open $tempFile, '>', $tempFilePath or die "Cannot open 1: $!\n";
print $tempFile get($eachAddr);
close $tempFile;
open $tempFile, '<', $tempFilePath or die "Cannot open 2: $!\n";
open my $tempFile, '+>', $tempFilePath or die "Can't open '$tempFilePath' with mode'+>': '$!'";
print $tempFile get($eachAddr);
seek $tempFile, 0, 0;
Instead of getting the text at the given URL twice, and using a weird defined test:
$undefCheck = get($eachAddr);
if(defined($undefCheck) == 0) { next; }
...
print $tempFile get($eachAddr);
my $text = get( $addr );
next unless defined $text;
...
print $tempFile $text;
Instead of a bunch of:
open ... or die ...;
I would use autodie.
use autodie;
...
# will now die on error and will tell you the file it fails on.
open my $fh, '<', $filename;
Another thing I would like to point out is that die "...\n" prevents die from appending the location of the error. The only time you should do that is if the default behaviour is unhelpful.
If you closed $tempFile before checking $dup this could be simpler:
if($dup == 1) { close $tempFile; next; }
close $tempFile;
close $tempFile;
next if $dup;
Instead of this repetitive block of code:
while(<$iReport>)
{
chomp;
if ($_ =~ /.*\<title>(.*)\<\/title>/) { if($1 ne 'ASF JIRA') { say $oReport "Title : $1"; } }
elsif($_ =~ /.*\<assignee username="(.*)">.*\<\/assignee>/) { say $oReport "Assignee: $1"; }
elsif($_ =~ /.*\<reporter username="(.*)">.*\<\/reporter>/) { say $oReport "Reporter: $1"; }
elsif($_ =~ /.*\<type[^>]*>(.*)\<\/type>/) { say $oReport "Type : $1"; }
elsif($_ =~ /.*\<priority[^>]*>(.*)\<\/priority>/) { say $oReport "Priority: $1"; }
elsif($_ =~ /.*\<created>(.*)\<\/created>/) { say $oReport "Created : $1"; }
elsif($_ =~ /.*\<resolved>(.*)\<\/resolved>/) { say $oReport "Resolved: $1"; }
}
use List::Util qw'max';
my #simple_tags = qw'title type priority created resolved';
my $simple_tags_length = max map length, #simple_tags, qw'assignee reporter';
my $simple_tags = join '|', #simple_tags;
...
while( <$iReport> ){
my($tag,$contents);
if( ($tag,$contents) = /<($simple_tags)[^>]*>(.*?)<\/\g{1}>/ ){
}elsif( ($tag,$contents) = /<(assignee|reporter) username="(.*?)">.*?<\/\g{1}>/ ){
}else{ next }
printf $oReport "%-${simple_tags_length}s: %s\n", ucfirst($tag), $contents;
}
While this code isn't any shorter, or clearer, it would be very easy to add another tag to compare against. So it isn't really better, as less repetitive.
I would like to point out that $_ =~ /.../ is better written as /.../.
You could use or instead of if/elsif/else with empty blocks.
...
while( <$iReport> ){
/<($simple_tags)[^>]*>(.*?)<\/\g{1}>/
or /<(assignee|reporter) username="(.*?)">.*?<\/\g{1}>/
or next;
my($tag,$contents) = ($1,$2);
printf $oReport "%-${simple_tags_length}s: %s\n", ucfirst($tag), $contents;
}
It might be best to combine them into a single regex using /x and (?<NAME>REGEX) syntax with %- or %+.
...
while( <$iReport> ){
/
(?:
# simple tags
< (?<tag> $simple_tags ) [^>]* >
# contents between tags
(?<contents> .*? )
|
# tags with contents in `username` attribute
<
(?<tag> assignee|reporter )
[ ]
# contents in `username` attribute
username = "(?<contents> .*? )"
>
.*? # throw out stuff between tags
)
<\/ \g{tag} > # end tag matches start tag
/x or next; # skip if it doesn't match
printf $oReport "%-${simple_tags_length}s: %s\n", ucfirst($+{tag}), $+{contents};
}
Or even use (DEFINE) (I'll leave that as an exercise for the reader since this is already fairly long).
Perhaps the worst part of the code is that you define almost all of you variables up-front.
my #totalReport;
my $eachAddr;
my $copyFile;
my $copyFilePath = 'c:\perl\015_JiraGet\HADOOP XML\\';
my $tempFile;
my $tempFilePath = 'c:\perl\015_JiraGet\temp.txt';
my $analyzed;
my $analyzedPath = 'c:\perl\015_JiraGet\analyzed - HADOOP.txt';
my $undefCheck;
my $i = 0;
my $j = 0;
my $title = 'temp';
my $dup = 0;
This means that you are practically using global variables. While some of these appear to need to be defined there, some of them don't, and therefore shouldn't be defined there. You should really be defining your variables at the point you need them, or at least at the beginning of the block where you need them.
The reason you aren't getting the output until the file is closed is because Perl buffers the output.
Perl normally buffers output so it doesn't make a system call for every bit of output. By saving up output, it makes fewer expensive system calls. …
- perlfaq5
The old way to turn off buffering is to select the file for output and set $| to a non-zero value, and then re-select the original output.
{
my $previous_default = select($file); # save previous default output handle
$| = 1; # autoflush
select($previous_default); # restore previous default output handle
}
The new way is to use $file->autoflush which comes from IO::Handle.
(The module will get automatically loaded for you on recent versions of Perl 5)
You can also flush the output when you choose by using flush or $file->flush.
IO::Handle also adds a $file->printflush which turns on autoflush temporarily during the print.

How to check for files that has two different extensions in Perl

I have a file reflog with the content as below. There will be items with same name but different extensions. I want to check that for each of the items (file1, file2 & file3 here as example), it needs to be exist in both extensions (.abc and .def). If both extensions exist, it will perform some regex and print out. Else it will just report out with the file name together with extension (ie, if only on of file1.abc or file1.def exists, it will be printed out).
reflog:
file1.abc
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file8abc.def
file7.abc
file1.def
file9abc.def
file10def.abc
My script is as below (editted from yb007 script), but I have some issues with the output that I don;t know how to resolve. I notice the output is going to be wrong when the reflog file having any file with the name *abc.def (such as ie. file8abc.def & file9abc.def). It will be trim down the last 4 suffix and return the wrong .ext (which is .abc here but I suppose it should be .def).
#! /usr/bin/perl
use strict;
use warnings;
my #files_abc ;
my #files_def ;
my $line;
open(FILE1, 'reflog') || die ("Could not open reflog") ;
open (FILE2, '>log') || die ("Could not open log") ;
while ($line = <FILE1>) {
if($line=~ /(.*).abc/) {
push(#files_abc,$1);
} elsif ($line=~ /(.*).def/) {
push(#files_def,$1); }
}
close(FILE1);
my %first = map { $_ => 1 } #files_def ;
my #same = grep { $first{$_} } #files_abc ;
my #abc_only = grep { !$first{$_} } #files_abc ;
foreach my $abc (sort #abc_only) {
$abc .= ".abc";
}
my %second = map {$_=>1} #files_abc;
my #same2 = grep { $second{$_} } #files_def; ##same and same2 are equal.
my #def_only = grep { !$second{$_} } #files_def;
foreach my $def (sort #def_only) {
$def .= ".def";
}
my #combine_all = sort (#same, #abc_only, #def_only);
print "\nCombine all:-\n #combine_all\n" ;
print "\nList of files with same extension\n #same";
print "\nList of files with abc only\n #abc_only";
print "\nList of files with def only\n #def_only";
foreach my $item (sort #combine_all) {
print FILE2 "$item\n" ;
}
close (FILE2) ;
My output is like this which is wrong:-
1st:- print screen output as below:
Combine all:-
file.abc file.abc file1 file10def.abc file2 file3 file4.abc file5 file6.def file7.abc
List of files with same extension
file1 file2 file3 file5
List of files with abc only
file4.abc file.abc file7.abc file.abc file10def.abc
List of files with def only
file6.def
Log output as below:
**file.abc
file.abc**
file1
file10def.abc
file2
file3
file4.abc
file5
file6.def
file7.abc
Can you pls help me take a look where gies wrong? Thanks heaps.
ALWAYS add
use strict;
use warnings;
to the head of your program. They will catch most simple errors before you need to ask for help.
You should always check whether a file open succeeded with open FILE, "reflog" or die $!;
You are using a variable $ine that doesn't exist. You mean $line
The lines you read into the array contain a trailing newline. Write chomp #lines; to remove them
Your regular expressions are wrong and you need || instead of &&. Instead write if ($line =~ /\.(iif|isp)$/)
If you still have problems when these are fixed then please ask again.
Aside from the errors already pointed out, you appear to be loading #lines from FUNC instead of FILE. Is that also a typo?
Also, If reflog truly contains a series of lines with one filename on each line, why would you ever expect the conditional "if ($line =~ /.abc/ && $line =~ /.def/)" to evaluate true?
It would really help if you could post an example from the actual file you are reading from, along with the actual code you are debugging. Or at least edit the question to fix the typos already mentioned
use strict;
use warnings;
my #files_abc;
my #files_def;
my $line;
open(FILE,'reflog') || die ("could not open reflog");
while ($line = <FILE>) {
if($line=~ /(.*)\.abc/) {
push(#files_abc,$1);
}
elsif($line=~ /(.*)\.def/) {
push(#files_def,$1);
}
}
close(FILE);
my %second = map {$_=>1} #files_def;
my #same = grep { $second{$_} } #files_abc;
print "\nList of files with same extension\n #same";
foreach my $abc (#files_abc) {
$abc .= ".abc";
}
foreach my $def (#files_def) {
$def .= ".def";
}
print "\nList of files with abc extension\n #files_abc";
print "\nList of files with def extension\n #files_def";
Output is
List of files with same extension
file1 file2 file3 file5
List of files with abc extension
file1.abc file2.abc file3.abc file4.abc file5.abc file7.abc file10def.abc
List of files with def extension
file2.def file3.def file5.def file6.def file8abc.def file1.def file9abc.def
Hope this helps...
You don't need to slurp the whole file; you can read one line at a time. I think this code works on this extended version of your reflog file:
xx.pl
#!/usr/bin/env perl
use strict;
use warnings;
open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log' or die "Failed to create file log for writing ($!)";
my ($oldline, $oldname, $oldextn) = ("", "", "");
while (my $newline = <$file>)
{
chomp $newline;
$newline =~ s/^\s*//;
my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
if ($oldname eq $newname)
{
# Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
print $func "$newname\n";
print "$newname\n";
$oldline = "";
$oldname = "";
$oldextn = "";
}
else
{
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
$oldline = $newline;
$oldname = $newname;
$oldextn = $newextn;
}
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;
Since the code does not actually check the extensions, it would be feasible to omit $oldextn and $newextn; on the other hand, you might well want to check the extensions if you're sufficiently worried about the input format to need to deal with leading white space.
I very seldom find it good for a processing script like this to remove its own input, hence I've left unlink "reflog"; commented out; your mileage may vary. I would also often just read from standard input and write to standard output; that would simplify the code quite a bit. This code writes to both the log file and to standard output; obviously, you can omit either output stream. I was too lazy to write a function to handle the writing, so the print statements come in pairs.
This is a variant on control-break reporting.
reflog
file1.abc
file1.def
file2.abc
file2.def
file3.abc
file3.def
file4.abc
file5.abc
file5.def
file6.def
file7.abc
Output
$ perl xx.pl
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$ cat log
file1
file2
file3
file4.abc
file5
file6.def
file7.abc
$
To handle unsorted file names with blank lines
#!/usr/bin/env perl
use strict;
use warnings;
open my $file, '<', "reflog" or die "Failed to open file reflog for reading ($!)";
open my $func, '>', 'log' or die "Failed to create file log for writing ($!)";
my #lines;
while (<$file>)
{
chomp;
next if m/^\s*$/;
push #lines, $_;
}
#lines = sort #lines;
my ($oldline, $oldname, $oldextn) = ("", "", "");
foreach my $newline (#lines)
{
chomp $newline;
$newline =~ s/^\s*//;
my ($newname, $newextn) = ($newline =~ m/(.*)([.][^.]*)$/);
if ($oldname eq $newname)
{
# Found the same file - presumably $oldextn eq ".abc" and $newextn eq ".def"
print $func "$newname\n";
print "$newname\n";
$oldline = "";
$oldname = "";
$oldextn = "";
}
else
{
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
$oldline = $newline;
$oldname = $newname;
$oldextn = $newextn;
}
}
print $func "$oldline\n" if ($oldline);
print "$oldline\n" if ($oldline);
#unlink "reflog" ;
chmod 0644, "log";
close $func;
close $file;
This is very similar to the original code I posted. The new lines are these:
my #lines;
while (<$file>)
{
chomp;
next if m/^\s*$/;
push #lines, $_;
}
#lines = sort #lines;
my ($oldline, $oldname, $oldextn) = ("", "", ""); # Old
foreach my $newline (#lines)
This reads the 'reflog' file, skipping blank lines, saving the rest in the #lines array. When the lines are all read, they're sorted. Then, instead of a loop reading from the file, the new code reads entries from the sorted array of lines. The rest of the processing is as before. For your described input file, the output is:
file1
file2
file3
Urgh: the chomp $newline; is not needed, though it is not otherwise harmful. The old-fashioned chop (a precursor to chomp) would have been dangerous. Score one for modern Perl.
open( FILE, "reflog" );
open( FUNC, '>log' );
my %seen;
while ( chomp( my $line = <FILE> ) ) {
$line =~ s/^\s*//;
if ( $ine =~ /(\.+)\.(abc|def)$/ ) {
$seen{$1}++;
}
}
foreach my $file ( keys %seen ) {
if ( $seen{$file} > 1 ) {
## do whatever you want to
}
}
unlink "reflog";
chmod( 0750, "log" );
close(FUNC);
close(FILE);