perl file size calculation not working - perl

I am trying to write a simple perl script that will iterate through the regular files in a directory and calculate the total size of all the files put together. However, I am not able to get the actual size of the file, and I can't figure out why. Here is the relevant portion of the code. I put in print statements for debugging:
$totalsize = 0;
while ($_ = readdir (DH)) {
print "current file is: $_\t";
$cursize = -s $_;
print "size is: $cursize\n";
$totalsize += $cursize;
}
This is the output I get:
current file is: test.pl size is:
current file is: prob12.pl size is:
current file is: prob13.pl size is:
current file is: prob14.pl size is:
current file is: prob15.pl size is:
So the file size remains blank. I tried using $cursize = $_ instead but the only effect of that was to retrieve the file sizes for the current and parent directories as 4096 bytes each; it still didn't get any of the actual file sizes for the regular files.
I have looked online and through a couple of books I have on perl, and it seems that perl isn't able to get the file sizes because the script can't read the files. I tested this by putting in an if statement:
print "Cannot read file $_\n" if (! -r _);
Sure enough for each file I got the error saying that the file could not be read. I do not understand why this is happening. The directory that has the files in question is a subdirectory of my home directory, and I am running the script as myself from another subdirectory in my home directory. I have read permissions to all the relevant files. I tried changing the mode on the files to 755 (from the previous 711), but I still got the Cannot read file output for each file.
I do not understand what's going on. Either I am mixed up about how permissions work when running a perl script, or I am mixed up about the proper way to use -s _. I appreciate your guidance. Thanks!

If it isn't just your typo -s _ instead of the correct -s $_ then please remember that readdir returns file names relative to the directory you've opened with opendir. The proper way would be something like
my $base_dir = '/path/to/somewhere';
opendir DH, $base_dir or die;
while ($_ = readdir DH) {
print "size of $_: " . (-s "$base_dir/$_") . "\n";
}
closedir DH;
You could also take a look at the core module IO::Dir which offers a tie way of accessing both the file names and the attributes in a simpler manner.

You have a typo:
$cursize = -s _;
Should be:
$cursize = -s $_;

Related

How to get the list of files that are not in another directory in perl

I have to fix a Perl script, which does the following:
# Get the list of files in the staging directory; skip all beginning with '.'
opendir ERR_STAGING_DIR, "$ERR_STAGING" or die "$PID: Cannot open directory $ERR_STAGING";
#allfiles = grep !/^$ERR_STAGING\/\./, map "$ERR_STAGING/$_", readdir(ERR_STAGING_DIR);
closedir(ERR_STAGING_DIR);
I have two directories one is STAGING and other is ERROR. STAGING contains files like ABC_201608100000.fin and ERR_STAGING_DIR contains ABC_201608100000.fin.bc_lerr.xml. Now the Perl script is run as a daemon process which constantly looks for the files in ERR_STAGING_DIR directory and processes the error files.
However, my requirement is to do not process the file if ABC_201608100000.fin exists in STAGING.
Question:
Is there a way , I can filter the allfiles array and select files which don't exist in STAGING directory?
WHAT I HAVE TRIED:
I have done programmatic way to ignore the files that exist in STAGING dir. Though it is not working.
# Move file from the staging directory to the processing directory.
#splitf = split(/.bc_lerr.xml/,basename($file));
my $finFile = $STAGING . "/" . $splitf[0];
print LOG "$PID: Staging File $finFile \n";
foreach $file(#sorted_allfiles) {
if ( -e $finFile )
{
print LOG "$PID: Staging File still exist.. moving to next $finFile \n";
next;
}
# DO THE PROCESSING.
The questions of timing aside, I assume that a snapshot of files may be processed without worrying about new files showing up. I take it that #allfiles has all file names from the ERROR directory.
Remove a file name from the front of the array at each iteration. Check for the corresponding file in STAGING and if it's not there process away, otherwise push it on the back of the array and skip.
while (#allfiles)
{
my $errfile = shift #allfiles;
my ($file) = $errfile =~ /(.*)\.bc_lerr\.xml$/;
if (-e "$STAGING/$file")
{
push #allfiles, $errfile;
sleep 1; # more time for existing files to clear
next;
}
# process the error file
}
If the processing is faster than what it takes for existing files in STAGING to go away, we would exhaust all processable files and then continuously run file tests. There is no reason for such abuse of resources, thus the sleep, to give STAGING files some more time to go away. Note that if just one file in STAGING fails to go away this loop will keep checking it and you want to add some guard against that.
Another way would be to process the error files with a foreach, and add those that should be skipped to a separate array. That can then be attempted separately, perhaps with a suitable wait.
How suitable this is depends on details of the whole process. For how long do STAGING files hang around, and is this typical or exceptional? How often do new files show up? How many files are there typically?
If you only wish to filter out the error files that have their counterparts in STAGING
my #errfiles_nostaging = grep {
my ($file) = $_ =~ /(.*)\.bc_lerr\.xml$/;
not -e "$STAGING/$file";
} #allfiles;
The output array contains the files from #allfiles which have no corresponding file in $STAGING and can be readily processed. This would be suitable if the error files are processed very fast in comparison to how long the $STAGING files stay around.
The filter can be written in one statement as well. For example
grep { not -e "$STAGING/" . s/\.bc_lerr\.xml$//r } # / or
grep { not -e "$STAGING/" . (split /\.bc_lerr\.xml$/, $_)[0] }
The first example uses the non-destructive /r modifier, available since 5.14. It changes the substitution to return the changed string and not change the original one. See it in perlrequick and in perlop.
This is extremely brute force example, but if you have the contents of the staging directory in an array, you can check against that array when you read the contents of the error directory.
I've made some GIGANTIC assumptions about the relationship of the filenames -- basically that the stage directory contains the file truncated, specifically the way you listed in your example. If that's universally the case, then a substring would work even faster, but this example is a little more scalable, in the event your example was simplified to illustrate the issue.
use strict;
my #error = qw(
ABC_201608100000.fin.bc_lerr.xml
ABD_201608100000.fin.bc_lerr.xml
ABE_201608100000.fin.bc_lerr.xml
ABF_201608100000.fin.bc_lerr.xml
);
my #staging = qw(
ABC_201608100000.fin
ABD_201608100000.fin
);
foreach my $error (#error) {
my $stage = $error;
$stage =~ s/\.bc_lerr\.xml//;
unless (grep { /$stage/ } #staging) {
## process the file here
}
}
The grep in this example is O(n), so if you have a really large list of either array you would want to load this into a hash first, which would be O(1).

Why is my filesize the same after adding further data?

Having some trouble with Perl (I am brand new at it). I have one .txt file in the same directory. I'm planning to copy the file, print it to stdout, add more text to the copy, and compare file sizes. This is what I have so far:
#!/usr/local/bin/perl
use File::Copy;
copy("data.txt", "copyOfData.txt") or die "copy failed :(";
open (MYFILE, "data.txt") or die "open failed :(";
while (<MYFILE>) {
chomp;
print "$_\n";
}
$filesize = -s MYFILE;
print "MYFILE filesize is $filesize\n";
close (MYFILE);
open(MYCOPYFILE, ">>copyOfData.txt");
print MYCOPYFILE "\nextra data here blah blah blah\n";
$filesize = -s MYCOPYFILE;
print "MYCOPYFILE filesize is $filesize\n";
close (MYCOPYFILE);
However, the output I'm getting is as follows:
MYFILE filesize is 28
MYCOPYFILE filesize is 28
Surely the MYCOPYFILE size should be bigger than the MYFILE size as I've added extra text? I have checked both text files and the copy does have the extra text at the end.
Thanks for your help!
You can check the size of the file using the filename (you don't have to open it).
my $size = -s 'data.txt' ;
And you should always start your script with
use strict ;
use warnings ;
And opening files is better done with the three-argument-version of open
open my $filehandle , '<' , 'filename' or die "Failed to open: $!" ;
$filesize = -s MYFILE;
As pointed out above, -s doesn't work on filehandles. If you want to get the size from a file handle, use stat
$filesize = ((stat(MYFILE))[7]);
See perldoc -f -X for details of -s and friends, see perldoc -f stat for stat
Check the filesize after closing the file.
If this does not change the filesize try adding more text, maybe the size wil change then.
Greetings
dgw's answer is correct.
Another way is to use File::stat as:
#!/usr/bin/perl
use strict;
use warnings;
use File::stat;
my $filesize = stat("test.txt")->size;
print "Size: $filesize\n";
exit 0;
Perl automatically do some buffering operation on IO objects. The contents may not be written to disk right after you call print FH "blalba". However, the -s function reads file size from disk. So when you think you have "updated" the file, you might get the size smaller than expected.
To get the correct size, flush the content in the buffer to the disk and then fetch the size with -s. Note that close(FH) will first flush the buffer automatically and then close the file handle. So you can put the -s operation after the close call to get accurate size.
Or, flush the buffer explicitly by calling flush() of IO::Handle before getting size:
open(MYCOPYFILE, ">>copyOfData.txt");
print MYCOPYFILE "\nextra data here blah blah blah\n";
MYCOPYFILE->flush();
$filesize = -s MYCOPYFILE;
print "MYCOPYFILE filesize is $filesize\n";
-s operates on file handle, directory handle and expression representing file name. So your code about -s works fine.

Perl - Print output to files in a different directory

Good morning!
I am sorry if this has been posted before - I could not find it. I just need a little point into the right direction, this is my homework and I feel it is almost done. What I want to do is take data from files in a different folder from where the script is run, process the data inside Perl, then print the output to another directory. Now I got the two parts done, but what I fail with is that Perl does not find the path to where I want to save the files. It just sais "No file or directory with that names exists", but it does! Here is the part of the script for that:
my #files = <$ENV{HOME}/Docs/unprocessed/*.txt>;
my $path = "$ENV{HOME}/Docs/results";
<looping through #files, processing each file in the unprocessed folder...>
open (OUTFILE, $path . '>$file') or die $!;
print OUTFILE ""; # "" Is really the finished calculations from the loop, not important here.
close FILE;
close OUTFILE;
I bet its something stupid...
Because you are mixing the "write" token > in with the filename. This:
open (OUTFILE, $path . '>$file')
Should be:
open (OUTFILE, ">$path/$file")
You will also likely have to strip the .../Docs/unprocessed/ prefix from your filename:
use File::Basename;
open (OUTFILE, ">$path/" . basename($file))

Perl - Import contents of file into another file

The below code I'm trying to produce is trying to do this:
What I'm trying to do is running a BTEQ script that gets data from a DB then exports to a flat-file, that flat file is picked up my a Perl script (the above code), then with this post trying to get perl to import that file it gets into a fastload file. Does that make more sense?
while (true) {
#Objective: open dir, get flat-file which was exported from bteq
opendir (DIR, "C:/q2refresh/") or die "Cannot open /my/dir: $!\n"; #open directory with the flat-file
my #Dircontent = readdir DIR;
$filetobecopied = "C:/q2refresh/q2_refresh_prod_export.txt"; #flatfile exported from bteq
$newfile = "C:/q2refresh/Q2_FastLoadFromFlatFile.txt"; #new file flat-file contents will be copied to as "fastload"
copy($filetobecopied, $newfile) or die "File cannot be copied.";
close DIR;
my $items_in_dir = #Dircontent;
if ($items_in_dir > 2) { # > 2 because of "." and ".."
-->>>>>> # take the copied FlatFile above and import into a fastload script located at C:/q2refresh/q2Fastload.txt
}
else {sleep 100;}
}
I need help with implementing the above bolded section. How do I import the contents of C:/q2refresh/Q2_FastLoadFromFlatFile.txt into a fastload script located at C:/q2refresh/q2Fastload.txt.
// I apologize if this is somewhat newbish, but I am new to Perl.
Thanks.
if ($items_in_dir > 2) { # > 2 because of "." and ".."
Well, when including . and .., plus the two copies of q2_refresh_prod_export.txt, you will always have more than 2 files in the directory. If such a case should occur that q2_refresh_prod_export.txt is not copied, the script will die. So the else clause will never be called.
Also, it is pointless to copy the file to a new place, if you are simply going to copy it to another place in a second. It's not like "cut & paste", you actually, physically copy the file to a new file, not a clipboard.
If by "import into" mean that you want to append the contents of q2_refresh_prod_export.txt to an existing q2Fastload.txt, there are ways to do that, such as what Troy suggested in another answer, with an open and >> (append to).
You will have to sort out what you mean by this whole $items_in_dir condition. You are keeping files and copying files in that directory, so what is it exactly that you are checking for? Whether the files have all been removed (by some other process)?
I can't tell what you're trying to do. Could it be that you just want to do this?
open SOURCE, $newfile;
open SINK, '>>C:/q2refresh/q2Fastload.txt';
while (<SOURCE>) {
print SINK $_;
}
close SOURCE;
close SINK;
That will append the contents of $newfile to your fastload file.

Why did File::Find finish short of completely traversing a large directory?

A directory exists with a total of 2,153,425 items (according to Windows folder Properties). It contains .jpg and .gif image files located within a few subdirectories. The task was to move the images into a different location while querying each file's name to retrieve some relevant info and store it elsewhere.
The script that used File::Find finished at 20462 files. Out of curiosity I wrote a tiny recursive function to count the items which returned a count of 1,734,802. I suppose the difference can be accounted for by the fact that it didn't count folders, only files that passed the -f test.
The problem itself can be solved differently by querying for file names first instead of traversing the directory. I'm just wondering what could've caused File::Find to finish at a small fraction of all files.
The data is stored on an NTFS file system.
Here is the meat of the script; I don't think including DBI stuff would be relevant since I reran the script with nothing but a counter in process_img() which returned the same number.
find(\&process_img, $path_from);
sub process_img {
eval {
return if ($_ eq "." or $_ eq "..");
## Omitted querying and composing new paths for brevity.
make_path("$path_to\\img\\$dir_area\\$dir_address\\$type");
copy($File::Find::name, "$path_to\\img\\$dir_area\\$dir_address\\$type\\$new_name");
};
if ($#) { print STDERR "eval barks: $#\n"; return }
}
EDIT:
eval barked a few times regarding BDI errors:
DBD::CSV::db do failed: Mismatched single quote before:
'INSERT INTO img_info (path, type, floorplan, legacy_id)
VALUES (
?0?building?1?0?2?19867'
)' at C:/perl/site/lib/SQL/Statement.pm line 77
[for Statement "
INSERT INTO img_info (path, type, floorplan, legacy_id)
VALUES (
'wal/15 Broad Street/building/nyc-Wall-St--South-St--Seaport-condo-elevator- building-52201501.jpg',
'building',
'0',
'19867'
)
"]
I assume that's due to the double dash between 'St' and 'South'. No errors of other nature were reported.
And here is another method I used to count files:
count_images($path_from);
sub count_images {
my $path = shift;
opendir my $images, $path or die "died opening $path";
while (my $item = readdir $images) {
next if $item eq '.' or $item eq '..';
$img_counter++ && next if -f "$path/$item";
count_images("$path/$item") if -d "$path/$item";
}
closedir $images or die "died closing $path";
}
print $img_counter;
It could have run out of resources? (memory, file descriptors, etc...?).
or it could have been some funky file name (easy to test by re-running again but removing 10 files - if it stops on exact same file, that filename is the culprit)
If you can trace the memory footprint, that'd tell you if you have a memory leak (see recent SO question on memory leaks to help with that).
And as Ether said, we could offer hopefully more than general investigative idea had you pasted in the code.
UPDATE
Based one your code:
Please indicate whether eval barks anything to STDERR
More importantly, any IO operations need to be error-checked. E.g.
copy($something,$other)
|| die "Copy $something to $other died with error: $!\n"; # or print
# Same for making the directory