Zipping files with perl, last line is cut off - perl

I tried to zip some CSV files which are about 5 MB with Perl. Below is my zip code.
The files are zipped but when I open them with the Windows unzip utility I found that the last lines of the CSV files are missing. What could be the problem here? I tried to change the chuncksize and the desiredCompressionLevel but this didn't help.
sub zip_util{
my $directory = shift;
$zip = Archive::Zip->new();
$zip->setChunkSize(65536);
# Add a file from disk
my $file1=File::Spec->catfile($directory, 'file.csv');
my $file2=File::Spec->catfile($directory, 'file2.csv');
my $file3=File::Spec->catfile($directory, 'fil3.csv');
$zip->addFile($file1,'file1.csv')->desiredCompressionLevel( 6 );
$zip->addFile($file2,'file2.csv')->desiredCompressionLevel( 6 );
$zip->addFile($fil3,'file3.csv')->desiredCompressionLevel( 6 );
# Save the Zip file
my $zipped_file=File::Spec->catfile($directory,'files.zip');
unless ( $zip->writeToFileNamed($zipped_file) == AZ_OK ) {
print LOG ": Zip Creation error\n";
}

I've checked it with warnings and strictures and I've found certain problems.
$zip doesn't use my (I don't know if it's intentional, but use strict really helps with globals like this).
You're running $zip->addFile($fil3,'file3.csv'). $fil3 variable definitely doesn't exist. If anything, this variable is $file3.
I guess it's issue during Copy and Paste, but subroutine doesn't have matching brace.
I've done quick script which I've used during testing.
use strict;
use warnings;
use Archive::Zip qw( :ERROR_CODES );
use File::Spec;
sub zip_util {
my ($directory) = #_;
my $zip = Archive::Zip->new();
$zip->setChunkSize(65536);
# Add a file from disk
my $file1 = File::Spec->catfile( $directory, 'file.csv' );
my $file2 = File::Spec->catfile( $directory, 'file2.csv' );
my $file3 = File::Spec->catfile( $directory, 'file3.csv' );
$zip->addFile( $file1, 'file1.csv' )->desiredCompressionLevel(6);
$zip->addFile( $file2, 'file2.csv' )->desiredCompressionLevel(6);
$zip->addFile( $file3, 'file3.csv' )->desiredCompressionLevel(6);
# Save the Zip file
my $zipped_file = File::Spec->catfile( $directory, 'files.zip' );
if ( $zip->writeToFileNamed($zipped_file) != AZ_OK ) {
print STDERR "Zip Creation error\n";
}
}
zip_util '.';
The problem is that it was working. So, I've done this script to make some sort of 5MB files:
use strict;
use warnings;
use 5.010;
for ( 1 .. 524_288 ) {
my $number = $_ % 10;
say "$number-------";
}
Both files in ZIP and original "CSV" had this same size and content. So, it's probably the second issue which I've mentioned - use of $fil3 variable or something with your files (which you sadly didn't uploaded, so I cannot look into those).

Related

Perl finding a file based off it's extension through all subdirectories

I have a segment of code that is working that finds all of the .txt files in a given directory, but I can't get it to look in the subdirectories.
I need my script to do two things
scan through a folder and all of its subdirectories for a text file
print out just the last segments of its path
For example, I have a directory structed
C:\abc\def\ghi\jkl\mnop.txt
I script that points to the path C:\abc\def\. It then goes through each of the subfolders and finds mnop.txt and any other text file that is in that folder.
It then prints out ghi\jkl\mnop.txt
I am using this, but it really only prints out the file name and if the file is currently in that directory.
opendir(Dir, $location) or die "Failure Will Robertson!";
#reports = grep(/\.txt$/,readdir(Dir));
foreach $reports(#reports)
{
my $files = "$location/$reports";
open (res,$files) or die "could not open $files";
print "$files\n";
}
I do believe that this solution is more simple and easier to read. I hope it is helpful !
#!/usr/bin/perl
use File::Find::Rule;
my #files = File::Find::Rule->file()
->name( '*.txt' )
->in( '/path/to/my/folder/' );
for my $file (#files) {
print "file: $file\n";
}
What about using File::Find?
#!/usr/bin/env perl
use warnings;
use strict;
use File::Find;
# for example let location be tmp
my $location="tmp";
sub find_txt {
my $F = $File::Find::name;
if ($F =~ /txt$/ ) {
print "$F\n";
}
}
find({ wanted => \&find_txt, no_chdir=>1}, $location);
Much easier if you just use File::Find core module:
#!/usr/bin/perl
use strict;
use warnings FATAL => qw(all);
use File::Find;
my $Target = shift;
find(\&survey, #ARGV);
sub survey {
print "Found $File::Find::name\n" if ($_ eq $Target)
}
First argument: pathless name of file to search for. All subsequent arguments are directories to check. File::Find searches recursively, so you only need to name the top of a tree, all subdirectories will automatically be searched as well.
$File::Find::name is the full pathname of the file, so you could subtract your $location from that if you want a relative path.

Find::File to search a directory of a list of files

I'm writing a Perl script and I'm new to Perl -- I have a file that contains a list of files. For each item on the list I want to search a given directory and its sub-directories to find the file return the full path. I've been unsuccessful thus far trying to use File::Find. Here's what I got:
use strict;
use warnings;
use File::Find;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
my #file_list;
find(\&wanted, $directory);
sub wanted {
open (FILE, $input_file);
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
close (FILE);
return #file_list;
}
I find File::Find::Rule a tad easier and more elegant to use.
use File::Find::Rule;
my $path = '/some/path';
# Find all directories under $path
my #paths = File::Find::Rule->directory->in( $path );
# Find all files in $path
my #files = File::Find::Rule->file->in( $path );
The arrays contain full paths to the objects File::Find::Rule finds.
File::Find is used to traverse a directory structure in the filesystem. Instead of doing what you're trying to do, namely, have the wanted subroutine read in the file, you should read in the file as follows:
use strict;
use warnings;
use vars qw/#file_list/;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
open FILE, "$input_file" or die "$!\n";
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
# do what you need to here with the #file_list array
Okay, well re-read the doc and I misunderstood the wanted subroutine. The wanted is a subroutine that is called on every file and directory that is found. So here's my code to take that into account
use strict;
use warnings;
use File::Find;
my $directory = '/home/directory/';
my $input_file = '/home/directory/file_list';
my #file_list;
open (FILE, $input_file);
foreach my $file (<FILE>) {
chomp($file);
push ( #file_list, $file );
}
close (FILE);
find(\&wanted, $directory);
sub wanted {
if ( $_ ~~ #file_list ) {
print "$File::Find::name\n";
}
return;
}

How can I check the extension of a file using Perl?

To my perl script, a file is passed as an arguement. The file can be a .txt file or a .zip file containing the .txt file.
I want to write code that looks something like this
if ($file is a zip) {
unzip $file
$file =~ s/zip$/txt/;
}
One way to check the extension is to do a split on . and then match the last result in the array (returned by split).
Is there some better way?
You can use File::Basename for this.
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
use File::Basename;
my #exts = qw(.txt .zip);
while (my $file = <DATA>) {
chomp $file;
my ($name, $dir, $ext) = fileparse($file, #exts);
given ($ext) {
when ('.txt') {
say "$file is a text file";
}
when ('.zip') {
say "$file is a zip file";
}
default {
say "$file is an unknown file type";
}
}
}
__DATA__
file.txt
file.zip
file.pl
Running this gives:
$ ./files
file.txt is a text file
file.zip is a zip file
file.pl is an unknown file type
Another solution is to make use of File::Type which determines the type of binary file.
use strict;
use warnings;
use File::Type;
my $file = '/path/to/file.ext';
my $ft = File::Type->new();
my $file_type = $ft->mime_type($file);
if ( $file_type eq 'application/octet-stream' ) {
# possibly a text file
}
elsif ( $file_type eq 'application/zip' ) {
# file is a zip archive
}
This way, you do not have to deal with missing/wrong extensions.
How about checking the end of the filename?
if ($file =~ /\.zip$/i) {
and then:
use strict;
use Archive::Extract;
if ($file =~ /\.zip$/i) {
my $ae = Archive::Extract->new(archive => $file);
my $ok = $ae->extract();
my $files = $ae->files();
}
more information here.
You can check the file extension using a regex match as:
if($file =~ /\.zip$/i) {
# $file is a zip file
}
I know this question is several years old, but for anyone that comes here in the future, an easy way to break apart a file path into its constituent path, filename, basename and extension is as follows.
use File::Basename;
my $filepath = '/foo/bar.txt';
my ($basename, $parentdir, $extension) = fileparse($filepath, qr/\.[^.]*$/);
my $filename = $basename . $extension;
You can test it's results with the following.
my #test_paths = (
'/foo/bar/fish.wibble',
'/foo/bar/fish.',
'/foo/bar/fish.asdf.d',
'/foo/bar/fish.wibble.',
'/fish.wibble',
'fish.wibble',
);
foreach my $this_path (#test_paths) {
print "Current path: $this_path\n";
my ($this_basename, $parentdir, $extension) = fileparse($this_path, qr/\.[^.]*$/);
my $this_filename = $this_basename . $extension;
foreach my $var (qw/$parentdir $this_filename $this_basename $extension/) {
print "$var = '" . eval($var) . "'\n";
}
print "\n\n";
}
Hope this helps.
Why rely on file extension? Just try to unzip and use appropriate exception handling:
eval {
# try to unzip the file
};
if ($#) {
# not a zip file
}
Maybe a little bit late but it could be used as an alternative reference:
sub unzip_all {
my $director = shift;
opendir my $DIRH, "$director" or die;
my #files = readdir $DIRH;
foreach my $file (#files){
my $type = `file $director/$file`;
if ($type =~ m/gzip compressed data/){
system "gunzip $director/$file";
}
}
close $DIRH;
return;
}
Here is possible to use linux file executing it from perl by the use of backticks(``). You area able to pass the path of your folder and evaluate if exists a file that is classified by file as gzip compressed.
If you do not mind using a perl module, you can use Module::Generic::File, such as:
use Module::Generic::File qw( file );
my $f = file( '/some/where/file.zip' );
if( $f->extension eq 'zip' )
{
# do something
}
Module::Generic::File has a lot of features to handle and manipulate a file.

How do I read multiple directories and read the contents of subdirectories in Perl?

I have a folder and inside that I have many subfolders. In those subfolders I have many .html files to be read. I have written the following code to do that. It opens the parent folder and also the first subfolder and it prints only one .html file. It shows error:
NO SUCH FILE OR DIRECTORY
I dont want to change the entire code. Any modifications in the existing code will be good for me.
use FileHandle;
opendir PAR_DIR,"D:\\PERL\\perl_programes\\parent_directory";
while (our $sub_folders = readdir(PAR_DIR))
{
next if(-d $sub_folders);
opendir SUB_DIR,"D:\\PERL\\perl_programes\\parent_directory\\$sub_folders";
while(our $file = readdir(SUB_DIR))
{
next if($file !~ m/\.html/i);
print_file_names($file);
}
close(FUNC_MODEL1);
}
close(FUNC_MODEL);
sub print_file_names()
{
my $fh1 = FileHandle->new("D:\\PERL\\perl_programes\\parent_directory\\$file")
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}
Your posted code looks way overcomplicated. Check out File::Find::Rule and you could do most of that heavy lifting in very little code.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
I mean isn't that sexy?!
A user commented that you may be wishing to use only Depth=2 entries.
use File::Find::Rule;
my $finder = File::Find::Rule->new()->name(qr/\.html?$/i)->mindepth(2)->maxdepth(2)->start("D:/PERL/perl_programes/parent_directory");
while( my $file = $finder->match() ){
print "$file\n";
}
Will Apply this restriction.
You're not extracting the supplied $file parameter in the print_file_names() function.
It should be:
sub print_file_names()
{
my $file = shift;
...
}
Your -d test in the outer loop looks wrong too, BTW. You're saying next if -d ... which means that it'll skip the inner loop for directories, which appears to be the complete opposite of what you require. The only reason it's working at all is because you're testing $file which is only the filename relative to the path, and not the full path name.
Note also:
Perl on Windows copes fine with / as a path separator
Set your parent directory once, and then derive other paths from that
Use opendir($scalar, $path) instead of opendir(DIR, $path)
nb: untested code follows:
use strict;
use warnings;
use FileHandle;
my $parent = "D:/PERL/perl_programes/parent_directory";
my ($par_dir, $sub_dir);
opendir($par_dir, $parent);
while (my $sub_folders = readdir($par_dir)) {
next if ($sub_folders =~ /^..?$/); # skip . and ..
my $path = $parent . '/' . $sub_folders;
next unless (-d $path); # skip anything that isn't a directory
opendir($sub_dir, $path);
while (my $file = readdir($sub_dir)) {
next unless $file =~ /\.html?$/i;
my $full_path = $path . '/' . $file;
print_file_names($full_path);
}
closedir($sub_dir);
}
closedir($par_dir);
sub print_file_names()
{
my $file = shift;
my $fh1 = FileHandle->new($file)
or die "ERROR: $!"; #ERROR HERE
print("$file\n");
}
Please start putting:
use strict;
use warnings;
at the top of all your scripts, it will help you avoid problems like this and make your code much more readable.
You can read more about it here: Perlmonks
You are going to need to change the entire code to make it robust:
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
my $top = $ENV{TEMP};
find( { wanted => \&wanted, no_chdir=> 1 }, $top );
sub wanted {
return unless -f and /\.html$/i;
print $_, "\n";
}
__END__
Have you considered using
File::Find
Here's one method which does not require to use File::Find:
First open the root directory, and store all the sub-folders' names in an array by using readdir;
Then, use foreach loop. For each sub-folder, open the new directory by linking the root directory and the folder's name. Still use readdir to store the file names in an array.
The last step is to write the codes for processing the files inside this foreach loop.
Special thanks to my teacher who has given me this idea :) It really worked well!

How can I add a prefix to all filenames under a directory?

I am trying to prefix a string (reference_) to the names of all the *.bmp files in all the directories as well sub-directories. The first time we run the silk script, it will create directories as well subdirectories, and under each subdirectory it will store each mobile application's sceenshot with .bmp extension.
When I run the automated silkscript for second time it will again create the *.bmp files in all the subdirectories. Before running the script for second time I want to prefix all the *.bmp with a string reference_.
For example first_screen.bmp to reference_first_screen.bmp,
I have the directory structure as below:
C:\Image_Repository\BG_Images\second
...
C:\Image_Repository\BG_Images\sixth
having first_screen.bmp and first_screen.bmp files etc...
Could any one help me out?
How can I prefix all the image file names with reference_ string?
When I run the script for second time, the Perl script in silk will take both the images from the sub-directory and compare them both pixel by pixel. I am trying with code below.
Could you please guide me how can I proceed to complete this task.
#!/usr/bin/perl -w
&one;
&two;
sub one {
use Cwd;
my $dir ="C:\\Image_Repository";
#print "$dir\n";
opendir(DIR,"+<$dir") or "die $!\n";
my #dir = readdir DIR;
#$lines=#dir;
delete $dir[-1];
print "$lines\n";
foreach my $item (#dir)
{
print "$item\n";
}
closedir DIR;
}
sub two {
use Cwd;
my $dir1 ="C:\\Image_Repository\\BG_Images";
#print "$dir1\n";
opendir(D,"+<$dir1") or "die $!\n";
my #dire = readdir D;
#$lines=#dire;
delete $dire[-1];
#print "$lines\n";
foreach my $item (#dire)
{
#print "$item\n";
$dir2="C:\\Image_Repository\\BG_Images\\$item";
print $dir2;
opendir(D1,"+<$dir2") or die " $!\n";
my #files=readdir D1;
#print "#files\n";
foreach $one (#files)
{
$one="reference_".$one;
print "$one\n";
#rename $one,Reference_.$one;
}
}
closedir DIR;
}
I tried open call with '+<' mode but I am getting compilation error for the read and write mode.
When I am running this code, it shows the files in BG_images folder with prefixed string but actually it's not updating the files in the sub-directories.
You don't open a directory for writing. Just use opendir without the mode parts of the string:
opendir my($dir), $dirname or die "Could not open $dirname: $!";
However, you don't need that. You can use File::Find to make the list of files you need.
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use File::Find;
use File::Find::Closures qw(find_regular_files);
use File::Spec::Functions qw(catfile);
my( $wanted, $reporter ) = find_regular_files;
find( $wanted, $ARGV[0] );
my $prefix = 'recursive_';
foreach my $file ( $reporter->() )
{
my $basename = basename( $file );
if( index( $basename, $prefix ) == 0 )
{
print STDERR "$file already has '$prefix'! Skipping.\n";
next;
}
my $new_path = catfile(
dirname( $file ),
"recursive_$basename"
);
unless( rename $file, $new_path )
{
print STDERR "Could not rename $file: $!\n";
next;
}
print $file, "\n";
}
You should probably check out the File::Find module for this - it will make recursing up and down the directory tree simpler.
You should probably be scanning the file names and modifying those that don't start with reference_ so that they do. That may require splitting the file name up into a directory name and a file name and then prefixing the file name part with reference_. That's done with the File::Basename module.
At some point, you need to decide what happens when you run the script the third time. Do the files that already start with reference_ get overwritten, or do the unprefixed files get overwritten, or what?
The reason the files are not being renamed is that the rename operation is commented out. Remember to add use strict; at the top of your script (as well as the -w option which you did use).
If you get a list of files in an array #files (and the names are base names, so you don't have to fiddle with File::Basename), then the loop might look like:
foreach my $one (#files)
{
my $new = "reference_$one";
print "$one --> $new\n";
rename $one, $new or die "failed to rename $one to $new ($!)";
}
With the aid of find utility from coreutils for Windows:
$ find -iname "*.bmp" | perl -wlne"chomp; ($prefix, $basename) = split(m~\/([^/]+)$~, $_); rename($_, join(q(/), ($prefix, q(reference_).$basename))) or warn qq(failed to rename '$_': $!)"