Handling unicode directory and filenames in Perl on Windows - perl

I have an encoding problem with Perl and Windows. On a Windows 7 running Perl (strawberry 5.16) and a simple TK GUI I need to open files and/or access directories with non-english characters in their name/path. For opening files I've come out with this solution which seems to work fine:
#!/usr/bin/perl -w
use strict;
use warnings;
use Win32::Unicode::File;
use Encode;
use Tk;
my $mw = Tk::MainWindow->new;
my $tissue_but = $mw->Button(
-text => 'Open file',
-command => [ \&select_unicode_file ],
);
$tissue_but->grid( -row => 3, -column => 1 );
Tk::MainLoop();
sub select_unicode_file{
my $types = [ ['Txt', '.txt'],
['All Files', '*'],];
my $input_file= $mw->getOpenFile(-filetypes => $types);
my $fh = Win32::Unicode::File->new;
if ($fh->open('<', $input_file)){
while (my $line = $fh->readline()){
print "\n$line\n";
}
close $fh;
}
else{
print "Couldn't open file: $!\n";
}
}
This correctly opens files such as Поиск/Поиск.txt
What I CANNOT do is to simply get a directory path and than process it. I think I should use use Win32::Unicode::Dir but I really can't understand the documentation.
It should be something like this:
#!/usr/bin/perl -w
use strict;
use warnings;
use Win32::Unicode::Dir;
use Encode;
use Tk;
my $mw = Tk::MainWindow->new;
my $tissue_but = $mw->Button(
-text => 'Open file',
-command => [ \&select_unicode_directory ],
);
$tissue_but->grid( -row => 3, -column => 1 );
Tk::MainLoop();
sub select_unicode_directory{
my $dir = $mw->chooseDirectory( );
my $wdir = Win32::Unicode::Dir->new;
my $dir = $wdir->open($dir) || die $wdir->error;
my $dir_complete = "$dir/a.txt";
open (MYFILE, $dir_complete );
while (<MYFILE>) {
chomp;
print "$_\n";
}
close (MYFILE);
}

There is a logical error in:
my $dir = $wdir->open($dir) || die $wdir->error;
my $dir_complete = "$dir/a.txt";
$wdir->open('path') returns an object, not a string. You can't use it like a path. But that is not the worst of it. Sadly, it seems like the Tk implementation does not yet have support for Unicode file names (including chooseDirectory). I guess you will have to write a custom dir selector, but I'm not sure it's even possible.
This is capable of listing files in an ascii-chars folder (and ->fetch can list utf-8 files), and crashes when opening a folder with utf-8 chars. Well, to be fair it crashes when opening ??????.
use strict;
use warnings;
use Win32::Unicode::Dir;
use Win32::Unicode::Console;
use Encode;
use Tk;
my $mw = Tk::MainWindow->new;
my $tissue_but = $mw->Button(
-text => 'Select dir',
-command => [ \&select_unicode_directory ],
);
$tissue_but->grid( -row => 3, -column => 1 );
Tk::MainLoop();
sub select_unicode_directory {
my $wdir = Win32::Unicode::Dir->new;
my $selected = $mw->chooseDirectory(-parent =>$mw);
# http://search.cpan.org/dist/Tk/pod/chooseDirectory.pod#CAVEATS
$selected = encode("utf-8", $selected);
print "selected: $selected\n";
$wdir->open($selected) || die $wdir->error;
print "\$mw->chooseDirectory: $selected\n";
print "\$wdir->open(\$selected): $wdir\n";
# CRASH HERE, presumably because winders can't handle '?' in a file (dir) name
for ($wdir->fetch) {
# http://search.cpan.org/~xaicron/Win32-Unicode-0.38/lib/Win32/Unicode/Dir.pm
next if /^\.{1,2}$/;
my $path = "$selected/$_";
if (file_type('f', $path)) { print "file: $path\n"; }
elsif (file_type('d', $path)) { print " dir: $path\n"; }
}
print "closing \n";
$wdir->close || die $wdir->error;
}
Sample out (opening Поиск/):
Both samples below were run using: Strawberry Perl 5.12.3 built for MSWin32-x64-multi-thread
selected: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/?????
$mw->chooseDirectory: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/?????
$wdir->open($selected): Win32::Unicode::Dir=HASH(0x2e38158)
>>> perl crash <<<
Sample out (opening parent of Поиск):
selected: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk
$mw->chooseDirectory: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk
$wdir->open($selected): Win32::Unicode::Dir=HASH(0x2b92c10)
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/.select_uni_dir.pl.swp
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/o
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/o.dir
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/select_uni_dir.pl
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/select_uni_file.pl
dir: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/Поиск
Conclusion
The Tk dir selector returns ????? instead of Поиск. While looking for a way to enable Unicode in Tk, I found this:
http://search.cpan.org/dist/Tk/pod/UserGuide.pod#Perl/Tk_and_Unicode :
(...) Unfortunately, there are still places in Perl ignorant of
Unicode. One of these places are filenames. Consequently, the file selectors
in Perl/Tk do not handle encoding of filenames properly. Currently they
suppose that filenames are in iso-8859-1 encoding, at least on Unix systems.
As soon as Perl has a concept of filename encodings, then Perl/Tk will also
implement such schemes.
So at first glance it seems what you're trying to do is impossible (unless you
write or find a custom dir-selector). Actually, it may not be a bad idea to
submit this bug, because the GUI did show "Поиск" so the error is in the return value.

Related

File::Find is failing in subdirectories

I have a sub that find .vcf files in a sub-directories of the main directory, using File::Find::name that was working great in one environment but is not working on another machine(both run red hat linux) . It stillfinds .vcf files if it's in the main directory but fails to find in a sub-directory.
Could someone please help to troubleshoot?
Here is an example of a file it fails to find (broken over lines for readability):
/home/yeliiley/mdl3/results/SN1-376-OFA_TL127445_CHIP1_052318_BSN/
MD-18-6297_BG_v1_ac9023be-8db4-440b-9095/Variants/
MD-18-6297_BG_v1_MD-18-6297_BG_RNA_v1/
MD-18-6297_BG_v1_MD-18-6297_BG_RNA_v1_Non-Filtered_2018-05-24_040909.vcf
however, if the file is in $main_dir it finds it.
#!/usr/bin/perl
use warnings;
use strict;
use File::Find;
my $main_dir = "/home/yeliiley/mdl3/results/SN1-376-OFA_TL127445_CHIP1_052318_BSN";
my $location=$main_dir;
sub find_vcf {
my $F = $File::Find::name;
if ($F =~ /vcf$/ ) {
print "here is the vcf.$F\n";
$F =~ m|([^/]+).vcf$| or die "Can't extract Sample ID";
my $sample_id = $1; print "the short vcf name is: $sample_id\n";
}else {
print "Did not find any vcf files $F\n";
}
}
find({ wanted => \&find_vcf, no_chdir=>1}, $location);
Try adding "follow => 1" to your find() call, i.e.
find({ wanted => \&find_vcf , no_chdir => 1, follow => 1}, $location);

Perl How to merge two or more excel files in one (multiple worksheets)?

I need to merge a few excel file into one, multiple sheets.
I do not care too much about the sheet name on the new file.
I do not have Excel on the computer I plan to run this. so I cannot use Win32 OLE.
I attempted to run this code https://sites.google.com/site/mergingxlsfiles/ but it is not working, I get a new empty excel file.
I attempt to run http://www.perlmonks.org/?node_id=743574 but I only obtained one of the file in the new excel file.
My input excel files have some french characters (é for e.g.) I believe these are cp1252.
Code used :
#!/usr/bin/perl -w
use strict;
use Spreadsheet::ParseExcel;
use Spreadsheet::WriteExcel;
use File::Glob qw(bsd_glob);
use Getopt::Long;
use POSIX qw(strftime);
GetOptions(
'output|o=s' => \my $outfile,
'strftime|t' => \my $do_strftime,
) or die;
if ($do_strftime) {
$outfile = strftime $outfile, localtime;
};
my $output = Spreadsheet::WriteExcel->new($outfile)
or die "Couldn't create '$outfile': $!";
for (#ARGV) {
my ($filename,$sheetname,$targetname);
my #files;
if (m!^(.*\.xls):(.*?)(?::([\w ]+))$!) {
($filename,$sheetname,$targetname) = ($1,qr($2),$3);
warn $filename;
if ($do_strftime) {
$filename = strftime $filename, localtime;
};
#files = glob $filename;
} else {
($filename,$sheetname,$targetname) = ($_,qr(.*),undef);
if ($do_strftime) {
$filename = strftime $filename, localtime;
};
push #files, glob $filename;
};
for my $f (#files) {
my $excel = Spreadsheet::ParseExcel::Workbook->Parse($f);
foreach my $sheet (#{$excel->{Worksheet}}) {
if ($sheet->{Name} !~ /$sheetname/) {
warn "Skipping '" . $sheet->{Name} . "' (/$sheetname/)";
next;
};
$targetname ||= $sheet->{Name};
#warn sprintf "Copying %s to %s\n", $sheet->{Name}, $targetname;
my $s = $output->add_worksheet($targetname);
$sheet->{MaxRow} ||= $sheet->{MinRow};
foreach my $row ($sheet->{MinRow} .. $sheet->{MaxRow}) {
my #rowdata = map {
$sheet->{Cells}->[$row]->[$_]->{Val};
} $sheet->{MinCol} .. $sheet->{MaxCol};
$s->write($row,0,\#rowdata);
}
}
};
};
$output->close;
I have 2 excel files named: 2.xls (only 1 sheet named 2 in it), 3.xls (only 1 sheet named 3)
I launched the script as this:
xlsmerge.pl -s -o results-%Y%m%d.xls 2.xls:2 3.xls:3
Results: results-20121024.xls empty nothing in it.
Then I tried
xlsmerge.pl -s -o results-%Y%m%d.xls 2.xls 3.xls
And it worked.
I am not sure why is it failing while adding the Sheetname
It appears that there is a bug in this line of the script:
if (m!^(.*\.xls):(.*?)(?::([\w ]+))$!) {
($filename,$sheetname,$targetname) = ($1,qr($2),$3);
...
It looks to me like the goal of that line is to allow arguments either in the form
spreadsheet.xls:source_worksheet
or in another form allowing the name of the target sheet to be specified:
spreadsheet.xls:source_worksheet:target_worksheet
The last grouping appears intended to capture that last, optional argument: (?::([\w ]+)). The only problem is, this grouping was not made optional. Thus, when you only specify the source sheet and not the target, the regex fails to match and it falls to the backup behavior, which is to treat the whole argument as the filename. But this fails, too, because you don't have a file called 2.xls:2.
The solution would be to introduce the ? modifier after the last group in the regex to make it optional:
if (m!^(.*\.xls):(.*?)(?::([\w ]+))?$!) {
($filename,$sheetname,$targetname) = ($1,qr($2),$3);
...
Of course, that may not be the only problem. If the script was posted with an error, there could be other errors, too. I don't have Perl available to test it at the moment.

How to use the Windows "hook function" with Perl Win32::GUI?

I am working on Windows XP SP 3, Strawberry Perl.
I would like to let the user of my Perl program select a file; but when using Win32::GUI::GetOpenFileName(), I would like the Windows file selection dialog to open in the "Details" file list option, and not in the default "List" file list option.
Googling on the Net, it seems that I have to use the Windows "hook" function, and send certain messages to the file selection control. The documentation about that is MSDN, and I don't seem to master how to apply it in Perl.
Can anyone recommend what should be the right call syntax in Perl?
Here is my code sample, where the file selection dialog opens with the (default) "List" option:
use strict;
use warnings;
use 5.014;
use Win32::Console;
use Win32::GUI();
use autodie;
use warnings qw< FATAL utf8 >;
use Carp::Always;
use Win32API::File::Time qw{:win};
use Image::ExifTool qw(:Public);
use Date::Parse;
# ...
my ( $FileName, $ImageDir, $DIR, $TopDir);
# ...
$TopDir = 'D:\My Documents';
$ImageDir = Win32::GUI::BrowseForFolder( -root => $TopDir, -includefiles => 1,);
unless ($ImageDir) {
say '$DirName not defined after calling Win32::GUI::BrowseForFolder, ',
'Photo date set line'.__LINE__;
exit;
}
else {
say "Identified directory: $ImageDir";
}
# now select a file
$FileName = Win32::GUI::GetOpenFileName( -title => 'Select an image file', -directory => $ImageDir,
-file => "\0" . " " x 256,
-filter => ["Image files (*.jpg)" => "*.jpg;*.jpeg", "All files", "*.*", ],);
unless ($FileName) {
say '$FileName not defined after calling Win32::GUI::GetOpenFileName, ',
'Photo date set line'.__LINE__;
}
else {
say "Identified image file: $FileName";
}
# ...
Note: (somewhat) similar post at: http://www.perlmonks.org/?node_id=989418
Unfortunately the Win32::GUI API exposes neither the OFN_ENABLEHOOK flag bit or the lpfnHook field of the GetOpenFileName options.
You could perhaps get it working using the Win32::API module to work at an even lower level, but you would have to build the entire OPENFILENAME structure yourself using pack and write some XS code for the hook handler.

How to Extract text from MS Word?

I'm trying to open a word document and just extract all the text that is in the document and display it to the user using Win32::OLE
#usr/bin/perl
#OLEWord.pl
#Use string and print warnings
use strict;use warnings;
#Using OLE + OLE constants for Variants and OLE enumeration for Enumerations
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Word';
$Win32::OLE::Warn = 3;
#set the file to be opened
my $file = '/work/Test.docx';
#Create a new instance of Win32::OLE for the Word application, die if could not open the application
my $MSWord = Win32::OLE->new('Word.Application','Quit') and "Opened Word" or die "Unable to open document ", Win32::OLE->LastError();
#Set the screen to Visible, so that you can see what is going on
$MSWord->{'Visible'} = 1;
#open the request file or die and print warning message
my $Doc = $MSWord->Documents->Open('C:\work\Test.docx') or die "Could not open ", $file, " Error:", Win32::OLE->LastError();
#$MSWord->ActiveDocument->SaveAs({Filename => 'AlteredTest.docx',
#FileFormat => wdFormatDocument});
sub ShowObjs {
my $obj = shift;
foreach (sort keys %$obj) {
print "Keys: $_ - $obj->{$_}\n"; }
}
my $paragraphs = $Doc->Paragraphs;
ShowObjs($paragraphs);
# Get and print the Text inside the opened file
my $paragraphs = $Doc->Paragraphs;
my $txt = $Doc->Range->Text;
print $txt;
$MSWord->ActiveDocument->Close;
$MSWord->Quit;
I'm getting this error code:
OLE exception from "Microsoft Word":
Command Failed
Win32::OLE(0.1709) error ox800a1066
in METHOD/PROPERTYGET "Open" at OLEWord.pl line 20
Update: I can open up the Word application fine, it's just when I try to open up the file that is the problem
I have few scripts to convert DOC to various output format using Win32::OLE. They usually start like this:
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Word';
my $wr = Win32::OLE->new('Word.Application')
or die "Failure - word. \n";
$wr->{DisplayAlerts} = wdAlertsNone;
$wr->{Visible} = 0;
my $Doc = $wr->Documents->Open({
FileName => $input_file_path,
ConfirmConversions => 0,
AddToRecentFiles => 0,
Revert => 0,
ReadOnly => 1,
OpenAndRepair => 0,
}) or exit;
...
Please note that $input_file_path has to contain absolute path to your file. You can also enable Visible and DisplayAlerts to see any error Word might give you.
Edit: You can traverse paragraphs using in enumerator:
use Win32::OLE qw(in);
...
my $paragraphs = $Doc->Paragraphs;
for my $par (in $paragraphs) {
print $par->Range->Text();
}
Or you can use Word's own exporting method and save document as one of supported formats:
$Doc->SaveAs({
FileName => 'c:\\work\\Test.txt',
FileFormat => wdFormatEncodedText,
});
The advantage of latter method is that formatting is retained if possible, which produces better results for bullets, numbering and such.
Win32::OLE can be a little funny about interaction. if anything triggers a prompt, you may get a message like this. Typically it may be that it wants to open the file read-only, for example, and put up a dialog, but these dialogs can break under the default initialization of Win32::OLE.
If this is the case, calling
Win32::OLE->Initialize(Win32::OLE::COINIT_OLEINITIALIZE);
before you do anything like instantiate any objects (i.e., before Win32::OLE->new) might do the trick.

How do I create and append files with variable paths in Perl?

I'm working my way up towards creating a script that will create image galleries for me.
When I run what I have it tells me
No such file or directory at photographycreate line 16.
------------
(program exited with code: 2)
Here is the code that I've gotten so far.
#!/etc/perl -w
#CHANGE THIS
$filecategory = "cooking";
$filenumber = 0;
#$filename = "photography";
$imagedirectory = "\"/media/New Volume/Programming/kai product/media/photography/".$filecategory."/images/\"";
$galleryfile = "\"/media/New Volume/Programming/kai product/pages/".$filenumber."_".$filecategory."_gallery.html\"";
#imagelocation = <$imagedirectory/*>; #*/
$filecount = #imagelocation;
while($filenumber < 3) {
open GALLERY, "+>", $galleryfile or die $!;
print GALLERY ($filecount."\n");
print GALLERY ($imagedirectory."\n");
print GALLERY ($galleryfile."\n");
close GALLERY;
++$filenumber;
}
What I want it to do is create the file, open it, write stuff to it, and then close/save it. How can I, using what I have, do this?
Here is the fix:
#!/etc/perl -w
use Fcntl; #The Module
use strict;
#CHANGE THIS
my $filecategory = "cooking";
my $filenumber = 0;
my $imagedirectory = "\"/media/New Volume/Programming/kaiproduct/media/photography/".$filecategory."/images/\"";
my $galleryfile = "/media/New Volume/Programming/kaiproduct/pages/".$filenumber."_".$filecategory."_gallery.html";
my #imagelocation = <$imagedirectory/*>; #*/
my $filecount = #imagelocation;
while($filenumber < 3)
{
open GALLERY, "+>", $galleryfile or die $!;
print (GALLERY $filecount."\n");
print (GALLERY $imagedirectory."\n");
print (GALLERY $galleryfile."\n");
close GALLERY;
++$filenumber;
}
I think the problem is here:
$imagedirectory = "\"/media/New Volume/Programming/kai product/media/photography/".$filecategory."/images/\"";
$galleryfile = "\"/media/New Volume/Programming/kai product/pages/".$filenumber."_".$filecategory."_gallery.html\"";
Specifically, each of these strings starts with "\" and ends with \"", which means your files and folders will be surrounded with double quotes. So Perl isn't trying to open /media/New Volume/etc..., but "/media/New Volume/etc...", which doesn't exist because there is no directory called ". You're over-quoting.
One thing you can (and should always) do to make your code better is to use strict;. I see you already have use warnings; at the top there, which is good, but using both strict and warnings will make your code a lot safer and nicer to look at.
IMNHO, the real answer is to use File::Spec and write things a bit clearer:
use File::Spec::Functions qw( catfile );
# ...
my $root = "/media/New Volume/Programming/kai product";
my $imagedirectory = catfile($root,
'photography',
$filecategory,
'images',
);
my $galleryfile = catfile($root,
'pages',
"${filenumber}_${filecategory}_gallery.html",
);
In addition, it is a good idea to observe good habits, especially since you are just learning Perl:
Always put:
use strict;
use warnings;
as the first thing in your program.
Use lexical filehandles rather than bareword filehandles (which are package global):
open my $gallery, '+>', $galleryfile
or die "Cannot open '$galleryfile': $!;
and include the name of the file in the error message.
Finally, I like File::Slurp's append_file:
append_file $gallery_file, [
map { "$_\n" } ( $filecount, $imagedirectory, $galleryfile )
];
Here is a revised version of your program:
#!/etc/perl
use strict;
use warnings;
use File::Spec::Functions qw( catfile );
my $root = "/media/New Volume/Programming/kai product";
my $filecategory = "cooking";
my $imagedirectory = catfile($root,
'photography',
$filecategory,
'images',
);
my #imagelocation = read_dir $imagedirectory;
for my $filenumber ( 0 .. 2 ) {
my $galleryfile = catfile($root, 'pages',
"${filenumber}_${filecategory}_gallery.html",
);
append_file $gallery_file, [
map { "$_\n" } (
scalar #imagelocation, $imagedirectory, $galleryfile,
)];
}