How can I modify an existing Excel workbook with Perl? - perl

With Spreadsheet::WriteExcel, I can create a new workbook, but what if I want to open an existing book and modify certain columns? How would I accomplish that?
I could parse all of the data out of the sheet using Spreadsheet::ParseExcel then write it back with new values in certain rows/columns using Spreadsheet::WriteExcel, however. Is there a module that already combines the two?
Mainly I just want to open a .xls, overwrite certain rows/columns, and save it.

Spreadsheet::ParseExcel will read in existing excel files:
my $parser = Spreadsheet::ParseExcel->new();
# $workbook is a Spreadsheet::ParseExcel::Workbook object
my $workbook = $parser->Parse('Book1.xls');
But what you really want is Spreadsheet::ParseExcel::SaveParser, which is a combination of Spreadsheet::ParseExcel and Spreadsheet::WriteExcel. There is an example near the bottom of the documentation.

If you have Excel installed, then it's almost trivial to do this with Win32::OLE. Here is the example from Win32::OLE's own documentation:
use Win32::OLE;
# use existing instance if Excel is already running
eval {$ex = Win32::OLE->GetActiveObject('Excel.Application')};
die "Excel not installed" if $#;
unless (defined $ex) {
$ex = Win32::OLE->new('Excel.Application', sub {$_[0]->Quit;})
or die "Oops, cannot start Excel";
}
# get a new workbook
$book = $ex->Workbooks->Add;
# write to a particular cell
$sheet = $book->Worksheets(1);
$sheet->Cells(1,1)->{Value} = "foo";
# write a 2 rows by 3 columns range
$sheet->Range("A8:C9")->{Value} = [[ undef, 'Xyzzy', 'Plugh' ],
[ 42, 'Perl', 3.1415 ]];
# print "XyzzyPerl"
$array = $sheet->Range("A8:C9")->{Value};
for (#$array) {
for (#$_) {
print defined($_) ? "$_|" : "<undef>|";
}
print "\n";
}
# save and exit
$book->SaveAs( 'test.xls' );
undef $book;
undef $ex;
Basically, Win32::OLE gives you everything that is available to a VBA or Visual Basic application, which includes a huge variety of things -- everything from Excel and Word automation to enumerating and mounting network drives via Windows Script Host. It has come standard with the last few editions of ActivePerl.

There's a section of the Spreadsheet::WriteExcel docs that covers Modifying and Rewriting Spreadsheets.
An Excel file is a binary file within a binary file. It contains several interlinked checksums and changing even one byte can cause it to become corrupted.
As such you cannot simply append or update an Excel file. The only way to achieve this is to read the entire file into memory, make the required changes or additions and then write the file out again.
You can read and rewrite an Excel file using the Spreadsheet::ParseExcel::SaveParser module which is a wrapper around Spreadsheet::ParseExcel and Spreadsheet::WriteExcel. It is part of the Spreadsheet::ParseExcel package.
There's an example as well.

The Spreadsheet::ParseExcel::SaveParser module is a wrapper around Spreadsheet::ParseExcel and Spreadsheet::WriteExcel.
I recently updated the documentation with, what I hope, is a clearer example of how to do this.

Related

Reading Xlsx from another Xlsx file

I have few Xlsx files say X.xlsx,Y.xlsx,Z.XLSX and I kept those three Xlsx files in another xlsx file say A.xlsx. Now I want to ready the content in the three xlsx files(x,y,z) at a time through A.xlsx.
Can any one help me on this.
Thanks in advance
This is easy on Windows if your target machine also has Microsoft Excel installed.
Use the Win32::OLE module to create an instance of Excel, open your master file A.xlsx and then iterate over its ->{OLEObjects} property:
#!perl
use strict;
use warnings;
use Win32::OLE 'in';
$ex = Win32::OLE->new('Excel.Application') or die "oops\n";
my $Axlsx = $ex->Open('C:\\Path\\To\\A.xlsx');
my $i=0;
for my $embedded (in $Axlsx->OLEObjects) {
$embedded->Object->Activate();
$embedded->Object->SaveAs("test$i++.xlsx");
$embedded->Object->Close;
}
After saving them, you can treat them as normal Excel files. Alternatively, you can work directly with $embedded->Object, but as you haven't told us what exactly you need to do, it's hard to give specific advice.
See also Save as an Excel file embedded in another Excel file

Rewrite existing xls files using Perl

I am trying to write data in excel files which is already exists, but the code I tried is creating a new sheet and erasing the old sheet with the data. This the code I use
#!/usr/bin/perl –w
use strict;
use Spreadsheet::WriteExcel;
# Create a new Excel file
my $FileName = 'Report.xls';
my $workbook = Spreadsheet::WriteExcel->new($FileName);
# Add a worksheet
my $worksheet1 = $workbook->add_worksheet(); #<- My Doubt
# Change width for only first column
$worksheet1->set_column(0,0,20);
# Write a formatted and unformatted string, row and column
# notation.
$worksheet1->write(0,0, "Hello");
$worksheet1->write(1,0,"HI");
$worksheet1->write(2,0,"1");
$worksheet1->write(3,0,"2"); `
How can I assign the current sheet to $worksheet1. And one more thing is I need to read specific cell from which is already exist.
Please give me some guidance .Thank you
You cannot open an existing a spreadsheet with Spreadsheet::WriteExcel and update it like that. You first want to open it using Spreadsheet::ReadExcel along with an output sheet which you open with WriteExcel. Then, you read the input file, write out existing cells, sheets etc, and make whatever edits/updates/insertions you are going to make. Then, you can close both files, remove the previous, and rename the new one (optionally backing up the previous version).
You can only really edit/change a given Excel file without going through this process by opening it using Win32::OLE, but for that you are most certainly going to need to be on a Windows system (I am not sure about the state of Wine), and this is not something you want to do on a server.
You can think of creating a file with Spreadsheet::WriteExcel as similar to opening a file with open my $fh, '>', 'output.file' ... output.file will be clobbered.
Note the line:
my $fh = FileHandle->new('>'. $self->{_filename});
in Spreadsheet::WriteExcel::Workbook->new.

How to handle excel using OLE module?

Friends, I wrote a Perl script to convert a set of CSV files into spreadsheet
format using Spreadsheet::WriteExcel. After some research I came to
conclusion that there is no option to fix column width as Auto-fit option.
So what I'm doing is in the same script I've opened that XLS file using Win32::OLE
module, while doing this I got an error message
Can't use an undefined value as a HASH reference
Corresponding code is:
# spread sheet creation
my $workbook = Spreadsheet::WriteExcel->new($file_name);
# ...
my $worksheet = $workbook->add_worksheet($work_sheet_name);
# ...
$worksheet->write($rowNum, $j,$_,$default_format);
after these steps I have some more lines in the same script:
my $Excel = Win32::OLE->GetActiveObject('Excel.Application')
|| Win32::OLE->new('Excel.Application');
$Excel->{'Visible'} = 0; #0 is hidden, 1 is visible
$Excel->{DisplayAlerts}=1; #0 is hide alerts
# Open File and Worksheet
my $return_file_name="C:\\Users\\admin\\Desktop\\Report_Gen\\$file_name";
print ">>$return_file_name<<";
my $Book = $Excel->Workbooks->Open($return_file_name); # open Excel file
foreach my $Sheet (in $Book->Sheets) {
my $LastCol = $Sheet->UsedRange->Find({What=>"*",
SearchDirection=>xlPrevious,
SearchOrder=>xlByColumns})->{Column}; # mentioned error is from this line
my $mylastcol = 'A';
for (my $m=1;$m<$LastCol;$m++) {$mylastcol++;}
my #columnheaders = ('A:'.$mylastcol);
foreach my $range (#columnheaders){
$Sheet->Columns($range)->AutoFit();
}
I wrote a Perl script to convert a set of CSV files into spreadsheet format using Spreadsheet::WriteExcel. After some research I came to conclusion that there is no option to fix column width as Auto-fit option.
Autofit is a runtime option in Excel and so it isn't possible to create it via the file format using Spreadsheet::WriteExcel.
However, the Spreadsheet::WriteExcel docs contain an example of how to simulate autofit with an explanation of some of the issues involved.

How to read and update a pm file line by line?

In a Perl script, we are retrieving configuration details from a pm file. After the user changes the configuration details using the script's interface, the same values should be written back in the pm file.
For example, I have the following config.pm file:
$SourcePrimUserHost = '10.226.33.233';
$SourcePrimUserPort = '33002';
$SourceGroupsHost = '10.226.33.233';
$SourceGroupsPort = '33002';
I'm reading these values from a Perl script. I want to store updated values back to config.pm file.
How can we do this? Looking forward to your help.
This is not a good design choice.
The Perl module may (should!) be installed in such a way that the user can read, but not write it.
If the module is used by multiple users or multiple Perl programs, then the conf would be system-global and not application-specific.
Issues arise if multiple instances of the program are run at the same time.
I would recommend using a data serialization format like YAML, although JSON, Freeze/Thaw and Dumper may be other contestants. This configuration would best be stored in a seperate file.
If you have to store the data in the same file, you could use the __DATA__ token. Everything behind that is accessible inside the code as the DATA filehandle, and will not be executed by perl. It is also trivial to find this token when updating the configuration. If the module is called Foo::Bar:
my $serialized_stuff = ...;
my $self_loc = $INC{"Foo/Bar.pm"}; # %INC holds the location for every loaded module.
my $tempfile = ...;
open $SELF, "<", $self_loc or die ...;
open $TEMP, ">", $tempfile or die ...;
# don't touch anything before __DATA__
while(<$SELF>) {
print $TEMP $_;
last if /^__DATA__$/;
}
print $TEMP $serialized_stuff;
close $TEMP; close $SELF;
rename $tempfile => $self_loc or die ...;
Use one of the many configuration tools like Config::General from CPAN. They are easy to use, support different notations and you can write back your values into a text file.

Convert Word doc or docx files into text files?

I need a way to convert .doc or .docx extensions to .txt without installing anything. I also don't want to have to manually open Word to do this obviously. As long as it's running on auto.
I was thinking that either Perl or VBA could do the trick, but I can't find anything online for either.
Any suggestions?
A simple Perl only solution for docx:
Use Archive::Zip to get the word/document.xml file from your docx file. (A docx is just a zipped archive.)
Use XML::LibXML to parse it.
Then use XML::LibXSLT to transform it into text or html format. Seach the web to find a nice docx2txt.xsl file :)
Cheers !
J.
Note that an excellent source of information for Microsoft Office applications is the Object Browser. You can access it via Tools → Macro → Visual Basic Editor. Once you are in the editor, hit F2 to browse the interfaces, methods, and properties provided by Microsoft Office applications.
Here is an example using Win32::OLE:
#!/usr/bin/perl
use strict;
use warnings;
use File::Spec::Functions qw( catfile );
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Word';
$Win32::OLE::Warn = 3;
my $word = get_word();
$word->{Visible} = 0;
my $doc = $word->{Documents}->Open(catfile $ENV{TEMP}, 'test.docx');
$doc->SaveAs(
catfile($ENV{TEMP}, 'test.txt'),
wdFormatTextLineBreaks
);
$doc->Close(0);
sub get_word {
my $word;
eval {
$word = Win32::OLE->GetActiveObject('Word.Application');
};
die "$#\n" if $#;
unless(defined $word) {
$word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit })
or die "Oops, cannot start Word: ",
Win32::OLE->LastError, "\n";
}
return $word;
}
__END__
For .doc, I've had some success with the linux command line tool antiword. It extracts the text from .doc very quickly, giving a good rendering of indentation. Then you can pipe that to a text file in bash.
For .docx, I've used the OOXML SDK as some other users mentioned. It is just a .NET library to make it easier to work with the OOXML that is zipped up in an OOXML file. There is a lot of metadata that you will want to discard if you are only interested in the text. Some other people have already written the code I see: DocXToText.
Aspose.Words has a very simple API with great support too I have found.
There is also this bash command from commandlinefu.com which works by unzipping the .docx:
unzip -p some.docx word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g'
I strongly recommend AsposeWords if you can do Java or .NET. It can convert, without Word installed, between all major text file types.
If you have some flavour of unix installed, you can use the 'strings' utility to find and extract all readable strings from the document. There will be some mess before and after the text you are looking for, but the results will be readable.
Note that you can also use OpenOffice to perform miscellaneous document, drawing, spreadhseet etc. conversions on both Windows and *nix platforms.
You can access OpenOffice programmatically (in a way analogous to COM on Windows) via UNO from a variety of languages for which a UNO binding exists, including from Perl via the OpenOffice::UNO module.
On the OpenOffice::UNO page you will also find a sample Perl scriptlet which opens a document, all you then need to do is export it to txt by using the document.storeToURL() method -- see a Python example which can be easily adapted to your Perl needs.
.doc's that use the WordprocessingML and .docx's XML format can have their XML parsed to retrieve the actual text of the document. You'll have to read their specifications to figure out which tags contain readable text.
The method of Sinan Ünür works well.
However, I got some crash with the files I was transforming.
Another method is to use Win32::OLE and Win32::Clipboard as such:
Open the Word document
Select all the text
Copy in the Clipboard
Print the content of Clipboard in a txt file
Empty the Clipboard and close the Word document
Based on the script given by Sigvald Refsu in http://computer-programming-forum.com/53-perl/c44063de8613483b.htm, I came up with the following script.
Note: I chose to save the txt file with the same basename as the .docx file and in the same folder but this can easily be changed
###########################################
use strict;
use File::Spec::Functions qw( catfile );
use FindBin '$Bin';
use Win32::OLE qw(in with);
use Win32::OLE::Const 'Microsoft Word';
use Win32::Clipboard;
my $monitor_word=0; #set 1 to watch MS Word being opened and closed
sub docx2txt {
##Note: the path shall be in the form "C:\dir\ with\ space\file.docx";
my $docx_file=shift;
#MS Word object
my $Word = Win32::OLE->new('Word.Application', 'Quit') or die "Couldn't run Word";
#Monitor what happens in MS Word
$Word->{Visible} = 1 if $monitor_word;
#Open file
my $Doc = $Word->Documents->Open($docx_file);
with ($Doc, ShowRevisions => 0); #Turn of revision marks
#Select the complete document
$Doc->Select();
my $Range = $Word->Selection();
with ($Range, ExtendMode => 1);
$Range->SelectAll();
#Copy selection to clipboard
$Range->Copy();
#Create txt file
my $txt_file=$docx_file;
$txt_file =~ s/\.docx$/.txt/;
open(TextFile,">$txt_file") or die "Error while trying to write in $txt_file (!$)";
printf TextFile ("%s\n", Win32::Clipboard::Get());
close TextFile;
#Empty the Clipboard (to prevent warning about "huge amount of data in clipboard")
Win32::Clipboard::Set("");
#Close Word file without saving
$Doc->Close({SaveChanges => wdDoNotSaveChanges});
# Disconnect OLE
undef $Word;
}
Hope it can helps you.
You can't do it in VBA if you don't want to start Word (or another Office application). Even if you meant VB, you'd still have to start a (hidden) instance of Word to do the processing.
I need a way to convert .doc or .docx extensions to .txt without installing anything
for I in *.doc?; do mv $I `echo $ | sed 's/\.docx?/\.txt'`; done
Just joking.
You could use antiword for the older versions of Word documents, and try to parse the xml of the new ones.
With docxtemplater, you can easily get the full text of a word (works with docx only).
Here's the code (Node.JS)
DocxTemplater=require('docxtemplater');
doc=new DocxTemplater().loadFromFile("input.docx");
result=doc.getFullText();
This is just three lines of code and doesn't depend on any word instance (all plain JS)