Open Excel file in perl and print row count - perl

I am using Win32::OLE module to open an excel file and get row count. The problem is when i hard code excel file path it works fine but when i dynamically pass path it throw an error saying that "cant call method workbooks on unblessed reference". Please find the below sample code.
use OLE;
use Win32::OLE::Const 'Microsoft Excel';
my $xapp= Win32::OLE->GetActiveObject('Excel.Application')
or do { Win32::OLE->new('Excel.Application', 'Quit')};
$xapp->{'Visible'} = 0;
my $file='excel.xlsx';
my $fileName="c:/users/mujeeb/desktop/".$file;
print $fileName;
my $wkb = $xapp->Workbooks->Open($fileName); //here i am getting error coz i am passing dynamic fileName;
my $wks = $wkb->Worksheets('Sheet1');
my $Tot_Rows=$wks->UsedRange->Rows->{'Count'};
print $Tot_Rows."\n";
$xapp->close;

Use backslashes in the filename.
The filename is given to excel and excel won't understand forward slashes. Perl does not convert them because Perl doesn't know the string is a file.

Are you sure that there exists a method named as Open? Because I don't see it in the documentation of Win32::OLE. Also you must add use Win32::OLE; in your code.

You could use this line of code to change the path into readable path for OLE:
my $file='excel.xlsx';
my $fileName="c:/users/mujeeb/desktop/".$file;
$fileName=~s/[\/]/\\/g;
print $fileName;
outputs:
c:\\users\\mujeeb\\desktop\\excel.xlsx

Related

how to read data from spreadsheet with .xlsm format in perl?

I want read data from spreadsheet ,which is .xlsm format.I am unable to access it.its showing blank while access the data .so i want how to access .xlsm format in perl.
Here is what I have tried:
#!/usr/bin/perl
use strict;
use warnings;
use Spreadsheet::Read;
use Spreadsheet::read qw(ReadData);
my $book = ReadData ('C:\Perl64\bin\sample.xlsm');
foreach my $line(1..1000) {
my #rows =Spreadsheet::Read::cellrow($book->[0],"$line");
print "#rows";
}
seems the module you used doesn't work on xlsm files
consider using Spreadsheet::Reader::ExcelXML instead ;)
http://search.cpan.org/~jandrew/Spreadsheet-Reader-ExcelXML/lib/Spreadsheet/Reader/ExcelXML.pm
also beware, you should use push instead of #rows = ..., this way, you won't overwrite #rows at each for iteration
when you try to open a file you should add or die "error message"; at the end of the open, this way you will see if the file opened correctly
my $book = ReadData ('C:\Perl64\bin\sample.xlsm') or die "error while opening the file"; triggers the error with your example code

Reading XLS file using Perl

This is my Perl script
#!/usr/bin/perl -w
use strict;
use warnings;
use Spreadsheet::WriteExcel;
use Spreadsheet::ParseExcel;
my $workbook = Spreadsheet::ParseExcel->new("/home/Admin/Desktop/RP_processed_Address_withsubscriptionID (1).xls");
my $workbook1 = Spreadsheet::WriteExcel->new("/home/Admin/Desktop/new.xls");
open(my$old, '<', "$workbook") or die "oops!";
open(my$new, '>', "$workbook1") or die "ooops!";
while (my$line = <$workbook>) {
print $workbook1 $line
}
When I run this Script I'm getting following error
Odd number of elements in hash assignment at /usr/local/share/perl5/Spreadsheet/ParseExcel.pm line 167.
oops! at sample.pl line 9.
I'm not getting any idea where is script is going wrong . Please help me how to resolve this error
your suggestions will be appreciable.
You are not reading any docs again. You copy and paste code and don't understand the basics of what you do. Why are you opening files using open when you already open them using the two modules? Why do you write a line manually? This is not how excel data works, this is not how the modules work. Stop guessing. Learn what you're doing. This will never work.
Take a look at CPAN for Spreadsheet::ParseExcel
You need to access the worksheets within the workbook object you've created and determine which you would like to parse data from. From here you can access cells using the column/width coordinate system. You don't need to use 'open' as the ParseExcel and WriteExcel already do this.
my $sheet = $workbook1->worksheet('Sheet1');
my $cell = $sheet->get_cell( 0, 0 );
my $cell_value = $cell->value();
Is it a bit more clear on what you need to do?

How to read a line into an array using perl

I am using perl for the first time. I am trying to read a line from input file and store it in an array. Note that the the input file contains a single line with a bunch of words.
I tried using the following code:
open input, "query";
my #context = <input>;
But this gives a syntax error. How could i fix this?
It doesn't give a syntax error. IT even works fine if there's only one line. The following will only get the first line even if there are more than one:
my #context = scalar( <input> );
But why wouldn't you just do
my $context = <input>;
What is the syntax error? IMHO it writes none. But I would suggest some improvements
Always use use strict; use warnings; as a first line! It helps to detect a lot of possible problems.
Code has no error handling.
Use variables for file handlers. Using bareword is deprecated.
Open file for read if you need to only read from a file.
Maybe the ending newlines would be removed form the array.
If the file not needed to be kept opened it worth to close it. Here is not needed as exit will automatically close it implicitly, but it is a good practice to close the files explicitly.
So it could be:
#!/usr/bin/perl
use strict;
use warnings;
open my $input, "<infile" or die "$!";
my #context = map { chomp; $_;} <$input>;
close $input;

How can I modify an existing Excel workbook with Perl?

With Spreadsheet::WriteExcel, I can create a new workbook, but what if I want to open an existing book and modify certain columns? How would I accomplish that?
I could parse all of the data out of the sheet using Spreadsheet::ParseExcel then write it back with new values in certain rows/columns using Spreadsheet::WriteExcel, however. Is there a module that already combines the two?
Mainly I just want to open a .xls, overwrite certain rows/columns, and save it.
Spreadsheet::ParseExcel will read in existing excel files:
my $parser = Spreadsheet::ParseExcel->new();
# $workbook is a Spreadsheet::ParseExcel::Workbook object
my $workbook = $parser->Parse('Book1.xls');
But what you really want is Spreadsheet::ParseExcel::SaveParser, which is a combination of Spreadsheet::ParseExcel and Spreadsheet::WriteExcel. There is an example near the bottom of the documentation.
If you have Excel installed, then it's almost trivial to do this with Win32::OLE. Here is the example from Win32::OLE's own documentation:
use Win32::OLE;
# use existing instance if Excel is already running
eval {$ex = Win32::OLE->GetActiveObject('Excel.Application')};
die "Excel not installed" if $#;
unless (defined $ex) {
$ex = Win32::OLE->new('Excel.Application', sub {$_[0]->Quit;})
or die "Oops, cannot start Excel";
}
# get a new workbook
$book = $ex->Workbooks->Add;
# write to a particular cell
$sheet = $book->Worksheets(1);
$sheet->Cells(1,1)->{Value} = "foo";
# write a 2 rows by 3 columns range
$sheet->Range("A8:C9")->{Value} = [[ undef, 'Xyzzy', 'Plugh' ],
[ 42, 'Perl', 3.1415 ]];
# print "XyzzyPerl"
$array = $sheet->Range("A8:C9")->{Value};
for (#$array) {
for (#$_) {
print defined($_) ? "$_|" : "<undef>|";
}
print "\n";
}
# save and exit
$book->SaveAs( 'test.xls' );
undef $book;
undef $ex;
Basically, Win32::OLE gives you everything that is available to a VBA or Visual Basic application, which includes a huge variety of things -- everything from Excel and Word automation to enumerating and mounting network drives via Windows Script Host. It has come standard with the last few editions of ActivePerl.
There's a section of the Spreadsheet::WriteExcel docs that covers Modifying and Rewriting Spreadsheets.
An Excel file is a binary file within a binary file. It contains several interlinked checksums and changing even one byte can cause it to become corrupted.
As such you cannot simply append or update an Excel file. The only way to achieve this is to read the entire file into memory, make the required changes or additions and then write the file out again.
You can read and rewrite an Excel file using the Spreadsheet::ParseExcel::SaveParser module which is a wrapper around Spreadsheet::ParseExcel and Spreadsheet::WriteExcel. It is part of the Spreadsheet::ParseExcel package.
There's an example as well.
The Spreadsheet::ParseExcel::SaveParser module is a wrapper around Spreadsheet::ParseExcel and Spreadsheet::WriteExcel.
I recently updated the documentation with, what I hope, is a clearer example of how to do this.

Convert Word doc or docx files into text files?

I need a way to convert .doc or .docx extensions to .txt without installing anything. I also don't want to have to manually open Word to do this obviously. As long as it's running on auto.
I was thinking that either Perl or VBA could do the trick, but I can't find anything online for either.
Any suggestions?
A simple Perl only solution for docx:
Use Archive::Zip to get the word/document.xml file from your docx file. (A docx is just a zipped archive.)
Use XML::LibXML to parse it.
Then use XML::LibXSLT to transform it into text or html format. Seach the web to find a nice docx2txt.xsl file :)
Cheers !
J.
Note that an excellent source of information for Microsoft Office applications is the Object Browser. You can access it via Tools → Macro → Visual Basic Editor. Once you are in the editor, hit F2 to browse the interfaces, methods, and properties provided by Microsoft Office applications.
Here is an example using Win32::OLE:
#!/usr/bin/perl
use strict;
use warnings;
use File::Spec::Functions qw( catfile );
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Word';
$Win32::OLE::Warn = 3;
my $word = get_word();
$word->{Visible} = 0;
my $doc = $word->{Documents}->Open(catfile $ENV{TEMP}, 'test.docx');
$doc->SaveAs(
catfile($ENV{TEMP}, 'test.txt'),
wdFormatTextLineBreaks
);
$doc->Close(0);
sub get_word {
my $word;
eval {
$word = Win32::OLE->GetActiveObject('Word.Application');
};
die "$#\n" if $#;
unless(defined $word) {
$word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit })
or die "Oops, cannot start Word: ",
Win32::OLE->LastError, "\n";
}
return $word;
}
__END__
For .doc, I've had some success with the linux command line tool antiword. It extracts the text from .doc very quickly, giving a good rendering of indentation. Then you can pipe that to a text file in bash.
For .docx, I've used the OOXML SDK as some other users mentioned. It is just a .NET library to make it easier to work with the OOXML that is zipped up in an OOXML file. There is a lot of metadata that you will want to discard if you are only interested in the text. Some other people have already written the code I see: DocXToText.
Aspose.Words has a very simple API with great support too I have found.
There is also this bash command from commandlinefu.com which works by unzipping the .docx:
unzip -p some.docx word/document.xml | sed -e 's/<[^>]\{1,\}>//g; s/[^[:print:]]\{1,\}//g'
I strongly recommend AsposeWords if you can do Java or .NET. It can convert, without Word installed, between all major text file types.
If you have some flavour of unix installed, you can use the 'strings' utility to find and extract all readable strings from the document. There will be some mess before and after the text you are looking for, but the results will be readable.
Note that you can also use OpenOffice to perform miscellaneous document, drawing, spreadhseet etc. conversions on both Windows and *nix platforms.
You can access OpenOffice programmatically (in a way analogous to COM on Windows) via UNO from a variety of languages for which a UNO binding exists, including from Perl via the OpenOffice::UNO module.
On the OpenOffice::UNO page you will also find a sample Perl scriptlet which opens a document, all you then need to do is export it to txt by using the document.storeToURL() method -- see a Python example which can be easily adapted to your Perl needs.
.doc's that use the WordprocessingML and .docx's XML format can have their XML parsed to retrieve the actual text of the document. You'll have to read their specifications to figure out which tags contain readable text.
The method of Sinan Ünür works well.
However, I got some crash with the files I was transforming.
Another method is to use Win32::OLE and Win32::Clipboard as such:
Open the Word document
Select all the text
Copy in the Clipboard
Print the content of Clipboard in a txt file
Empty the Clipboard and close the Word document
Based on the script given by Sigvald Refsu in http://computer-programming-forum.com/53-perl/c44063de8613483b.htm, I came up with the following script.
Note: I chose to save the txt file with the same basename as the .docx file and in the same folder but this can easily be changed
###########################################
use strict;
use File::Spec::Functions qw( catfile );
use FindBin '$Bin';
use Win32::OLE qw(in with);
use Win32::OLE::Const 'Microsoft Word';
use Win32::Clipboard;
my $monitor_word=0; #set 1 to watch MS Word being opened and closed
sub docx2txt {
##Note: the path shall be in the form "C:\dir\ with\ space\file.docx";
my $docx_file=shift;
#MS Word object
my $Word = Win32::OLE->new('Word.Application', 'Quit') or die "Couldn't run Word";
#Monitor what happens in MS Word
$Word->{Visible} = 1 if $monitor_word;
#Open file
my $Doc = $Word->Documents->Open($docx_file);
with ($Doc, ShowRevisions => 0); #Turn of revision marks
#Select the complete document
$Doc->Select();
my $Range = $Word->Selection();
with ($Range, ExtendMode => 1);
$Range->SelectAll();
#Copy selection to clipboard
$Range->Copy();
#Create txt file
my $txt_file=$docx_file;
$txt_file =~ s/\.docx$/.txt/;
open(TextFile,">$txt_file") or die "Error while trying to write in $txt_file (!$)";
printf TextFile ("%s\n", Win32::Clipboard::Get());
close TextFile;
#Empty the Clipboard (to prevent warning about "huge amount of data in clipboard")
Win32::Clipboard::Set("");
#Close Word file without saving
$Doc->Close({SaveChanges => wdDoNotSaveChanges});
# Disconnect OLE
undef $Word;
}
Hope it can helps you.
You can't do it in VBA if you don't want to start Word (or another Office application). Even if you meant VB, you'd still have to start a (hidden) instance of Word to do the processing.
I need a way to convert .doc or .docx extensions to .txt without installing anything
for I in *.doc?; do mv $I `echo $ | sed 's/\.docx?/\.txt'`; done
Just joking.
You could use antiword for the older versions of Word documents, and try to parse the xml of the new ones.
With docxtemplater, you can easily get the full text of a word (works with docx only).
Here's the code (Node.JS)
DocxTemplater=require('docxtemplater');
doc=new DocxTemplater().loadFromFile("input.docx");
result=doc.getFullText();
This is just three lines of code and doesn't depend on any word instance (all plain JS)