Spreadsheet::ParseExcel building an array or hash - perl

I am new to Spreadsheet::ParseExcel. I have a space-delimited file which I opened in Microsoft Excel and saved it as a XLS file.
I installed Spreadsheet::ParseExcel and used the example code in documentation to print the contents of the file. My objective is to build an array of some of the data to write to a database. I just need a little help building the array -- writing to a database I'll figure out later.
I'm having a hard time understanding this module -- I did read the documentation, but because of my inexperience I'm unable to understand it.
Below is the code I'm using for the output.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use Spreadsheet::ParseExcel;
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse( 'file.xls' );
if ( !defined $workbook ) {
die $parser->error(), ".\n";
}
for my $worksheet ( $workbook->worksheets() ) {
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
for my $row ( $row_min .. $row_max ) {
for my $col ( $col_min .. $col_max ) {
my $cell = $worksheet->get_cell( $row, $col );
next unless $cell;
print "Row, Col = ($row, $col)\n";
print "Value = ", $cell->value(), "\n";
print "Unformatted = ", $cell->unformatted(), "\n";
print "\n";
}
}
}
And here is some of the output
Row, Col = (0, 0)
Value = NewRecordFlag
Unformatted = NewRecordFlag
Row, Col = (0, 1)
Value = AgencyName
Unformatted = AgencyName
Row, Col = (0, 2)
Value = CredentialIdnt
Unformatted = CredentialIdnt
Row, Col = (0, 3)
Value = ContactIdnt
Unformatted = ContactIdnt
Row, Col = (0, 4)
Value = AgencyRegistryCardNumber
Unformatted = AgencyRegistryCardNumber
Row, Col = (0, 5)
Value = Description
Unformatted = Description
Row, Col = (0, 6)
Value = CredentialStatusDescription
Unformatted = CredentialStatusDescription
Row, Col = (0, 7)
Value = CredentialStatusDate
Unformatted = CredentialStatusDate
Row, Col = (0, 8)
Value = CredentialIssuedDate
Unformatted = CredentialIssuedDate
My objective is to build an array of CredentialIssuedDate, AgencyRegistryCardNumber, and AgencyName. Once I grasp the concept of doing that, I can go to town with this great module.

Here's a quick example of something that should work for you. It builds an array #rows of arrays of the three field values you want for each worksheet, and displays each result using Data::Dumper. I haven't been able to test it, but it looks right and does compile
It starts by building a hash %headers that relates the column header strings to the column number, based on the first row in each worksheet.
Then the second row onwards is processed, extracting the cells in the columns named in the #wanted array, and putting their values in the array #row, which is pushed onto #rows as each one is accumulated
#!/usr/bin/perl
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use Data::Dumper;
my #wanted = qw/
CredentialIssuedDate
AgencyRegistryCardNumber
AgencyName
/;
my $parser = Spreadsheet::ParseExcel->new;
my $workbook = $parser->parse('file.xls');
if ( not defined $workbook ) {
die $parser->error, ".\n";
}
for my $worksheet ( $workbook->worksheets ) {
my ( $row_min, $row_max ) = $worksheet->row_range;
my ( $col_min, $col_max ) = $worksheet->col_range;
my %headers;
for my $col ( $col_min, $col_max ) {
my $header = $worksheet->get_cell($row_min, $col)->value;
$headers{$header} = $col;
}
my #rows;
for my $row ( $row_min + 1 .. $row_max ) {
my #row;
for my $name ( #wanted ) {
my $col = $headers{$name};
my $cell = $worksheet->get_cell($row, $col);
push #row, $cell ? $cell->value : "";
}
push #rows, \#row;
}
print Dumper \#rows;
}

I was able to resolve this by using the Spreadsheet::BasicReadNamedCol module
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use Spreadsheet::BasicReadNamedCol;
my $xlsFileName = 'shit.xls';
my #columnHeadings = (
'AgencyName',
'eMail',
'PhysicalAddress1',
'PhysicalAddress2'
);
my $ss = new Spreadsheet::BasicReadNamedCol($xlsFileName) ||
die "Could not open '$xlsFileName': $!";
$ss->setColumns(#columnHeadings);
# Print each row of the spreadsheet in the order defined in
# the columnHeadings array
my $row = 0;
while (my $data = $ss->getNextRow())
{
$row++;
print join('|', $row, #$data), "\n";
}

Related

How to remove entire array that have specific character

I have a perl code that extract from .xls file. My .xls file is as below
NUMBER NAME ALPHABET
one Jane a
two Adam b
three Josh c
;four
five Agnes e
six Mary f
;seven
eight Lara h
I want to extract the info and only take column 1 and 2. My perl code is as below.
#!/usr/bin/perl
use warnings;
use strict;
use Spreadsheet::ParseExcel;
main ();
sub main {
my $filename = 'Book1.xls';
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse( $filename );
if ( !defined $workbook ) {
die "-E-: cannot parse <$filename>.\n ", $parser->error(), ".\n";
}
my $worksheet = $workbook -> Worksheet ( 'a' ) || die "-E-: cannot parse family pin list.\n";
my ( $row_min, $row_max ) = $worksheet-> row_range();
open ( my $file,"> output.txt");
for my $row ( 1 .. $row_max ) {
my #data;
for my $col ( 0 ) {
my $number = $worksheet-> get_cell( $row, $col );
if ( $number ) {
push #data, $number-> value();
}
else {
push #data, '';
}
}
for my $col ( 2 ) {
my $alphabet = $worksheet->get_cell( $row, $col );
if ( $alphabet ) {
push #data, $alphabet->value();
print $file "#data\n";
}
else {
push #data, '';
}
}
}
close $file;
print "done\n";
}
The result is
one a
two b
three c
;four
five e
six f
;seven
eight h
I want to remove the entire array that start with string ";". I extend my code like below
open ( my $file,"> output.txt");
for my $row ( 1 .. $row_max ) {
my #data;
for my $col ( 0 ) {
my $number = $worksheet-> get_cell( $row, $col );
if ( $number ) {
push #data, $number-> value();
}
else {
push #data, '';
}
}
for my $col ( 11 ) {
my $alphabet = $worksheet->get_cell( $row, $col );
if ( $alphabet ) {
push #data, $alphabet->value();
}
else {
push #data, '';
}
}
my #new_data = grep(!/;/, #data);
my #latest_data = grep ( $_ ne '', #new_data);
print $file "#latest_data\n";
}
close $file;
print "done\n";
}
The output result produce like below.
one a
two b
three c
five e
six f
eight h
I don't want to be empty space. How i want to eliminate the empty space that produce result as below?
one a
two b
three c
five e
six f
eight h
I also try doing like this, but the result is same.
for my $index (reverse 0..$#data) {
if ( $data[$index] =~ /^;/ ) {
splice(#data, $index, 1);
}
}
print $file "#data\n";
I hope you are looking for next
use warnings;
use strict;
#data=("one a", "two b","three c", ";four d","five e","six e","seven g","eight h")
for(#data){
next if (/^;/);
print $_,"\n";
}
let's get your data gist
#data=("one a", "two b","three c", ";four d","five e","six e","seven g","eight h")
for(#data){
s/^;.*//; # // <-- Put Replacer here between `//` , leave it to remove line
push(#new_data,$_)}
print "$_\n" for #new_data

Hash assignment as array

I'm trying to understand the piece of code below; I just cannot understand what is being done in line 15.
It seems like it is trying to initialise/assign to %heading but I am just not sure how that syntax works.
$strings = [qw(city state country language code )];
my $file = "fname";
my $fn = $strings;
my $c = 0;
open( FILEH, "< ${file}.txt" ) or die( $! );
while ( <FILEH> ) {
my %heading;
chomp;
$c++;
#heading{ ( #$fn, "One" ) } = split( /[|]/ ); # Line 15
if ( defined( $heading{"One"} ) ) {
my $One = $heading{"One"};
}
That's called a "slice". It assigns to several keys at once:
#hash{ $key1, $key2 } = ($value1, $value2);
is a shorter and faster way of doing
$hash{$key1} = $value1;
$hash{$key2} = $value2;
#$fn is the same as #{ $fn }, i.e. array dereference.

How to delete entire column in Excel sheet and write updated data in new excel file using Perl?

I am new to Perl. I have excel file say "sample.xls" which looks like follows.
Sample.xls
There are about data of 1000 rows like this. I want to parse this file and write it in another file say "output.xls" with following output format.
output.xls
I have written a script in perl, however, it doesn't give me the exact output the way I want. Also, looks like the script is not very efficient. Can anyone guide me how I can improve my script as well as have my output as shown in "output.xls" ??
Here's the Script:
#!/usr/bin/perl –w
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use Spreadsheet::WriteExcel;
use Spreadsheet::WriteExcel::Chart;
# Read the input and output filenames.
my $inputfile = "path/sample.xls";
my $outputfile = "path/output.xls";
if ( !$inputfile || !$outputfile ) {
die( "Couldn't find file\n" );
}
my $parser = Spreadsheet::ParseExcel->new();
my $inwb = $parser->parse( $inputfile );
if ( !defined $inwb ) {
die "Parsing error: ", $parser->error(), ".\n";
}
my $outwb = Spreadsheet::WriteExcel->new( $outputfile );
my $inws = $inwb->worksheet( "Sheet1" );
my $outws = $outwb->add_worksheet("Sheet1");
my $out_row = 0;
my ( $row_min, $row_max ) = $inws->row_range();
my ( $col_min, $col_max ) = $inws->col_range();
my $format = $outwb->add_format(
center_across => 1,
bold => 1,
size => 10,
border => 4,
color => 'black',
border_color => 'black',
align => 'vcenter',
);
$outws->write(0,0, "Item Name", $format);
$outws->write(0,1, "Spec", $format);
$outws->write(0,2, "First name", $format);
$outws->write(0,3, "Middle Name", $format);
$outws->write(0,4, "Last Name", $format);
$outws->write(0,5, "Customer Number", $format);
$outws->write(0,6, "Age", $format);
$outws->write(0,7, "Units", $format);
my $col_count = 1;
#$row_min = 1;
for my $inws ( $inwb->worksheets() ) {
my ( $row_min, $row_max ) = $inws->row_range();
my ( $col_min, $col_max ) = $inws->col_range();
for my $in_row ( 2 .. $row_max ) {
for my $col ( 0 .. 0 ) {
my $cell = $inws->get_cell( $in_row, $col);
my #fields = split /_/, $cell->value();
next unless $cell;
$outws->write($in_row,$col, $cell->value());
$outws->write($in_row,$col+1, $fields[1]);
}
}
for my $in_row ( 2 .. $row_max ) {
for my $col ( 1 .. 1 ) {
my $cell = $inws->get_cell( $in_row, $col);
my #fields = split /_/, $cell->value();
next unless $cell;
#$outws->write($in_row,$col+1, $cell->value());
$outws->write($in_row,$col+1, $fields[0]);
$outws->write($in_row,$col+2, $fields[1]);
$outws->write($in_row,$col+3, $fields[2]);
$outws->write($in_row,$col+4, $fields[3]);
}
}
for my $in_row ( 2 .. $row_max ) {
for my $col ( 2 .. 2 ) {
my $cell = $inws->get_cell( $in_row, $col);
my #fields = split /_/, $cell->value();
next unless $cell;
$outws->write($in_row,6, $cell->value());
}
}
for my $in_row ( 2 .. $row_max ) {
for my $col ( 3 .. 9 ) {
my $cell = $inws->get_cell( $in_row, $col);
next unless $cell;
}
}
for my $in_row ( 2 .. $row_max ) {
for my $col ( 10 .. 10 ) {
my $cell = $inws->get_cell( $in_row, $col );
next unless $cell;
$outws->write($in_row,7, $cell->value());
}
}
}
To get your output sorted, you need to collect all the information first before you are writing it out. Right now, you are doing a bit of jumping back and forth between rows and columns.
Here are some changes I would make to get it sorted, and make it more efficient (to read).
Create a data structure $data outside of your loop to store all the information.
If there is only one worksheet, you don't need to loop over sheets. Just work with one sheet.
Loop over the lines.
Inside that loop, use the code you have to parse the individual fields to just parse them. No 2..2 loops. Just a bunch of statements.
my #item_fields = split /_/, $inws->get_cell( $in_row, 0 ) || q{};
my #name_fields = split /_/, $inws->get_cell( $in_row, $col ) || q{};
Store them in $data per item.
push #{ $data } = [ $item_fields[0], ... ];
Done with the loop. Open the output file.
Loop over $data with a sort and write to the output file.
foreach my $row (sort { $a->[0] cmp $b->[0] } #{ $data } ) { ... }
Done.
I suggest you read up on sort and also check out perlref and perlreftut to learn more about references (data structures).

Create Chart using another worksheet in (Spreadsheet::WriteExcel::Chart) Perl module

I have started using Spreadsheet::WriteExcel::Chart for writing chart.I have 10 worksheet which contain data. I have added one worksheet for chart. My question is, How can I create chart using another worksheet data.
2) I also tried another approach In which,I have created new worksheet and load data from old worksheet than try to create chart.
3)In $data,we are hardcoding value, is it possible that can we fetch value from cell rather
than hardcode value
In both case,I am not able to figure out the solution.
use Spreadsheet::ParseExcel;
use Spreadsheet::WriteExcel;
use Spreadsheet::ParseExcel::SaveParser;
# Open the template with SaveParseir
my $parser = new Spreadsheet::ParseExcel::SaveParser;
my $workbook = $parser->Parse('DD1.xls');
if ( !defined $workbook ) {
die $parser->error(), ".\n";
}
#Create the New worksheet
my $Flop_workbook = Spreadsheet::WriteExcel->new('Flop.xls');
for my $worksheet ( $workbook->worksheets() ) {
my ( $row_min, $row_max ) = $worksheet->row_range();
my ( $col_min, $col_max ) = $worksheet->col_range();
my $worksheetname = $worksheet->get_name();
print "worksheetname : ", $worksheetname ,"\n";
my $sheet = $Flop_workbook->add_worksheet($worksheetname);
for my $row ( $row_min .. $row_max ) {
for my $col ( $col_min .. $col_max ) {
my $cell = $worksheet->get_cell( $row, $col );
next unless $cell;
print "Row, Col = ($row, $col)\n";
print "Value = ", $cell->value, "\n";
print "Unformatted = ", $cell->unformatted(), "\n";
$sheet->write_string($row,$col,$cell->value);
print "\n";
}
}
}
my $chartsheet = $Flop_workbook->add_worksheet('Chart_data');
my $chart = $Flop_workbook->add_chart( type => 'line' );
#Configure the chart.
$chart->add_series(
categories => '=SUM_F_SCDLIB_DFF!$I$2',
values => '=SUM_F_SCDLIB_DFF!$I$2',
); (Failed here As I need to load another worksheet data)
# Add the worksheet data the chart refers to.
my $data = [
[ 'Category', 2, 3, 4, 5, 6, 7 ],
[ 'Value', 1, 4, 5, 2, 1, 5 ],
];
$chartsheet->write( 'AB5', $data );
#$template->SaveAs('newfile.xlsx');

making cells blank in and excel sheet using perl

I have an excel sheet to which I need to empty some cells
So far this is what it looks like:
I open the sheet, and check for cells not empty in column M.
I add those cells to my array mistake
and then I would like to make black all those cells and save the file (this step not working), as that file needs to be the input to anotherprogram/
thanks!
$infile = $ARGV[0];
$columns = ReadData($infile) or die "cannot open excel table\n\n";
print "xls sheet contains $columns->[1]{maxrow} rows\n";
my $xlsstartrow;
if ( getExcel( A . 1 ) ne "text" ) {
$xlsstartrow = 2;
}
else
{
$xlsstartrow = 4;
}
check_templates();
print "done";
sub check_templates {
for ( $row = $xlsstartrow ; $row < ( $columns->[1]{maxrow} + 1 ) ; $row++ ) {
if (getExcel(M . $row) ne "" ){
$cell = "M" . $row ;
push(#mistakes,$cell);
}
}
rewritesheet(#mistakes);
}
sub rewritesheet {
my $FileName = $infile;
my $parser = Spreadsheet::ParseExcel::SaveParser->new();
my $template = $parser->Parse($FileName);
my $worksheet = $template->worksheet(0);
my $row = 0;
my $col = 0;
# Get the format from the cell
my $format = $template->{Worksheet}[$sheet]
->{Cells}[$row][$col]
->{FormatNo};
foreach (#mistakes){
$worksheet->AddCell( $_, "" );
}
$template->SaveAs($infile2);`
Empty column values in an Excel sheet and save the result?
If the whole purpose of your program is to delete all column M values from a .xls file, then the following program (adopted from your program) will do exactly that:
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use Spreadsheet::ParseExcel::SaveParser;
my $infile = $ARGV[0];
(my $infile2 = $infile) =~ s/(\.xls)$/_2$1/;
my $parser = Spreadsheet::ParseExcel::SaveParser->new();
my $workbook = $parser->Parse($infile);
my $sheet = $workbook->worksheet(0);
print "xls sheet contains rows \[0 .. $sheet->{MaxRow}\]\n";
my $startrow = $sheet->get_cell(0, 0) eq 'text' ? 4-1 : 2-1;
my $col_M = ord('M') - ord('A');
for my $row ($startrow .. $sheet->{MaxRow}) {
my $c = $sheet->get_cell($row, $col_M);
if(defined $c && length($c->value) > 0) { # why check?
$sheet->AddCell($row, $col_M, undef) # delete value
}
}
$workbook->SaveAs($infile2);
print "done";
But, if you really want to clear out column M only, why would you test for values? You could just overwrite them without test. Maybe thats not all your program is required to perform? I don't know.
Regards
rbo