Perl script for downloading a file not working - perl

This is a test script I wrote to download a file from an URL.
The URL must be a direct link to the download file, and depending on the time provided, the program will count down and download at the specified time.
The problem is, it works for smaller file (~kb), but when I try for big files it freezes.
my $url = 'http://releases.ubuntu.com/14.04.2/ubuntu-14.04.2-desktop-amd64.iso';
my $file = '//strawberry//myscripts//ubuntu-14.04.2-desktop-amd64.iso';
my $starttime = '08.07.15 11:43:11';
my $nowtime = time; # time in sec since 1970
my $sTime = 0;
my $sleepSec = 0;
# parsing the input (start) time
if ( $starttime =~ /^\s*(\d{1,2})\.(\d{1,2})\.(\d{2})\s+(\d{1,2})\:(\d{1,2})\:(\d{1,2})/ ) {
# mktime(sec,min,hr,day,month,year)
# month (0..11); year = 0 => 1900 , year = 100 => 2000
$sTime = mktime( $6, $5, $4, $1, $2 - 1, 100 + $3 );
}
print "\nNow, the time is ---", ctime( $nowtime );
print "\nDownload will start at ---", ctime( $sTime );
$sleepSec = difftime( $sTime, $nowtime );
if ( $sleepSec > 0 ) {
print "I will sleep for $sleepSec seconds, and then download it. zzZ\n";
my $num = $sleepSec;
while ( $num-- ) {
sleep( 1 );
$| = 1;
print "$num\r";
}
my $status = getstore( $url, $file );
die "Error $status on $url" unless is_success( $status );
print "Your file has been downloaded successfully.\n";
}
else {
print "Shit I missed my starttime...\n";
}

Related

Count total records in File and compare with trailer records in Perl

I have the following code
if ( -e $filein ) {
if ( open( FILEIN, $filein ) ) {
print log_date() . "STARTING TO CHECK FILE \n";
while ( <FILEIN> ) {
chomp( $_ );
if ( length( $_ ) == $l_recordLength ) {
$rectype = substr( $_, $0, 3 );
my $config = "fdposmarkertaxentity.ini";
if ( -e $config ) {
my $cfg = new Config::IniFiles( -file => $config );
print "$rectype \n";
if ( $rectype eq "999" ) {
print log_date() . "CHECK TRAILER RECORD IN CHECKFILE SUBROUTINE WHILE CHECKING FILE \n";
$filereccnt = substr( $_, 3, 17 );
if ( ( $filereccnt != $. ) || ( $. <= 2 ) ) {
if ( $BLog ) {
$errhdrordtlrec += $l_valueOne;
print HEADERTRAILER_ERROR_LOGFILE log_date() . "$. \n";
print HEADERTRAILER_ERROR_LOGFILE log_date() . "$_ \n";
print HEADERTRAILER_ERROR_LOGFILE log_date() . "====================================================================================================== \n";
$ERRHDRTRLRRCRDMSG = "Trailer record count does not match record or no records present in file.";
print HEADERTRAILER_ERROR_LOGFILE log_date() . "$ERRHDRTRLRRCRDMSG $filereccnt \n";
print HEADERTRAILER_ERROR_LOGFILE log_date() . "====================================================================================================== \n";
$BTrailerRecAlreadyRead = $l_valueOne;
}
}
}
}
}
}
}
}
And my record file contains below inputs
000FDPOSTAXENTITY2018021317243200001
DTL11~|~|110~|~|220~|~|T0333~|~|~|~|
DTL11~|~|333~|~|444~|~|T0555~|~|~|~|
DTL11~|~|555~|~|222~|~|T0777~|~|~|~|
99900000000000000005~|~|~|~|~|~|~|~|
I want to compare the number of lines from an input file. 00000000000000005 is the line count in a file. My record file has five lines. If the detail record is 5 and the trailer record 00000000000000005 is not 5 or something like 00000000000000007 then it will throw error message like
Trailer record count doesnot match
else if it is 00000000000000005 it will throw
Trailer record count matches
Can anyone help me out with the solution?
Not sure if I understood your question correctly. This is how I would write it:
use strict;
use warnings;
my $records = 0;
my $line;
while (<DATA>) {
#print "LINE: $_";
# inside a record
if (/^000/.../^999/) {
#print "RECORD: $_";
# start of record
$line = 0 if /^000/;
$line++;
# end of record
if (/^999/) {
$records++;
# record length check
my($record_length) = /^999(\d+)/;
$record_length += 0; # integer
die "Record $records is $line line(s) long, but should be $record_length!\n"
unless $line == $record_length;
print "Record $records is OK\n";
}
}
}
__DATA__
asdjaslkdjasd
asdjasdjasld
asdjasdaskd
000FDPOSTAXENTITY2018021317243200001
DTL11~|~|110~|~|220~|~|T0333~|~|~|~|
DTL11~|~|333~|~|444~|~|T0555~|~|~|~|
DTL11~|~|555~|~|222~|~|T0777~|~|~|~|
99900000000000000005~|~|~|~|~|~|~|~|
asdasdasd
asdasdasdasd
000FDPOSTAXENTITY2018021317243200001
DTL11~|~|110~|~|220~|~|T0333~|~|~|~|
DTL11~|~|555~|~|222~|~|T0777~|~|~|~|
99900000000000000005~|~|~|~|~|~|~|~|
asdasdasd
Example run on the embedded data:
Record 1 is OK
Record 2 is 4 line(s) long, but should be 5!

Perl script is producing the symbol  while converting an Excel file to CSV

We have a batch process in our system which will convert an Excel .xlsx file to CSV format using Perl. When it converts the CSV file it produces some symbols like Â, so I am not getting the expected result. Can some please help me how to use the same value as in the Excel file while converting to CSV?
Value in Excel file:
Unverifiable License Documentation NB Only
Value converted in CSV through Perl:
Unverifiable License Documentation – NB Only
I want to retain the same value that is in Excel while converting to CSV
Note: I used Encoding(UTF-8) while opening the file but even then it didn't work.
My Perl code
use Spreadsheet::XLSX;
use File::Basename;
use set_env_cfg;
use Date::Simple (':all');
use Math::Round;
$sts = open( INP, "< ${if}" );
#$sts = open (INP, '<:encoding(UTF-8)', ${if} );
#$sts = open (INP, '<:encoding(ISO-8859-1)', ${if} );
if ( $sts == 0 ) {
print LOG tmstmp() . ": Error opening input file\n";
close LOG;
print LOG "$ldlm\n";
`cp $lf $od`;
die;
}
print LOG "$ldlm\n";
print LOG tmstmp() . ": Conversion started for $if\n";
$oBook = Spreadsheet::XLSX->new($if);
foreach $WkS ( #{ $oBook->{Worksheet} } ) {
print LOG tmstmp() . ": Converting worksheet ----- " . $WkS->{Name}, "\n";
$cfgrec = ''; # initialize the configure record
$sts = open( OUT, ">$od/$WkS->{Name}.txt" );
if ( $sts == 0 ) {
print LOG tmstmp() . ": Error opening output file\n";
close LOG;
close INP;
print LOG "$ldlm\n";
`cp $lf $od`;
die;
}
$WkS->{MaxRow} ||= $WkS->{MinRow};
foreach $iR ( $WkS->{MinRow} .. $WkS->{MaxRow} ) {
$WkS->{MaxCol} ||= $WkS->{MinCol};
print OUT $cfgkey if ( ( $cfgko == 0 ) && ( $iR >= $hdrcnt ) );
foreach $iC ( $WkS->{MinCol} .. $WkS->{MaxCol} ) {
$cell = $WkS->{Cells}[$iR][$iC];
if ($cell) {
if ( ( $cell->{Type} ) eq "Date" ) {
if ( int( $cell->{Val} ) == ( $cell->{Val} ) ) {
$tmpval = date("1900-01-01") + ( $cell->{Val} ) - 2;
}
else {
$css = round( ( ( $cell->{Val} ) - int( $cell->{Val} ) ) * 86400 );
$cmi = int( $css / 60 );
$chr = int( $css / 3600 );
$css = $css - $cmi * 60;
$cmi = $cmi - $chr * 60;
$tmpval = date("1900-01-01") + int( $cell->{Val} ) - 2;
$tmpval .= " $chr:$cmi:$css";
}
}
else {
$tmpval = Spreadsheet::XLSX::Utility2007::unescape_HTML( $cell->{Val} );
}
print OUT $tmpval; ###Added double quotes in txt file to handle the comma delimiter value
}
if ( ( $iR == ${hdr_seq} - 1 ) ) {
if ( ( $cell->{Type} ) eq "Date" ) {
if ( int( $cell->{Val} ) == ( $cell->{Val} ) ) {
$tmpval = date("1900-01-01") + ( $cell->{Val} ) - 2;
}
else {
$css = round( ( ( $cell->{Val} ) - int( $cell->{Val} ) ) * 86400 );
$cmi = int( $css / 60 );
$chr = int( $css / 3600 );
$css = $css - $cmi * 60;
$cmi = $cmi - $chr * 60;
$tmpval = date("1900-01-01") + int( $cell->{Val} ) - 2;
$tmpval .= " $chr:$cmi:$css";
}
}
else {
$tmpval = Spreadsheet::XLSX::Utility2007::unescape_HTML( $cell->{Val} );
}
$cfgrec .= $tmpval;
}
if ( ( $iC == 0 ) && ( $iR == ${hdr_seq} ) ) {
$cfgrec = uc($cfgrec);
$cfgko = cnt_ocr( $cfgrec, $keyhdr );
$cfgkey = "*|" x ( $klm - $cfgko );
}
print OUT "|" if ( $iC < $WkS->{MaxCol} );
print OUT $cfgkey if ( ( $cfgko == $iC + 1 ) && ( $iR >= $hdrcnt ) );
}
print OUT "\n";
}
print LOG tmstmp() . ": Worsheet conversion completed successfully ----- " . $WkS->{Name}, "\n";
close OUT;
push #csv_file_lst, "$WkS->{Name}.txt";
}
print LOG tmstmp() . ": Conversion completed successfully for $if\n";
My guess is that your Excel file contains data encoded using the Windows-1252 code page that has been reencoded into UTF-8 without first being decoded
This string from your Excel file
Unverifiable License Documentation – NB Only
contains an EN DASH, which is represented as "\x96" in Windows-1252. If this is again encoded into UTF-8 the result is the two bytes "\xC2\x96". Interpreting this using Windows-1252 results in the two characters LATIN CAPITAL LETTER A WITH CIRCUMFLEX followed by EN DASH, which is what you're seeing
As far as I can tell, the only change necessary is to open your file with Windows-1252 decoding, like this
open my $fh, '<:encoding(Windows-1252)', $excel_file or die $!
Update
Your revised question shows your Perl code, but has removed the essential information from the Excel data that you show. This string
Unverifiable License Documentation NB Only
now has just two spaces between Documentation and NB and omits the "0x96" n-dash
Note — I've since restored the original data and tidied your code.
Your various attempts at opening the input file are here
$sts=open (INP, "< ${if}" );
#$sts=open (INP, '<:encoding(UTF-8)', ${if} );
#$sts=open (INP, '<:encoding(ISO-8859-1)', ${if} );
and you came very close with ISO-8859-1, but Microsft, in their wisdom, have reused the gaps in ISO-8859-1 encoding between 0x7F and 0x9F to represent printable characters in Windows-1252. The n-dash character at 0x96 is inside this range, so decoding your input as ISO-8859-1 won't render it correctly
As far as I can see, you just need to write
$sts = open (INP, '<:encoding(Windows-1252)', ${if} );
and your input data will be read correctly
You should also specify the encoding of your output file to avoid Wide character in print warnings and malformed data. I can't tell whether you want to duplicate the encoding of your Excel file, use UTF-8, or something else entirely, but you should change this
$sts = open( OUT, ">$od/$WkS->{Name}.txt" );
to either
$sts = open OUT, '>:encoding(Windows-1252)', "$od/$WkS->{Name}.txt";
or
$sts = open OUT, '>:encoding(UTF-8)', "$od/$WkS->{Name}.txt";
as appropriate
Note also that it is best practice to use the three-parameter form of open all the time, and it is best to use lexical file names instead of the global ones that you have. But this isn't a code review, so I've disregarded those points
I hope this underlines to you that it is vital to establish the encoding of your input data and decode it correctly. Guessing really isn't an option
Update
My apologies. I overlooked that the initial open is ignore by the Spreadsheet::XLSX module, which is passed a filename, rather than a file handle
This module is awkward in that it completely hides all character decoding, and relies on [Text::Iconv][Text::Iconv] to do the little conversion that it supports: something that is much better supported by Perl's own [Encode][Encode] module
The change I suggested to your open call is wrong, because it seems that a .xlsx file is a zipped file. However you never read from INP so it will make no difference. You should also close INP immediately after you have opened it as it is a wasted resource
Short of using a different module, the best thing I can suggest is that you hack the data returned by Spreadsheet::XLSX->new
This block will correct the erroneous re-encoding. I have added it right before your foreach $iR ( ... )` loop
You will need to add
use Encode qw/ decode :fallbacks /;
to the top of your code
Please let me know how you get on. Now I really must go!
{
my $columns = $WkS->{Cells};
for my $row ( #$columns ) {
next unless $row;
for my $cell ( #$row) {
next unless $cell and $cell->type eq 'Text';
for ( $cell->{_Value} ) {
$_ = decode('UTF-8', $_, FB_CROAK);
$_ = decode('Windows-1252', $_, FB_CROAK);
}
}
}
}

Copy data from huge files while they are open

I am trying to merge data from huge files to a combined file using Perl.
File will be in open condition and large amount of data is continuously being added to the files. Appending around 50,000 lines per minute.
The files are stored in a network shared folder accessed by between 10 and 30 machines.
These are JTL files generated by JMeter.
This merge runs every minute for about 6 or 7 hours, and the time taken should not be more than 30 to 40 seconds.
The process is triggered every minute by a Web Application deployed in a Linux machine.
I have written a script which stores the last line added by the individual files to the combined file in separate files.
This works fine up to 15 minutes but constantly increase the merge time.
My script
#!/usr/bin/perl
use File::Basename;
use File::Path;
$consolidatedFile = $ARGV[0];
$testEndTimestamp = $ARGV[1];
#csvFiles = #ARGV[ 2 .. $#ARGV ];
$testInProcess = 0;
$newMerge = 0;
$lastLines = "_LASTLINES";
$lastLine = "_LASTLINE";
# time() gives current time timestamp
if ( time() <= $testEndTimestamp ) {
$testInProcess = 1;
}
# File exists, has a size of zero
if ( -z $consolidatedFile ) {
mkdir $consolidatedFile . $lastLines;
$newMerge = 1;
}
open( CONSOLIDATED, ">>" . $consolidatedFile );
foreach my $file ( #csvFiles ) {
open( INPUT, "<" . $file );
#linesArray = <INPUT>;
close INPUT;
if ( $newMerge ) {
print CONSOLIDATED #linesArray[ 0 .. $#linesArray - 1 ];
open my $fh, ">", $consolidatedFile . $lastLines . "/" . basename $file . $lastLine;
print $fh $linesArray[ $#linesArray - 1 ];
close $fh;
}
else {
open( AVAILABLEFILE, "<" . $consolidatedFile . $lastLines . "/" . basename $file . $lastLine );
#lineArray = <AVAILABLEFILE>;
close AVAILABLEFILE;
$availableLastLine = $lineArray[0];
open( FILE, "<" . $file );
while ( <FILE> ) {
if ( /$availableLastLine/ ) {
last;
}
}
#grabbed = <FILE>;
close( FILE );
if ( $testInProcess ) {
if ( $#grabbed > 0 ) {
pop #grabbed;
print CONSOLIDATED #grabbed;
open( AVAILABLEFILE, ">" . $consolidatedFile . $lastLines . "/" . basename $file . $lastLine );
print AVAILABLEFILE $grabbed[ $#grabbed - 1 ];
}
close AVAILABLEFILE;
}
else {
if ( $#grabbed >= 0 ) {
print CONSOLIDATED #grabbed;
}
}
}
}
close CONSOLIDATED;
if ( !$testInProcess ) {
rmtree $consolidatedFile . $lastLines;
}
I need to optimize the script in order to reduce the time.
Is it possible to store last line in a cache?
Can anyone suggest another way for this type of merging?
Another script which stores last line in cache instead of file.
Even this does not complete merge within 1 min.
#!/usr/bin/perl
use CHI;
use File::Basename;
use File::Path;
my $cache = CHI->new(
driver => 'File',
root_dir => '/path/to/root'
);
$consolidatedFile = $ARGV[0];
$testEndTimestamp = $ARGV[1];
#csvFiles = #ARGV[ 2 .. $#ARGV ];
$testInProcess = 0;
$newMerge = 0;
$lastLines = "_LASTLINES";
$lastLine = "_LASTLINE";
# time() gives current time timestamp
if ( time() <= $testEndTimestamp ) {
$testInProcess = 1;
}
# File exists, has a size of zero
if ( -z $consolidatedFile ) {
$newMerge = 1;
}
open( CONSOLIDATED, ">>" . $consolidatedFile );
foreach my $file (#csvFiles) {
$fileLastLineKey =
$consolidatedFile . $lastLines . "_" . basename $file . $lastLine;
open( INPUT, "<" . $file );
#linesArray = <INPUT>;
close INPUT;
if ($newMerge) {
print CONSOLIDATED #linesArray[ 0 .. $#linesArray - 1 ];
$fileLastLine = $linesArray[ $#linesArray - 1 ];
$cache->set( $fileLastLineKey, $fileLastLine );
}
else {
$availableLastLine = $cache->get($fileLastLineKey);
open( FILE, "<" . $file );
while (<FILE>) {
if (/$availableLastLine/) {
last;
}
}
#grabbed = <FILE>;
close(FILE);
if ($testInProcess) {
if ( $#grabbed > 0 ) {
pop #grabbed;
print CONSOLIDATED #grabbed;
$fileLastLine = $grabbed[ $#grabbed - 1 ];
$cache->set( $fileLastLineKey, $fileLastLine );
}
}
else {
if ( $#grabbed >= 0 ) {
print CONSOLIDATED #grabbed;
$cache->remove($fileLastLineKey);
}
}
}
}
close CONSOLIDATED;
I am thinking of reading files from last line to required line and copy those lines to consolidated file.
Can anyone suggest on this???
You may want to try open the file in binmode and read it blockwise in a loop. This usually offers significant performance improvements. The following functions is an example, here i put at maximum $maxblocks blocks of a file on the array, from block $offset on, in an array passed as reference. Note that the last block may not contain the entire $block bytes when the file is not large enough.
sub file2binarray {
my $file=shift;
my $array=shift;
my $maxblocks=shift;
my $offset=shift;
my $block=2048;
$offset=0 if ((!defined($offset)) || ($offset !~/^\s*\d+\s*$/o));
$maxblocks="ALL"
if (!defined($maxblocks) || ($maxblocks!~/^\s*\d+\s*$/o));
my $size=(stat($file))[7];
my $mb=$size/$block;
$mb++ if ($mb*$block<$size);
$maxblocks=$mb-$offset if(($maxblocks eq "ALL")||
($maxblocks>$mb-$offset));
$offset*=$block;
open(IN,"$file") || die("Cannot open file <$file>\n");
binmode(IN);
$bytes_read=$block;
seek(IN,$offset,0);
my ($blk,$bytes_read,$buffer)=(0,0,"");
while (($bytes_read==$block)&& ($blk<$maxblocks)){
$bytes_read=sysread(IN,$buffer,$block);
push(#$array,$buffer);
$blk++;
}
close(IN);
}
To read the entire file at ones, e.g. you call it like this
my #array;
my $filename="somefile";
file2binarray ($filename,\#array,"ALL",0);
but probably you'd rather call it in a loop with some bookkeeping over the offset, and parse the array in between subsequent calls.
Hope this helps.

perl while loop - will only exit the first time the condition is met

I have a script that has multiple 3 if statements, depending on which range the "cases" fall under changes the calculation that will be made. I am just starting to learn perl, I think the problem is that my last input isn't part of the while block but I can't figure out how to keep it inside without getting a curly bracket error
[flata#localhost bin]$ casespacked.pl
Please enter your name: done
[flata#localhost bin]$ casespacked.pl
Please enter your name: bill
Please enter pay rate: 10
Please enter hours worked: 20
Please enter cases packed: 4
Name: bill, Hours: 20, Regular Pay: 200, Bonus Pay: 20, Total Pay: 220
Please enter your name: done
Please enter pay rate:^C
the loop works as it should, and will keep producing the results I need but it won't exit correctly, only if you enter done the first time it asks for your name
#!/usr/bin/perl
print "Please enter your name: ";
chop( my $Name = <stdin> );
while ( $Name ne "done" ) {
print "Please enter pay rate: ";
chop( my $Rate = <stdin> );
print "Please enter hours worked: ";
chop( my $Hours = <stdin> );
print "Please enter cases packed: ";
chop( my $Cases = <stdin> );
if ( $Cases >= 1 && $Cases <= 9 ) {
$Bonus = ( $Cases * 5 );
$Pay = ( $Rate * $Hours );
$Total = ( $Bonus + $Pay );
}
if ( $Cases >= 10 && $Cases <= 20 ) {
$Bonus = ( $Cases * 8 );
$Pay = ( $Rate * $Hours );
$Total = ( $Bonus + $Pay );
}
if ( $Cases >= 20 ) {
$Bonus = ( $Cases * 10 );
$Pay = ( $Rate * $Hours );
$Total = ( $Bonus + $Pay );
}
{
print "Name: $Name, Hours: $Hours, Regular Pay: $Pay, Bonus Pay: $Bonus, Total Pay: $Total\n";
}
print "Please enter your name: ";
chop( my $Name = <stdin> );
}
even if I move the print "please enter your name: "; up so its after the last if statement, it won't exit which should still be inside the while loop ? or am I not understanding
Your bug is caused because your reading the subsequent $Name lexically scoped to the while block, and then it reverts to its previous value before being tested in the while (COND):
chop( my $Name = <stdin> );
} # $Name reverts to previous value
That will fix your current bug.
However, I would like to suggest a few other things:
Always include use strict; and use warnings; at the top of EVERY Perl script.
Instead of trying to tie your condition directly to the while loop, just use an infinite loop that you break out of using last.
Use chomp instead of chop
Use all lowercase for variable names. Read perlstyle for specifics.
When your conditions are linked, use if, elsif, else instead of a chain of independent if's. If you then reorder the conditions, you can therefore just check if something is greater than, which simplifies your logic.
The following demonstrates these and other fixes:
#!/usr/bin/perl
use strict;
use warnings;
while (1) {
print "Please enter your name: ";
chomp( my $name = <STDIN> );
last if $name eq "done";
print "Please enter pay rate: ";
chomp( my $rate = <STDIN> );
print "Please enter hours worked: ";
chomp( my $hours = <STDIN> );
print "Please enter cases packed: ";
chomp( my $cases = <STDIN> );
my $bonus;
if ( $cases >= 20 ) {
$bonus = $cases * 10;
} elsif ( $cases >= 10 ) {
$bonus = $cases * 8;
} elsif ( $cases >= 1 ) {
$bonus = $cases * 5;
} else {
warn "Invalid number of Cases: $cases\n";
next;
}
my $pay = $rate * $hours;
my $total = $bonus + $pay;
print "Name: $name, Hours: $hours, Regular Pay: $pay, Bonus Pay: $bonus, Total Pay: $total\n";
}
You're using my $Name which creates a new variable, masking the previous one. Remove the my from the last assignment and it will start working.

Problems reading header line from my Excel 2007 files created with Perl

I have a problem with merging two dynamically created Excel 2007 files.
My files are created with the Perl Module Excel::Writer::XLSX on Solaris.
Say I have two files, fileA.xlsx and fileB.xlsx. Now I want to merge them together (fileA + fileB => fileC).
It is not really possible at this time to append fileB to fileA. This is a limitation of Excel::Writer::XLSX, which can only create new files.
Both .xlsx files can be opened without complaints in Excel 2007, in LibreOffice 3 (on linux), and (with the help of Microsoft's xlsx to xls converters) even in Excel 2003.
However, when I open them with perl (using the module Spreadsheet::XLSX), the contents of the header row, (row 0) are always skipped;
# ...
foreach my $infile (#infiles) {
my $excel = Spreadsheet::XLSX->new($infile);
my $i = 0;
foreach my $sheet ( #{ $excel->{Worksheet} } ) {
printf( "Infile '$infile', Sheet $i: %s\n", $sheet->{Name} );
$sheet->{MaxRow} ||= $sheet->{MinRow};
print "$infile: " . $sheet->{MaxRow} . " rows\n";
print "data starts at row: " . $sheet->{MinRow} . ". \n";
next unless $i == 0; # only copy data from the first sheet (for speed)
my $start_row = $sheet->{MinRow};
foreach my $row ( $start_row .. $sheet->{MaxRow} ) {
$sheet->{MaxCol} ||= $sheet->{MinCol};
foreach my $col ( $sheet->{MinCol} .. $sheet->{MaxCol} ) {
my $cell = $sheet->{Cells}[$row][$col];
if ($cell) {
# do something with the data
# ...
# write to outfile
$excel_writer->sheets(0)->write($dest_row, $col, $cell->{Val} )
}
}
}
}
}
Now, the ouput of this code fragment is always
data starts at row: 1.
But this is not true, it starts at row 0. If I manually go to read in data from row0, $cell is undefined (although it shouldn't be).
Interestingly, when I open the file in Microsoft Excel, and change it trivially, (say, by adding a blank space to one of the cell values in the header row), and save the file, then the header row IS found by the code above.
data starts at row: 0.
By the way, when I open, change, save the file in LibreOffice, there are numerous warnings concerning date values when I re-read them with the code above. (Thus, datetime values seem to be saved slightly incorrectly by LibreOffice).
The code that produces the files looks like this (note: some vars are defined outside of this sub):
sub exportAsXLS {
#require Spreadsheet::WriteExcel;
require Excel::Writer::XLSX;
my ( $data, $dir, $sep, #not2export ) = #_;
my $val;
my $EXCEL_MAXROW = 1048576;
return undef unless $data;
return "." unless scalar #$data > 0;
my $time = time2str( "%Y%m%d_%H%M%S", time() );
my $file = "$outdir/$dir/${host}_${port}-${time}.xlsx";
#my $workbook = Spreadsheet::WriteExcel->new($file);
my $workbook = Excel::Writer::XLSX->new($file);
$workbook->set_optimization();
my $worksheet = $workbook->add_worksheet();
# Set the default format for dates.
#my $date_formatHMS = $workbook->add_format( num_format => 'mmm d yyyy hh:mm AM/PM' );
#my $date_formatHMS = $workbook->add_format( num_format => 'yyyy-mm-ddThh:mm:ss.sss' );
my %formats;
$formats{date_HM} = $workbook->add_format( num_format => 'yyyy-mm-ddThh:mm' );
$formats{date_HMS} = $workbook->add_format( num_format => 'yyyy-mm-ddThh:mm:ss' );
$formats{num} = $workbook->add_format();
$formats{num}->set_num_format();
$formats{headline} = $workbook->add_format();
$formats{headline}->set_bold();
$formats{headline}->set_num_format('#');
# Format as a string. use the Excel text format #:
# Doesn't change to a number when edited
$formats{string} = $workbook->add_format( num_format => '#' );
$worksheet->set_row( 0, 15, $formats{headline} );
my $row = 0;
my $col = 0;
for ( my $r = -1 ; $r < #$data && $r < $EXCEL_MAXROW ; $r++ ) {
for ( my $i = 0 ; $i < #$column ; $i++ ) {
next if grep( $_ eq $column->[$i], #not2export );
my $val = $data->[$r]{ $column->[$i] };
my $t = int $type->[$i];
if ( $r < 0 ) {
#warn " type: $type->[$i] , ";
# Erste Zeile = Spaltennamen ausgeben
$worksheet->write_string( $row, $col++, $column->[$i], $formats{string});
#$worksheet->write_comment( 0, 0, "\x{263a}" ); # Smiley
#$worksheet->write( $row, $col++, $column->[$i], $formats{headline} );
} elsif ( ( $t == 11 ) or ( $t == 9 ) ) {
# 11 - Der Wert ist ein Datum, im SHORT Format, 9- long
$val = time2str( "%Y-%m-%dT%H:%M:%S", str2time( $data->[$r]{ $column->[$i] } ) );
$worksheet->write_date_time( $row, $col++, $val, $formats{date_HMS} );
} else {
$worksheet->write( $row, $col++, $val );
}
}
$col = 0;
$row++;
}
return $file;
}
The difference between the files is as follows.
On the left is the file that Excel::Writer::XLSX produces. ON the right is the file that MS Excel 2003 produces after a trivial change to the header row. the row header data is refactored, externalized to a different file, sharedStrings.xml
Which looks like this.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="5" uniqueCount="5">
<si>
<t>SITE</t>
</si>
<si>
<t>LOG_DATE</t>
</si>
<si>
<t>KTZI201_WF_TEMPERATUR</t>
</si>
<si>
<t>KTZI300_TEMP_RESERVOIR</t>
</si>
<si>
<t>XPEDITION</t>
</si>
</sst>
Spreadsheet::XLSX can read the header if the .xlsx file is formatted as shown on the right half of the picture, but skips the header row when formatted as shown on the left half.
When I run your program against the output of this Excel::Writer::XLSX example program it correctly reports data in the first row (row == 0):
Infile 'a_simple.xlsx', Sheet 0: Sheet1
a_simple.xlsx: 10 rows
data starts at row: 0.
Perhaps you should double check the program that is producing the input files.
Also, make sure you are on the latest version of Excel::Writer::XLSX.