Perl - empty rows while writing CSV from Excel - perl

I want to convert excel-files to csv-files with Perl. For convenience I like to use the module File::Slurp for read/write operations. I need it in a subfunction.
While printing out to the screen, the program generates the desired output, the generated csv-files unfortunately just contain one row with semicolons, field are empty.
Here is the code:
#!/usr/bin/perl
use File::Copy;
use v5.14;
use Cwd;
use File::Slurp;
use Spreadsheet::ParseExcel;
sub xls2csv {
my $currentPath = getcwd();
my #files = <$currentPath/stage0/*.xls>;
for my $sourcename (#files) {
print "Now working on $sourcename\n";
my $outFile = $sourcename;
$outFile =~ s/xls/csv/g;
print "Output CSV-File: ".$outFile."\n";
my $source_excel = new Spreadsheet::ParseExcel;
my $source_book = $source_excel->Parse($sourcename)
or die "Could not open source Excel file $sourcename: $!";
foreach my $source_sheet_number ( 0 .. $source_book->{SheetCount} - 1 )
{
my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];
next unless defined $source_sheet->{MaxRow};
next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
next unless defined $source_sheet->{MaxCol};
next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};
foreach my $row_index (
$source_sheet->{MinRow} .. $source_sheet->{MaxRow} )
{
foreach my $col_index (
$source_sheet->{MinCol} .. $source_sheet->{MaxCol} )
{
my $source_cell =
$source_sheet->{Cells}[$row_index][$col_index];
if ($source_cell) {
print $source_cell->Value, ";"; # correct output!
write_file( $outFile, { binmode => ':utf8' }, $source_cell->Value, ";" ); # only one row of semicolons with empty fields!
}
}
print "\n";
}
}
}
}
xls2csv();
I know it has something to do with the parameter passing in the write_file function, but couldn't manage to fix it.
Has anybody an idea?
Thank you very much in advance.

write_file will overwrite the file unless the append => 1 option is given. So this:
write_file( $outFile, { binmode => ':utf8' }, $source_cell->Value, ";" );
Will write a new file for each new cell value. It does however not match your description of "only one row of semi-colons of empty fields", as it should only be one semi-colon, and one value.
I am doubtful towards this sentiment from you: "For convenience I like to use the module File::Slurp". While the print statement works as it should, using File::Slurp does not. So how is that convenient?
What you should do, if you still want to use write_file is to gather all the lines to print, and then print them all at once at the end of the loop. E.g.:
$line .= $source_cell->Value . ";"; # use concatenation to build the line
...
push #out, "$line\n"; # store in array
...
write_file(...., \#out); # print the array
Another simple option would be to use join, or to use the Text::CSV module.

Well, in this particular case, File::Slurp was indeed complicating this for me. I just wanted to avoid to repeat myself, which I did in the following clumsy working solution:
#!/usr/bin/perl
use warnings;
use strict;
use File::Copy;
use v5.14;
use Cwd;
use File::Basename;
use File::Slurp;
use Tie::File;
use Spreadsheet::ParseExcel;
use open qw/:std :utf8/;
# ... other functions
sub xls2csv {
my $currentPath = getcwd();
my #files = <$currentPath/stage0/*.xls>;
my $fh;
for my $sourcename (#files) {
say "Now working on $sourcename";
my $outFile = $sourcename;
$outFile =~ s/xls/csv/gi;
if ( -e $outFile ) {
unlink($outFile) or die "Error: $!";
print "Old $outFile deleted.";
}
my $source_excel = new Spreadsheet::ParseExcel;
my $source_book = $source_excel->Parse($sourcename)
or die "Could not open source Excel file $sourcename: $!";
foreach my $source_sheet_number ( 0 .. $source_book->{SheetCount} - 1 )
{
my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];
next unless defined $source_sheet->{MaxRow};
next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
next unless defined $source_sheet->{MaxCol};
next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};
foreach my $row_index (
$source_sheet->{MinRow} .. $source_sheet->{MaxRow} )
{
foreach my $col_index (
$source_sheet->{MinCol} .. $source_sheet->{MaxCol} )
{
my $source_cell =
$source_sheet->{Cells}[$row_index][$col_index];
if ($source_cell) {
print $source_cell->Value, ";";
open( $fh, '>>', $outFile ) or die "Error: $!";
print $fh $source_cell->Value, ";";
close $fh;
}
}
print "\n";
open( $fh, '>>', $outFile ) or die "Error: $!";
print $fh "\n";
close $fh;
}
}
}
}
xls2csv();
I'm actually NOT happy with it, since I'm opening and closing the files so often (I have many files with many lines). That's not very clever in terms of performance.
Currently I still don't know how to use the split or Text:CSV in this case, in order to put everything into an array and to open, write and close each file only once.
Thank you for your answer TLP.

Related

Print variable after closing the file in Perl

Below code works fine but I want $ip to be printed after closing the file.
use strict;
use warnings;
use POSIX;
my $file = "/tmp/example";
open(FILE, "<$file") or die $!;
while ( <FILE> ) {
my $lines = $_;
if ( $lines =~ m/address/ ) {
my ($string, $ip) = (split ' ', $lines);
print "IP address is: $ip\n";
}
}
close(FILE);
sample data in /tmp/example file
$cat /tmp/example
country us
ip_address 192.168.1.1
server dell
This solution looks for the first line that contains ip_address followed by some space and a sequence of digits and dots
Wrapping the search in a block makes perl delete the lexical variable $fh. Because it is a file handle, that handle will also be automatically closed
Note that I've used autodie to avoid the need to explicitly check the status of the open call
This algorithm will find the first occurrence of ip_address and stop reading the file immediately
use strict;
use warnings 'all';
use autodie;
my $file = '/tmp/example';
my $ip;
{
open my $fh, '<', $file;
while ( <$fh> ) {
if ( /ip_address\h+([\d.]+)/ ) {
$ip = $1;
last;
}
}
}
print $ip // 'undef', "\n";
output
192.168.1.1
Store all ips in an array and you'll then have it for later processing.
The shown code can also be simplified a lot. This assumes a four-number ip and data like that shown in the sample
use warnings;
use strict;
use feature 'say';
my $file = '/tmp/example';
open my $fh, '<', $file or die "Can't open $file: $!";
my #ips;
while (<$fh>) {
if (my ($ip) = /ip_address\s*(\d+\.\d+\.\d+\.\d+)/) {
push #ips, $ip;
}
}
close $fh;
say for #ips;
Or, once you open the file, process all lines with a map
my #ips = map { /ip_address\s*(\d+\.\d+\.\d+\.\d+)/ } <$fh>;
The filehandle is here read in a list context, imposed by map, so all lines from the file are returned. The block in map applies to each in turn, and map returns a flattened list with results.
Some notes
Use three-argument open, it is better
Don't assign $_ to a variable. To work with a lexical use while (my $line = <$fh>)
You can use split but here regex is more direct and it allows you to assign its match so that it is scoped. If there is no match the if fails and nothing goes onto the array
use warnings;
use strict;
my $file = "test";
my ( $string,$ip);
open my $FH, "<",$file) or die $!;
while (my $lines = <FH>) {
if ($lines =~ m/address/){
($string, $ip) = (split ' ', $lines);
}
}
print "IP address is: $ip\n";
This will give you the output you needed. But fails in the case of multiple IP match lines in the input file overwrites the last $ip variable.

Merge two files based on the starting of the line

I want to merge two files into one using perl. Below are the sample files.
***FILE 1***
XDC123
XDC456
XDC678
BB987
BB654
*** FILE 2 ***
XDC876
XDC234
XDC789
BB456
BB678
And I want the merged file to look like:
***MERGED FILE***
XDC123
XDC456
XDC678
XDC876
XDC234
XDC789
BB987
BB654
BB456
BB678
For the above functionality I have written the below perl script snippet:
#!/usr/bin/env perl;
use strict;
use warnings;
my $file1 = 'C:/File1';
my $file2 = 'C:/File2';
my $file3 = 'C:/File3';
open( FILEONE, '<$file1' );
open( FILETWO, '<$file2' );
open( FILETHREE, '>$file3' );
while (<FILEONE>) {
if (/^XDC/) {
print FILETHREE;
}
if (/^BB/) {
last;
}
}
while (<FILETWO>) {
if (/^XDC/) {
print FILETHREE;
}
if (/^BB/) {
last;
}
}
while (<FILEONE>) {
if (/^BB/) {
print FILETHREE;
}
}
while (<FILETWO>) {
if (/^BB/) {
print FILETHREE;
}
}
close($file1);
close($file2);
close($file3);
But the merged file that is generated from the above code looks like:
***FILE 3***
XDC123
XDC456
XDC678
XDC876
XDC234
XDC789
BB654
BB678
The first line that starts from BB is missed out from both the files. Any help on this will be appreciated. Thank you.
The problem is, you iterate each file to the end, but never 'rewind' for if you're wanting to start over.
So your while ( <FILEONE> ) { line consumes (and discards) the first line that matches m/^BB/ - the last exits the "while" loop, but only after it's already read the line.
However that's assuming you get your open statements right, because:
open( FILEONE, '>$file1' );
Actually empties it, it doesn't read from it. So I am assuming you've transposed your code, and introduced new errors whilst doing so.
As a style point - you should really use 3 argument open, with lexical filehandles.
So instead:
#!/usr/bin/env perl;
use strict;
use warnings;
my $file1 = 'C:/File1';
my $file2 = 'C:/File2';
my $file3 = 'C:/File3';
my #lines;
foreach my $file ( $file1, $file2 ) {
open( my $input, '<', $file ) or die $!;
push( #lines, <$input> );
close($input);
}
open( my $output, '>', $file3 ) or die $!;
print {$output} sort #lines;
close($output)
(Although as noted in the comments - if that's all you want to do, the unix sort utility is probably sufficient).
However, if you need to preserve the numeric ordering, whilst sorting on the alphabetical, you need a slightly different data structure:
#!/usr/bin/env perl;
use strict;
use warnings;
my $file1 = 'C:/File1';
my $file2 = 'C:/File2';
my $file3 = 'C:/File3';
my %lines;
foreach my $file ( $file1, $file2 ) {
open( my $input, '<', $file ) or die $!;
while ( my $line = <$file> ) {
my ( $key ) = $line =~ m/^(\D+)/;
push %{$lines{$key}}, $line;
}
close($input);
}
open( my $output, '>', $file3 ) or die $!;
foreach my $key ( sort keys %lines ) {
print {$output} #{$lines{$key}};
}
close($output)

What produces the white space in my perl programm?

As the title says, I have a program or better two functions to read and write a file either in an array or to one. But now to the mean reason why I write this: when running my test several times my test program that tests my functions produces more and more white space. Is there somebody that could explain my fail and correct me?
my code
Helper.pm:
#!/usr/bin/env perl
package KconfCtl::Helper;
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
return #rray;
}
sub array_to_file($$;$) {
my #rray = #{ shift() };
my $file = shift();
my $mode = shift();
$mode='>' if not $mode;
my $filestream;
if ( not defined $file ) {
$filestream = STDOUT;
}
else {
open( $filestream, $mode, $file ) or die("cant open $file: $!");
}
my $l = #rray; print $l,"\n";
foreach my $line (#rray) {
print $filestream "$line\n";
}
close($filestream);
}
1;
test_helper.pl:
use KconfCtl::Helper;
use strict;
my #t;
#t= KconfCtl::Helper::file_to_array("kconf.test");
#print #t;
my $t_index=#t;
#t[$t_index]="n";
KconfCtl::Helper::array_to_file(\#t, "kconf.test", ">");
the result after the first:
n
and the 2nd run:
n
n
When you read from a file, the data includes the newline characters at the end of each line. You're not stripping those off, but you are adding an additional newline when you output your data again. That means your file is gaining additional blank lines each time you read and write it
Also, you must always use strict and use warnings 'all' at the top of every Perl script; you should avoid using subroutine prototypes; and you should declare all of your variables as late as possible
Here's a more idiomatic version of your module code which removes the newlines on input using chomp. Note that you don't need the #! line on the module file as it won't be run from the command line, but you my want it on the program file. It's also more normal to export symbols from a module using the Exporter module so that you don't have to qualify the subroutine names by prefixing them with the full package name
use strict;
use warnings 'all';
package KconfCtl::Helper;
sub file_to_array {
my ($file) = #_;
open my $fh, '<', $file or die qq{Can't open "$file" for input: $!}; #'
chomp(my #array = <$fh>);
return #array;
}
sub array_to_file {
my ($array, $file, $mode) = #_;
$mode //= '>';
my $fh;
if ( $file ) {
open $fh, $mode, $file or die qq{Can't open "$file" for output: $!}; #'
}
else {
$fh = \*STDOUT;
}
print $fh $_, "\n" for #$array;
}
1;
and your test program would be like this
#!/usr/bin/env perl
use strict;
use warnings 'all';
use KconfCtl::Helper;
use constant FILE => 'kconf.test';
my #t = KconfCtl::Helper::file_to_array(FILE);
push #t, 'n';
KconfCtl::Helper::array_to_file(\#t, FILE);
When you read in from your file, you need to chomp() the lines, or else the \n at the end of the line is included.
Try this and you'll see what's happening:
use Data::Dumper; ## add this line
sub file_to_array($) {
my $file = shift();
my ( $filestream, $string );
my #rray;
open( $filestream, '<', $file ) or die("cant open $file: $!");
#rray = <$filestream>;
close($filestream);
print Dumper( \#rray ); ### add this line
return #rray;
}
you can add
foreach(#rray){
chomp();
}
into your module to stop this happening.

Loop through file in perl and remove strings with less than 4 characters

I am trying to bring a file loop through it and remove any strings that have less than four characters in it and then print the list. I come from a javascript world and perl is brand new to me.
use strict;
use warnings;
sub lessThan4 {
open( FILE, "<names.txt" );
my #LINES = <FILE>;
close( FILE );
open( FILE, ">names.txt" );
foreach my $LINE ( #LINES ) {
print FILE $LINE unless ( $LINE.length() < 4 );
}
close( FILE );
}
use strict;
use warnings;
# automatically throw exception if open() fails
use autodie;
sub lessThan4 {
my #LINES = do {
# modern perl uses lexical, and three arg open
open(my $FILE, "<", "names.txt");
<$FILE>;
};
# remove newlines
chomp(#LINES);
open(my $FILE, ">", "names.txt");
foreach my $LINE ( #LINES ) {
print $FILE "$LINE\n" unless length($LINE) < 4;
# possible alternative to 'unless'
# print $FILE "$LINE\n" if length($LINE) >= 4;
}
close($FILE);
}
You're basically there. I hope you'll find some comments on your code useful.
# Well done for including these. So many new Perl users don't
use strict;
use warnings;
# Perl programs traditionally use all lower-case subroutine names
sub lessThan4 {
# 1/ You should use lexical variables for filehandles
# 2/ You should use the three-argument version of open()
# 3/ You should always check the return value from open()
open( FILE, "<names.txt" );
# Upper-case variable names in Perl are assumed to be global variables.
# This is a lexical variable, so name it using lower case.
my #LINES = <FILE>;
close( FILE );
# Same problems with open() here.
open( FILE, ">names.txt" );
foreach my $LINE ( #LINES ) {
# This is your biggest problem. Perl doesn't yet embrace the idea of
# calling methods to get properties of a variable. You need to call
# length() as a function.
print FILE $LINE unless ( $LINE.length() < 4 );
}
close( FILE );
}
Rewriting to take all that into account, we get the following:
use strict;
use warnings;
sub less_than_4 {
open( my $in_file_h, '<', 'names.txt' ) or die "Can't open file: $!";
my #lines = <$in_file_h>;
close( $in_file_h );
open( my $out_file_h, '>', 'names.txt' ) or die "Can't open file: $!";
foreach my $line ( #lines ) {
# Note: $line will include the newline character, so you might need
# to increase 4 to 5 here
print $out_file_h $line unless length $line < 4;
}
close( $out_file_h );
}
I am trying to bring a file loop through it and remove any strings that have less than four characters in it and then print the list.
I suppose you need to remove strings from the file which are less than 4 chars in length.
#!/usr/bin/perl
use strict;
use warnings;
open ($FH, "<", "names.txt");
my #final_list;
while (my $line = <$FH>) {
map {
length($_) > 4 and push (#final_list, $_) ;
} split (/\s/, $line);
}
print "\nWords with more than 4 chars: #final_list\n";
#Please try this one:
use strict;
use warnings;
my #new;
while(<DATA>)
{
#Push all the values less than 4 characters
push(#new, $_) unless(length($_) > '4');
}
print #new;
__DATA__
Williams
John
Joe
Lee
Albert
Francis
Sun

Extract data from file

I have data like
"scott
E -45 COLLEGE LANE
BENGALI MARKET
xyz -785698."
"Tomm
D.No: 4318/3,Ansari Road, Dariya Gunj,
xbc - 289235."
I wrote one Perl program to extract names i.e;
open(my$Fh, '<', 'printable address.txt') or die "!S";
open(my$F, '>', 'names.csv') or die "!S";
while (my#line =<$Fh> ) {
for(my$i =0;$i<=13655;$i++){
if ($line[$i]=~/^"/) {
print $F $line[$i];
}
}
}
It works fine and it extracts names exactly .Now my aim is to extract address that is like
BENGALI MARKET
xyz -785698."
D.No: 4318/3,Ansari Road, Dariya Gunj,
xbc - 289235."
In CSV file. How to do this please tell me
There are a lot of flaws with your original problem. Should address those before suggesting any enhancements:
Always have use strict; and use warnings; at the top of every script.
Your or die "!S" statements are broken. The error code is actually in $!. However, you can skip the need to do that by just having use autodie;
Give your filehandles more meaningful names. $Fh and $F say nothing about what those are for. At minimum label them as $infh and $outfh.
The while (my #line = <$Fh>) { is flawed as that can just be reduced to my #line = <$Fh>;. Because you're going readline in a list context it will slurp the entire file, and the next loop it will exit. Instead, assign it to a scalar, and you don't even need the next for loop.
If you wanted to slurp your entire file into #line, your use of for(my$i =0;$i<=13655;$i++){ is also flawed. You should iterate to the last index of #line, which is $#line.
if ($line[$i]=~/^"/) { is also flawed as you leave the quote character " at the beginning of your names that you're trying to match. Instead add a capture group to pull the name.
With the suggested changes, the code reduces to:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'printable address.txt';
open my $outfh, '>', 'names.csv';
while (my $line = <$infh>) {
if ($line =~ /^"(.*)/) {
print $outfh "$1\n";
}
}
Now if you also want to isolate the address, you can use a similar method as you did with the name. I'm going to assume that you might want to build the whole address in a variable so you can do something more complicated with it than throwing them blindly at a file. However, mirroring the file setup for now:
use strict;
use warnings;
use autodie;
open my $infh, '<', 'printable address.txt';
open my $namefh, '>', 'names.csv';
open my $addressfh, '>', 'address.dat';
my $address = '';
while (my $line = <$infh>) {
if ($line =~ /^"(.*)/) {
print $namefh "$1\n";
} elsif ($line =~ /(.*)"$/) {
$address .= $1;
print $addressfh "$address\n";
$address = '';
} else {
$address .= $line;
}
}
Ultimately, no matter what you want to use your data for, your best solution is probably to output it to a real CSV file using Text::CSV. That way it can be imported into a spreadsheet or some other system very easily, and you won't have to parse it again.
use strict;
use warnings;
use autodie;
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1, eol => "\n" } )
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $infh, '<', 'printable address.txt';
open my $outfh, '>', 'address.csv';
my #data;
while (my $line = <$infh>) {
# Name Field
if ($line =~ /^"(.*)/) {
#data = ($1, '');
# End of Address
} elsif ($line =~ /(.*)"$/) {
$data[1] .= $1;
$csv->print($outfh, \#data);
# Address lines
} else {
$data[1] .= $line;
}
}