Correct way to use short form of "if" in perl - perl

I had written series of perl constructs using short form of if in perl as below,
( $psap[0] = sprintf( "%.4f", $psap[0] )
&& ( $psap[0] = "1:" . $psap[0] )
&& push( #all, $psap[0] ) )
if ( defined( $psap[0] ) );
( $psap[1] = sprintf( "%.4f", $psap[1] )
&& ( $psap[1] = "2:" . $psap[1] )
&& push( #all, $psap[1] ) )
if ( defined( $psap[1] ) );
I had faced some issues with this,
sprintf does not seems to be working ( Values are not rounded);
When I tried to print $psap[0] and $psap[1] value I was just getting 1 and 2 respectively and nothing else (#all contains the value of psap[0] as expected );
I agree that the code is not readable, I wanted some quick way to solve the issue that time. Later I wrote in full form of if which was working as expected.

Forget about the if... it's a red herring. Focus on this:
$psap[1] = sprintf( "%.4f", $psap[1] )
&& ( $psap[1] = "2:" . $psap[1] )
&& push( #all, $psap[1] )
That's essentially:
$psap[1] = X() && Y() && Z();
So you're setting $psap[1] to the result of a boolean && operation on three operands.
Either wrap your assignment in parentheses, like this:
( $psap[1] = sprintf("%.4f", $psap[1]) )
&& ( $psap[1] = "2:" . $psap[1] )
&& push( #all, $psap[1] )
Or use the low-precedence and operator:
$psap[1] = sprintf("%.4f", $psap[1])
and $psap[1] = "2:" . $psap[1]
and push(#all, $psap[1])

Your code is a dreadful misuse of the if statement modifier, which was never meant to control anything more than the simplest of statements.
I suggest you code it like this, which is far clearer
for my $i (0, 1) {
next unless defined $psap[$i];
$psap[$i] = sprintf '%d:%.4f', $i + 1, $psap[$i];
push #all, $psap[$i];
}

Logical operators should be avoided if you don't really want to depend on their result, and just want to sequentially execute your code, thus
$psap[1] = sprintf("%.4f", $psap[1]), $psap[1] ="2:$psap[1]", push(#all, $psap[1])
if defined $psap[1];
or shorter,
defined and $_ = sprintf("%.4f", $_), $_ ="2:$_", push(#all, $_)
for $psap[1];
or shorter,
defined and $_ = sprintf("2:%.4f", $_), push(#all, $_)
for $psap[1];

The assignment to $psap[0] is of lower precedence than the &&, so it is being assigned the value of
sprintf( "%.4f", $psap[0] ) && ( $psap[0] = "1:" . $psap[0] ) && push( #all, $psap[0] )
Just wrap it in some extra parentheses:
#!/usr/bin/env perl
use strict;
use warnings;
my #psap = ( 0.123456789 );
my #all;
( ( $psap[0] = sprintf( "1:%.4f", $psap[0] ) )
&& push( #all, $psap[0] )
)
if ( defined( $psap[0] ) );
print $psap[0], "\n";
I also took the liberty of putting the "1:" into the sprintf rather than building the string in two steps.
As shown in the operator precedence and associativity table, and is of a much lower precedence and could be used instead:
$psap[0] = sprintf( "1:%.4f", $psap[0] ) and push( #all, $psap[0] )
if ( defined( $psap[0] ) );
output:
1:0.1235

Related

Hash assignment as array

I'm trying to understand the piece of code below; I just cannot understand what is being done in line 15.
It seems like it is trying to initialise/assign to %heading but I am just not sure how that syntax works.
$strings = [qw(city state country language code )];
my $file = "fname";
my $fn = $strings;
my $c = 0;
open( FILEH, "< ${file}.txt" ) or die( $! );
while ( <FILEH> ) {
my %heading;
chomp;
$c++;
#heading{ ( #$fn, "One" ) } = split( /[|]/ ); # Line 15
if ( defined( $heading{"One"} ) ) {
my $One = $heading{"One"};
}
That's called a "slice". It assigns to several keys at once:
#hash{ $key1, $key2 } = ($value1, $value2);
is a shorter and faster way of doing
$hash{$key1} = $value1;
$hash{$key2} = $value2;
#$fn is the same as #{ $fn }, i.e. array dereference.

Perl script is producing the symbol  while converting an Excel file to CSV

We have a batch process in our system which will convert an Excel .xlsx file to CSV format using Perl. When it converts the CSV file it produces some symbols like Â, so I am not getting the expected result. Can some please help me how to use the same value as in the Excel file while converting to CSV?
Value in Excel file:
Unverifiable License Documentation NB Only
Value converted in CSV through Perl:
Unverifiable License Documentation – NB Only
I want to retain the same value that is in Excel while converting to CSV
Note: I used Encoding(UTF-8) while opening the file but even then it didn't work.
My Perl code
use Spreadsheet::XLSX;
use File::Basename;
use set_env_cfg;
use Date::Simple (':all');
use Math::Round;
$sts = open( INP, "< ${if}" );
#$sts = open (INP, '<:encoding(UTF-8)', ${if} );
#$sts = open (INP, '<:encoding(ISO-8859-1)', ${if} );
if ( $sts == 0 ) {
print LOG tmstmp() . ": Error opening input file\n";
close LOG;
print LOG "$ldlm\n";
`cp $lf $od`;
die;
}
print LOG "$ldlm\n";
print LOG tmstmp() . ": Conversion started for $if\n";
$oBook = Spreadsheet::XLSX->new($if);
foreach $WkS ( #{ $oBook->{Worksheet} } ) {
print LOG tmstmp() . ": Converting worksheet ----- " . $WkS->{Name}, "\n";
$cfgrec = ''; # initialize the configure record
$sts = open( OUT, ">$od/$WkS->{Name}.txt" );
if ( $sts == 0 ) {
print LOG tmstmp() . ": Error opening output file\n";
close LOG;
close INP;
print LOG "$ldlm\n";
`cp $lf $od`;
die;
}
$WkS->{MaxRow} ||= $WkS->{MinRow};
foreach $iR ( $WkS->{MinRow} .. $WkS->{MaxRow} ) {
$WkS->{MaxCol} ||= $WkS->{MinCol};
print OUT $cfgkey if ( ( $cfgko == 0 ) && ( $iR >= $hdrcnt ) );
foreach $iC ( $WkS->{MinCol} .. $WkS->{MaxCol} ) {
$cell = $WkS->{Cells}[$iR][$iC];
if ($cell) {
if ( ( $cell->{Type} ) eq "Date" ) {
if ( int( $cell->{Val} ) == ( $cell->{Val} ) ) {
$tmpval = date("1900-01-01") + ( $cell->{Val} ) - 2;
}
else {
$css = round( ( ( $cell->{Val} ) - int( $cell->{Val} ) ) * 86400 );
$cmi = int( $css / 60 );
$chr = int( $css / 3600 );
$css = $css - $cmi * 60;
$cmi = $cmi - $chr * 60;
$tmpval = date("1900-01-01") + int( $cell->{Val} ) - 2;
$tmpval .= " $chr:$cmi:$css";
}
}
else {
$tmpval = Spreadsheet::XLSX::Utility2007::unescape_HTML( $cell->{Val} );
}
print OUT $tmpval; ###Added double quotes in txt file to handle the comma delimiter value
}
if ( ( $iR == ${hdr_seq} - 1 ) ) {
if ( ( $cell->{Type} ) eq "Date" ) {
if ( int( $cell->{Val} ) == ( $cell->{Val} ) ) {
$tmpval = date("1900-01-01") + ( $cell->{Val} ) - 2;
}
else {
$css = round( ( ( $cell->{Val} ) - int( $cell->{Val} ) ) * 86400 );
$cmi = int( $css / 60 );
$chr = int( $css / 3600 );
$css = $css - $cmi * 60;
$cmi = $cmi - $chr * 60;
$tmpval = date("1900-01-01") + int( $cell->{Val} ) - 2;
$tmpval .= " $chr:$cmi:$css";
}
}
else {
$tmpval = Spreadsheet::XLSX::Utility2007::unescape_HTML( $cell->{Val} );
}
$cfgrec .= $tmpval;
}
if ( ( $iC == 0 ) && ( $iR == ${hdr_seq} ) ) {
$cfgrec = uc($cfgrec);
$cfgko = cnt_ocr( $cfgrec, $keyhdr );
$cfgkey = "*|" x ( $klm - $cfgko );
}
print OUT "|" if ( $iC < $WkS->{MaxCol} );
print OUT $cfgkey if ( ( $cfgko == $iC + 1 ) && ( $iR >= $hdrcnt ) );
}
print OUT "\n";
}
print LOG tmstmp() . ": Worsheet conversion completed successfully ----- " . $WkS->{Name}, "\n";
close OUT;
push #csv_file_lst, "$WkS->{Name}.txt";
}
print LOG tmstmp() . ": Conversion completed successfully for $if\n";
My guess is that your Excel file contains data encoded using the Windows-1252 code page that has been reencoded into UTF-8 without first being decoded
This string from your Excel file
Unverifiable License Documentation – NB Only
contains an EN DASH, which is represented as "\x96" in Windows-1252. If this is again encoded into UTF-8 the result is the two bytes "\xC2\x96". Interpreting this using Windows-1252 results in the two characters LATIN CAPITAL LETTER A WITH CIRCUMFLEX followed by EN DASH, which is what you're seeing
As far as I can tell, the only change necessary is to open your file with Windows-1252 decoding, like this
open my $fh, '<:encoding(Windows-1252)', $excel_file or die $!
Update
Your revised question shows your Perl code, but has removed the essential information from the Excel data that you show. This string
Unverifiable License Documentation NB Only
now has just two spaces between Documentation and NB and omits the "0x96" n-dash
Note — I've since restored the original data and tidied your code.
Your various attempts at opening the input file are here
$sts=open (INP, "< ${if}" );
#$sts=open (INP, '<:encoding(UTF-8)', ${if} );
#$sts=open (INP, '<:encoding(ISO-8859-1)', ${if} );
and you came very close with ISO-8859-1, but Microsft, in their wisdom, have reused the gaps in ISO-8859-1 encoding between 0x7F and 0x9F to represent printable characters in Windows-1252. The n-dash character at 0x96 is inside this range, so decoding your input as ISO-8859-1 won't render it correctly
As far as I can see, you just need to write
$sts = open (INP, '<:encoding(Windows-1252)', ${if} );
and your input data will be read correctly
You should also specify the encoding of your output file to avoid Wide character in print warnings and malformed data. I can't tell whether you want to duplicate the encoding of your Excel file, use UTF-8, or something else entirely, but you should change this
$sts = open( OUT, ">$od/$WkS->{Name}.txt" );
to either
$sts = open OUT, '>:encoding(Windows-1252)', "$od/$WkS->{Name}.txt";
or
$sts = open OUT, '>:encoding(UTF-8)', "$od/$WkS->{Name}.txt";
as appropriate
Note also that it is best practice to use the three-parameter form of open all the time, and it is best to use lexical file names instead of the global ones that you have. But this isn't a code review, so I've disregarded those points
I hope this underlines to you that it is vital to establish the encoding of your input data and decode it correctly. Guessing really isn't an option
Update
My apologies. I overlooked that the initial open is ignore by the Spreadsheet::XLSX module, which is passed a filename, rather than a file handle
This module is awkward in that it completely hides all character decoding, and relies on [Text::Iconv][Text::Iconv] to do the little conversion that it supports: something that is much better supported by Perl's own [Encode][Encode] module
The change I suggested to your open call is wrong, because it seems that a .xlsx file is a zipped file. However you never read from INP so it will make no difference. You should also close INP immediately after you have opened it as it is a wasted resource
Short of using a different module, the best thing I can suggest is that you hack the data returned by Spreadsheet::XLSX->new
This block will correct the erroneous re-encoding. I have added it right before your foreach $iR ( ... )` loop
You will need to add
use Encode qw/ decode :fallbacks /;
to the top of your code
Please let me know how you get on. Now I really must go!
{
my $columns = $WkS->{Cells};
for my $row ( #$columns ) {
next unless $row;
for my $cell ( #$row) {
next unless $cell and $cell->type eq 'Text';
for ( $cell->{_Value} ) {
$_ = decode('UTF-8', $_, FB_CROAK);
$_ = decode('Windows-1252', $_, FB_CROAK);
}
}
}
}

loop is not working after satisfying the condition

I have two files myresult and annotation. details of these files are as follows.
myresult:
288..639 1.13075739182609-6.20035408429888i
300..651 1.90372125344918-6.09008858828515i
312..663 1.6908117147722-5.67058877579329i
324..675 0.644484787809351-5.54571698740166i
336..687 1.21850904281332-5.47700589647424i
annotation:
272..1042
1649..2629
For loop is running only once after satisfying the If condition. It is not entering again in loop even after the condition is satisfied. for example, in file myresult 2nd line satisfies the condition i.e numbers 300..651 lies in the range of 1st line of file annotation, therefore it prints all values from 300 to 651. But when it need to go for next round(3rd line) i.e. 312..663 it is not entering this loop since this range also lies in 1st line of file annotation.
so the output should be numbers from 300 to 663, but it is printing from 300 to 651 only.
Code:
#!/usr/bin/perl
use Math::Complex;
open( $inp0, "<myresult" ) or die "not found";
open( $inp2, "<annotation" ) or die "not found";
my #arr2 = <$inp0>;
my #arr4 = <$inp2>;
my #result;
foreach my $line1 (#arr2) {
my ( $col1, $col2 ) = split( /\s/, $line1 );
if ( $col2 > 1.60 ) {
my ( $from1, $to1 ) = split( /\.\./, $col1 );
foreach my $line2 (#arr4) {
my ( $from2, $to2 ) = split( /\.\./, $line2 );
for ( my $i = $from1; $i <= $to1; $i++ ) {
for ( my $j = $from2; $j <= $to2; $j++ ) {
$res = grep( /$i/, #result );
if ( $i == $j && $res == 0 ) {
print "$i \n";
push( #result, $i );
}
}
}
}
}
}
Second columns of the file "myresult" contains complex numbers like "1.13075739182609-6.20035408429888i" for example.
Two complex numbers can not be compared. (Complex number is a vector of complex plane with real axis and imaginary axis. Two vectors could not be compared like integer.)
'($col2>1.60)' will be false for all of your data in the file "myresult". This is why for loop is not executed.
As Fumu said two complex numbers can not be compared.
If you need help with complex numbers in Perl then check out Math::Complex module.

Print in single line with consecutive elements

So I have an array like this:
W,X,Y,Z
-7,6,101,15
-7,6,101,16
-7,6,101,17
-7,6,101,78
-7,6,101,79
-7,6,101,84
-7,6,101,92
-7,9,615,49
-7,9,615,50
-7,10,759,38
-7,10,759,39
Now, I want to print a line when W=X=Y and Z are consecutive numbers.
Expected Output:
W,X,Y,Z
-7,6,101,15-16-17
-7,6,101,78-79
-7,6,101,84
-7,6,101,92
-7,9,615,49-50
-7,10,759,38-39
How do I implement this on Perl?
Thanks,
Art
Here is my script:
while ( $output_line = <FILE_C> ) {
chomp $output_line;
my ( $W, $X, $Y, $C, $D, $E, $F, $Z ) = ( split /\s/, $output_line );
if ( $Y == $Block_previous ) {
print("Yes\t$Block_previous\t$Y\t$Z\n");
push( #Z_array, $Z );
push( #Y_array, $Y );
next;
}
else {
push( #Z_array_nonblkmatch, $Z );
}
foreach $Z_printer (#Z_array) {
print("$Y_array[0]\t$Z_printer\n");
if ( ( $Z_delta == 1 ) || ( $Z_delta == -1 ) ) {
push( #Z_adj, $Z_printer, $Z_printer_prev );
#~ print ("pair: $Z_printer_prev-$Z_printer\n");
}
else {
#~ print ("$Z_printer\n");
}
$Z_printer_prev = $Z_printer;
}
#Z_adj = ();
#Z_array = ();
#Y_array = ();
#Z_array_nonblkmatch = ();
$Block_previous = $Y;
#~ <STDIN>;
}
close(FILE_C);
Thanks, raina77ow! However, this is what the output look like:
-7,6,101,15-16-17-79
-7,6,101,16-17-79
-7,6,101,17-79
-7,6,101,78-79
-7,6,101,79-50
-7,6,101,84-50
-7,6,101,92
-7,6,615,49-50-39
-7,6,615,50
One possible approach (ideone demo):
use warnings;
use strict;
my $prev;
while (<DATA>) {
chomp;
next unless /\S/;
my #numbers = split /,/;
if (defined $prev && $numbers[3] == $prev + 1) {
print '-' . ++$prev;
next;
}
print "\n" if defined $prev;
print join ',', #numbers;
$prev = $numbers[3];
}
__DATA__
-7,6,101,15
-7,6,101,16
-7,6,101,17
-7,6,101,78
-7,6,101,79
-7,6,101,84
-7,6,101,92
-7,9,615,49
-7,9,615,50
-7,10,759,38
-7,10,759,39
I choose not to collect this data into intermediate array, as you did, as the question was simple: print it grouped. The key is storing the value of the last (Z) column, then checking each new line against it: if it matches, you print just the incremented value (that's what print '-' . ++$prev line for), if not, you end this line (for all but the first case) and start a new one with the numbers of this line.

Help converting to subroutine

I have tried to convert my code into a series of subroutines to make it more modular. The conditional statements in the code below is what I can't incorporate into the subroutine.
next unless ( $sentblock =~ /\[sent. \d+ len. \d+\]: \[.+\]/ ); #1#
( $sentence, $sentencenumber ) = &sentence_sentnum_chptnum($sentblock); #SUBROUTINE
if ( $sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/ ) { #2#
$chapternumber = $1;
$chapternumber =~ tr/./_/;
}
next
unless ( $sentence =~ /\b\Q$search_key\E/i #3#
&& $sentence =~ /\b\Q$addkey0\E/i
&& $sentence =~ /\b\Q$addkey1\E/i );
next
if ( defined($exc0) #4#
&& length($exc0)
&& $sentence =~ /\b\Q$exc0\E\b/i );
next
if ( defined($exc1) #5#
&& length($exc1)
&& $sentence =~ /\b\Q$exc1\E\b/i );
The subroutine so far:
sub sentence_sentnum_chptnum {
my $subsentblock = shift;
my ( $subsentence, $subsentencenumber );
return unless ( $subsentblock =~ /\[sent. (\d+) len. \d+\]: \[(.+)\]/ ); #DIDN'T replace the need to put one in the main script
$subsentencenumber = $1;
$subsentence = $2;
$subsentence =~ s/, / /g;
return ( $subsentence, $subsentencenumber );
}
It works as is, but if I try putting the other conditional statements in: I get errors saying $sentence is uninitialized later in the code. Example: If I try to include the check of $addkey using the same condition, but just swapping next for return I get an error that $sentence is uninitialized in the line: if ( $sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/ ) { And likewise if I put any of those conditions into the subroutine.
Main Question: How can I:
(1) get rid of next unless ( $sentblock =~ /\[sent. \d+ len. \d+\]: \[.+\]/ ); (it's in the subroutine too)
(2) Include: if ( $sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/ ) & all 3 next statements
(3) Since it's included, also return $chapternumber
Without affecting my code?
General Best Practice Question: If I have variables defined at the top of my code (from an HTML form) is it better practice to localize them each time in every subroutine, or just not pass anything into the subroutine, and use the value assigned at the beginning of the code? (Ex. $search_key, $addkey and $exc)?
Test Case I have made a test case, however it is pretty long, so I didn't include it. If you need one, it is very similar to: http://perlmonks.org/?node_id=912276 just find where the subroutine takes over and delete that part... It's right after foreach my $sentblock (#parsed).
Note: The test case does not include addkey or exc, and nothing will match the chapternumber (put '~ 5.5' in front of one sentence to include it)
I've tried checking the returned $sentence in the main program. This eliminates the error, but there are no matches for the rest of the program (ie. The end result of the search engine is 0 results).
Thanks, let me know if anything is unclear.
How much do you want to break things down? It's hard to see what the "best" or "right" way to split things up is without more code.
In general, if you go through your code and add comments describing what each block of code does, you could just as readily replace each commented block with a sub that has a name that recaps the sentence:
# Is this a sentence block?
next unless ( $sent_block =~ /\[sent. \d+ len. \d+\]: \[.+\]/ );
#1#
my ( $sentence, $sentence_number ) = parse_sentence_block($sent_block);
# Get chapter info if present
if ( $sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/ ) { #2#
$chapter_number = $1;
$chapter_number =~ tr/./_/;
}
# Skip if key found
next
unless ( $sentence =~ /\b\Q$search_key\E/i #3#
&& $sentence =~ /\b\Q$addkey0\E/i
&& $sentence =~ /\b\Q$addkey1\E/i );
# skip if excrescence 0 (or whatever exc is short for)
next
if ( defined($exc0) #4#
&& length($exc0)
&& $sentence =~ /\b\Q$exc0\E\b/i );
# skip if excrescence 1.
next
if ( defined($exc1) #5#
&& length($exc1)
&& $sentence =~ /\b\Q$exc1\E\b/i );
Now take these comments and make them into subs:
next unless is_sentence_block( $sent_block );
my( $sentence, $sentence_number ) = parse_sentence_block($sent_block);
# Maybe update the chapter number
my $new_chapter_number = get_chapter_number( $sentence );
$chapter_number = $new_chapter_number if defined $new_chapter_number;
next unless have_all_keys( $sentence => $search_key, $add_key0, $add_key1 );
next if have_excrescence( $exc0 );
next if have_excrescence( $exc1 );
sub is_sentence_block {
my $block = shift;
return $sent_block =~ /\[sent. \d+ len. \d+\]: \[.+\]/ );
}
sub get_chapter_number {
my $sentence = shift;
return unless $sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/;
return $1;
}
sub have_all_keys {
my $sentence = shift;
my #keys = #_;
for my $key ( #keys ) {
return unless $sentence =~ /\b\Q$key1\E/i;
}
return 1
}
sub have_excrescence {
my $sentence = shift;
my $exc = shift;
return 0 unless defined($exc);
return 0 unless length($exc)
return 0 unless $sentence =~ /\b\Q$exc\E\b/i );
return 1;
}
Try this approach (some of this code may look familiar to you ;-) ):
sub extractSentenceAndPositions {
my $sentenceBlock = shift;
my ($sentence, $sentenceNumber, $chapterNumber) = ("", "", "");
if ($sentenceBlock =~ /\[sent. (\d+) len. \d+\]: \[(.+)\]/) {
$sentenceNumber = $1;
$sentence = $2;
$sentence =~ s/, / /g;
if ($sentence =~ /\~\s(\d*F*[\.I_]\w+)\s/) { #2#
$chapterNumber = $1;
$chapterNumber =~ tr/./_/;
}
# Turning the original 'next-unless' chain into a conditional
# which zeroes out the return values instead
if ( !( $sentence =~ /\b\Q$search_key\E/i #3#
&& $sentence =~ /\b\Q$addkey0\E/i
&& $sentence =~ /\b\Q$addkey1\E/i )
||
!( defined($exc0) #4#
&& length($exc0)
&& $sentence =~ /\b\Q$exc0\E\b/i )
||
!( defined($exc1) #5#
&& length($exc1)
&& $sentence =~ /\b\Q$exc1\E\b/i )
) {
($sentence, $sentenceNumber, $chapterNumber) = ("", "", "");
}
}
return ($sentence, $sentenceNumber, $chapterNumber);
}
Then, replace your first listing with...
($sentence, $sentenceNumber, $chapterNumber) = extractSentenceAndPositions($sentblock);
next if (!$sentence || !$sentenceNumber || !$chapterNumber);
Regarding your best practices question, I would say for this use case (cgi vars and the like), where those values are almost certainly not going to change, I'd refer to them directly. The basic concept I generally follow is to scrub them once at the beginning of the run (by which I mean sanitize away any SQL injections, XSS, XSRF, shell injections, or other such nastiness in the values) and from then on treat them as read-only globals. I've heard other opinions on the subject, but that's what I usually do.
As far as checking the returned $sentence in the main program somehow destroying all the other matches, I'm not sure how that would happen unless there's something else going on. I've used this approach (next or last based on returned values) in numerous scripts, and there's nothing inherently destructive about it.