How to count the non-blank cell regardless it's format in perl? - perl

I'm trying to count the non-empty cell in Perl.
The problem is that cell_index counts up to 9 but I expected 8.
I think it counts with formatted cell too.
How do I count the cell containing only string or numbers not formatted empty cell in Perl?
use Spreadsheet::ParseXLSX;
use XML::LibXML;
...
while ( $worksheet->get_cell($cell_index , 1) ne '' ) {
$cell_index++;
print $cell_index ,"\n";
}

It depends what you mean by "empty". Your code checks the contents of the cell against the empty string. But that's not the only thing that might look empty. Just off the top of my head, a cell might appear empty if it contains:
Absolutely nothing (which might be represented in Perl code by undef)
The empty string (the case that you cover)
One or more white-space characters (spaces, tabs, stuff like that)
The safest test might be to count cells that contain no non-whitespace characters - which you can do easily with a regex.
$worksheet->get_cell($cell_index , 1) =~ /\S/

Related

How can I preserve the uppercase/lower case of a string in search using perl?

I want to search for "Frequencies" (its first letter in uppercase) in my text files. And my code will print to the output file some columns including "Frequencies". But there are also occurrences of "frequencies" (its first letter in lowercase) in the text files. I am using this part $search_word = qr/Frequencies/; in the code. How can I make the first letter of the word "Frequencies" upper case in the $search_word = qr/Frequencies/; part to eliminate the occurrences of "frequencies" in the search?
In Perl, you have ucfirst to capitalize the first letter. For example:
$a = "freQuEncY";
$a = ucfirst(lc($a)); # $a <-- "Frequency";
Why don't you use regex match to check , like this
if($string_to_be_searched =~ /Frequencies/){
do something; # like print
}
Try this one:
if ( $$test_string[$i] =~ /\b(?i)f(?-i)requencies/ ) {
my $captured = ucfirst($&);
# process $captured
}
Explanation:
The regex matches will be case-insensitive for the first letter of the word frequencies only. (?i) turns on case-insensitive matching at the position it occurs for the remainder of the pattern or until it is revoked by (?-i). This works for other flags too, cf. perldoc section on re.
$& contains the full match
\b denotes a word boundary (perhaps you don't need that but your problem description suggests you do).

Odd behavior with split in perl when result includes empty strings

This is my perl 5.16 code
while(<>) {
chomp;
#data = split /a/, $_;
print(join("b",#data),"\n");
}
If I input a file with this in it:
paaaa
paaaaq
I get
p
pbbbbq
But I was expecting
pbbbb
pbbbbq
Why am I wrong to expect the latter behavior?
It is documented that trailing empties are removed unless you specify a third, non-zero argument.
If LIMIT is omitted (or, equivalently, zero), then it is usually treated as if it were instead negative but with the exception that trailing empty fields are stripped (empty leading fields are always preserved)
You want
split /a/, $_, -1;
Take a look at the LIMIT parameter in the split perldoc:
http://perldoc.perl.org/functions/split.html
The relevant section is:
If LIMIT is negative, it is treated as if it were instead arbitrarily large; as many fields as possible are produced.
If LIMIT is omitted (or, equivalently, zero), then it is usually treated as if it were instead negative but with the exception that trailing empty fields are stripped (empty leading fields are always preserved); if all fields are empty, then all fields are considered to be trailing (and are thus stripped in this case).
So to get the behavior you're expecting, try:
while(<>) {
chomp;
#data = split /a/, $_, -1;
print(join("b",#data),"\n");
}
Because after splitting paaaa , you got an array #data that has only one elemet p in it.
Maybe substitution is better:
while(<>) {
chomp;
$_=~s/a/b/g;
print($_,"\n");
}

index argument contains . perl

If a string contains . representing any character, index doesn't match on it. What to do so that it takes . as any character?
For ex,
index($str, $substr)
if $substr contains . anywhere, index will always return -1
thanks
carol
That is not possible. The documentation says:
The index function searches for one string within another, but without
the wildcard-like behavior of a full regular-expression pattern match.
...
The keywords, you can use for further googlings are:
perl regular expression wildcard
Update:
If you just want to know, if your string matches, using a regular expression could look like that:
my $string = "Hello World!";
if( $string =~ /ll. Worl/ )
{
print "Ahoi! Position: ".($-[0])."\n";
}
This is matching a single character.
$-[0] is the offset into the string of the beginning of the entire
match.
-- http://perldoc.perl.org/perlvar.html
If you want to have a pattern, that is matching an arbitary amount of arbitary characters, you could choose a pattern like...
...
if( $string =~ /ll.*orl/ )
{
...
See perlvar for further information about special perl variables. You will find the variable #LAST_MATCH_START and some explanation about $-[0] over there. There are several more variables, that can help you to find sub matches and to gather other interessting information about your matches...
From perldoc -f index, you can see index() doesn't have any regex syntax:
index STR,SUBSTR
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-
expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If
POSITION is omitted, starts searching from the beginning of the string. POSITION before the beginning of the string or after
its end is treated as if it were the beginning or the end, respectively. POSITION and the return value are based at 0 (or
whatever you've set the $[ variable to--but don't do that). If the substring is not found, "index" returns one less than the
base, ordinarily "-1"
A simple test:
$ perl -e 'print index("1234567asdfghj.","j.")'
13
Use regex:
$str =~ /$substr/g;
$index = pos();

What does the Perl split function return when there is no value between tokens?

I'm trying to split a string using the split function but there isn't always a value between tokens.
Ex: ABC,123,,,,,,XYZ
I don't want to skip the multiple tokens though. These values are in specific positions in the string. However, when I do a split, and then try to step through my resulting array, I get "Use of uninitialized value" warnings.
I've tried comparing the value using $splitvalues[x] eq "" and I've tried using defined($splitvalues[x]) , but I can't for the life of me figure out how to identify what the split function is putting in to my array when there is no value between tokens.
Here's the snippet of my code (now with more crunchy goodness):
my #matrixDetail = ();
#some other processing happens here that is based on matching data from the
##oldDetail array with the first field of the #matrixLine array. If it does
#match, then I do the split
if($IHaveAMatch)
{
#matrixDetail = split(',', $matrixLine[1]);
}
else
{
#matrixDetail = ('','','','','','','');
}
my $newDetailString =
(($matrixDetail[0] eq '') ? $oldDetail[0] : $matrixDetail[0])
. (($matrixDetail[1] eq '') ? $oldDetail[1] : $matrixDetail[1])
.
.
.
. (($matrixDetail[6] eq '') ? $oldDetail[6] : $matrixDetail[6]);
because this is just snippets, I've left some of the other logic out, but the if statement is inside a sub that technically returns the #matrixDetail array back. If I don't find a match in my matrix and set the array equal to the array of empty strings manually, then I get no warnings. It's only when the split populates the #matrixDetail.
Also, I should mention, I've been writing code for nearly 15 years, but only very recently have I needed to work with Perl. The logic in my script is sound (or at least, it works), I'm just being anal about cleaning up my warnings and trying to figure out this little nuance.
#!perl
use warnings;
use strict;
use Data::Dumper;
my $str = "ABC,123,,,,,,XYZ";
my #elems = split ',', $str;
print Dumper \#elems;
This gives:
$VAR1 = [
'ABC',
'123',
'',
'',
'',
'',
'',
'XYZ'
];
It puts in an empty string.
Edit: Note that the documentation for split() states that "by default, empty leading fields are preserved, and empty trailing ones are deleted." Thus, if your string is ABC,123,,,,,,XYZ,,,, then your returned list will be the same as the above example, but if your string is ,,,,ABC,123, then you will have a list with three empty strings in elements 0, 1, and 2 (in addition to 'ABC' and '123').
Edit 2: Try dumping out the #matrixDetail and #oldDetail arrays. It's likely that one of those isn't the length that you think it is. You might also consider checking the number of elements in those two lists before trying to use them to make sure you have as many elements as you're expecting.
I suggest to use Text::CSV from CPAN. It is a ready made solution which already covers all the weird edge cases of parsing CSV formatted files.
delims with nothing between them give empty strings when split. Empty strings evaluate as false in boolean context.
If you know that your "details" input will never contain "0" (or other scalar that evaluates to false), this should work:
my #matrixDetail = split(',', $matrixLine[1]);
die if #matrixDetail > #oldDetail;
my $newDetailString = "";
for my $i (0..$#oldDetail) {
$newDetailString .= $matrixDetail[$i] || $oldDetail[$i]; # thanks canSpice
}
say $newDetailString;
(there are probably other scalars besides empty string and zero that evaluate to false but I couldn't name them off the top of my head.)
TMTOWTDI:
$matrixDetail[$_] ||= $oldDetail[$_] for 0..$#oldDetail;
my $newDetailString = join("", #matrixDetail);
edit: for loops now go from 0 to $#oldDetail instead of $#matrixDetail since trailing ",,," are not returned by split.
edit2: if you can't be sure that real input won't evaluate as false, you could always just test the length of your split elements. This is safer, definitely, though perhaps less elegant ^_^
Empty fields in the middle will be ''. Empty fields on the end will be omitted, unless you specify a third parameter to split large enough (or -1 for all).

What's happening in this Perl foreach loop?

I have this Perl code:
foreach (#tmp_cycledef)
{
chomp;
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
$cycle_code =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$close_day =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$first_date =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
#print "$cycle_code, $close_day, $first_date\n";
$cycledef{$cycle_code} = [ $close_day, split(/-/,$first_date) ];
}
The value of tmp_cycledef comes from output of an SQL query:
select cycle_code,cycle_close_day,to_char(cycle_first_date,'YYYY-MM-DD')
from cycle_definition d
order by cycle_code;
What exactly is happening inside the for loop?
Huh, I'm surprised no one fixed it for you :)
It looks like the person who wrote this was trying to trim leading and trailing whitespace from each field. It's a really odd way to do that, and for some reason he was overly concerned with interior whitespace in each field despite his anchors.
I think that should be the same as trimming the whitespace around the delimiter in the split:
foreach (#tmp_cycledef)
{
s/^\s+//; s/$//; #leading and trailing whitespace on the whole string
my ($cycle_code, $close_day, $first_date) = split(/\s*\|\s*/, $_, 3);
$cycledef{$cycle_code} = [ $close_day, split(/-/,$first_date) ];
}
The key to thinking about split is considering which parts of the string you want to throw away, not just what separates the fields that you want.
For regex part, s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/ do stripping of leading and trailing whitespaces
Each row in #tmp_cycledef is composed of a string formatted following "cycle_code | close_day | first_date".
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
Split the string into three parts. The following regular expressions are used to strip leading and trailing whitespaces.
The last instruction of the loop creates an entry in the dictionary $cycledef indexed by $cycle_code. The entry is formated is formatted using the following scheme:
[ $close_day, YYYY, MM, DD ]
where $first_date = "YYYY-MM-DD".
#tmp_cycledef: The output of the sql query is stored in this array
foreach (#tmp_cycledef) : For every element in this array.
chomp : remove the \n char from the end of every element.
my ($cycle_code, $close_day, $first_date) = split(/\|/, $_,3);
split the elements into 3 parts and assign the variable to each of the splited element. parts of split are "split(/PATTERN/,EXPR,LIMIT)"
$cycle_code =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$close_day =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$first_date =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
This regex part is sripping of leading and trailing whitespaces from each variable.
my god, it's been such a long time since I've read perl... but I'll give it a shot.
you grab a record from #tmp_cycledef, and chomp off the newline at the end, and split it up into the three variables: then, like S.Mark said, each substitution regex strips off the leading and trailing whitespace for each of the three variable. Finally, the values get pushed into a hash as a list, with some debugging code commented out right above it.
hth
Your query gives a set of rows that
are stored in the array
#tmp_cycledef.
We iterate over each row in the
result using: foreach
(#tmp_cycledef).
The result rows might have trailing
newline char, we get rid of them
using chomp.
Next we split the row (which is not
in $_) on the pipe and assign the
first 3 pieces to $cycle_code,
$close_day and $first_date
respectively.
The split pieces might have leading
and trailing white spaces, the next 3
lines are to remove the leading and
trailing white space in the 3
variables.
Finally we make an entry into the
hash %cycledef. The key use is
$cycle_code and the value is an
array whose first element is
$close_day and rest of the elements
are pieces got after splitting
$first_date on hyphen.