Why does split return an array with every second element empty? - perl

I'm trying to split a string every 5 characters. The array I'm getting back from split isn't how I'm expecting it: all the even indexes are empty, the parts I'm looking for are on odd indexes.
This version doesn't output anything:
use warnings;
use strict;
my #ar = <DATA>;
foreach (#ar){
my #mkh = split (/(.{5})/,$_);
print $mkh[2];
}
__DATA__
aaaaabbbbbcccccdddddfffff
If I replace the print line with this (odd indexes 1 and 3):
print $mkh[1],"\n", $mkh[3];
The output is the first two parts:
aaaaa
bbbbb
I don't understand this, I expected to be able to print the first two parts with this:
print $mkh[0],"\n", $mkh[1];
Can someone explain what is wrong in my code, and help me fix it?

The first argument in split is the pattern to split on, i.e. it describes what separates your fields. If you put capturing groups in there (as you do), those will be added to the output of the split as specified in the split docs (last paragraph).
This isn't what you want - your separator isn't a group of five characters. You're looking to split a string every X characters. For that, better use:
my #mkh = (/...../g);
# or
my #mkh = (/.{5}/g);
or one of the other options you'll find in: How can I split a string into chunks of two characters each in Perl?

Debug using Data::Dump
To observe exactly what your split operation is doing, use a module like Data::Dump:
use warnings;
use strict;
while (<DATA>) {
my #mkh = split /(.{5})/;
use Data::Dump;
dd #mkh;
}
__DATA__
aaaaabbbbbcccccdddddfffff
Outputs:
("", "aaaaa", "", "bbbbb", "", "ccccc", "", "ddddd", "", "fffff", "\n")
As you can see, your code is splitting on groups of 5 characters, and leaving empty strings between them. This is obviously not what you want.
Use Pattern Matching instead
Instead, you simply want to capture groups of 5 characters. Therefore, you just need a pattern match with the /g Modifier:
use warnings;
use strict;
while (<DATA>) {
my #mkh = /(.{5})/g;
use Data::Dump;
dd #mkh;
}
__DATA__
aaaaabbbbbcccccdddddfffff
Outputs:
("aaaaa", "bbbbb", "ccccc", "ddddd", "fffff")

You can also use zero-width delimiter, which can be described as split string at places which are in front of 5 chars (by using \K positive look behind)
my #mkh = split (/.{5}\K/, $_);

Related

How to combine two regex pattern in perl?

I want to combine two regex pattern to split string and get a table of integers.
this the example :
$string= "1..1188,1189..14,14..15";
$first_pattern = /\../;
$second_pattern = /\,/;
i want to get tab like that:
[1,1188,1189,14,14,15]
Use | to connect alternatives. Also, use qr// to create regex objects, using plain /.../ matches against $_ and assigns the result to $first_pattern and $second_pattern.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = '1..1188,1189..14,14..15';
my $first_pattern = qr/\.\./;
my $second_pattern = qr/,/;
my #integers = split /$first_pattern|$second_pattern/, $string;
say for #integers;
You probably need \.\. to match two dots, as \.. matches a dot followed by anything but a newline. Also, there's no need to backslash a comma.

Perl: Replace consecutive spaces in this given scenario?

an excerpt of a big binary file ($data) looks like this:
\n1ax943021C xxx\t2447\t5
\n1ax951605B yyy\t10400\t6
\n1ax919275 G2L zzz\t6845\t6
The first 25 characters contain an article number, filled with spaces. How can I convert all spaces between the article numbers and the next column into a \x09 ? Note the one or more spaces between different parts of the article number.
I tried a workaround, but that overwrites the article number with ".{25}xxx»"
$data =~ s/\n.{25}/\n.{25}xxx/g
Anyone able to help?
Thanks so much!
Gary
You can use unpack for fixed width data:
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Useqq=1;
print Dumper $_ for map join("\t", unpack("A25A*")), <DATA>;
__DATA__
1ax943021C xxx 2447 5
1ax951605B yyy 10400 6
1ax919275 G2L zzz 6845 6
Output:
$VAR1 = "1ax943021C\txxx\t2447\t5";
$VAR1 = "1ax951605B\tyyy\t10400\t6";
$VAR1 = "1ax919275 G2L\tzzz\t6845\t6";
Note that Data::Dumper's Useqq option prints whitecharacters in their escaped form.
Basically what I do here is take each line, unpack it, using 2 strings of space padded text (which removes all excess space), join those strings back together with tab and print them. Note also that this preserves the space inside the last string.
I interpret the question as there being a 25 character wide field that should have its trailing spaces stripped and then delimited by a tab character before the next field. Spaces within the article number should otherwise be preserved (like "1ax919275 G2L").
The following construct should do the trick:
$data =~ s/^(.{25})/{$t=$1;$t=~s! *$!\t!;$t}/emg;
That matches 25 characters from the beginning of each line in the data, then evaluates an expression for each article number by stripping its trailing spaces and appending a tab character.
Have a try with:
$data =~ s/ +/\t/g;
Not sure exactly what you what - this will match the two columns and print them out - with all the original spaces. Let me know the desired output and I will fix it for you...
#!/usr/bin/perl -w
use strict;
my #file = ('\n1ax943021C xxx\t2447\t5', '\n1ax951605B yyy\t10400\t6',
'\n1ax919275 G2L zzz\t6845\t6');
foreach (#file) {
my ($match1, $match2) = ($_ =~ /(\\n.{25})(.*)/);
print "$match1'[insertsomethinghere]'$match2\n";
}
Output:
\n1ax943021C '[insertsomethinghere]'xxx\t2447\t5
\n1ax951605B '[insertsomethinghere]'yyy\t10400\t6
\n1ax919275 G2L '[insertsomethinghere]'zzz\t6845\t6

string array sorting issue in perl

I am running below code to sort strings and not getting the expected results.
Code:
use warnings;
use strict;
my #strArray= ("64.0.71","68.0.71","62.0.1","62.0.2","62.0.11");
my #sortedStrArray = sort { $a cmp $b } #strArray;
foreach my $element (#sortedStrArray ) {
print "\n$element";
}
Result:
62.0.1
62.0.11 <--- these two
62.0.2 <---
64.0.71
68.0.71
Expected Result:
62.0.1
62.0.2 <---
62.0.11 <---
64.0.71
68.0.71
"1" character 0x31. "2" is character 0x32. 0x31 is less than 0x32, so "1" sorts before "2". Your expectations are incorrect.
To obtain the results you desire to obtain, you could use the following:
my #sortedStrArray =
map substr($_, 3),
sort
map pack('CCCa*', split(/\./), $_),
#strArray;
Or for a much wider range of inputs:
use Sort::Key::Natural qw( natsort );
my #sortedStrArray = natsort(#strArray);
cmp is comparing lexicographically (like a dictionary), not numerically. This means it will go through your strings character by character until there is a mismatch. In the case of "62.0.11" vs. "62.0.2", the strings are equal up until "62.0." and then it finds a mismatch at the next character. Since 2 > 1, it sorts "62.0.2" > "62.0.11". I don't know what you are using your strings for or if you have any control over how they're formatted, but if you were to change the formatting to "62.00.02" (every segment has 2 digits) instead of "62.0.2" then they would be sorted as you expect.
Schwartzian_transform
This is usage of randal schwartz transofm:
First, understand, what you want:
sorting by first number, then second, then third:
let's do it with this:
use warnings;
use strict;
use Data::Dumper;
my #strArray= ("64.0.71","68.0.71","62.0.1","62.0.2","62.0.11");
my #transformedArray = map{[$_,(split(/\./,$_))]}#strArray;
=pod
here #transformedArray have such structure:
$each_element_of_array: [$element_from_original_array, $firstNumber, $secondNumber, $thirdNumber];
for example:
$transformedArray[0] ==== ["64.0.71", 64, 0, 71];
after that we will sort it
first by first number
then: by second number
then: by third number
=cut
my #sortedArray = map{$_->[0]} # save only your original string.
sort{$a->[3]<=>$b->[3]}
sort{$a->[2]<=>$b->[2]}
sort{$a->[1]<=>$b->[1]}
#transformedArray;
print Dumper(\#sortedArray);
Try the Perl module Sort::Versions, it is designed to give you what you expect.http://metacpan.org/pod/Sort::Versions
It supports alpha-numeric version ids as well.

Split functions

I want to get the split characters. I tried the below coding, but I can able to get the splitted text only. However if the split characters are same then it should be returned as that single characters
For example if the string is "asa,agas,asa" then only , should be returned.
So in the below case I should get as "| : ;" (joined with space)
use strict;
use warnings;
my $str = "Welcome|a:g;v";
my #value = split /[,;:.%|]/, $str;
foreach my $final (#value) {
print $final, "\n";
}
split splits a string into elements when given what separates those elements, so split is not what you want. Instead, use:
my #punctuations = $str =~ /([,;:.%|])/g;
So you want to get the opposite of split
try:
my #value=split /[^,;:.%|]+/,$str;
It will split on anything but the delimiters you set.
Correction after commnets:
my #value=split /[^,;:.%|]+/,$str;
shift #value;
this works fine, and gives unique answers
#value = ();
foreach(split('',",;:.%|")) { push #value,$_ if $str=~/$_/; }
To extract all the separators only once, you need something more elaborate
my #punctuations = keys %{{ map { $_ => 1 } $str =~ /[,;:.%|]/g }};
Sounds like you call "split characters" what the rest of us call "delimiters" -- if so, the POSIX character class [:punct:] might prove valuable.
OTOH, if you have a defined list of delimiters, and all you want to do is list the ones present in the string, it's much more efficient to use m// rather than split.

Perl string in Quote Word?

Seem like my daily road block. Is this possible? String in qw?
#!/usr/bin/perl
use strict;
use warnings;
print "Enter Your Number\n";
my $usercc = <>;
##split number
$usercc =~ s/(\w)(?=\w)/$1 /g;
print $usercc;
## string in qw, hmm..
my #ccnumber = qw($usercc);
I get Argument "$usercc" isn't numeric in multiplication (*) at
Thanks
No.
From: http://perlmeme.org/howtos/perlfunc/qw_function.html
How it works
qw() extracts words out of your string
using embedded whitsepace as the
delimiter and returns the words as a
list. Note that this happens at
compile time, which means that the
call to qw() is replaced with the list
before your code starts executing.
Additionlly, no interpolation is possible in the string you pass to qw().
Instead of that, use
my #ccnumber = split /\s+/, $usercc;
Which does what you probably want, to split $usercc on whitespace.