How do I split "a=1,b=2" into a hash in perl? - perl

I want to do this:
my %options = makeHash("user=bob,pass=123");
Bonus points if anyone can make this work...
my %options = makeHash('user="bob,a",pass=123');
I can easily write the first method with multiple split()'s but I want to know if there is a cool elegant way specific to Perl this can be done...

You can use Text::ParseWords (a core module in Perl 5) parse the fields, and also overcome quoted comma signs inside the fields. Note that the return value is a hash reference, not a hash.
use strict;
use warnings;
use Text::ParseWords;
my $options = makeHash('user="bob,a",pass=123');
sub makeHash {
my $str = shift;
my #foo = quotewords(',', 0, $str); # split into pairs
my %hash = quotewords('=', 0, #foo); # split into key + value
return \%hash;
}

If your keys and values are all alphanumeric then you can just write
my %options = "user=bob,pass=123" =~ /\w+/g;
or, for your second case
my %options = 'user="bob,a",pass=123' =~ /(\w+)="?([\w,]+)/g;
You need to be clear exactly what characters can appear in your data, whether or not there may be spaces around the = etc.

Related

perl: Use map and foreach at once?

I was wondering if it is possible to make a hash assigning its keys and values at once. Or in general use map and for at one line:
#!/usr/bin/perl
%h = map {$_, $i} qw[a b c] for $i (1..3)
But unfortunatelly not => Number found where operator expected, meant number in the parenthesis. So my question is why am I not able to make double loop by this way? And how otherwise would someone assign hash keys to values (and I dont concern something like $h = {a=>1,b=>2,c=>3} but rather assigning %h = (#keys = #values) ... in other words, how to assign hash by:
2 arrays only (#keys,#values), no scalars
At once (at one line - without block)
Is it even possible in perl?
Populating a hash is simply a matter of assigning a list with alternating keys and values, so you just have to construct the list using the two arrays in an alternating fashion.
use strict;
use warnings;
my #keys = qw(a b c);
my #values = 1..3;
my %h = map { ($keys[$_], $values[$_]) } 0..$#keys;
List::UtilsBy provides a useful abstraction for this in zip_by.
use List::UtilsBy 'zip_by';
my %h = zip_by { #_ } \#keys, \#values;
But actually it's even easier to use slice assignment. Though you technically can't do this in the same statement as the declaration, it's by far the neatest option:
my %h;
#h{#keys} = #values;
Use List::MoreUtils 'zip' or add your own since that module is not a core module:
sub zip(\##){map{($_[0][$_-1],$_[$_])}1..#{$_[0]}}
my %h = zip #keys, #values;
Well, the question is not very clear on 'why?' -- same can be achieved with following code
use strict;
use warnings;
use Data::Dumper;
my $debug = 1;
my %h;
#h{qw(a b c)} = (1..3);
print Dumper(\%h) if $debug;

Perl reference not printing the expected value

This is my program but why not it is printing my array values instead.
use strict;
use warnings;
use Data::Dumper;
my (#arr1,#arr2) = ([1,1,1,2,3,4],[5,5,5,6,9,87]);
my #arr3 = [\#arr1,\#arr2];
foreach (#arr3){
foreach (#$_){
print $_;
}
}
Output:
ARRAY(0x556414c6b908)ARRAY(0x556414c6b7e8)
but why not it is printing my array values instead.
Because the values are array references. To print the inner values, use dereference:
print #{ $array_ref };
For complex structures (arrays of arrays), you can use Data::Dumper:
use Data::Dumper;
print Dumper($array_ref);
But it still wouldn't work. You can't assign to several arrays at once. The first array gets all the values, the remaining arrays stay empty.
Documented in perlsub:
Do not, however, be tempted to do this:
(#a, #b) = upcase(#list1, #list2);
Like the flattened incoming parameter list, the return list is also
flattened on return. So all you have managed to do here is stored
everything in #a and made #b empty.
First of all, you weren't assigning anything to #arr2. You used something like the following to try to assign to #arr2:
(#arr1, #arr2) = ...;
However, Perl has no way to know how many scalars to assign to #arr1 and how many to assign to #arr2, so it assigns them all to #arr1. Use two different assignments instead.
Secondly, [ ] creates an array and returns a reference to it, so
my #arr1 = [1,1,1,2,3,4];
assigns a single scalar (a reference) to #arr1. This is what you are printing. You want
my #arr1 = (1,1,1,2,3,4);
Same goes for #arr2 and #arr3.
Therefore, your code should be
use strict;
use warnings;
use feature qw( say );
my #arr1 = (1,1,1,2,3,4);
my #arr2 = (5,5,5,6,9,87);
my #arr3 = (\#arr1,\#arr2);
for (#arr3) {
say join ", ", #$_;
}
or
use strict;
use warnings;
use feature qw( say );
my #arr3 = ([1,1,1,2,3,4],[5,5,5,6,9,87]);
for (#arr3) {
say join ", ", #$_;
}

Not an ARRAY reference error in "pop($str)"

I am learning Perl for work and I'm trying to practise with some basic programs.
I want my program to take a string from STDIN and modify it by taking the last character and putting it at the start of the string.
I get an error when I use variable $str in $str = <STDIN>.
Here is my code:
my $str = "\0";
$str = <STDIN>;
sub last_to_first {
chomp($str);
pop($str);
print $str;
}
last_to_first;
Exec :
Matrix :hi
Not an ARRAY reference at matrix.pl line 13, <STDIN> line 1.
Why your approach doesn't work
The pop keyword does not work on strings. Strings in Perl are not automatically cast to character arrays, and those array keywords only work on arrays.
The error message is Not an ARRAY reference because pop sees a scalar variable. References are scalars in Perl (the scalar here is something like a reference to the address of the actual array in memory). The pop built-in takes array references in Perl versions between 5.14 and 5.22. It was experimental, but got removed in the (currently latest) 5.24.
Starting with Perl 5.14, an experimental feature allowed pop to take a scalar expression. This experiment has been deemed unsuccessful, and was removed as of Perl 5.24.
How to make it work
You have to split and join your string first.
my $str = 'foo';
# turn it into an array
my #chars = split //, $str;
# remove the last char and put it at the front
unshift #chars, pop #chars;
# turn it back into a string
$str = join '', #chars;
print $str;
That will give you ofo.
Now to use that as a sub, you should pass a parameter. Otherwise you do not need a subroutine.
sub last_to_first {
my $str = shift;
my #chars = split //, $str;
unshift #chars, pop #chars;
$str = join '', #chars;
return $str;
}
You can call that sub with any string argument. You should do the chomp to remove the trailing newline from STDIN outside of the sub, because it is not needed for switching the chars. Always build your subs in the smallest possible unit to make it easy to debug them. One piece of code should do exactly one functionality.
You also do not need to initialize a string with \0. In fact, that doesn't make sense.
Here's a full program.
use strict;
use warnings 'all';
my $str = <STDIN>;
chomp $str;
print last_to_first($str);
sub last_to_first {
my $str = shift;
my #chars = split //, $str;
unshift #chars, pop #chars;
$str = join '', #chars;
return $str;
}
Testing your program
Because you now have one unit in your last_to_first function, you can easily implement a unit test. Perl brings Test::Simple and Test::More (and other tools) for that purpose. Because this is simple, we'll go with Test::Simple.
You load it, tell it how many tests you are going to do, and then use the ok function. Ideally you would put the stuff you want to test into its own module, but for simplicity I'll have it all in the same program.
use strict;
use warnings 'all';
use Test::Simple tests => 3;
ok last_to_first('foo', 'ofo');
ok last_to_first('123', '321');
ok last_to_first('qqqqqq', 'qqqqqq');
sub last_to_first {
my $str = shift;
my #chars = split //, $str;
unshift #chars, pop #chars;
$str = join '', #chars;
return $str;
}
This will output the following:
1..3
ok 1
ok 2
ok 3
Run it with prove instead of perl to get a bit more comprehensive output.
Refactoring it
Now let's change the implementation of last_to_first to use a regular expression substitution with s/// instead of the array approach.
sub last_to_first {
my $str = shift;
$str =~ s/^(.+)(.)$/$2$1/;
return $str;
}
This code uses a pattern match with two groups (). The first one has a lot of chars after the beginning of the string ^, and the second one has exactly one char, after which the string ends $. You can check it out here. Those groups end up in $1 and $2, and all we need to do is switch them around.
If you replace your function in the program with the test, and then run it, the output will be the same. You have just refactored one of the units in your program.
You can also try the substr approach from zdim's answer with this test, and you will see that the tests still pass.
The core function pop takes an array, and removes and returns its last element.
To manipulate characters in a string you can use substr, for example
use warnings;
use strict;
my $str = <STDIN>;
chomp($str);
my $last_char = substr $str, -1, 1, '';
my $new_str = $last_char . $str;
The arguments to substr mean: search the variable $str, at offset -1 (one from the back), for a substring of length 1, and replace that with an empty string '' (thus removing it). The substring that is found, here the last character, is returned. See the documentation page linked above.
In the last line the returned character is concatenated with the remaining string, using the . operator.
You can browse the list of functions broken down by categories at Perl functions by category.
Perl documentation has a lot of goodies, please look around.
Strings are very often manipulated using regular expressions. See the tutorial perlretut, the quick start perlrequick, the quick reference perlreref, and the full reference perlre.
You can also split a string into a character array and work with that. This is shown in detail in the answer by simbabque, which packs a whole lot more of good advice.
This is for substring function used for array variables:
my #arrays = qw(jan feb mar);
last_to_first(#arrays);
sub last_to_first
{
my #lists = #_;
my $last = pop(#lists);
#print $last;
unshift #lists, $last;
print #lists;
}
This is for substring function used for scalar variables:
my $str = "";
$str = <STDIN>;
chomp ($str);
last_to_first($str);
sub last_to_first
{
my $chr = shift;
my $lastchar = substr($chr, -1);
print $lastchar;
}

Issues using List::MoreUtils::firstidx

I am trying to use List::MoreUtils methods. But, need some clarity on its usage it in some scenarios.
Please let me know, if it can be used with a map. For example:
#!/usr/bin/perl
use strict;
use warnings;
use List::Util;
use List::MoreUtils;
use Data::Dumper;
my #udData1 = qw(WILL SMITH TOMMY LEE JONES);
my #arr = qw(WILL TOMMY);
my %output = map{$_=>List::MoreUtils::firstidx{/$_/} #udData1} #arr;
print Dumper %output;
print List::MoreUtils::firstidx{/TOMMY/} #udData1;
print "\n";
Output:
$VAR1 = 'TOMMY';
$VAR2 = 0;
$VAR3 = 'WILL';
$VAR4 = 0;
2
As observed I am not getting the values correctly when using map, but getting it fine when used in the later command.
I intend to use $_ as an element of #arr. This may be incorrect. So, please suggest me an alternative. Shall i have to use foreach?
The problem is this bit right here:
List::MoreUtils::firstidx{/$_/} #udData1
In this bit of code, you're expecting $_ to be both the pattern taken from #arr and the string taken from #udData1 at the same time. (Remember that firstidx{/TOMMY/} means firstidx{$_ =~ /TOMMY/}, and likewise firstidx{/$_/} means firstidx{$_ =~ /$_/}.)
What actually happens is that $_ is the value from #udData1 (since that's the innermost loop) and you wind up matching that against itself. Because it's a simple alphabetic string, it always matches itself, and firstidx correctly returns 0.
Here's one solution using a temporary lexical variable:
my %output = map{ my $p = $_;
$p => List::MoreUtils::firstidx{/$p/} #udData1 } #arr;

Perl sub optimisation push string into csv using split

I would like to optimise this Perl sub:
push_csv($string,$addthis,$position);
for placing strings inside a CSV string.
e.g. if $string="one,two,,four"; $addthis="three"; $position=2;
then push_csv($string,$addthis,$position) will change the value of $string = "one,two,three,four";
sub push_csv {
my #fields = split /,/, $_[0]; # split original string by commas;
$_[1] =~ s/,//g; # remove commas in $addthis
$fields[$_[2]] = $_[1]; # put the $addthis string into
# the array position $position.
$_[0] = join ",", #fields; # join the array with commas back
# into the string.
}
This is a bottleneck in my code, as it needs to be called a few million times.
If you are proficient in Perl, could you take a look at it, and suggest optimisation/alternatives? Thanks in advance! :)
EDIT:
Converting to #fields and back to string is taking time, I just thought of a way to speed it up where I have more than one sub call in a row. Split once, then push more than one thing into the array, then join once at the end.
For several reasons, you should be using Text::CSV to handle these low-level CSV details. Provided that you are able to install the XS version, my understanding is that it will run faster than anything you can do in pure Perl. In addition, the module will correctly handle all sorts of edge cases that you are likely to miss.
use Text::CSV;
my $csv = Text::CSV->new;
my $line = 'foo,,fubb';
$csv->parse($line);
my #fields = $csv->fields;
$fields[1] = 'bar';
$csv->combine(#fields);
print $csv->string; # foo,bar,fubb
Keep your array as an array in the first place, not as a ,-separated string?
You might want to have a look at Data::Locations.
Or try (untested, unbenchmarked, doesn't append new fields like your original can...)
sub push_csv {
$_[1] =~ y/,//d;
$_[0] =~ s/^(?:[^,]*,){$_[2]}\K[^,]*/$_[1]/;
return;
}
A few suggestions:
Use tr/,//d instead of s/,//g as it is faster. This is essentially the same as ysth's suggestion to use y/,//d
Perform split only as much as is needed. If $position = 1, and you have 10 fields, then you're wasting computation performing unnecessary splits and joins. The optional third argument to split can be leveraged to your advantage here. However, this does depend on how many consecutive empty fields you are expecting. It may not be worth it if you don't know ahead of time how many of these you have
You're quite right in wanting to perform multiple appends with one sub-call. There is no need to perform multiple splits and joins when one will do just as well
You really ought to be using Text::CSV, but here's how I would revise the implementation of your sub in pure Perl (assuming a maximum of one consecutive empty field):
sub push_csv {
my ( $items, $positions ) = #_[1..2];
# Test inputs
warn "No. of items to add & positions not equal"
and
return unless #{$items} == #{$positions};
my $maxPos; # Find the maximum position number
for my $position ( #{$positions} ) {
$maxPos ||= $position;
$maxPos = $position if $maxPos < $position;
}
my #fields = split /,/ , $_[0], $maxPos+2; # Split only as much as needed
splice ( #fields, $positions->[$_], 1, $items->[$_] ) for 0 .. $#{$items};
$_[0] = join ',' , #fields;
print $_[0],"\n";
}
Usage
use strict;
use warnings;
my $csvString = 'one,two,,four,,six';
my #missing = ( 'three', 'five' );
my #positions = ( 2, 4 );
push_csv ( $csvString, \#missing, \#positions );
print $csvString; # Prints 'one,two,three,four,five,six'
If you're hitting a bottleneck by splitting and joining a few million times... then don't split and join constantly. split each line once when it initially enters the system, pass that array (or, more likely, a reference to the array) around while doing your processing, and then do a single join to turn it into a string when you're ready for it to leave the system.
e.g.:
#!/usr/bin/env perl
use strict;
use warnings;
# Start with some CSV-ish data
my $initial_data = 'foo,bar,baz';
# Split it into an arrayref
my $data = [ split /,/, $initial_data ];
for (1 .. 1_000_000) {
# Pointless call to push_csv, just to make it run
push_csv($data, $_, $_ % 3);
}
# Turn it back into a string and display it
my $final_data = join ',', #$data;
print "Result: $final_data\n";
sub push_csv {
my ($data_ref, $value, $position) = #_;
$$data_ref[$position] = $value;
# Alternately:
# $data_ref->[$position] = $value;
}
Note that this simplifies things enough that push_csv becomes a single, rather simple, line of processing, so you may want to just do the change inline instead of calling a sub for it, especially if runtime efficiency is a key criterion - in this trivial example, getting rid of push_csv and doing it inline reduced run time by about 70% (from 0.515s to 0.167s).
Don't you think it might be easier to use arrays and splice, and only use join to create the comma separation at the end?
I really don't think using s/// repeatedly is a good idea if this is a major bottleneck in your code.