Convert a string into a hash in Perl using split() - perl

$hashdef = "Mouse=>Jerry, Cat=>Tom, Dog=>Spike";
%hash = split /,|=>/, $hashdef;
print "$_=>$hash{$_}" foreach(keys %hash);
Mouse=>JerryDog=>SpikeCat=>Tom
I am new to Perl. Can any one explain the regular expression inside the split function? I able to know | is used as the choice of both, but I was still confused.
%hash = split /|=>/, $hashdef;
I get the output
S=>pe=>J=>eT=>or=>rm=>,y=>,u=>sM=>og=>D=>oC=>ai=>kt
%hash = split /,/, $hashdef;
Mouse=>Jerry=>Cat=>TomDog=>Spike=>
Please explain the above condition.

split's first argument defines what separates the elements you want.
/,|=>/ matches a comma (,) or an equals sign followed by a greater-than sign (=>). They're just literals here, there's nothing special about them.
/|=>/ matches the zero-length string or an equals sign followed by a greater-than sign, and splitting on a zero-length string just splits a string up into individual characters; therefore, in your hash, M will map to o, u will map to s, etc. They appear jumbled up in your output because hashes don't have a definite ordering.
/,/ just splits on a comma. You're creating a hash that maps Mouse=>Jerry to Cat=>Tom and Dog=>Spike to nothing.

$hashdef = "Mouse=>Jerry, Cat=>Tom, Dog=>Spike";
my %hash = eval( "( $hashdef )" );
print $hash{'Mouse'}."\n";
eval executes a string as a Perl expression. This doesn't use split, but I think would be a good way to handle the case outlined in your post of getting a hash from your string, seeing as your string happens to be well formed Perl, so I've added it here.

sub hash2string {
my $href = $_[0];
my $hstring = "";
foreach (keys %{$href}) {
$hstring .= "$_=>$href->{$_}, ";
}
return substr($hstring, 0, -2);
}
sub string2hash {
my %lhash;
my #lelements = split(/, /, $_[0]);
foreach (#lelements) {
my ($skey,$svalue) = split(/=>/, $_);
$lhash{$skey} = $svalue;
}
return %lhash;
}

Related

Regular Expression Matching Perl for first case of pattern

I have multiple variables that have strings in the following format:
some_text_here__what__i__want_here__andthen_some 
I want to be able to assign to a variable the what__i__want_here portion of the first variable. In other words, everything after the FIRST double underscore. There may be double underscores in the rest of the string but I only want to take the text after the FIRST pair of underscores.
Ex.
If I have $var = "some_text_here__what__i__want_here__andthen_some", I would like to assign to a new variable only the second part like $var2 = "what__i__want_here__andthen_some"
I'm not very good at matching so I'm not quite sure how to do it so it just takes everything after the first double underscore.
my $text = 'some_text_here__what__i__want_here';
# .*? # Match a minimal number of characters - see "man perlre"
# /s # Make . match also newline - see "man perlre"
my ($var) = $text =~ /^.*?__(.*)$/s;
# $var is not defined when there is no __ in the string
print "var=${var}\n" if defined($var);
You might consider this an example of where split's third parameter is useful. The third parameter to split constrains how many elements to return. Here is an example:
my #examples = (
'some_text_here__what__i_want_here',
'__keep_this__part',
'nothing_found_here',
'nothing_after__',
);
foreach my $string (#examples) {
my $want = (split /__/, $string, 2)[1];
print "$string => ", (defined $want ? $want : ''), "\n";
}
The output will look like this:
some_text_here__what__i_want_here => what__i_want_here
__keep_this__part => keep_this__part
nothing_found_here =>
nothing_after__ =>
This line is a little dense:
my $want = (split /__/, $string, 2)[1];
Let's break that down:
my ($prefix, $want) = split /__/, $string, 2;
The 2 parameter tells split that no matter how many times the pattern /__/ could match, we only want to split one time, the first time it's found. So as another example:
my (#parts) = split /#/, "foo#bar#baz#buzz", 3;
The #parts array will receive these elements: 'foo', 'bar', 'baz#buzz', because we told it to stop splitting after the second split, so that we get a total maximum of three elements in our result.
Back to your case, we set 2 as the maximum number of elements. We then go one step further by eliminating the need for my ($throwaway, $want) = .... We can tell Perl we only care about the second element in the list of things returned by split, by providing an index.
my $want = ('a', 'b', 'c', 'd')[2]; # c, the element at offset 2 in the list.
my $want = (split /__/, $string, 2)[1]; # The element at offset 1 in the list
# of two elements returned by split.
You use brackets to capature then reorder the string, the first set of brackets () is $1 in the next part of the substitution, etc ...
my $string = "some_text_here__what__i__want_here";
(my $newstring = $string) =~ s/(some_text_here)(__)(what__i__want_here)/$3$2$1/;
print $newstring;
OUTPUT
what__i__want_here__some_text_here

Perl module / subroutine to remove shared substring of strings

Given a list/array of strings (in particular, UNIX paths), remove the shared part, eg:
./dir/fileA_header.txt
./dir/fileA_footer.txt
I probably will strip the directory before using the function, but strictly speacking this won't change much.
I'd like to know a method to either remove the shared parts (./dir/fileA_) or remove the not-shared part.
Thank you for your help!
This is a bit of a hack, but if you don't need to support Unicode strings (that is, if all characters have a value below 256), you can use xor to get the length of the longest common prefix of two strings:
my $n = do {
($str1 ^ $str2) =~ /^\0*/;
$+[0]
};
You can apply this operation in a loop to get the common prefix of a list of strings:
use v5.12.0;
use warnings;
sub common_prefix {
my $prefix = shift;
for my $str (#_) {
($prefix ^ $str) =~ /^\0*/;
substr($prefix, $+[0]) = '';
}
return $prefix;
}
my #paths = qw(
./dir/fileA_header.txt
./dir/fileA_footer.txt
);
say common_prefix(#paths);
Output: ./dir/fileA_

Split functions

I want to get the split characters. I tried the below coding, but I can able to get the splitted text only. However if the split characters are same then it should be returned as that single characters
For example if the string is "asa,agas,asa" then only , should be returned.
So in the below case I should get as "| : ;" (joined with space)
use strict;
use warnings;
my $str = "Welcome|a:g;v";
my #value = split /[,;:.%|]/, $str;
foreach my $final (#value) {
print $final, "\n";
}
split splits a string into elements when given what separates those elements, so split is not what you want. Instead, use:
my #punctuations = $str =~ /([,;:.%|])/g;
So you want to get the opposite of split
try:
my #value=split /[^,;:.%|]+/,$str;
It will split on anything but the delimiters you set.
Correction after commnets:
my #value=split /[^,;:.%|]+/,$str;
shift #value;
this works fine, and gives unique answers
#value = ();
foreach(split('',",;:.%|")) { push #value,$_ if $str=~/$_/; }
To extract all the separators only once, you need something more elaborate
my #punctuations = keys %{{ map { $_ => 1 } $str =~ /[,;:.%|]/g }};
Sounds like you call "split characters" what the rest of us call "delimiters" -- if so, the POSIX character class [:punct:] might prove valuable.
OTOH, if you have a defined list of delimiters, and all you want to do is list the ones present in the string, it's much more efficient to use m// rather than split.

Perl: Greedy nature refuses to work

I am trying to replace a string with another string, but the greedy nature doesn't seem to be working for me. Below is my code where "PERFORM GET-APLCY" is identified and replaced properly, but string "PERFORM GET-APLCY-SOI-CVG-WVR" and many other such strings are being replaced by the the replacement string for "PERFORM GET-APLCY".
s/PERFORM $func[$i]\.*/# PERFORM $func[$i]\.\n $hash{$func[$i]}/g;
where the full stop is optional during string match and replacement. I have also tried giving the pattern to be matched as $func[$i]\b
Please help me understand what the issue could be.
Thanks in advance,
Faez
Why GET-APLCY- should not match GET-APLCY., if the dot is optional?
Easy solution: sort your array by length in descending order.
#func = sort { length $b <=> length $a } #func
Testing script:
#!/usr/bin/perl
use warnings;
use strict;
use feature 'say';
my %hash = ('GET-APLCY' => 'REP1',
'GET-APLCY-SOI-CVG-WVR' => 'REP2',
'GET-APLCY-SOI-MNG-CVRW' => 'REP3',
);
my #func = sort { length $b <=> length $a } keys %hash;
while (<DATA>) {
chomp;
print;
print "\t -> \t";
for my $i (0 .. $#func) {
s/$func[$i]/$hash{$func[$i]}/;
}
say;
}
__DATA__
GET-APLCY param
GET-APLCY- param
GET-APLCY. param
GET-APLCY-SOI. param
GET-APLCY-SOI-CVG-WVR param
GET-APLCY-SOI-MNG-CVRW param
You appear to be looping over function names, and calling s/// for each one. An alternative is to use the e option, and do them all in one go (without a loop):
my %hash = (
'GET-APLCY' => 'replacement 1',
'GET-APLCY-SOI-CVG-WVR' => 'replacement 2',
);
s{
PERFORM \s+ # 'PERFORM' keyword
([A-Z-]+) # the original function name
\.? # an optional period
}{
"# PERFORM $1.\n" . $hash{$1};
}xmsge;
The e causes the replacement part to be evaluated as an expression. Basically, the first part finds all PERFORM calls (I'm assuming that the function names are all upper case with '-' between them – adjust otherwise). The second part replaces that line with the text you want to appear.
I've also used the x, m, and s options, which is what allows the comments in the regular expression, among other things. You can find more about these under perldoc perlop.
A plain version of the s-line should be:
s/PERFORM ([A-Z-]+)\.?/"# PERFORM $1.\n" . $hash{$1}/eg;
I guess that $func[$i] contains "GET-APLCY". If so, this is because the star only applies to the dot, an actual dot, not "any character". Try
s/PERFORM $func[$i].*/# PERFORM $func[$i]\.\n $hash{$func[$i]}/g;
I'm pretty sure you trying to do some kind of loop for $i. And in that case most likely
GET-APLCY is located in #func array before GET-APLCY-SOI-CVG-WVR. So I recommend to reverse sort #func before entering loop.

foreach loop with a condition perl?

Is it possible to have a foreach loop with a condition in Perl?
I'm having to do a lot of character by character processing - where a foreach loop is very convenient. Note that I cannot use some libraries since this is for school.
I could write a for loop using substr with a condition if necessary, but I'd like to avoid that!
You should show us some code, including the sort of thing you would like to do.
In general, character-by-character processing of a string would be done in Perl by writing
for my $ch (split //, $string) { ... }
or, if it is more convenient
my #chars = split //, $string;
for (#chars) { ... }
or
my $i = 0;
while ($i < #chars) { my $char = $chars[$i++]; ... }
and the latter form can support multiple expressions in the while condition. Perl is very rich with different ways to do the similar things, and without knowing more about your problem it is impossible to say which is best for you.
Edit
It is important to note that none of these methods allow the original string to be modified. If that is the intention then you must use s///, tr/// or substr.
Note that substr has a fourth parameter that will replace the specified part of the original string. Note also that it can act as an lvalue and so take an assignment. In other words
substr $string, 0, 1, 'X';
can be written equivalently as
substr($string, 0, 1) = 'X';
If split is used to convert a string into a list of characters (actually one-character strings) then it can be modified in this state and recombined into a string using join. For instance
my #chars = split //, $string;
$chars[0] = 'X';
$string = join '', #chars;
does a similar thing to the above code using substr.
For example:
foreach my $l (#something) {
last if (condition);
# ...
}
will exit the loop if condition is true
You might investigate the next and last directives. More info in perldoc perlsyn.