How to replace a set of search/replace pairs? - perl

I have a dictionary of translations as a hash:
my %dict = { hello => 'hola', goodbye => 'adios' , ... }
(The actual use-case is not a human language translation! I'm replacing a load of tokens with some other values. This is just for example.)
How can I apply each of these to a string? Obviously I could loop them and pass each to s/$key/$value/ but then I'd have to quote them so it wouldn't break if a search or replacement had (for example) / in it.
In PHP there's strtr($subject, $replacement_pairs_array) - is there anything similar in Perl?

First, your hash initialization is off: A hash is initialized as a list:
my %dict = ( hello => 'hola', goodbye => 'adios' , ... );
Or you can use a hash reference:
my $dict = { hello => 'hola', goodbye => 'adios' , ... };
which is a scalar.
Replacing the keys with the values in a string is easy:
s/$_/$dict{$_}/g for keys %dict;
unless
The contents of substitutions shall not be replaced, e.g. %dict = (a => b, b => c) should transform "ab" to "bc" (not to "cc" as the above solution may or may not do, hash order is random).
The keys can contain regex metacharacters like ., +, or (). This can be circumvented by escaping regex metacharacters with the quotemeta function.
The traditional approach is to build a regex that matches all keys:
my $keys_regex = join '|', map quotemeta, keys %dict;
Then:
$string =~ s/($keys_regex)/$dict{$1}/g;
which solves all these issues.
In the regex building code, we first escape all keys with map quotemeta, and then join the strings with | to build the regex that matches all keys. The resulting regex is quite efficient.
This guarantees that each part of the string is only translated once.

%dict = ( 'hello' => 'hola', 'goodbye' => 'adios' );
my $x="hello bob, goodbye sue";
my $r=join("|",keys %dict);
$x=~s/($r)/$dict{$1}/ge;
print $x;
This shows one way to do it.
Convert the hash keys to a alternated regexp ie "hello|goodbye", look for matches with that expression then use the found key to lookup the value in the hash. With the g flag the regexp is globally or repeatedly applied to the string and the with e flag the replacement expression is evaluated as perl instead of being a literal replacement

There appears to be a CPAN module that'll do this

Related

How to let Perl recognize both lower and uppercase input?

I'm currently trying to figure out how I can make my perl script accept both the lowercase and uppercase variant of a letter.
For Example.
my %channels = (a => 'test', b => 'test2', c => 'test3', d => 'test4', e => 'test5', f => 'test6', g => 'test7');
So when I want to enter test I can either do a or A and it will accept it
To Sum up the problem:
When running the script I ran into an issue where you had to input a if you wanted test. This is all fine to me but other people wanted the option to do capital A instead. I am trying to give them the option to do either.
Thanks All
For your hash keys (single alphabet as a key) try the logical defined operator.
// => logical defined operator. If the left hand side is true give the result else if evaluate the right side.
print $channels{$input} // $channels{lc($input)};
One simple solution for your input
print $channels{lc($input)};
If the input is uppercase it will convert to lowercase. Don't get worry about lowercase characters.
It is not fully clear what your requirement is but from the example it looks like you are asking for a hash which is case insensitive regarding the keys, i.e. that $hash{foo}, $hash{FoO} etc all result in the same value.
This can be implemented with tied hashes by defining appropriate FETCH,STORE and DELETE methods for the hash. And there are already implementations which do this, like Hash::Case.
Of course you could also simply normalize all keys (lower case, upper case etc) before accessing the hash.
I would keep all the hash keys lower case and convert the input value keys to lower case:
for my $inputKey ('a', 'A', 'b', 'B') {
print $channels{lc($inputKey)}, "\n";
}
With a regex it's not shorter but another way to solve it:
my %hash = {a => 'test', b => 'test2'};
my $read_input;
foreach my $key (keys %hash){
if($key =~ /${read_input}/i){ #note the i for 'ignore case'
print $hash{$key};
}
}
You basically iterate through the keys of your hash and compare them against the input. If you found the right one you can print the value.
But nevertheless: You may just convert every input to lower case and then access the hash.

Perl replace multiple strings simultaneously (case insensitive)

Consider the following perl code which works perfectly:
%replacements = ("what" => "its", "lovely" => "bad");
($val = $sentence) =~ s/(#{[join "|", keys %replacements]})/$replacements{$1}/g;
stackoverflow user sresevoir brilliantly came up with that replacement code that involved using a hash, allowing you to find and replace multiple terms without iterating through a loop.
I've been throwing other various search and replace terms at it programmatically and I've started using it to highlight words that are the result of a search.
The problem (refer to problem code shown below):
Make it case insensitive by adding an "i" before the "g" at the end.
If the search term $thisterm and the search term word contained in $sentence has no difference in case, there are no problems. If the search term $thisterm (i.e. Stackoverflow) and the search term word contained in $sentence is a different case (i.e. stackoverflow), then the result returned is nothing for that term. It's as if I told it to
$sentence =~ s/$thisterm//g;
Here's the problem code:
foreach $thisterm (#searchtermarray) {
# The variable $thisterm has already gone through a filter to remove special characters.
$thistermtochange = $thisterm;
$replacements{$thistermtochange} = "<span style=\"background-color:#FFFFCC;\">$thistermtochange<\/span>";
}
$sentence =~ s/(#{[join "|", keys %replacements]})/$replacements{$1}/ig;
I also went back and duplicated the problem with the above original code. It seems the combination of adding the i modifier, using a hash reference, and different case is something Perl doesn't like.
What am I missing?
Thanks,
DB
P. S. I've benefited from stackoverflow for years; but I just signed up for this question and the site wouldn't let me directly comment to sresevoir. As a "brand new" user I don't have enough reputation points.
Keep all the keys of the hash in lower case, and do this:
s/(#{[join "|", keys %replacements]})/$replacements{ lc $1 }/ig
(note the addition of lc)
There are a few other things you ought to consider.
First, as is, if you are trying to replace both lovely and love with different replacements, lovely may or may not ever be found, depending on which key is returned by keys first. To prevent this, it's a good idea to sort by descending length:
s/(#{[join "|", sort { length $b <=> length $a } keys %replacements]})/$replacements{$1}/ig
Second, this technique only works with fixed strings; if your keys contain any regex metacharacters, for instance replacing how? with why?, it will fail, because $1 will never be how?. To allow metacharacters (interpreted as literal characters), quote them:
s/(#{[join "|", map quotemeta, sort { length $b <=> length $a } keys %replacements]})/$replacements{$1}/ig
From your comment, it seems to me that you want to find certain strings, all in one pass, and add stuff around them (that doesn't vary by which string). If so, you are going about it the hard way and shouldn't be using a hash at all. Have an array of the strings you want to search for and replace them:
s/(#{[join "|", map quotemeta, sort { length $b <=> length $a } #search_strings]})/<span style="background-color:#FFFFCC;">$1<\/span>/ig;
The problem is that, if you have a hash like this
my %replacements = (
word => '<span style="background-color:#FFFFCC;">word</span>'
)
then the substitution will look like
s/(word)/$replacements{$1}/ig;
But a case-independent regex pattern will match WORD as well, so the replacement expression $replacements{$1} will be $replacements{'WORD'} which doesn't exist.
While you may be pleased with his solution, sresevoir uses an ugly way of embedding a string expression within a regex. This
($val = $sentence) =~ s/(#{[join "|", keys %replacements]})/$replacements{$1}/g;
would be much better as
my $pattern = join '|', keys %replacements;
($val = $sentence) =~ s/($pattern)/$replacements{$1}/g;
But you have generalised this hash idea too far and it is the wrong way to make the changes that you need. If your replacement string is a simple function of the original string, as in this case, then it is best written directly as a replacement string using captures from the pattern. I would write it like this
my $pattern = join '|', #searchtermarray;
$sentence =~ s{($pattern)}{<span style="background-color:#FFFFCC;">$1</span>\n}ig;
But note that that, as it stands, the search will find any words that are substrings of anything in the text, and will also go awry if #searchtermarray has any strings that contain regex metacharacters. You don't say anything about your actual data so I can't really help you to resolve this.

Hash Key and Value in Perl

I have the question in Perl:Read a series of last names and phone numbers from the given input. The names and numbers should be separated by a comma. Then print the names and numbers alphabetically according to last name. Use hashes.
#!usr/bin/perl
my %series = ('Ashok','4365654435' 'Ramnath','4356456546' 'Aniketh','4565467577');
while (($key, $value) = each(sort %series))
{
print $key.",".$value."\n";
}
I am not getting the output. Where am I going wrong? Please help. Thanks in advance
#!usr/bin/perl
my %series = ('Ashok','4365654435' 'Ramnath','4356456546' 'Aniketh','4565467577');
print $_.",".$series{$_}."\n" for sort keys %series;
If I execute any of the above 2 programs, I get the same output as:
String found where operator expected at line 2, near "'4365654435' 'Ramnath'" (Missing operator before 'Ramnath'?)
String found where operator expected at line 2, near "'4356456546' 'Aniketh'" (Missing operator before 'Aniketh'?)
syntax error at line 2, near "'4365654435' 'Ramnath'"
Execution aborted due to compilation errors
But according to the question, I think I cannot store the input as my %series = ('Ashok','4365654435','Ramnath','4356456546','Aniketh','4565467577');
each only operates on hashes. You can't use sort like that, it sorts lists not hashes.
Your loop could be:
foreach my $key (sort keys %series) {
print $key.",".$series{$key}."\n";
}
Or in shorthand:
print $_.",".$series{$_}."\n" for sort keys %series;
In your hash declaration you have:
my %series = ('Ashok','4365654435' 'Ramnath','4356456546' 'Aniketh','4565467577');
This is generating the warnings.
A hash is simply an even list of scalars. Therefore, you have to put a comma between each pair:
my %series = ('Ashok','4365654435', 'Ramnath','4356456546', 'Aniketh','4565467577');
# ^--- ^---
If you want visual distinction between the pairs, you can use the => operator. This behaves the same as the comma. Additionaly, if the left hand side is a legal bareword, it is viewed as a quoted string. Therefore, we could write any of these:
# it is just a comma after all, with autoquoting
my %series = (Ashok => 4365654435 => Ramnath => 4356456546 => Aniketh => 4565467577);
# using it as a visual "pair" constructor
my %series = ('Ashok'=>'4365654435', 'Ramnath'=>'4356456546', 'Aniketh'=>'4565467577');
# as above, but using autoquoting. Numbers don't have to be quoted.
my %series = (
Ashok => 4365654435,
Ramnath => 4356456546,
Aniketh => 4565467577,
);
This last solution is the best. The last coma is optional, but I consider it good style—it makes it easy to add another entry. You can use autoquoting whenever the bareword on the left would be a legal variable name. E.g. a_bc => 1 is valid, but a bc => 1 is not (whitespace is not allowed in variable names), and +/- => 1 is not allowed (reserved characters). However Ünıçøðé => 1 is allowed when your source code is encoded in UTF-8 and you use uft8 in your script.
Besides what amonand Mat said, I'd like to notice other issues in your code:
your shebang is wrong it should be #!/usr/bin/perl - notice the first /
you don't have use strict; and use warnings; in your code - although this is not strictly a mistake, I consider this to be an issue. Those 2 commands will save you from a lot of trouble later on.
PS: you have to use commas between your number and names also, not only between names and numbers - you have to, because otherwise you get a compile error

Perl: Greedy nature refuses to work

I am trying to replace a string with another string, but the greedy nature doesn't seem to be working for me. Below is my code where "PERFORM GET-APLCY" is identified and replaced properly, but string "PERFORM GET-APLCY-SOI-CVG-WVR" and many other such strings are being replaced by the the replacement string for "PERFORM GET-APLCY".
s/PERFORM $func[$i]\.*/# PERFORM $func[$i]\.\n $hash{$func[$i]}/g;
where the full stop is optional during string match and replacement. I have also tried giving the pattern to be matched as $func[$i]\b
Please help me understand what the issue could be.
Thanks in advance,
Faez
Why GET-APLCY- should not match GET-APLCY., if the dot is optional?
Easy solution: sort your array by length in descending order.
#func = sort { length $b <=> length $a } #func
Testing script:
#!/usr/bin/perl
use warnings;
use strict;
use feature 'say';
my %hash = ('GET-APLCY' => 'REP1',
'GET-APLCY-SOI-CVG-WVR' => 'REP2',
'GET-APLCY-SOI-MNG-CVRW' => 'REP3',
);
my #func = sort { length $b <=> length $a } keys %hash;
while (<DATA>) {
chomp;
print;
print "\t -> \t";
for my $i (0 .. $#func) {
s/$func[$i]/$hash{$func[$i]}/;
}
say;
}
__DATA__
GET-APLCY param
GET-APLCY- param
GET-APLCY. param
GET-APLCY-SOI. param
GET-APLCY-SOI-CVG-WVR param
GET-APLCY-SOI-MNG-CVRW param
You appear to be looping over function names, and calling s/// for each one. An alternative is to use the e option, and do them all in one go (without a loop):
my %hash = (
'GET-APLCY' => 'replacement 1',
'GET-APLCY-SOI-CVG-WVR' => 'replacement 2',
);
s{
PERFORM \s+ # 'PERFORM' keyword
([A-Z-]+) # the original function name
\.? # an optional period
}{
"# PERFORM $1.\n" . $hash{$1};
}xmsge;
The e causes the replacement part to be evaluated as an expression. Basically, the first part finds all PERFORM calls (I'm assuming that the function names are all upper case with '-' between them – adjust otherwise). The second part replaces that line with the text you want to appear.
I've also used the x, m, and s options, which is what allows the comments in the regular expression, among other things. You can find more about these under perldoc perlop.
A plain version of the s-line should be:
s/PERFORM ([A-Z-]+)\.?/"# PERFORM $1.\n" . $hash{$1}/eg;
I guess that $func[$i] contains "GET-APLCY". If so, this is because the star only applies to the dot, an actual dot, not "any character". Try
s/PERFORM $func[$i].*/# PERFORM $func[$i]\.\n $hash{$func[$i]}/g;
I'm pretty sure you trying to do some kind of loop for $i. And in that case most likely
GET-APLCY is located in #func array before GET-APLCY-SOI-CVG-WVR. So I recommend to reverse sort #func before entering loop.

In perl, how do I replace a set of characters with a different set of characters in a single pass?

Given ...
Ax~B~xCx~xDx
... emit ...
A~-B-~C~-~D~
I want to replace the ~ characters with - and the x characters with ~.
I could write ...
s/~/-/g;s/x/~/g;
... but that (looks like it) passes over the string twice.
Use "transliterate" for replacement based on characters. Try this:
tr/~x/\-~/;
Since you're dealing with single characters, tr/// is the obvious answer:
tr/~x/-~/;
However, you're going to need s/// to deal with longer sequences:
my %subs = ( '~' => '-', 'x' => '~' );
my $pat = join '|', map quotemeta, keys %subs;
s/($pat)/$subs{$1}/g;