Instr Equivalent in perl? - perl

A variable named RestrictedNames holds the list of restricted user names. SplitNames is an array variable which holds the complete set of user name. Now I have to check whether current name is found in RestrictedNames variable like using instr.
#SplitNames = ("naag algates","arvind singh","abhay avasti","luv singh","new algates") and now i want to block all the surnames which has "singh" ,"algates" etc.
#SplitNames = ("naag algates","arvind singh","abhay avasti","luv singh","new algates")
$RestrictedNames="tiwary singh algates n2 n3 n4 n5 n6";
for(my $i=0;$i<#SplitNames;$i++)
{
if($RestrictedNames =~ m/^$SplitNames[$i]/ ) //google'd this condition, still fails
{
print "$SplitNames[$i] is a restricted person";
}
}

You should modify this line:
if($RestrictedNames =~ m/^$SplitNames[$i]/ )
to
if($RestrictedNames =~ m/$SplitNames[$i]/ )
^ looks for a match from the beginning.
For more details about perl metacharacters, see here
EDIT:
If you need blocking based on surnames, try this code in the for-loop body.
my #tokens = split(' ', $SplitNames[$i]); # splits name on basis of spaces
my $surname = $tokens[$#tokens]; # takes the last token
if($RestrictedNames =~ m/$surname/ )
{
print "$SplitNames[$i] is a restricted person\n";
}

Don't try dealing with a string of restricted names, deal with an array.
Then just use the smart match operator (~~ or two tilde characters) to see if a given string is in it.
#!/usr/bin/perl
use v5.12;
use strict;
use warnings;
my $RestrictedNames="n1 n2 n3 n4 n5 n6 n7 n8 n9";
my #restricted_names = split " ", $RestrictedNames;
say "You can't have foo" if 'foo' ~~ #restricted_names;
say "You can't have bar" if 'bar' ~~ #restricted_names;
say "You can't have n1" if 'n1' ~~ #restricted_names;
say "You can't have n1a" if 'n1a' ~~ #restricted_names;

Try something like below using Hash Slice:
my #users = ( "n10", "n12", "n13", "n4", "n5" );
my #r_users = ( "n1", "n2", "n3", "n4", "n5", "n6", "n7", "n8", "n9" ) ;
my %check;
#check{#r_users} = ();
foreach my $user ( #users ) {
if ( exists $check{$user} ) {
print"Restricted User: $user \n";
}
}

Most idiomatic way would be to create a hash of the restricted names, then split the surname from the name and check if the surname is in the hash.
use strict;
use warnings;
my #SplitNames = ("naag algates","arvind singh","abhay avasti","luv singh","new algates");
my $RestrictedNames = "tiwar y singh algates n2 n3 n4 n5 n6";
# Create hash of restricted names
my %restricted;
map { $restricted{$_}++ } split(' ', $RestrictedNames);
# Loop over names and check if surname is in the hash
for my $name (#SplitNames) {
my $surname = (split(' ', $name))[-1];
if ( $restricted{$surname} ) {
print "$name is a restricted person\n";
}
}
Please note that the split function normally takes a RegEx. However using ' ' with split is a special case. It splits on any length of whitespace, and also ignores any leading whitespace, so it's useful for splitting strings of individual words.
FYI, the equivalent to instr in perl is to use index($string, $substring). If $substring does not occur inside $string it will return -1. Any other value means $string contains $substring. However, when comparing lists it's much less hassle to use a hash like I have shown above... and unlike index, it won't match 'joyce' when you really only meant to match 'joy'.

Related

Use of hash of hashes for extracting data

I am new to perl and am trying to understand hashes. I've tried using a basic hash and its working. I am now trying to extract data using a hash of hashes. E.g I have a text file (input.txt) that contains some random information. How can I extract the required information using a hash of hashes structure.
input.txt
hi how r you this is sample .txt. you can use it for learning hash and hash of hashes. Let say I have cell ("name") and it has value as below
cell ("name"){
pin : A, B;
function: A+B;
value: 0.435;
}
I want to extract cell data in following format.
Output
Cell Pin Value
name A 0.435
I tried this:
while(<$fh>)
{
if(/ cell \(/){$hash{cell} //=$_;}
elsif(/ pin \(/){$hash{pin} //=$_;}
elsif(/ value :/){$hash{value} //=$_;}
}
use Data::Dump;
dd \%hash;
This will give only one entry in hash form. How can I get all these matches available in the input file.
Firstly, you need some way to avoid the text commentary at the start of the file. You could just skip the first line, but then random text that appears elsewhere will mess things up. What would be better is to look for the relevant data but happily ignore any other text no matter where it appears.
Notice that the text commentary contains relevant-looking data: cell ("name") but there is no { on the end of the line. You could use that to distinguish between the commentary and the data but that's perhaps a little too flexible. Probably better to insist on the { as well as whitespace only before the cell declaration.
Once inside a cell, it's reasonable to insist on having no comments. We can then just iteratively read lines and split on the ":" until we reach the }. Combined with some general advice;
Separate regex definition from regex use.
Test your matches before using the capture variables; and
Use 'extended mode' regexes which allow whitespace in the regex
This all gives us;
#!/usr/bin/env perl
use v5.12;
use Data::Dumper qw(Dumper);
my $cell_name_re = qr/ ^ \s* cell \s* \( \s* "(\w+)" \) \s* { /x;
my $cell_data_re = qr/ ^ \s* ([^:]+) : (\N+) \n /x;
my $closing_curly_re = qr/ ^ \s* } /x;
my %data ;
while (<>) {
next unless /$cell_name_re/ ;
my $cell_name = $1 ;
my %cell_hash ;
while (<>) {
if ( /$cell_data_re/ ) {
$cell_hash{ $1 } = $2 ;
}
elsif ( /$closing_curly_re/ ) {
$data{ $cell_name } = \%cell_hash ;
last ; # exit the inner loop
}
else {
warn "Don't understand line $. - ignoring" ;
}
}
}
print Dumper( \%data );
exit 0;
There are two key things here - firstly, %cell_hash is declared inside the first loop which ensures we get a new %cell_hash each time through; and when we insert %cell_hash into the global %data we take a reference to it with \. Running it the input data above yields;
{
'name' => {
'function' => ' A+B;',
'value' => ' 0.435;',
'pin ' => ' A, B;'
}
};

Want to extract the first letter of each word

I basically have a variable COUNTRY along with variables SUBJID and TREAT and I want to concatenate it like this ABC002-123 /NZ/ABC.
Suppose if the COUNTRY variable had the value 'New Zealand'. I want to extract the first letter of each word, But I want extract only the first two letters of the value when there is only one word in the COUNTRY variable. I wanted a to know how to simply the below code. If possible in perl programming.
If COUNTW(COUNTRY) GT 1 THEN
CAT_VAR=
UPCASE(SUBJID||"/"||CAT(SUBSTR(SCAN(COUNTRY,1,' '),1,1),
SUBSTR(SCAN(COUNTRY,2,' '),1,1))||"/"||TREAT);
my #COUNTRY = ("New Zealand", "Germany");
# 'NZ', 'GE'
my #two_letters = map {
my #r = /\s/ ? /\b(\w)/g : /(..)/;
uc(join "", #r);
} #COUNTRY;
The SAS Perl Regular Expression solution is to use CALL PRXNEXT along with PRXPOXN or CALL PRXPOSN (or a similar function, if you prefer):
data have;
infile datalines truncover;
input #1 country $20.;
datalines;
New Zealand
Australia
Papua New Guinea
;;;;
run;
data want;
set have;
length country_letter $5.;
prx_1 = prxparse('~(?:\b([a-z])[a-z]*\b)+~io');
length=0;
start=1;
stop = length(country);
position=0;
call prxnext(prx_1,start,stop,country,position,length);
do while (position gt 0);
matchletter = prxposn(prx_1,1,country);
country_letter = cats(country_letter,matchletter);
call prxnext(prx_1,start,stop,country,position,length);
put i= position= start= stop=;
end;
run;
I realize the OP might not be interested in another answer, but for other users browsing this thread and not wanting to use Perl expressions I suggest the following simple solution (for the original COUNTRY variable):
FIRST_LETTERS = compress(propcase(COUNTRY),'','l');
The propcase functions capitalizes the first letters of each word and puts the other ones in lower case. The compress function with 'l' modifier deletes all lower case letters.
COUNTRY may have any number of words.
How about this:
#!/usr/bin/perl
use warnings;
use strict;
my #country = ('New Zealand', 'Germany', 'Tanzania', 'Mozambique', 'Irish Repuublic');
my ($one_word_letters, $two_word_letters, #initials);
foreach (#country){
if ($_ =~ /\s+/){ # Captures CAPs if 'country' contains a space
my ($first_letter, $second_letter) = ($_ =~ /([A-Z])/g);
my ($two_word_letters) = ($first_letter.$second_letter);
push #initials, $two_word_letters; # Add to array for later
}
else { ($one_word_letters) = ($_ =~ /([A-Z][a-z])/); # If 'country' is only one word long, then capture first two letters (CAP+noncap)
push #initials, $one_word_letters; # Add this to the same array
}
}
foreach (#initials){ # Print contents of the capture array:
print "$_\n";
}
Outputs:
NZ
Ge
Ta
Mo
IR
This should do the job provided there really are no 3 word countries. Easily fixed if there are though...
This should do.
#!/usr/bin/perl
$init = &getInitials($ARGV[0]);
if($init)
{
print $init . "\n";
exit 0;
}
else
{
print "invalid name\n";
exit 1;
}
1;
sub getInitials {
$name = shift;
$name =~ m/(^(\S)\S*?\s+(\S)\S*?$)|(^(\S\S)\S*?$)/ig;
if( defined($1) and $1 ne '' ) {
return uc($2.$3);
} elsif( defined($4) and $4 ne '' ) {
return uc($5);
} else {
return 0;
}
}

Convert a string into a hash in Perl using split()

$hashdef = "Mouse=>Jerry, Cat=>Tom, Dog=>Spike";
%hash = split /,|=>/, $hashdef;
print "$_=>$hash{$_}" foreach(keys %hash);
Mouse=>JerryDog=>SpikeCat=>Tom
I am new to Perl. Can any one explain the regular expression inside the split function? I able to know | is used as the choice of both, but I was still confused.
%hash = split /|=>/, $hashdef;
I get the output
S=>pe=>J=>eT=>or=>rm=>,y=>,u=>sM=>og=>D=>oC=>ai=>kt
%hash = split /,/, $hashdef;
Mouse=>Jerry=>Cat=>TomDog=>Spike=>
Please explain the above condition.
split's first argument defines what separates the elements you want.
/,|=>/ matches a comma (,) or an equals sign followed by a greater-than sign (=>). They're just literals here, there's nothing special about them.
/|=>/ matches the zero-length string or an equals sign followed by a greater-than sign, and splitting on a zero-length string just splits a string up into individual characters; therefore, in your hash, M will map to o, u will map to s, etc. They appear jumbled up in your output because hashes don't have a definite ordering.
/,/ just splits on a comma. You're creating a hash that maps Mouse=>Jerry to Cat=>Tom and Dog=>Spike to nothing.
$hashdef = "Mouse=>Jerry, Cat=>Tom, Dog=>Spike";
my %hash = eval( "( $hashdef )" );
print $hash{'Mouse'}."\n";
eval executes a string as a Perl expression. This doesn't use split, but I think would be a good way to handle the case outlined in your post of getting a hash from your string, seeing as your string happens to be well formed Perl, so I've added it here.
sub hash2string {
my $href = $_[0];
my $hstring = "";
foreach (keys %{$href}) {
$hstring .= "$_=>$href->{$_}, ";
}
return substr($hstring, 0, -2);
}
sub string2hash {
my %lhash;
my #lelements = split(/, /, $_[0]);
foreach (#lelements) {
my ($skey,$svalue) = split(/=>/, $_);
$lhash{$skey} = $svalue;
}
return %lhash;
}

Parsing files that use synonyms

If I had a text file with the following:
Today (is|will be) a (great|good|nice) day.
Is there a simple way I can generate a random output like:
Today is a great day.
Today will be a nice day.
Using Perl or UNIX utils?
Closures are fun:
#!/usr/bin/perl
use strict;
use warnings;
my #gens = map { make_generator($_, qr~\|~) } (
'Today (is|will be) a (great|good|nice) day.',
'The returns this (month|quarter|year) will be (1%|5%|10%).',
'Must escape %% signs here, but not here (%|#).'
);
for ( 1 .. 5 ) {
print $_->(), "\n" for #gens;
}
sub make_generator {
my ($tmpl, $sep) = #_;
my #lists;
while ( $tmpl =~ s{\( ( [^)]+ ) \)}{%s}x ) {
push #lists, [ split $sep, $1 ];
}
return sub {
sprintf $tmpl, map { $_->[rand #$_] } #lists
};
}
Output:
C:\Temp> h
Today will be a great day.
The returns this month will be 1%.
Must escape % signs here, but not here #.
Today will be a great day.
The returns this year will be 5%.
Must escape % signs here, but not here #.
Today will be a good day.
The returns this quarter will be 10%.
Must escape % signs here, but not here %.
Today is a good day.
The returns this month will be 1%.
Must escape % signs here, but not here %.
Today is a great day.
The returns this quarter will be 5%.
Must escape % signs here, but not here #.
Code:
#!/usr/bin/perl
use strict;
use warnings;
my $template = 'Today (is|will be) a (great|good|nice) day.';
for (1..10) {
print pick_one($template), "\n";
}
exit;
sub pick_one {
my ($template) = #_;
$template =~ s{\(([^)]+)\)}{get_random_part($1)}ge;
return $template;
}
sub get_random_part {
my $string = shift;
my #parts = split /\|/, $string;
return $parts[rand #parts];
}
Logic:
Define template of output (my $template = ...)
Enter loop to print random output many times (for ...)
Call pick_one to do the work
Find all "(...)" substrings, and replace them with random part ($template =~ s...)
Print generated string
Getting random part is simple:
receive extracted substring (my $string = shift)
split it using | character (my #parts = ...)
return random part (return $parts[...)
That's basically all. Instead of using function you could put the same logic in s{}{}, but it would be a bit less readable:
$template =~ s{\( ( [^)]+ ) \)}
{ my #parts = split /\|/, $1;
$parts[rand #parts];
}gex;
Sounds like you may be looking for Regexp::Genex. From the module's synopsis:
#!/usr/bin/perl -l
use Regexp::Genex qw(:all);
$regex = shift || "a(b|c)d{2,4}?";
print "Trying: $regex";
print for strings($regex);
# abdd
# abddd
# abdddd
# acdd
# acddd
# acdddd
Use a regex to match each parenthetical (and the text inside it).
Use a string split operation (pipe delimiter) on the text inside of the matched parenthetical to get each of the options.
Pick one randomly.
Return it as the replacement for that capture.
Smells like a recursive algorithm
Edit: misread and thought you wanted all possibilities
#!/usr/bin/python
import re, random
def expand(line, all):
result = re.search('\([^\)]+\)', line)
if result:
variants = result.group(0)[1:-1].split("|")
for v in variants:
expand(line[:result.start()] + v + line[result.end():], all)
else:
all.append(line)
return all
line = "Today (is|will be) a (great|good|nice) day."
all = expand(line, [])
# choose a random possibility at the end:
print random.choice(all)
A similar construct that produces a single random line:
def expand_rnd(line):
result = re.search('\([^\)]+\)', line)
if result:
variants = result.group(0)[1:-1].split("|")
choice = random.choice(variants)
return expand_rnd(
line[:result.start()] + choice + line[result.end():])
else:
return line
Will fail however on nested constructs

How can I expand a string like "1..15,16" into a list of numbers?

I have a Perl application that takes from command line an input as:
application --fields 1-6,8
I am required to display the fields as requested by the user on command line.
I thought of substituting '-' with '..' so that I can store them in array e.g.
$str = "1..15,16" ;
#arr2 = ( $str ) ;
#arr = ( 1..15,16 ) ;
print "#arr\n" ;
print "#arr2\n" ;
The problem here is that #arr works fine ( as it should ) but in #arr2 the entire string is not expanded as array elements.
I have tried using escape sequences but no luck.
Can it be done this way?
If this is user input, don't use string eval on it if you have any security concerns at all.
Try using Number::Range instead:
use Number::Range;
$str = "1..15,16" ;
#arr2 = Number::Range->new( $str )->range;
print for #arr2;
To avoid dying on an invalid range, do:
eval { #arr2 = Number::Range->new( $str )->range; 1 } or your_error_handling
There's also Set::IntSpan, which uses - instead of ..:
use Set::IntSpan;
$str = "1-15,16";
#arr2 = Set::IntSpan->new( $str )->elements;
but it requires the ranges to be in order and non-overlapping (it was written for use on .newsrc files, if anyone remembers what those are). It also allows infinite ranges (where the string starts -number or ends number-), which the elements method will croak on.
You're thinking of #arr2 = eval($str);
Since you're taking input and evaluating that, you need to be careful.
You should probably #arr2 = eval($str) if ($str =~ m/^[0-9.,]+$/)
P.S. I didn't know about the Number::Range package, but it's awesome. Number::Range ftw.
I had the same problem in dealing with the output of Bit::Vector::to_Enum. I solved it by doing:
$range_string =~ s/\b(\d+)-(\d+)\b/expand_range($1,$2)/eg;
then also in my file:
sub expand_range
{
return join(",",($_[0] .. $_[1]));
}
So "1,3,5-7,9,12-15" turns into "1,3,5,6,7,9,12,13,14,15".
I tried really hard to put that expansion in the 2nd part of the s/// so I wouldn't need that extra function, but I couldn't get it to work. I like this because while Number::Range would work, this way I don't have to pull in another module for something that should be trivial.
#arr2 = ( eval $str ) ;
Works, though of course you have to be very careful with eval().
You could use eval:
$str = "1..15,16" ;
#arr2 = ( eval $str ) ;
#arr = ( 1..15,16 ) ;
print "#arr\n" ;
print "#arr2\n" ;
Although if this is user input, you'll probably want to do some validation on the input string first, to make sure they haven't input anything dodgy.
Use split:
#parts = split(/\,/, $fields);
print $parts[0];
1-6
print $parts[1];
8
You can't just put a string containing ',' in an array, and expect it to turn to elements (except if you use some Perl black magic, but we won't go into that here)
But Regex and split are your friends.