How to matching sequential words in a sentence using Perl?

How to matching sequential words in a sentence using Perl? - perl

Is there better way to match words other than this method, im trying to find the word in the array that occur in any of the sentences.
my $count = 0;
my #strings = (
"i'm going to find the occurrence of two words going if possible",
"i'm going to find the occurrence of two words if impossible",
"to find a solution to this problem",
"i will try my best for a way to match this problem"
);
#neurot = qw(going match possible);
my $com_neu = '\b'.join('\b|\b', #neurot).'\b';
foreach my $sentence (#string){
#l = $sentence =~ /($com_neu)/gi;
foreach my $list (#l){
if($list =~ m/\w['\w-]*/){
print $list;
$count++;
}
}
print $count;
}
Output:
String 1: going going possible
String 2: going
String 3:
String 4: match
please help me with a faster way.
Thanks.

Another approach could be to use hash to match the words:
my %neurot_hash = map { lc($_) => 1 } qw(going match possible);
for my $sentence (#strings) {
for my $found (grep { $neurot_hash{ lc($_) } } $sentence =~ /\w['\w-]*/gi) {
print $found, " ";
}
print "\n";
}
For data you provided this method is ~ 7% faster. But keep in mind that the data set is very small, so YMMV.

what about the 'smart-match' operator?
foreach my $elem (#neurot){
if(/$elem/i ~~ #strings){
print "Found $elem\n";
}
}

The same as bvr answer, but perhaps cleaner
my %neurot_hash = map { lc($_) => 1 } qw(going match possible);
for my $sentence (#strings) {
my #words = split /[^\w']/, $sentence;
#I am not sure if you want to take "i'm" as a separate word.
#Apparently, stackoverflow does not like '.
my #found = grep { exists $neurot_hash{ lc($_) } } #words;
print join (" ", #found);
print "\n";
}

Related

Why do you need to this line of code?

Here is the code
#!/usr/bin/perl
my ($items,
#aryitems,
#aryitems2,
$search,
$size,
$num);
#aryitems2=('Chattahoochee','committee','bookkeeper'
,'millennium','cappuccino','Tattle','Illiad','Mississippi',
'Tennessee');
$size=#aryitems2;
print "The size of the array is $size\n";
print "Enter the string to search:";
chomp ($search=<STDIN>);
foreach (#aryitems2)
{
$num=0;
$pos=0;
print "The word is $_\n";
if (/$search/i)
{
print "The pattern is '$search' and found in the word $_\n";
while ($pos<(length $_) and $pos!=-1)
{
$pos=index ($_,$search,$pos);
if ($pos!=-1)
{
$pos++;
$num++;
}
}
print "the number of times '$search' is found in this word is $num\n";
}
else
{
print "the pattern is '$search' and is not found in $_\n";
}
}
The part I don't get is
$pos=index ($_,$search,$pos);
What is the purpose of this code? How come it's not
$pos=index ($_,$pos);
or
$pos=index($search,$pos);
etc...
Why do you need it?

You really can't get around specifying both the string you want to find ($search), and the string in which to look ($_). So, at a minimum, you need
$pos = index($_, $search);
Why wasn't that used? Because that finds the first match, but the goal is to find all matches. index will start looking at the position given by the third parameter if give, allowing the loop to find every instance of $search in $_.
Note that because $pos++; was used instead of $pos += length($search);, overlapping matches are possible. For example, if you were search for abab in abababab, your algorithm would find 3 matches instead of 2.
Cleaned up code:
#!/usr/bin/perl
use strict;
use warnings qw( all );
use feature qw( say );
my #words = qw( Chattahoochee committee bookkeeper millennium cappuccino Tattle Illiad Mississippi Tennessee );
my ($search) = #ARGV
or die("usage\n");
for my $word (#words) {
my $count = 0;
my $pos = 0;
while (1) {
$pos = index($word, $search, $pos);
last if $pos < 0;
++$count;
++$pos;
}
say "$_ contains $count instances of $search";
}
Using match operator instead of index:
#!/usr/bin/perl
use strict;
use warnings qw( all );
use feature qw( say );
my #words = qw( Chattahoochee committee bookkeeper millennium cappuccino Tattle Illiad Mississippi Tennessee );
my ($search) = #ARGV
or die("usage\n");
for my $word (#words) {
my $count = 0;
++$count while $word =~ /\Q$search/g;
say "$_ contains $count instances of $search";
}

It's there so you can count the number of times the pattern is found in the word. It keeps track of where the pattern was previously found so that subsequent searches start further into the word and don't find occurrences of the pattern that have already been discovered.

Perl Error: Argument isn't numeric in array or hash lookup

I was writing a simple program to match words to a regex pattern. But I keep receiving the error above. This is my code:
my #words = ("Ordinary", "order", "afford", "cordford", "'ORD airport'");
foreach my $index (#words) {
if ($words[$index] =~ m/ord/) {
print "match\n";
} else {print "no match\n";}
}
Error I received:
Argument "Ordinary" isn't numeric in array or hash lookup at test.pl line 6.
Argument "order" isn't numeric in array or hash lookup at test.pl line 6.
Argument "afford" isn't numeric in array or hash lookup at test.pl line 6.
Argument "cordford" isn't numeric in array or hash lookup at test.pl line 6.
Argument "'ORD airport'" isn't numeric in array or hash lookup at test.pl line 6.
no matchno matchno matchno matchno match
Can anyone explain to me what's causing the error and why?

This is the code that you show (improved a little)
my #words = ( 'Ordinary', 'order', 'afford', 'cordford', q{'ORD airport'} );
for my $index ( #words ) {
if ( $words[$index] =~ /ord/ ) {
print "match\n";
}
else {
print "no match\n";
}
}
}
This for loop will set $index to each value in the #words array. So, for instance, the first time the loop is executed $index will be set to Ordinary; the second time it will be set to order etc.
Naming it $index shows clearly that you expected it to contain all the indices for #words. You can do that, like this
for my $index ( 0 .. $#words ) { ... }
and your program will work fine if you make just that change. The output is
no match
match
match
match
no match
But you had the right idea from the start. Most often an array is just a list of values and the indices have no relevance. That applies to your case, and you can write
for my $word ( #words ) {
if ( $word =~ m/ord/ ) {
print "match\n";
}
else {
print "no match\n";
}
}
Or using Perl's default variable $_ it can be written
for ( #words ) {
if ( m/ord/ ) {
print "match\n";
}
else {
print "no match\n";
}
}
or even just
print /ord/ ? "match\n" : "no match\n" for #words;
Every example above is exactly equivalent and so produces identical output

The reason is your $index will produce the elements of an array not the index values.
It should be foreach my $index (0..$#words) now $index will produce the index of an array in every iteration.
use strict;
use warnings;
my #words = ("Ordinary", "order", "afford", "cordford", "'ORD airport'");
foreach my $index (0..$#words) {
if ($words[$index] =~ m/ord/) {
print "match\n";
}
else {print "no match\n";}
}
Or else. simply check the condition with $index.
use strict;
use warnings;
my #words = ("Ordinary", "order", "afford", "cordford", "'ORD airport'");
foreach my $index (#words) {
if ($index =~ m/ord/) {
print "match\n";
}
else {print "no match\n";}
}

This is array lookup
$words[$index];
If it was a hash it would be
$words{$index};
Arrays expect integer indexes but you're using strings that look nothing like integers.
If you are iterating over arrays in Perl you don't need the index..
#!/usr/bin/perl
use strict;
use warnings;
my #words = ("Ordinary", "order", "afford", "cordford", "'ORD airport'");
foreach my $word (#words) {
if($word =~ m/ord/) {
print "$word match\n";
} else {
print "$word no match\n";
}
}
Note. I've used foreach because you see it in more language you could also use for
You can also try something a little bit alternative, note this won't end but it's worth studying ie
#!/usr/bin/perl
use strict;
use warnings;
my #words = ("Ordinary", "order", "afford", "cordford", "'ORD airport'");
my $iterator = sub {
my $item = shift(#words);
push(#words, $item);
return $item;
};
while(my $item = $iterator->()) {
print("$item\n");
}
I do love Perl.

How to skip splitting for some part of the line

Say I have a line lead=george wife=jane "his boy"=elroy. I want to split with space but that does not include the "his boy" part. I should be considered as one.
With normal split it is also splitting "his boy" like taking "his" as one and "boy" as second part. How to escape this
Following this i tried
split " ", $_
Just came to know that this will work
use strict; use warnings;
my $string = q(hi my name is 'john doe');
my #parts = $string =~ /'.*?'|\S+/g;
print map { "$_\n" } #parts;
But it does not looks good. Any other simple thing with split itself?

You could use Text::ParseWords for this
use Text::ParseWords;
$list = "lead=george wife=jane \"his boy\"=elroy";
#words = quotewords('\s+', 0, $list);
$i = 0;
foreach (#words) {
print "$i: <$_>\n";
$i++;
}
ouput:
0: <lead=george>
1: <wife=jane>
2: <his boy=elroy>

sub split_space {
my ( $text ) = #_;
while (
$text =~ m/
( # group ($1)
\"([^\"]+)\" # first try find something in quotes ($2)
|
(\S+?) # else minimal non-whitespace run ($3)
)
=
(\S+) # then maximum non-whitespace run ($4)
/xg
) {
my $key = defined($2) ? $2 : $3;
my $value = $4;
print( "key=$key; value=$value\n" );
}
}
split_space( 'lead=george wife=jane "his boy"=elroy' );
Outputs:
key=lead; value=george
key=wife; value=jane
key=his boy; value=elroy

PP posted a good solution. But just to make it sure, that there is a cool other way to do it, comes my solution:
my $string = q~lead=george wife=jane "his boy"=elroy~;
my #split = split / (?=")/,$string;
my #split2;
foreach my $sp (#split) {
if ($sp !~ /"/) {
push #split2, $_ foreach split / /, $sp;
} else {
push #split2,$sp;
}
}
use Data::Dumper;
print Dumper #split2;
Output:
$VAR1 = 'lead=george';
$VAR2 = 'wife=jane';
$VAR3 = '"his boy"=elroy';
I use a Lookahead here for splitting at first the parts which keys are inside quotes " ". After that, i loop through the complete array and split all other parts, which are normal key=values.

You can get the required result using a single regexp, which extract the keys and the values and put the result inside a hash table.
(\w+|"[\w ]+") will match both a single and multiple word in the key side.
The regexp captures only the key and the value, so the result of the match operation will be a list with the following content: key #1, value #1, key #2, value#2, etc.
The hash is automatically initiated with the appropriate keys and values, when the match result is assigned to it.
here is the code
my $str = 'lead=george wife=jane "hello boy"=bye hello=world';
my %hash = ($str =~ m/(?:(\w+|"[\w ]+")=(\w+)(?:\s|$))/g);
## outputs the hash content
foreach $key (keys %hash) {
print "$key => $hash{$key}\n";
}
and here is the output of this script
lead => george
wife => jane
hello => world
"hello boy" => bye

Parsing a string into a hash structure in perl

I have the following string:
$str = "list
XYZ
status1 : YES
value1 : 100
status2 : NO
value2 : 200
Thats all";
I want to convert it into a hash using a function which takes this string as input and returns a hash with status1 as key and YES as value for example.
How to do so?
And how to reference the returned hash?

Like always, there's more than one way to do it. Here come five.
Pure regular expressions (YEAH!)
I think this is the coolest one. The regex returns a list of all captures which is exactly the list we want to initialize the hash with:
my %regex = $str =~ /(\S+)\s*:\s*(\S+)/g;
Iterative
This is the most straightforward way for most programmers, I think:
my #lines = split /\R/ => $str;
my %iterative = ();
for (#lines) {
next unless /(\S+)\s*:\s*(\S+)/;
$iterative{$1} = $2;
}
Nothing to explain here. I first split the string in lines, then iterate over them, leaving out lines that don't look like foo : bar. Done.
List processing
Writing everything as a big list expression feels a little bit hackish, but maybe this is interesting to learn more ways to express stuff:
my %list = map { /(\S+)\s*:\s*(\S+)/ and $1 => $2 }
grep { /:/ }
split /\R/ => $str;
Read from right to left: Like in the example above we start with splitting the string in lines. grep filters the lines for : and in the final map I transform matching line strings in a list of length two, with a key and a value.
List reducing
Non-trivial use-cases of List::Util's reduce function are very rare. Here's one, based on the list approach from above, returning a hash reference:
my $reduced = reduce {
$a = { $a =~ /(\S+)\s*:\s*(\S+)/ } unless ref $a;
$a->{$1} = $2 if $b =~ /(\S+)\s*:\s*(\S+)/;
return $a;
} grep { /:/ } split /\R/ => $str;
State machine
Here's a funny one with regex usage for white-space separation only. It needs to keep track of a state:
# preparations
my $state = 'idle';
my $buffer = undef;
my %state = ();
my #words = split /\s+/ => $str;
# loop over words
for my $word (#words) {
# last word was a key
if ($state eq 'idle' and $word eq ':') {
$state = 'got_key';
}
# this is a value for the key in buffer
elsif ($state eq 'got_key') {
$state{$buffer} = $word;
$state = 'idle';
$buffer = undef;
}
# remember this word
else {
$buffer = $word;
}
}

Just for fun (note that I recommend using one of memowe's) here is one that (ab)uses the YAML:
#!/usr/bin/env perl
use strict;
use warnings;
use YAML;
my $str = "list
XYZ
status1 : YES
value1 : 100
status2 : NO
value2 : 200
Thats all";
$str = join "\n", grep { /:/ } split "\n", $str;
my $hash = Load "$str\n";

#!/usr/bin/perl
use warnings;
$\="\n";
sub convStr {
my $str = $_[0];
my %h1=();
while ($str =~m/(\w+)\s+:\s+(\w+)/g) {
$h1{$1} =$2;
}
return \%h1;
}
my $str = "list
XYZ
status1 : YES
value1 : 100
status2 : NO
value2 : 200
Thats all";
my $href=convStr($str);
foreach (keys(%$href)) {
print $_ , "=>", $href->{$_};
}
On running this, I get:
status2=>NO
value1=>100
status1=>YES
value2=>200

my %hhash;
my #lines = split /\s+\n/, $str;
foreach (#lines)
{
$_=~s/^\s+//g;
if(/:/)
{
$key=(split(/:/))[0];
$value=(split(/:/))[1];
$hhash{$key}=$value;
}
}

perl search sentence for keywords

If I want to find a keyword in a sentence using Perl I have something like this:
foreach $line (#lines)
{
if ($line =~/keyword/)
{
print $line;
}
}
If I want to see if more keywords are in the sentence how should I change the matching?

There are several solutions. The easiest is to use something like /keyword.*keyword/.
When you want to count number of the keywords in a string (not simply check if there two keywords) you can do something like:
for(#lines){
my $n = 0;
$n++ while /(keyword)/g;
print if ($n>2);
}
By the way, your code can be more concise:
for (#lines) {
print if /keyword/;
}
That is the same.

If you want to process each match of all matches (g modifier):
my $number_of_matches = 0;
foreach $line (#lines)
{
while ( $line =~ m/keyword/g )
{
do_something_you_need();
$number_of_matches++;
}
}

my #words=map {split / /;} #lines;
foreach my $el(#keywords) {
#match=grep {$el eq $_} #words;
}

Do you want to see if the sentence contains other (different) keywords, or do you want to check whether it contains the same keyword multiple times.
For the first, you can write
if ($line =~ /keyword1|keyword2|keyword3/) { ... }
and for the second, it looks like this
my $n = () = $line =~ /keyword/g;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to matching sequential words in a sentence using Perl? - perl

what about the 'smart-match' operator? foreach my $elem (#neurot){ if(/$elem/i ~~ #strings){ print "Found $elem\n"; } }

Related

Why do you need to this line of code?

Perl Error: Argument isn't numeric in array or hash lookup

How to skip splitting for some part of the line

Parsing a string into a hash structure in perl

perl search sentence for keywords

Categories

Resources