How to get many matched patterns from multiple patterns match in perl - perl

Is there a way to get multi-matched-patterns from "many '|' regex" matching.
Here is my code,
#! /usr/bin/perl
#matches = qw(google intel hp qualcomm app);
$keyword = join('|', #matches);
$string = "hello google app";
#founded = ($string =~ /($keyword)/);
print "Founded keyword is:" . join(" ", #founded);
I hope to get 'google and app', because this keywords are both matched in strings. But How sad, just get 'google'

Simply add a /g modifier to your match:
#found = ($string =~ /($keyword)/g);
You'll get all the matches that way.

I think you are looking for the intersection of two lists:
use Array::Utils qw(:all);
my #matches = qw(google intel hp qualcomm app);
my #find = qw(hello google app);
my #result = intersect(#matches, #find);
print "Founded keyword(s): ", join(" ", #result) . "\n";
This solution uses the Array::Utils module

Related

perl split but keep special separator with second string

I have the following string below that I need to split up but I need to keep the separator with the second portion of the string. So that I end up with:
$a = 'State of mind is primary';
$b = '\098\455\098evertyhing else is secondary.-Eckhart Tolle '
My attempt:
my $string = 'State of mind is primary\098\455\098evertyhing else is secondary. -Eckhart Tolle';
my $separator = '\098\455\098';
my ($a, $b) = split($separator, $string);
print "$a\n";
print "$b\n";
How can I accomplish this with perl split?
Split on the 0-length string that precedes the sequence.
my ($x, $y) = split(/(?=\\098\\455\\098)/, $string);
Use a positive lookahead.
(I've simplified your separator to remove one potential source of confusion)
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $string = 'State of mind is primaryXXXevertyhing else is secondary. -Eckhart Tolle';
my $separator = 'XXX';
my ($first, $second) = split /(?=$separator)/, $string;
say "$first / $second";
And the output:
$ perl split
State of mind is primary / XXXevertyhing else is secondary. -Eckhart Tolle
Don't use split() use pattern matching.
EDIT I guess you really meant \ not octal characters:
my $string = "State of mind is primary\\098\\455\\098\everything else is secondary. -Eckhart Tolle";
my($a, $b) = ($string =~ /^(.+)(\\098\\455\\098.+)$/);
print "A: '$a'\n";
print "B: '$b'\n";
output:
A: 'State of mind is primary'
B: '\098\455\098ertyhing else is secondary. -Eckhart Tolle'

Splitting and printing with Perl

My Perl script is attempting to take in a command line argument that is a file name such as name.txt or hello.txt.exe and parse out the file extension based on the . delimiter, and print only the extension like exe or txt. Here's what I currently have which doesn't print anything and I'm not entirely sure why.
usr/bin/perl -w
use strict;
my ($user_arg) = shift #ARGV;
my ($ext);
if ( ($ext) = $user_arg =~ /(\.[^.].+)$/)
{
print "Ends in ", ($ext) = $user_arg =~ /(\.[^.].+)$/ , "\n";
print "Ends in" , ($ext) = $user_arg =~ /(\.[^.]+)$/, "\n";
}
elsif( ($ext) = $user_arg =~ /(\.[^.]+)$/)
{
print"Ends in " , ($ext), "\n";
}
else
{
print "No Extension";
}
*Updated, now my problem is the first statement will print both conditions if it's something like name.txt it will print .txt twice, where I want it to only print .txt once UNLESS it's name.txt.exe where I'd like it to print .txt.exe then .exe
There's two main issues here:
1) You need to shift off #ARGV
my $arg = shift #ARGV;
2) You need to escape the 'dot'
my #values = split /\./, $user_arg;
Other things...
You usually want to sanitize user input:
die "usage: $0 filename\n" if {some condition}
I think you mean chomp $val; in your foreach.
It wouldn't hurt to be familiar with File::Basename, fileparse could make your life easier. Although it might be overkill here.
UPDATE
You should be able to integrate this yourself. In your case you won't need to loop
over a list of files, you'll just have one.
This doesn't do what you want where it prints "txt.exe", "exe". But you can fine tune this to your liking.
my #file_tests = qw(nosuffix testfile.txt /path/to/file.exe foo.bar.baz);
for my $fullname (#file_tests) {
my #names = split /\./, $fullname;
# shift off the first element, which will
# give you the list of suffixs or an empty list
shift #names;
# you can decide how you want to print this list
# if scalar #names is 0 don't print anything
print "list of suffixes: " . join( ', ', #names ) . "\n"
if scalar(#names) > 0;
}
OUTPUT:
list of suffixes: txt
list of suffixes: exe
list of suffixes: bar, baz

Deparsing/Decomposing - step-by-step this obfuscated perl script

As the title - please can anyone explain how the next scripts works
this prints the text: "Perl guys are smart"
''=~('(?{'.('])##^{'^'-[).*[').'"'.('-[)#{:__({:)[{(-:)^}'^'}>[,[]*&[[[[>[[#[[*_').',$/})')
this prints only "b"
use strict;
use warnings;
''=~('(?{'.('_/).+{'^'/]##_[').'"'.('=^'^'_|').',$/})')
the perl -MO=Deparse shows only this:
use warnings;
use strict 'refs';
'' =~ m[(?{print "b",$/})];
but havent any idea why... ;(
What is the recommended way decomposing like scripts? How to start?
so, tried this:
'' =~
(
'(?{'
.
(
'])##^{' ^ '-[).*['
)
.
'"'
.
(
'-[)#{:__({:)[{(-:)^}' ^ '}>[,[]*&[[[[>[[#[[*_'
)
.
',$/})'
)
several parts are concatenated by .. And the result of the bitwise ^ probably gives the text parts. The:
perl -e "print '-[)#{:__({:)[{(-:)^}' ^ '}>[,[]*&[[[[>[[#[[*_'"
prints "Perl guys are smart" and the first ^ generating "print".
But when, i rewrite it to:
'' =~
(
'(?{'
.
(
'print'
)
.
'"'
.
(
'Perl guys are smart'
)
.
',$/})'
)
My perl told me:
panic: top_env
Strange, first time i saw like error message...
Thats mean: it isn't allowed replace the 'str1' ^ 'str2' with the result, (don't understand why) and why the perl prints the panic message?
my perl:
This is perl 5, version 12, subversion 4 (v5.12.4) built for darwin-multi-2level
Ps: examples are generated here
In the line
.('_/).+{' ^ '/]##_[
when you evaluate ']' ^ '-', the result will be the letter p. ^ is a bitwise string operation, so after that we follow letter by letter to get result string.
Check my script, it works like your example. I hope it will help you.
use v5.14;
# actually we obfuscated print and your word + "
# it looks like that (print).'"'.(yor_word")
my $print = 'print';
my $string = 'special for stackoverflow by fxzuz"';
my $left = get_obfuscated($print);
my $right = get_obfuscated($string);
# prepare result regexp
my $result = "'' =~ ('(?{'.($left).'\"'.($right).',\$/})');";
say 'result obfuscated ' . $result;
eval $result;
sub get_obfuscated {
my $string = shift;
my #letters = split //, $string;
# all symbols like :,&? etc (exclude ' and \)
# we use them for obfuscation
my #array = (32..38, 40..47, 58..64, 91, 93..95, 123..126);
my $left_str = '';
my $right_str = '';
# obfuscated letter by letter
for my $letter (#letters) {
my #result;
# get right xor letters
for my $symbol (#array) {
# prepare xor results
my $result = ord $letter ^ $symbol;
push #result, { left => $result, right => $symbol } if $result ~~ #array;
}
my $rand_elem = $result[rand $#result];
$left_str .= chr $rand_elem->{left};
$right_str .= chr $rand_elem->{right};
}
my $obfuscated = "'$left_str' ^ '$right_str'";
say "$string => $obfuscated";
return $obfuscated;
}
The trick to understanding what's going on here is to look at the string being constructed by the XORs and concatenations:
(?{print "Perl guys are smart",$/})
This is an experimental regular expression feature of the form (?{ code }). So what you see printed to the terminal is the result of
print "Perl guys are smart",$/
being invoked by ''=~.... $/ is Perl's input record separator, which by default is a newline.

Reformulate a string query in perl

How do i reformulate a string in perl?
For example consider the string "Where is the Louvre located?"
How can i generate strings like the following:
"the is Louvre located"
"the Louvre is located"
"the Louvre located is"
These are being used as queries to do a web search.
I was trying to do something like this:
Get rid of punctuations and split the sentence into words.
my #words = split / /, $_[0];
I don't need the first word in the string, so getting rid of it.
shift(#words);
And then i need move the next word through out the array - not sure how to do this!!
Finally convert the array of words back to a string.
How can I generate all permutations of an array in Perl?
Then use join to glue each permutation array back together into a single string.
Somewhat more verbose example:
use strict;
use warnings;
use Data::Dumper;
my $str = "Where is the Louvre located?";
# split into words and remove the punctuation
my #words = map {s/\W+//; $_} split / /, $str;
# remove the first two words while storing the second
my $moving = splice #words, 0 ,2;
# generate the variations
my #variants;
foreach my $position (0 .. $#words) {
my #temp = #words;
splice #temp, $position, 0, $moving;
push #variants, \#temp;
}
print Dumper(\#variants);
my #head;
my ($x, #tail) = #words;
while (#tail) {
push #head, shift #tail;
print join " ", #head, $x, #tail;
};
Or you can just "bubble" $x through the array: $words[$n-1] and words[$n]
foreach $n (1..#words-1) {
($words[$n-1, $words[$n]) = ($words[$n], $words[$n-1]);
print join " ", #words, "\n";
};

How can I split a pipe-separated string in a list?

Here at work, we are working on a newsletter system that our clients can use. As an intern one of my jobs is to help with the smaller pieces of the puzzle. In this case what I need to do is scan the logs of the email server for bounced messages and add the emails and the reason the email bounced to a "bad email database".
The bad emails table has two columns: 'email' and 'reason'
I use the following statement to get the information from the logs and send it to the Perl script
grep " 550 " /var/log/exim/main.log | awk '{print $5 "|" $23 " " $24 " " $25 " " $26 " " $27 " " $28 " " $29 " " $30 " " $31 " " $32 " " $33}' | perl /devl/bademails/getbademails.pl
If you have sugestions on a more efficient awk script, then I would be glad to hear those too but my main focus is the Perl script. The awk pipes "foo#bar.com|reason for bounce" to the Perl script. I want to take in these strings, split them at the | and put the two different parts into their respective columns in the database. Here's what I have:
#!usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbpath = "dbi:mysql:database=system;host=localhost:3306";
my $dbh = DBI->connect($dbpath, "root", "******")
or die "Can't open database: $DBI::errstr";
while(<STDIN>) {
my $line = $_;
my #list = # ? this is where i am confused
for (my($i) = 0; $i < 1; $i++)
{
if (defined($list[$i]))
{
my #val = split('|', $list[$i]);
print "Email: $val[0]\n";
print "Reason: $val[1]";
my $sth = $dbh->prepare(qq{INSERT INTO bademails VALUES('$val[0]', '$val[1]')});
$sth->execute();
$sth->finish();
}
}
}
exit 0;
Something like this would work:
while(<STDIN>) {
my $line = $_;
chomp($line);
my ($email,$reason) = split(/\|/, $line);
print "Email: $email\n";
print "Reason: $reason";
my $sth = $dbh->prepare(qq{INSERT INTO bademails VALUES(?, ?)});
$sth->execute($email, $reason);
$sth->finish();
}
You might find it easier to just do the whole thing in Perl. "next unless / 550 /" could replace the grep and a regex could probably replace the awk.
I'm not sure what you want to put in #list? If the awk pipes one line per entry, you'll have that in $line, and you don't need the for loop on the #list.
That said, if you're going to pipe it into Perl, why bother with the grep and AWK in the first place?
#!/ust/bin/perl -w
use strict;
while (<>) {
next unless / 550 /;
my #tokens = split ' ', $_;
my $addr = $tokens[4];
my $reason = join " ", #tokens[5..$#tokens];
# ... DBI code
}
Side note about the DBI calls: you should really use placeholders so that a "bad email" wouldn't be able to inject SQL into your database.
Have you considered using App::Ack instead? Instead of shelling out to an external program, you can just use Perl instead. Unfortunately, you'll have to read through the ack program code to really get a sense of how to do this, but you should get a more portable program as a result.
Why not forgo the grep and awk and go straight to Perl?
Disclaimer: I have not checked if the following code compiles:
while (<STDIN>) {
next unless /550/; # skips over the rest of the while loop
my #fields = split;
my $email = $fields[4];
my $reason = join(' ', #fields[22..32]);
...
}
EDIT: See #dland's comment for a further optimisation :-)
Hope this helps?
my(#list) = split /\|/, $line;
This will generate more than two entries in #list if you have extra pipe symbols in the tail of the line. To avoid that, use:
$line =~ m/^([^|]+)\|(.*)$/;
my(#list) = ($1, $2);
The dollar in the regex is arguably superfluous, but also documents 'end of line'.