preg_match and preg_replace in string php - preg-replace

$string = "Lorem Ipsum is #simply# dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard #dummy# text ever since the 1500s, when an unknown printer took a galley of #type1# and scrambled it to #make# a #type2# specimen book. It has survived not only #five# centuries, but also the #leap# into electronic typesetting, remaining essentially unchanged";
I have set of hard code words like "#simply# | #dummy# | #five# | #type1#"
What I expect as a output is:
If hard code words is found in $string it should get highlighted in black. like "<strong>...</strong>".
If a word in $string is within #...# but not available in the hard code word list then those words in the string should get highlighted in red.
please note that even though we have #type1# in hard code words, if $string contains #type2# or #type3# it should also get highlighted .
for doing this I have tried as below
$pattern = "/#(\w+)#/";
$replacement = '<strong>$1</strong>';
$new_string = preg_replace($pattern, $replacement, $string);
This gets me all the words which are within #..# tags highlighted.
I'm bad in preg_ can someone help. thanks in advance.

You have to use preg_replace_callback that takes a callback function as replacement parameter. In the function you can test which capture group has succeeded and return the string according to.
$pattern = '~#(?:(simply|dummy|five|type[123])|(\w+))#~';
$replacement = function ($match) {
if ( empty($match[2]) )
return '<strong>' . $match[1] . '</strong>';
else
return '<strong style="color:red">' . $match[2] . '</strong>';
};
$result = preg_replace_callback($pattern, $replacement, $text);

Not sure I really understand your needs, but what about:
$pat = '#(simply|dummy|five|type\d)#';
$repl = '<strong>$1</strong>';
$new_str = preg_replace("/$pat/", $repl, $string);

Related

Counting occurance of the word "the" giving me all different answers

So I have a simple script to read in a text file from the command line, and I want to count the number of "the"s but I've been getting weird numbers.
while(<>){
$wordcount= split(/\bthe\b/, $_);}
print "\"the\" occurs $wordcount times in $ARGV";
So using that I get 10 occurrences, but if I use /\bthe\b/i I get 12. /\Bthe\b/ gives me 6 I believe. There are 11 occurrences in my test txt. Am I just an idiot? Should $wordcount just be started at 1 or 0? Also is it bad practice to use split this way? The code works fine for actually counting the words, but not when counting an exact string. New to perl so any and all abuse is appreciated. Thanks
Edit: also I know it's not adding, but now I get that $wordcount is being treated more like an array, so it worked for a previous iteration, though it was definitely poor form.
Use a regex in a list context to pull the count of matches:
my $wordcount = 0;
while (<>) {
$wordcount += () = /\bthe\b/g;
}
print qq{"the" occurs $wordcount times in $ARGV\n};
Reference: perlfaq4 - How can I count the number of occurrences of a substring within a string?
split splits the string into a list based on the regex provided. Your count comes from the fact you've put split in scalar context. From perldoc -f split:
split Splits the string EXPR into a list of strings and returns the
list in list context, or the size of the list in scalar context.
Given the string "The quick brown fox jumps over the lazy dog" I'd expect your $wordcount to be 2, which would be correct.
The quick brown fox jumps over the lazy dog
^^^============================^^^========= -> two fields
However if you had "A bird and the quick brown fox jumps over the lazy dog" you'd end up with 3 which is not correct.
A bird and the quick brown fox jumps over the lazy dog
===========^^^============================^^^========= -> three fields
First of all you absolutely would want \b as that matches a word boundary. \B matches things that aren't word boundaries so you'd be matching any word that contained "the" instead of the word "the".
Secondly you just want to count the occurrences - you do that by counting the matches of the entire string
$wordcount = () = $string =~ /\bthe\b/gi
$wordcount becomes the list in scalar context, () is a list you aren't actually capturing since you don't want the matches. $string is the string to match against. You're matching "the" at word boundaries and gi is the whole string (global), case insensitive.
With the /i flag, 'The' would be included, but not without it.
\B is a non-word boundary, so would only find things like "clothe", and not "the".
Yes, it is bad practice to use split that way. Properly, if you just want a count, do this:
$wordcount = () = split ...;
split in scalar context does something that seemed like a good idea originally, but doesn't seem so good anymore, so avoid it. The above incantation calls it in list context but assigns the number of elements found to $wordcount.
But the elements produced by splitting on the aren't what you want; you want a count of times the was found. So do (possibly with /ig instead of just /g):
$wordcount = () = /\bthe\b/g;
Note that you probably want +=, not =, to get a total for all lines.
sample.txt
Ajith
kumar
Ajith
my name is Ajith and Ajith
lastname is kumar
code
use Data::Dumper;
print "Enter your string = ";
my $input = <>; ## User input
chomp $input; ## The chomp() function will remove (usually) any newline character from the end of a string
my %count;
open FILE, "<sample.txt" or die $!; ## To read the data from a file
my #data = <FILE>;
for my $d (#data) {
my #array = split ('\s', $d); ##To split the more than one word in a line
for my $a (#array) {
$count{$a}++; ## Counter
}
}
print Dumper "Result: " . $count{$input};
The above code get the input vai command prompt, then search the word into the given text file "sample.txt", then display the output how many times it appears in the text file (sample.txt)
Note: User Input must be in "Case sensitive ".
INTPUT from the USER
Enter your string = Ajith
OUTPUT
$VAR1 = 'Result: 4';
print "Enter the string: ";
chomp($string = <>);
die "Error opening file" unless(open(fil,"filename.txt"));
my #file = <fil>;
my #mt;
foreach (#file){
#s = map split,$_;
push(#mt,#s);
}
$s = grep {m/$string/gi} #mt;
print "Total no., of $string is:: $s\n";
In this give the output what you expect.

simple preg_replace rule that I can't get to work

Can't understand how to do this preg_replace, haven't tried as don't know what to try on it, too hard to understand..
index-D.html where d is a digit from 0-99999
how to replace occurrences of that string, index-D.html to empty
The manual is pretty clear and provides examples as well:
$string = "index-D.html where d is a digit from 0-99999";
$pattern = "index-D.html";
$new_string = preg_replace($pattern, "", $string);

How can I replace a particular character with its upper-case counterpart?

Consider the following string
String = "this is for test. i'm new to perl! Please help. can u help? i hope so."
In the above string after . or ? or ! the next character should be in upper case. how can I do that?
I'm reading from text file line by line and I need to write modified data to another file.
your help will be greatly appreciated.
you could use a regular expression
try this:
my $s = "...";
$s =~ s/([\.\?!]\s*[a-z])/uc($1)/ge; # of course $1 , thanks to plusplus
the g-flag searches for all matches and the e-flag executes uc to convert the letter to uppercase
Explanation:
with [.\?!] you search for your punctuation marks
\s* is for whitespaces between the marks and the first letter of your next word and
[a-z] matches on a single letter (in this case the first one of the next word
the regular expression mentioned above searches with these patterns for every appearance of a punctuation mark followed by (optional) whitespaces and a letter and replaces it with the result of uc (which converts the match to uppercase).
For example:
my $s = "this is for test. i'm new to perl! Please help. can u help? i hope so.";
$s =~ s/([\.\?!]\s*[a-z])/uc(&1)/ge;
print $s;
will find ". i", "! P", ". c" and "? i" and replaces then, so the printed result is:
this is for test. I'm new to perl! Please help. Can u help? I hope so.
You can use the substitution operator s///:
$string =~ s/([.?!]\s*\S)/ uc($1) /ge;
Here's a split solution:
$str = "this is for test. im new to perl! Please help. can u help? i hope so.";
say join "", map ucfirst, split /([?!.]\s*)/, $str;
If all you are doing is printing to a new file, you don't need to join the string back up. E.g.
while ($line = <$input>) {
print $output map ucfirst, split /([?!.]\s*)/, $line;
}
edit - completely misread the question, thought you were just asking to uppercase the is for some reason, apologies for any confusion!
as the answers so far state, you could look at regular expressions, and the substitution operator (s///). No-one has mentioned the \b (word boundary) character though, which may be useful to find the single is - otherwise you are going to have to keep adding punctuation characters that you find to the character class match (the [ ... ]).
e.g.
my $x = "this is for test. i'm new to perl! Please help. can u help? i hope so. ".
\"i want it to work!\". Dave, Bob, Henry viii and i are friends. foo i bar.";
$x =~ s/\bi\b/I/g; # or could use the capture () and uc($1) in eugene's answer
gives:
# this is for test. I'm new to perl! Please help. can u help? I hope so.
# "I want it to work!". Dave, Bob, Henry viii and I are friends. foo I bar.

How can I wrap lines to 45 characters in a Perl program?

I have this text I am writing in a Perl CGI program:
$text = $message;
#lines = split(/\n/, $text);
$lCnt .= $#lines+1;
$lineStart = 80;
$lineHeight = 24;
I want to force a return after 45 characters. How do I do that here?
Thanks in advance for your help.
Look at the core Text::Wrap module:
use Text::Wrap;
my $longstring = "this is a long string that I want to wrap it goes on forever and ever and ever and ever and ever";
$Text::Wrap::columns = 45;
print wrap('', '', $longstring) . "\n";
Check out Text::Wrap. It will do exactly what you need.
Since Text::Wrap for some reason doesn't work for the OP, here is a solution using a regex:
my $longstring = "lots of text to wrap, and some more text, and more "
. "still. thats right, even more. lots of text to wrap, "
. "and some more text.";
my $wrap_at = 45;
(my $wrapped = $longstring) =~ s/(.{0,$wrap_at}(?:\s|$))/$1\n/g;
print $wrapped;
which prints:
lots of text to wrap, and some more text, and
more still. thats right, even more. lots of
text to wrap, and some more text.
The Unicode::LineBreak module can do more sophisticated wrapping of non-English text (Especially East Asian scripts) than Text::Wrap, and has some nice features like optionally being able to recognize URIs and avoid splitting them.
Example:
#!/usr/bin/env perl
use warnings;
use strict;
use Unicode::LineBreak;
my $longstring = "lots of text to wrap, and some more text, and more "
. "still. thats right, even more. lots of text to wrap, "
. "and some more text.";
my $wrapper = Unicode::LineBreak->new(ColMax => 45, Format => "NEWLINE");
print for $wrapper->break($longstring);

Why can't I match my string from standard input in Perl?

Why will my script not work correctly?
I follow a YouTube video and worked for the guy.
I am running Perl on Windows using ActiveState ActivePerl 5.12.2.1202
Here is my tiny tiny code block.
print "What is your name?\n";
$name = <STDIN>;
if ($name eq "Jon") {
print "We have met before!\n";
} else {
print "We have not met before.\n";
}
The code automatically jumps to the else statement and does not even check the if statement.
The statement $name = <STDIN>; reads from standard input and includes the terminating newline character "\n". Remove this character using the chomp function:
print "What is your name?\n";
$name = <STDIN>;
chomp($name);
if ($name eq "Jon") {
print "We have met before!\n";
} else {
print "We have not met before.\n";
}
The trick in programming is to know what your data are. When something's not acting like you expect, look at the data to see if they are what you expect. For instance:
print "The name is [$name]\n";
You put the braces around it so you can see any extra whitespace that might be there. In this case, you would have seen:
The name is [Jon
]
That's your clue that there is extra stuff. Since the eq has to match exactly, it fails to match.
If you're just starting with Perl, try Learning Perl. It's much better than random videos from YouTube. :)
When you read the name standard input as $name = <STDIN>;
$name will have a trailing newline. So if I enter foo , $name will actually have foo\n.
To get rid of this newline you an make use of the chomp function as:
chomp($name = <STDIN>);