perl get the first matched word in a line - perl

Sorry im new in perl and cannot find a similar answer.
html file
<div class="user_rating">
.
.
<span class="genre">
.
.
.
<span class="genre">
.
.
.
<span class="genre">
.
.
.
<span class="genre">
perl file
$content =~ /<div class="user_rating">(.*)<span class="genre">/gs;
$empty = $1;
this $empty variable contains information from <div class="user_rating"> to the last <span class="genre">.
But i just want the information from <div class="user_rating"> to the first <span class="genre">.
how should i modify my code? i know it is a regular expression problem.
Any help plz...

Modify your regexp, because .* is greedy.
$content =~ /<div class="user_rating">(.*?)(<span class="genre">){1}/gs;

Related

Break lines in powershell array and then convert it to html?

I swear I've been searching a lot and still haven't found any solution to this thing.
What I do is basically converting arrays to HTML but I can't find any way to break the words of the array in new lines.
For example:
$Member.ExpireOn=((Invoke-SqliteQuery -SQLiteConnection $C -Query "SELECT expireon FROM exceptions_test WHERE (identity='$User')").ExpireOn -join "`r`n")
$Member.ExpireOn is basically this:
31/01/2019 00:00:00 31/03/2019 00:00:00 31/03/2019 00:00:00
so even if I join it with -join "rn" or with <br> I can't find any way to have a line for each element in the array.
I need to do it because after that I print the whole array in a HTML file and this is what I get:
instead of having a <br> between those words.
Hopefully somebody can help me :)
even if I join it with -join "`r`n"
Whitespace in HTML is insignificant, so you cannot force line breaks this way; they will show in the HTML markup (source code), but not when rendered.
or with <br>
The problem is that ConvertTo-Html - which I assume you're using - escapes any HTML markup in the input objects' property values (it assumes you want to use the values verbatim), so you cannot pass <br> through in order to make a single table cell use multiple line - you'll see literal <br> strings in the table, because ConvertTo-Html has escaped them as <br>.
A quick and dirty workaround would be to manually convert the escaped <br> elements back to their HTML form:
# Sample input objects
$o = [pscustomobject] #{
Foo = 'Cert A'
# Join the array elements with <br>
ExpireOn = (Get-Date), (Get-date).AddDays(1) -join '<br>'
}, [pscustomobject] #{
Foo = 'Cert B'
ExpireOn = (Get-Date).AddDays(2), (Get-date).AddDays(3) -join '<br>'
}
# Convert *escaped* <br> elements back to literal '<br>'
# NOTE: This will replace *all* instances of '<br>' in the
# document text, wherever it may occur.
($o | ConvertTo-Html) -replace '<br>', '<br>'
This yields the following, showing that the <br> was effectively passed through, which should make the input date values render on individual lines:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>HTML TABLE</title>
</head><body>
<table>
<colgroup><col/><col/></colgroup>
<tr><th>Foo</th><th>ExpireOn</th></tr>
<!-- Note the <br> elements -->
<tr><td>Cert A</td><td>2/5/2020 12:51:45 PM<br>2/6/2020 12:51:45 PM</td></tr>
<tr><td>Cert B</td><td>2/7/2020 12:51:45 PM<br>2/8/2020 12:51:45 PM</td></tr>
</table>
</body></html>
In Chrome on my Mac, this renders as follows:
If you need a more robust solution, you'll have to use an HTML parser - see this answer.

remove if is not in front one of this character !?:;%

I would like to remove all if is not in front one of this character !?:;% with preg_replace ( I supose ) .
<div> Hello !
Am I 100 % clear ? </div>
It should give me
<div>Hello ! Am I 100 % clear ?</div>
Thanks in advance
Use negative lookahead:
$str = preg_replace('/ (?![!?:;%])/', '', $str);

Convert \x3c/div\x3e to <div> using perl

I have a string which has characters like \x3c/div\x3e , i am trying to convert this to <div> is there any module which helps to solve this issue. I have checked use HTML::Entities but couldn't solve the issue, i am in need of some suggestion.
One method:
my $str = '\x3c/div\x3e';
$str =~ s{\\x([[:xdigit:]]{2})}{chr hex $1}eg;
print $str;
Outputs
<div>

preg_replace new statement to find variables

Hi at the moment I use preg_replace to replace a variable like "$money" with a text. Now I have changed my vars to "%%money%%" what I must changed in the preg_replace statement?
preg_replace("#\\$" . $var . "([^a-zA-Z_0-9\x7f-\xff]|$)#", $value . "\\1", $text);
How about:
preg_replace("#%%" . $var . "([^a-zA-Z_0-9\x7f-\xff]?)%%$#", $value . "$1", $text);

Perl get text between tags

I tried so many codes that I found on internet but none of them would work.
I have a HTML code something like this.
<div class="usernameHolder">Username: user123</div>
what I want is get the text user123 from this line of code, of course this code is with the rest of the HTML content (an HTML page) Can anyone point me to the right direction?
$text = #source=~ /Username:\s+(.*)\s+</;
print $text;
but it won't return anything.
If the HTML is in a string:
$source = '<div class="usernameHolder">Username: user123</div>';
# Allow optional whitespace before or after the username value.
$text = $source=~ /Username:\s*(.*?)\s*</;
print $1 . "\n"; # user123
If the HTML is in an array:
#source = (
'<p>Some text</p>',
'<div class="usernameHolder">Username: user123</div>',
'<p>More text</p>'
);
# Combine the matching array elements into a string.
$matching_lines = join "",grep(/Username:\s*(.*?)\s*</, #source);
# Extract the username value.
$text = $matching_lines =~ /Username:\s*(.*?)\s*</;
print $1 . "\n"; # user123
A more-compact version using an array:
#source = (
'<p>Some text</p>',
'<div class="usernameHolder">Username: user123</div>',
'<p>More text</p>'
);
# Combine the matching array elements in a string, and extract the username value.
$text = (join "",grep(/Username:\s*(.*?)\s*</, #source)) =~ /Username:\s*(.*?)\s*</;
print $1 . "\n"; # user123
Your second \s+ doesn't match anything, since there is no space between user123 and the following tag.
How about this?
/Username:\s*(.*?)\s*</
Here, \s* is discarding spaces if there are any, and .*? is there so that you don't grab most of the document in the process. (See greedy vs. non-greedy)