How to remove quotes in my product description string? - preg-replace

I'm using OSCommerce for my online store and I'm currently optimizing my product page for rich snippets.
Some of my Google Indexed pages are being marked as "Failed" by Google due to double quotes in the description field.
I'm using an existing code which strips the html coding and truncates anything after 197 characters.
<?php echo substr(trim(preg_replace('/\s\s+/', ' ', strip_tags($product_info['products_description']))), 0, 197); ?>
How can I include the removal of quotes in that code so that the following string:
<strong>This product is the perfect "fit"</strong>
becomes:
This product is the perfect fit

Happened with me, try to use:
tep_output_string($product_info['products_description']))
" becomes "

We can try using preg_replace_callback here:
$input = "SOME TEXT HERE <strong>This product is the perfect \"fit\"</strong> SOME MORE TEXT HERE";
$output = preg_replace_callback(
"/<([^>]+)>(.*?)<\/\\1>/",
function($m) {
return str_replace("\"", "", $m[2]);
},
$input);
echo $output;
This prints:
SOME TEXT HERE This product is the perfect fit SOME MORE TEXT HERE
The regex pattern used does the following:
<([^>]+)> match an opening HTML tag, and capture the tag name
(.*?) then match and capture the content inside the tag
<\/\\1> finally match the same closing tag
Then, we use a callback function which does an additional replacement to strip off all double quotes.
Note that in general using regex against HTML is bad practice. But, if your text only has single level/occasional HTML tags, then the solution I gave above might be viable.

Related

XPath nodes text joined by br

How to join text nodes between br tags again by br.
Here is the xml code
<div>
text1.
<br>
text2.
<br>
text3.
<div>ad sense code</div>
<br>
text4.
<div>ad sense code</div>
<br>
textxx.
<br>
</div>
I need to get all text node text2 to textxx joined by br tag or \n\n.
I can get all the text but joined without any separator using
//div/text()[position()>1] but the result like this:
text1.text2.text3.text4.textxx.
while I want it like this:
text1.<br>text2.<br>text3.<br>text4.<br>textxx.<br>
Simply I need to keep the br tags.
I am using Perl HTML::TreeBuilder::LibXML module.
XPath can be used (a) to select nodes from the input document, or (b) to compute atomic values such as strings, booleans, or numbers from the nodes in the input document. It can never [with very edge-case exceptions] return nodes that weren't present in the input.
It's not entirely clear what you mean by your desired output of
text1.<br>text2.<br>text3.<br>text4.<br>textxx.<br>
Are you looking for this as a string? Or a sequence of text nodes and element nodes, interspersed?
Returning it as a string is possible in XPath 3.1 using the serialize() function, but in Perl you only have access to the venerable and limited XPath 1.0.
Returning it as a set of nodes isn't possible because the nodes aren't there in the source: the source contains text nodes that have values such as "__text1__" where underscores represent whitespace, and your desired output drops the whitespace.
You appear to be doing a transformation rather than merely a selection, so you are out of XPath territory and into XSLT.
The solution I was able to do what I want in Perl is like this:
$text = "";
$tree = HTML::TreeBuilder::LibXML->new_from_content($content);
foreach my $node ($tree->findnodes("./div/text()[position()>1]")) {
$text .= $node->findvalue('string(.)') . "<br>";
}
$text =~ s/<br>$//g;

PHP do 2 preg_replace in link tag

I am trying to make a multi preg_replace, not sure if thats the correct function.
I want this outcome
[link]www.mynewhomepage.com(My new homepage)[/link]
to become <a href=mynewhomepage.com>My New homepage</a>
I have made this code, which dosent give me what i want
$string = 'i have made a new homepage visit [link]http://myhomepage.dk(My New homepage)[/link]';
$find = array('#\[link\](.+)\[\/link\]#iUs', '#\((.+)\)#iUs');
$replace = array('<a href=$1>', '</a>');
$result = preg_replace($find, $replace, $string);
echo $result;
And it give me this outcome: http://myhomepage.dk>
Can anyone guide me or help me in the right direction of what i am doing wrong? :)
Thanks and happy summer for you :)
This should work:
\[link\](.*?)\((.*)\)\[\/link\]
https://regexr.com/3sc23
It basically matches up to first left parenthesis and also from last parenthesis to the end. Then you put these pieces into capturing groups for referencing them later.
Use $2 as substitution
About your original question, your solution had these problems:
For first replacement we should use <a href="$1. Note that we use " at the beginning of the link but for the moment we do not add it at the end. That way, the nest regexp will be easier.
Then, on the second regexp we should add "> at the beginning in order to close the tag. Also you were not using the captured group at all. That would be the replacement: ">$1</a>
That is, change this line:
$replace = array('<a href=$1>', '</a>');
to this:
$replace = array('$1');

php gettext include string with phpcode

i'm trying to use gettext to translate the string in my site
gettext doesn't have problem detecting strings such as
<? echo _("Donations"); ?>
or
<? echo _("Donate to this site");?>
but obviously, usually we'll use codes like this in our site
<? echo _("$siteName was developed with one thing in mind"); ?>
Of course in the website, the $siteName is displayed correctly as
My Website was developed with one thing in mind
if we put
$siteName = "My Website";
previously.
My problem is, i'm using poedit to extract all the strings in my codes that needs to be translated, and it seems poedit doesn't extract all string with php codes like I described above. So how do I get poedit extract strings with php code inside it too? Or is there any other tools I should use?
One possibility is to use sprintf. Just make sure you keep the percent (%) in the poedit string!
echo sprintf( _("This %s can be translated "), 'string');
Or when using multiple variables
echo vsprintf( _("This %s can be %s"), ['string', 'translated']);

Why won't my extension render umlauts?

I am working on an extension to display downloads on a website. You can view the full, current source over on GitHub.
Given this piece of code in my controller:
$linkName = Tx_Downloads_Utility_Filename::construct( $download );
$download->setLinkText( $linkName );
This is where I want to set the label for a download. Sadly, when it is later rendered, the result will be blank if $linkName contained an umlaut (umlauts were just my test subject, the actual scope is unknown).
For debugging purposes, I have extended that section to look like this:
$linkName = Tx_Downloads_Utility_Filename::construct( $download );
$download->setLinkText( $linkName );
$this->flashMessages->add( "'" . strlen( $linkName ) . "'" );
$this->flashMessages->add( urlencode( $linkName ) );
$this->flashMessages->add( $linkName );
The resulting output of that is:
Please note that no third flash message is rendered.
But it's not like no umlauts would be rendered. For example, this is the record I am debugging with:
The link field (between the image icon and the 31.06KB) is blank but should say Text_File_Sömething.jpg. The string Sömething is rendered perfectly fine in another place of the template.
Is the problem with my Fluid template?
Sorry, that was not really clear. Next try:
you call Tx_Downloads_Utility_Filename::construct($linkName) which (by default) calls Tx_Downloads_Utility_Filename::clean($linkName) which again removes all the special characters by replacing anything that doesn't match the regex pattern /([[:alnum:]_\.-]*)/ by underscores.
There seems to be a problem with encoding (maybe your db is not set to UTF-8 encoding) so Text_File_Sömething is actually turned into Text_File_Sömething and the clean() method turns that into an invalid string. try using utf8_encode() on the $filename first.

Reading custom values in Ebay RSS feed (XML::RSS module)

I've spent entirely way too long trying to figure this out. I'm using XML: RSS and Perl to read / parse an Ebay RSS feed. Within the <item></item> area, I see these entries:
<rx:BuyItNowPrice xmlns:rx="urn:ebay:apis:eBLBaseComponents">1395</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:ebay:apis:eBLBaseComponents">1255</rx:CurrentPrice>
However, I can't figure out how to grab the details during the loop. I wrote a regex to grab them:
#current_price = $item =~ m/\<rx\:CurrentPrice.*\>(\d+)\<\/rx\:CurrentPrice\>/g;
Which works if you place the above 'CurrentPrice' entry into a standalone string, but not while the script is reading through the RSS feed.
I can grab most of the information I want out of the item->description area (# bids, auction end time, BIN price, thumbnail image, etc.), but it would be nicer if I could grab the info from the feed without me having to deal with grabbing all that information manually.
How to grab custom fields from an RSS feed (short of writing regexes to parse the entire feed w/o a module)?
Here's the code I'm working with:
$my_limit = 0;
use LWP::Simple;
use XML::RSS;
$rss = XML::RSS->new();
$data = get( $mylink );
$rss->parse( $data );
$channel = $rss->{channel};
$NumItems = 0;
foreach $item (#{$rss->{'items'}}) {
if($NumItems > $my_limit){
last;
}
#current_price = $item =~ m/\<rx\:CurrentPrice.*\>(\d+)\<\/rx\:CurrentPrice\>/g;
print "$current_price[0]";
}
If you have the rss/xml document and want specific data you could use XPATH:
Perl CPAN XPATH
XPath Introduction
What is the way in which "it doesn't work" from an RSS feed? Do you mean no matches when there should be matches? Or one match where there should be several matches?
One thing that jumps out at me about your regular expression is that you use .*, which can sometimes be greedier than you want. That is, if $item contained the expression
<rx:BuyItNowPrice xmlns:rx="urn:...nts">1395</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:...nts">1255</rx:CurrentPrice>
<rx:BuyItNowPrice xmlns:rx="urn:...nts">1395</rx:BuyItNowPrice>
<rx:SomeMoreStuff xmlns:rx="urn:...nts">zzz</rx:BuyItNowPrice>
<rx:CurrentPrice xmlns:rx="urn:...nts">1255</rx:CurrentPrice>
then the first part of your regular expression (\<rx\:CurrentPrice.*\>) will wind up matching everything on lines 2, 3, and 4, plus the first part of line 5 (up to the >). Instead, you might want to use the regular expression1
m/\<rx:CurrentPrice[^>]*>(\d+)\<\/rx:CurrentPrice\>/
which will only match up to the closing </rx:CurrentPrice> tag after a single instance of an opening <rx:CurrentPrice> tag.
1 The other obvious answer is that you really don't want to use a regular expression at all, that regular expressions are inferior tools for parsing XML compared to customized parsing modules, and that all the special cases you will have to deal with using regular expressions will eventually render you unconscious from having repeatedly beaten your head against your desk. See Salgar's answer, for example.