Why won't my extension render umlauts? - typo3

I am working on an extension to display downloads on a website. You can view the full, current source over on GitHub.
Given this piece of code in my controller:
$linkName = Tx_Downloads_Utility_Filename::construct( $download );
$download->setLinkText( $linkName );
This is where I want to set the label for a download. Sadly, when it is later rendered, the result will be blank if $linkName contained an umlaut (umlauts were just my test subject, the actual scope is unknown).
For debugging purposes, I have extended that section to look like this:
$linkName = Tx_Downloads_Utility_Filename::construct( $download );
$download->setLinkText( $linkName );
$this->flashMessages->add( "'" . strlen( $linkName ) . "'" );
$this->flashMessages->add( urlencode( $linkName ) );
$this->flashMessages->add( $linkName );
The resulting output of that is:
Please note that no third flash message is rendered.
But it's not like no umlauts would be rendered. For example, this is the record I am debugging with:
The link field (between the image icon and the 31.06KB) is blank but should say Text_File_Sömething.jpg. The string Sömething is rendered perfectly fine in another place of the template.
Is the problem with my Fluid template?

Sorry, that was not really clear. Next try:
you call Tx_Downloads_Utility_Filename::construct($linkName) which (by default) calls Tx_Downloads_Utility_Filename::clean($linkName) which again removes all the special characters by replacing anything that doesn't match the regex pattern /([[:alnum:]_\.-]*)/ by underscores.
There seems to be a problem with encoding (maybe your db is not set to UTF-8 encoding) so Text_File_Sömething is actually turned into Text_File_Sömething and the clean() method turns that into an invalid string. try using utf8_encode() on the $filename first.

Related

FPDF library not showing special characters like '✓'

Hey I try to write special characters like '✓' to FPDF and it's not work. Only normal string work.
I have checkbox on the pdf and i try to fill the checkbox with '✓'.
I try it like this:
$value = iconv('UTF-8', 'windows-1255', html_entity_decode('✓'));
$pdf->Write(0, $value);
But when i go to the pdf and the string broken and not the same.
Thanks
This character is not included in windows-1255. You may use "ZapfDingbats" and use chr(51) or chr(52).
$pdf->SetFont('ZapfDingbats', '', 12);
$pdf->Write(0, chr(51();
See here for a font dump of all standard fonts.

How to remove quotes in my product description string?

I'm using OSCommerce for my online store and I'm currently optimizing my product page for rich snippets.
Some of my Google Indexed pages are being marked as "Failed" by Google due to double quotes in the description field.
I'm using an existing code which strips the html coding and truncates anything after 197 characters.
<?php echo substr(trim(preg_replace('/\s\s+/', ' ', strip_tags($product_info['products_description']))), 0, 197); ?>
How can I include the removal of quotes in that code so that the following string:
<strong>This product is the perfect "fit"</strong>
becomes:
This product is the perfect fit
Happened with me, try to use:
tep_output_string($product_info['products_description']))
" becomes "
We can try using preg_replace_callback here:
$input = "SOME TEXT HERE <strong>This product is the perfect \"fit\"</strong> SOME MORE TEXT HERE";
$output = preg_replace_callback(
"/<([^>]+)>(.*?)<\/\\1>/",
function($m) {
return str_replace("\"", "", $m[2]);
},
$input);
echo $output;
This prints:
SOME TEXT HERE This product is the perfect fit SOME MORE TEXT HERE
The regex pattern used does the following:
<([^>]+)> match an opening HTML tag, and capture the tag name
(.*?) then match and capture the content inside the tag
<\/\\1> finally match the same closing tag
Then, we use a callback function which does an additional replacement to strip off all double quotes.
Note that in general using regex against HTML is bad practice. But, if your text only has single level/occasional HTML tags, then the solution I gave above might be viable.

Perl CGI - How can I delete contents of text fields?

So, I am totally new with CGI programming in Perl.
The question is simple. Is there any chance to delete the content of a text field in CGI?
I must to write a code that have some popup_menu, submit button and text fields (area).
When I click on the submit button the program reads the value from one of the popup_menu.
The task is to copy this content into text field and then when I choose another element from the popup_menu (and click on the submit button of course), let the new content write into the text field replace the old one.
I think perldoc.perl.org gives only a little information about CGI programming. I'd have lot of questions in thema... :(
Any help would be approciate!
I guess, what you describe is: when you click the submit button, then your cgi script will run, given the parameters you entered in the form. What I then has to do is: write something back and print the form again - with different values.
So even if this is not the perfect way of doing such kind of things (for simple form element substitution you should do it client side and use javascript - you don't need a cgi backend script for this), let's see how a cgi script might look like.
First it's important to know, how you write your form. Let's assume you write it "the hard way" with print.
What your script has to do is parse the input and then add it as a value to the output.
use CGI;
my $q = CGI->new;
# get the value from the popup / html select
my $popup_value = $q->param('popup_menu'); # name of the <select name="..."> in your html
# ...
# writing the form
print $q->header;
# some more prints with form etc.
print textarea( -name => 'text_area',
-default => $popup_value // '', # will use empty string on first call
);
# Don't turn off autoescaping !
BTW, the value of a select option is meant to be a short indicator, not a full text (even this might be possible up to a certain amount of characters). So you might think of building a hash or an array with the appropriate values to be printed in the text area and give your select options the values 0, 1, 2 ...
my #text_values = ('', 'First text', 'second text', 'third text');
my $popup_value = $q->param('popup_menu') || 0; # default index.
# now use 1,2,3, ... as values in your popup_menu options
# ...
print textarea( -name => 'text_area',
-default => $text_values[$popup_value] );

preg_match a keyword variable against a list of latin and non-latin chars keywords in a local UTF-8 encoded file

I have a bad words filter that uses a list of keywords saved in a local UTF-8 encoded file. This file includes both Latin and non-Latin chars (mostly English and Arabic). Everything works as expected with Latin keywords, but when the variable includes non-Latin chars, the matching does not seem to recognize these existing keywords.
How do I go about matching both Latin and non-Latin keywords.
The badwords.txt file includes one word per line as in this example
bad
nasty
racist
سفالة
وساخة
جنس
Code used for matching:
$badwords = file_get_contents("badwords.txt");
$badtemp = explode("\n", $badwords);
$badwords = array_unique($badtemp);
$hasBadword = 0;
$query = strtolower($query);
foreach ($badwords as $key => $val) {
if (!empty($val)) {
$val = trim($val);
$regexp = "/\b" . $val . "\b/i";
if (preg_match($regexp, $query))
$badFlag = 1;
if ($badFlag == 1) {
// Bad word detected die...
}
}
}
I've read that iconv, multibyte functions (mbstring) and using the operator /u might help with this, and I tried a few things but do not seem to get it right. Any help would be much appreciated in resolving this, and having it match both Latin and non-Latin keywords.
The problem seems to relate to recognizing word boundaries; the \b construct is apparently not “Unicode aware.” This is what the answers to question php regex word boundary matching in utf-8 seem to suggest. I was able to reproduce the problem even with text containing Latin letters like “é” when \b was used. And the problem seems to disappear (i.e., Arabic words get correctly recognized) when I set
$wstart = '(^|[^\p{L}])';
$wend = '([^\p{L}]|$)';
and modify the regexp as follows:
$regexp = "/" . $wstart . $val . $wend . "/iu";
Some string functions in PHP cannot be used on UTF-8 strings, they're supposedly going to fix it in version 6, but for now you need to be careful what you do with a string.
It looks like strtolower() is one of them, you need to use mb_strtolower($query, 'UTF-8'). If that doesn't fix it, you'll need to read through the code and find every point where you process $query or badwords.txt and check the documentation for UTF-8 bugs.
As far as I know, preg_match() is ok with UTF-8 strings, but there are some features disabled by default to improve performance. I don't think you need any of them.
Please also double check that badwords.txt is a UTF-8 file and that $query contains a valid UTF-8 string (if it's coming from the browser, you set it with a <meta> tag).
If you're trying to debug UTF-8 text, remember most web browsers do not default to the UTF-8 text encoding, so any PHP variable you print out for debugging will not be displayed correctly by the browser, unless you select UTF-8 (in my browser, with View -> Encoding -> Unicode).
You shouldn't need to use iconv or any of the other conversion API's, most of them will simply replace all of the non-latin characters with latin ones. Obviously not what you want.

Print freeform text in a Perl function?

I have been getting a very odd error when trying to print freeform text in a subroutine in Perl. Below is the code I am calling
print OUTFILE <<"HEADER";
The freeform text would go here
HEADER
The odd thing is that this only works in the main of my function. As soon as I place it in a function call, I get this error:
Can't find string terminator "HEADER" anywhere before EOF
Meaning it can't find the HEADER, even though it is there. Can you not use freeform text within a function (subroutine)?
Make sure that there is no space/tab/indentation before ending string identifier, that is HEADER. Your code should look like this:
function someFunc(){
print OUTFILE <<"HEADER";
The freeform text would go here
HEADER
}
Notice that there is no space/tab/indentation before HEADER there. It should start from first character of its line.
Check this tutorial out for more information:
Perl Here-Doc Tutorial
Quoting:
The important rule to remember is that
you finish a here-doc using the same
word you started, and it must be by
itself on the line