How to make Zend_Form display special characters in text elements? - zend-framework

I am using a Zend_Form subclass to add and edit records in a database. The layout has iso-8859-1 encoding and charset. The table records use latin1_spanish_ci collation.
The form input text element doesn't display anything at all when the record contains special characters like accents. If there are no special characters the form input text element displays the record correctly. Curiously enough the special characters display correctly when they appear outside the text input field, for example inside an Html heading2 or a paragraph.
I have tried inserting the following in application.ini:
resources.db.params.charset=iso-8859-1
but I get an error message:
SQLSTATE[42000] [1115] Unknown character set: 'iso-8859-1'
I have also tried changing the db charset to utf8 in the same way. The form text element displays the string but I get strange characters instead of the original ones.
I have tried almost anything but I haven't solved the problem. It seems that text input elements generated with Zend_Form hate Latin characters.
Have you had the same problem?

I found this simple solution in a zf forum:
Add the following to your _initView function in bootstrap.php and forget about everything else:
$view->setEncoding('iso-8859-1');

Related

pdfbox generates garbage japanese text if it contains kangxi radicals

we use pdfbox 2.0.20 and try to generate pdf file that contains following text with NotoSansJP(https://fonts.google.com/noto/specimen/Noto+Sans+JP). note that ⽤ in the text does not valid kanji (0xe794a8), this is kangxi radical use(0xe2bda4)
注文書兼注文請書Webページのボタン⽤GIF画像編集
result become below garbled text.
strange thing is if I copy-and-paste these garbled text in pdf here, result seems correct.
注⽂書兼注⽂請書Webページのボタン用GIF画像編集
except that 用 in the text become valid kanji(0xe794a8).
so, for me it seems that when text contains invalid kanji like kangxi radicals, different code pages are used.
but the fact that kangxi radical character seems to be modified to valid kanji, maybe it related with unicode normalization.
Does anyone experience same situation? and any thought about the reason?
regards,
EDIT: our nervous customer complains the problematic text contains sensitive data, I change the text a bit.

Unicode converted text isn't shown properly in MS-Word

In a mapping editor, the display is correct after the legacy to unicode conversion for DEVANAGARI text shown using a unicode font (Arial Unicode MS). However, in MS-WORD, the display isn't as expected for the same unicode text in the unicode font (Arial Unicode MS) or any other Devanagari unicode fonts. The expected sequence of unicodes are provided as per the documentation. The sequence can be seen on the left-hand side table.
Please let me know where I am going wrong.
Thanks for your help!
Does your map have to insert the zero_width_joiner? The halant (virama) by itself is enough to get the half-consonant (for some combinations) and in particular, it may be that Word is using the presence of the ZWJ to keep them separate.
If getting rid of the ZWJ doesn't help, another possibility is that Word may be treating the individual characters of the text string as individual "runs" of text.
If those first 4 characters are not in a single run, this can happen.
[aside: the way to tell if it's being treated as a single run, is to save the document as an xml file and then open it with something like notepad++ and look at the xml "w:t" element (IIRC) associated with these characters. If they're all in separate w:t elements, it means they're in separate runs. In that case, you might need to copy the text from Word to some other tool (e.g. Notepad++) and then copy it from there and paste it back in Word -- that might cause it to be imported into Word in a single run.

Zend Framework Form not rendering special characters like (ä, ö, ü etc) - makes the form element value empty

I am trying to set the Zend Form working for me. I am using the same form for inserting and editing a particular database object. The object has name and I can easily create a new object with the name "Ülo". It saves correctly in database and when I fetch it to display in a report then it shows correclty "Ülo". The problem is with forms. When I open the edit form then the name element is empty. All other elements are showing correctly and if I change them to have "ü" in them they are displayed empty too. The same thing is with Form element labels. When I set a label to contain "ü" it does not show the label any more.
For example if I have $name->setLabel('Nameü: '); then it does not show the label but when I change it back to $name->setLabel('Name: '); then it shows correclty.
Same thing when I have $bcrForm->name->setValue('Ülo'); it does not show the value but when I change it to $bcrForm->name->setValue('Alo'); it shows correctly.
How can I fix it to display correctly? It seems like it is some kind of form rendering issue.
This one helped solving it for me:
make sure these settings are in /etc/php5/apache2/php.ini and /etc/php5/cli/php.ini:
default_charset = utf-8
Did you checked encoding? Try adding this to head...
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
make sure that both your script and view files are encoded on UTF-8 and your connection to your DB is set for UTF-8 too.
If you are using mysql you force it to return UTF-8 data by openning a DB connection and running : SET NAMES 'utf8'
or using mysqli with : mysqli_set_charset('utf8');
I would check:
view charset
database charset (backend)
Zend_Db_Adapter charset
file charset
The view escape method is set to expect utf8 chars and may strip anything else (ie. singlebyte strange chars) :)
It should be as simple as setting escape flag to false on the elements label decorator.
$name->addDecorator('Label', аrray('escape'=>false));
Or see setEscape(). http://framework.zend.com/manual/1.12/en/zend.form.standardDecorators.html

formatting text in a csv export

I'm having trouble with a .csv export which is being uploaded to a website. There are must be some hidden or illegal characters in a description field I have in the database. I'm having a tough time getting the text to format correctly and not break a php script.
If I use the GetAs(css) function in a calculation, the text works fine. Obviously this won't work as a working file but it at least validates there's something in the formatting of the description field that's breaking the export. I did use the excel clean(text) calculation and that fixes the issue as well. Just need to find a way in Filemaker to do this.
Any suggestions?? Maybe a custom function that strips out bad characters?
You can filter invalid characters out of text using the filter function. If you only want a minimal set of ASCII characters, use it like
filter(mytable::myfield; "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789.!?")

UTF8 charset, diacritical elements, conversion problems - and Zend Framework form escaping

I am writing a webapp in ZF and am having serious issues with UTF8. It's using multi lingual content through Zend Form and it seems that ZF heavily escapes all of these characters and basically just won't show a field if there's diacritical elements 'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
Zend Form allows for having non escaped data, but trying to use this is confusing, and it seems it'd need to be used all over the place.
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
And if the last question is true, then how do I convert the source text to UTF8? I am comfortable setting up apache so that it sends a default UTF8 charset heading, and also adding the charset meta tag to the html, but doing this I am still getting messed up encoding. I have also tried opening the translation csv file in TextWrangler on OSX as UTF8, but it has done nothing.
Thanks!
L
'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
This I don't understand. Can you show an example of how it is displayed, as opposed to how it should be displayed?
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
Yup. In more detail: If the data you're displaying and the encoding of the HTML page are both UTF-8, the multi-byte special characters will be displayed correctly.
And if the last question is true, then how do I convert the source text to UTF8?
Advanced editors and IDEs enable you to define what encoding the source file is saved in. You would need to open the file in its current encoding (with special characters being displayed correctly) and save it as UTF-8.
If the content is messed up when you have the right content-type header and/or meta tag specified, then the content is not UTF-8 yet. If you don't get it sorted, post an example of what it looks like here.