WkHTMLtoPDF Unicode Issue - unicode

I've already read several similar StackOverflow posts and none of them were able to resolve my issue.
The Issue
I have a PDF that's being generated by WkHTMLtoPDF which contains a unicode RIGHT SINGLE QUOTATION MARK (U+2019, or ’) character. Rendered in a browser, the output looks like the following:
When I run this through WkHTMLtoPDF, I get the following:
The Code
I'm using the following for my CSS:
#font-face {
font-family: localGeorgia;
src: url("file:///usr/share/fonts/truetype/georgia/GEORGIA.TTF");
}
body {
overflow: visible !important;
font-family: localGeorgia, Georgia, Times, "Times New Roman", serif;
font-size: 12px;
}
I have also copied the Georgia font from my local computer to the server (there are several files in the /usr/share/fonts/truetype/georgia/ directory) and I have run fc-cache -fv to clear the font cache and run fc-list to verify that Georgia was properly installed. The localGeorgia font family was added as a formality because I still wasn't getting a working display.
I've verified both via the online docs and my operating system's character map that the Georgia font does support the RIGHT SINGLE QUOTATION MARK (see below) although I don't know how to prove definitively that this glyph is in the TrueType file (I'm not familiar with opening or parsing TrueType files)
At this point it's unclear to me why WkHTMLtoPDF is displaying this mess of characters instead of the proper unicode glyph
Additional details (environment and such)
I'm running Ubuntu 16.04
Laravel version 5.3
I'm using Laravel-Snappy version 0.3.3 (which is using KNP-Snappy version 0.4.3)
My config for Snappy is pretty straight-forward:
<?php
return array(
'pdf' => array(
'enabled' => true,
'binary' => base_path('vendor/h4cc/wkhtmltopdf-amd64/bin/wkhtmltopdf-amd64'),
'timeout' => false,
'options' => array(),
'env' => array(),
),
'image' => array(
'enabled' => false,
'binary' => '/usr/local/bin/wkhtmltoimage',
'timeout' => false,
'options' => array(),
'env' => array(),
),
);
The installed wkhtmltopdf version is 0.12.3 (with patched qt)
To generate the PDF I'm calling ->render() on the View, passing this to PDF::loadHTML, then calling ->inline() on the result and returning a response. Here's a minimal example of how I'm generating the PDF:
$property = Property::find(1);
$view = View::make("pdf.flier")->with(["property" => $property]);
$pdf = PDF::loadHTML($view->render())->inline();
return response($pdf)->header("application/pdf")->header("Content-Disposition", "attachment; filename=flier.pdf");
The HTML is incredibly simple:
<html>
<head>
<base href="{{ url("/") }}" />
<link rel="stylesheet" type="text/css" href="css/flier.css" />
</head>
<body>
<img src="{{ $property->image }}" />
<h1>{{ $property->title }}</h1>
</body>
</html>
The CSS gives the h1 an absolute position over top of the image

After a couple of days, I've finally figured this out
The issue does not lie with the font. If it did, I would see a glyph fail to load (e.g. - a box or a question mark would appear in place of the unicode character)
Instead what I'm seeing is several incorrect glyphs appear in place of the desired unicode character. This indicates an encoding issue, not a font issue. WkHTMLtoPDF is interpreting the 3-byte unicode character as 3 individual 1-byte ASCII characters
The problem is that my browser has a default encoding of UTF-8, but WkHTMLtoPDF does not (at least not in version 0.12.3). The fix was simple: update my config file
<?php
return array(
'pdf' => array(
'enabled' => true,
'binary' => base_path('vendor/h4cc/wkhtmltopdf-amd64/bin/wkhtmltopdf-amd64'),
'timeout' => false,
'options' => array(
'encoding' => 'utf-8'
),
'env' => array(),
),
'image' => array(
'enabled' => false,
'binary' => '/usr/local/bin/wkhtmltoimage',
'timeout' => false,
'options' => array(
'encoding' => 'utf-8'
),
'env' => array(),
),
);
Note: In my research I found some examples of people claiming the "--encoding" option did not work for them, however adding a meta charset tag to the HTML did:
<meta charset="utf-8">

Related

ZF3 form element classes get encoded with unicode entities

I'm trying to figure out why ZF3 encodes my element's class string, but can't find anything about that behaviour on the internet.
$this->add([
'type' => 'Button',
'name' => 'submitLogin',
'options' => [
'label' => '<i class="zmdi zmdi-check"></i>',
'label_options' => [
'disable_html_escape' => true,
]
],
'attributes' => [
'type' => 'submit',
'class' => 'btn btn--icon login__block__btn',
],
]);
becomes
<button type="submit" name="submitLogin" class="btn btn--icon login__block__btn" value=""><i class="zmdi zmdi-check"></i></button>
I think this is an abstract concept. Generally we take some steps when we work with data. We filter input values and escape outputs. This is a security philosophy.
Zend Framework did the same thing while something is about security. This means this behavior is by default. ZF escapes attributes' values when it is being displayed onto the browser. ZF only allows non-escaping through explicit options like you did for the label's content above.
You will get some concept via this issue on github where Matthew said:
Secure by default is the mantra

Anchor tag image joomla

How can I use an image button instead of the text in the following code
JHTML::link ($product->link, JText::_ ('COM_VIRTUEMART_PRODUCT_DETAILS'), array('title' => $product->product_name, 'class' => 'product-details'))
in HTML it is rendering as <a href="blah blah">Product details
How to use the image instead of Product details in HREF tag in above php code
JHTML::link ($product->link, JText::_ ('COM_VIRTUEMART_PRODUCT_DETAILS'), array('title' => $product->product_name, 'class' => 'product-details'))
$product is a PHP object, and JHTML is used by joomla! framework to output links
Actually your code:
JHTML::link ($product->link, JText::_ ('COM_VIRTUEMART_PRODUCT_DETAILS'), array('title' => $product->product_name, 'class' => 'product-details'))
is equal to plain HTML
<?php JText::_ ('COM_VIRTUEMART_PRODUCT_DETAILS');?>
JText is used to output words and phrases based in a language INI file based in language/LANGUAGE_ID/folder. For english language and virtuemart component your string is located in /language/en-GB/en-GB.com_virtuemart.ini
in order to use an image link try
<img=src="myimage.png" />

Cakephp Form Labels Encoding Utf8

In my php application since the beginning that i set everything with utf8 to avoid future problems. I set my database:
class DATABASE_CONFIG {
public $default = array(
'datasource' => 'Database/Mysql',
'persistent' => false,
'host' => 'localhost',
'login' => 'root',
'password' => '',
'database' => 'aquitex',
'prefix' => '',
'encoding' => 'utf8',
);
public $test = array(
'datasource' => 'Database/Mysql',
'persistent' => false,
'host' => 'localhost',
'login' => 'root',
'password' => '',
'database' => 'aquitex',
'prefix' => '',
'encoding' => 'utf8',
);
}
The file core.php:
Configure::write('App.encoding', 'UTF-8');
And the default layout of the views:
<?php echo $this->Html->charset(); ?>
However, i'm still having problems in some elements like labels of forms.
In my index.ctp file, this line:
echo $this->Html->link("Segurança", array('controller' => 'Posts','action'=> 'add'), array( 'class' => 'button'));
works perfectly and there's no problem with the 'ç' character.
But in forms, like this:
echo $this->Form->create('Post');
echo $this->Form->input('Nome Produto');
echo $this->Form->input(utf8_encode("Código Produto"));
echo $this->Form->input("Versão");
echo $this->Form->input('Data');
//echo $this->Form->input('body', array('rows' => '3'));
echo $this->Form->end('Criar Ficha');
there's no way i can get the words on the labels of the form with 'ó" or 'ç' characters showing properly. As you can see i even tried the utf8encode() in one of them.
Any hints? Thank you!
there is no need to use utf8_encode() in your views.
you simply forgot to save the view file properly.
save it as "utf8 without bom" and you will be fine.
files that do not contain any special utf8 char can still stay as ansi (since there is no difference between them then).
but every file that does contain such a character you need to save as utf8 (even controllers and models if you plan on using utf8 characters there for error messages etc).
PS: in general it is wiser to use english and to translate it via PO file into your language.
this way you can leave the files as they are and you are more flexible (you can add new languages on the fly just by creating a new PO file then).
EDIT
After figuring out together that your inputs() use utf8 chars, I will need to update:
It is wise to use "underscore_field_names" for your db fields (and therefore your input fields) - and in English:
echo $this->Form->input("version"));
you can easily translate them via PO file afterwards or specifying the label:
echo $this->Form->input("version", array('label' => 'Versão'));
but the first way is recommended to keep it dry.
App.encoding just tells Cake to send data in UTF8. If you're using MySQL, make sure the database itself is set to utf8_general_ci collation.

How to properly use UTF-8-encoded data from Schema inside Catalyst app?

Data defined inside Catalyst app or in templates has correct encoding and is diplayed well, but from database everything non-Latin1 is converted to ?. I suppose problem should be in model class, which is such:
use strict;
use base 'Catalyst::Model::DBIC::Schema';
__PACKAGE__->config(
schema_class => 'vhinnad::Schema::DB',
connect_info => {
dsn => 'dbi:mysql:test',
user => 'user',
password => 'password',
{
AutoCommit => 1,
RaiseError => 1,
mysql_enable_utf8 => 1,
},
'on_connect_do' => [
'SET NAMES utf8',
],
}
);
1;
I see no flaws here, but something must be wrong. I used my schema also with test scripts and data was well encoded and output was correct, but inside Catalyst app i did not get encoding right. Where may be the problem?
EDIT
For future reference i put solution here: i mixed in connect info old and new style.
Old style is like (dsn, username, passw, hashref_options, hashref_other options)
New style is (dsn => dsn, username => username, etc), so right is to use:
connect_info => {
dsn => 'dbi:mysql:test',
user => 'user',
password => 'password',
AutoCommit => 1,
RaiseError => 1,
mysql_enable_utf8 => 1,
on_connect_do => [
'SET NAMES utf8',
],
}
In a typical Catalyst setup with Catalyst::View::TT and Catalyst::Model::DBIC::Schema you'll need several things for UTF-8 to work:
add Catalyst::Plugin::Unicode::Encoding to your Catalyst app
add encoding => 'UTF-8' to your app config
add ENCODING => 'utf-8' to your TT view config
add <meta http-equiv="Content-type" content="text/html; charset=UTF-8"/> to the <head> section of your html to satisfy old IEs which don't care about the Content-Type:text/html; charset=utf-8 http header set by Catalyst::Plugin::Unicode::Encoding
make sure your text editor saves your templates in UTF-8 if they include non ASCII characters
configure your DBIC model according to DBIx::Class::Manual::Cookbook#Using Unicode
if you use Catalyst::Authentication::Store::LDAP configure your LDAP stores to return UTF-8 by adding ldap_server_options => { raw => 'dn' }
According to Catalyst::Model::DBIC::Schema#connect_info:
The old arrayref style with hashrefs for DBI then DBIx::Class options is also supported.
But you are already using the 'new' style so you shouldn't nest the dbi attributes:
connect_info => {
dsn => 'dbi:mysql:test',
user => 'user',
password => 'password',
AutoCommit => 1,
RaiseError => 1,
mysql_enable_utf8 => 1,
on_connect_do => [
'SET NAMES utf8',
],
}
This advice assumes you have fairly up to date versions of DBIC and Catalyst.
This is not necessary: on_connect_do => [ 'SET NAMES utf8' ]
Ensure the table|column charsets are UTF-8 in your DB. You can achieve things that sometimes look right even when parts are broken. The DB must be saving the character data as UTF-8 if you expect the entire chain to work.
Ensure you're using and configuring Catalyst::Plugin::Unicode::Encoding in your Catalyst app. It did have serious-ish bugs in the not too distant past so get the newest.

Non-english characters in Zend_Form

I'm creating a form using Zend_Form, and all he words that contains diacritics are not rendered. The encoding is set to UTF-8, the collation of the database is set to utf-8_unicode. What else should I do/check?
The page header:
<meta content="text/html; charset=utf-8" http-equiv="content-type">
The Zend_form part:
$user = Doctrine::getTable("aclUser")->find(1, Doctrine_Core::HYDRATE_ARRAY);
$this->addElement('text','providerName',
array(
'label' => 'Provider_name',
'required' => false,
'readonly' => true,
'value' => $user['name'],
'filters' => array('StringTrim'),
'decorators'=> array(new Application_Form_Decorators_Custom())
)
);
Could be an issue submitting the form itself
Does setting your form's accept-charset to UTF-8 help?
<form accept-charset="UTF-8">
[form elements]
</form>
Solved. I added this settings in my.cnf and now everything is rendered ok:
[client]
default-character-set=utf8
[mysqld]
init_connect='SET collation_connection = utf8_unicode_ci'
init_connect='SET NAMES utf8'
default-character-set=utf8
character-set-server = utf8
collation-server = utf8_unicode_ci
[mysql]
default-character-set=utf8