Using mPDF to create a PDF from a HTML form - forms

I want to use mpdf to create my PDF-files because I use norwegian letters such as ÆØÅ. The information on the PDF-file would mostly consist of text written by the user in a HTML form. But, I have some problems.
When using this code:
$mpdf->WriteHTML('Text with ÆØÅ');
The PDF will show the special characters.
But when using this:
<?php
include('mpdf/mpdf.php');
$name = 'Name - <b>' . $_POST['name'] . '</b>';
$mpdf = new mPDF();
$mpdf->WriteHTML($name);
$mpdf->Output();
exit;
?>
The special characters will not show.
The HTML form looks like this:
<form action="hidden.php" method="POST">
<p>Name:</p>
<input type="text" name="name">
<input type="submit" value="Send"><input type="reset" value="Clear">
</form>
Why won't the special characters show with this method? And which method should I use?

Since echoing the POST-data back onto the website does not show the characters as well, this clearly isn't an issue with mpdf. When using content including non-Ascii characters, special care about the websites character encoding has to be taken.
From the mpdf-documentation it can be seen that it supports UTF-8 encoding, so you might want to use that for your data. POST-data is received in the same encoding that is used by the website. So if the website is in latin-1, you will need to call utf8_encode() to convert the POST-data to unicode. If the website already uses UTF-8 you should be just fine.
If you don't set a specific encoding in the website header (which you should always to avoid this kind of trouble), encoding might depend on several factors such as the operating system and configuration on the server or the encoding of the original php sourcefile which, as it turns out, is influenced by your own OS configuration and choice of editor.

Related

How to properly encode html entities in emails? e.g. &nearr; for Gmail

So I modified some emails I send to get rid of images and replace them by special unicode characters. For example I had an arrow image and replaced it with &nearr; while wrapping it in a <span> to give it the color I want.
When I look at the source in Gmail (3 dots > Show Original) I see this:
...
--1234567890123456789012345678
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.=
w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns=3D"http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8" />
</head>
<body>
...
... <span style=3D"font-family:arial,verdana;font-weight:bold;color:#209a20">&nearr;</span> ...
...
</body>
</html>
--1234567890123456789012345678--
Which is what I'd expect since that's what I wrote in my code.
Now the problem is that it displays like this in the Gmail web interface:
What am I doing wrong? Isn't UTF-8 a unicode encoding that should support this character?
I would understand if some of these special characters are displayed as square boxes or something, but I do not understand how they can remain encoded while the turns into a space correctly.
It also makes me question whether other email clients will display these correctly (would love feedback on that too).
In the 1950's computers could handle only capital letters, digits and some punctuation.
Before 1970, EBCDIC was invented (only to later die out) for handling lower case and a few more punctuation characters.
Then came a plethora of encodings to handle European accents, Cyrillic, Greek, and eventually Chinese. (There are some interesting stories on the invention of typewriters for handling Chinese!)
Eventually, the Unicode group got together and slowly created a universal standard. It has been evolving for a few decades and continues to enhance it -- emojis are a big addition that is ongoing.
But, meanwhile, how does one put Emoji, etc, in URLs, type them on a keyboard, etc, etc? Those standards are lagging way behind. So, there are kludges in place.
HTML allows "entities", such as &nearr; for that arrow.
Putting such in a URL would require something like %E2%86%97.
Several encodings also base their kludge on the hex encoding of the utf8.
Unicode allows \U8599 which is based on the decimal value of the "codepoint". (I think Java goes that direction.)
MySQL INSERT: UNHEX('E28697')
Keyboards -- good luck.
I don't know of anything other than HTML that reacts favorably to &nearr;
Ever notice a + in a URL? That is the encoding for a single space. (Also %20 works there.)
Try the HTML code rather than the HTML entity.
So ↗ for the north east arrow, as per
https://www.toptal.com/designers/htmlarrows/arrows/north-east-arrow/
Best reference for this is usually https://unicode-table.com/en/

escaping user input using express.js

When a user fills out a form how do I go about escaping the user input in express.js?
Does express.js do this by default? I can't find a source.
Do I have to use a third-party module like express-validator.js?
UPDATE
I figured out the difference between escaping and validating.
What I wanted to do was escape user input but what I should be doing is validating it, making sure it's in a valid format and then escape the output to the form if it is not valid providing the user exactly what they inputted.
<%= some_html %> will automatically escape it. <%- some_html %> will output html intact.
Exactly what kind of escaping do you need to do? Express will automatically decode (not unescape) the query string for you and make it available as req.query. URL params will also be unencoded for you automatically.
If you need to escape HTML that includes user input when rendering, you should do that via your template engine. Most template engines such as jade (= value) or handlebars or mustache ({{value}}) will escape HTML by default, and require an explicit syntax to pass data through unescaped ( != value in jade or {{{value}}} in handlebars/mustache).

UIWebView, quote characters with Arial font not showing up correctly

I have some .html with the font defined as:
<font color="white" face="Arial">
I have no other style applied to my tag. In it, when I display data like:
<b> “Software” </b>
or
<b>“Software”</b>
they both display characters I do not want in the UIWebView. It looks like this on a black background:
How do I avoid that? If I don't use font face="arial", it works fine.
This is an encoding issue. Make sure you use the same encoding everywhere. UTF8 is probably the best choice.
You can put a line
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
in your html to tell UIWebView about the encoding.
To be precise, “ is what you get when you take the UTF-8 encoding of “, and interpret it as ISO-8859-1. So your data is encoded in UTF-8, which is good, and you just need to set the content type to UTF-8 instead of ISO-8859-1 (e.g. using the <meta> tag above)
You shouldn’t generally use the curly quote characters themselves—character encodings will always mess you up somehow. No idea why it works correctly when you don’t use Arial (though that suggests a great idea: don’t use Arial), but your best bet is to use the HTML entities “ and ” instead.

Perl CGI.pm encoding - wrong encoding for "ě"

I have a simple web page that uses CGI.pm This is what I do:
when I call any perl CGI.pm function and use czech character "ě" for value of a textfield, label of radio_group or anything else I get �› insetad of "ě"
this is extremly weird - since the whole page is utf8 (<meta name="charset" content="utf-8"/> ). Especially since this works
print '<textfield value="ěěěě" >';
therefore I am positive - it has to be CGI.pm causing the problem... I tried to put
use utf8;
utf8::decode($textfield_value);
at the beginning of my scirpt and it fixed the CGI.pm problem but made all other characters in the script (those that are regulary printed) look funny..
Any ideas???
Set the accept-charset attribute in your form fields to UTF-8?
<form action="/..." accept-charset="UTF-8">
This might not be sufficient to solve your problem, but it is often necessary to force the client browser to utf8-encode the form data that gets sent to the server.
Have you tried replacing the ě's with their octal or hex escapes? Unfortunately, there doesn't seem to be an HTML code for the character.

UTF8 charset, diacritical elements, conversion problems - and Zend Framework form escaping

I am writing a webapp in ZF and am having serious issues with UTF8. It's using multi lingual content through Zend Form and it seems that ZF heavily escapes all of these characters and basically just won't show a field if there's diacritical elements 'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
Zend Form allows for having non escaped data, but trying to use this is confusing, and it seems it'd need to be used all over the place.
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
And if the last question is true, then how do I convert the source text to UTF8? I am comfortable setting up apache so that it sends a default UTF8 charset heading, and also adding the charset meta tag to the html, but doing this I am still getting messed up encoding. I have also tried opening the translation csv file in TextWrangler on OSX as UTF8, but it has done nothing.
Thanks!
L
'é' and if I use the HTML entity equivalent e.g. é it gets escaped so that the user will see 'é'.
This I don't understand. Can you show an example of how it is displayed, as opposed to how it should be displayed?
So, I have been told that if the page and the text is in UTF8, no conversion to htmlentities is required. Is this true?
Yup. In more detail: If the data you're displaying and the encoding of the HTML page are both UTF-8, the multi-byte special characters will be displayed correctly.
And if the last question is true, then how do I convert the source text to UTF8?
Advanced editors and IDEs enable you to define what encoding the source file is saved in. You would need to open the file in its current encoding (with special characters being displayed correctly) and save it as UTF-8.
If the content is messed up when you have the right content-type header and/or meta tag specified, then the content is not UTF-8 yet. If you don't get it sorted, post an example of what it looks like here.