Rendering unicode characters in Markdown from Emacs - emacs

I have a Markdown text file, utf-8-encoded, that has some non-ASCII characters such as ’. I couldn't get the reference Perl implementation to handle these characters correctly, but I can get it work with Pandoc. I'd like to be able to render my Markdown file straight from Emacs, using C-c C-c p from Markdown mode, and that's still not working for me. I get what looks like a blank space instead of the non-ASCII character. For example,
I love apostrophe’s.
turns into
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<title></title>
<style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<p>I love apostrophe s.</p>
</body>
</html>
Note that the HTML contains charset=utf-8. It's nearly identical to what I get running Pandoc from the command line, except for the missing apostrophe. I'm invoking Pandoc, whether from Emacs or the command prompt, using pandoc -f markdown -t html -s --mathjax --highlight-style pygments, which I got from here.
Can I get the apostrophe and other unicode characters to render properly from Emacs?
EDIT: using the C-u C-x = command that #db48x suggested, I verified that the blank character is a regular space (#x20)

I had the same problem with German Umlaut characters and figured out a solution: If I add the line
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
to the beginning of the markdown file it ends up in the html-file and, although embedded in <p></p> tags, works. C-c C-c e exports nicely my Umlauts and your apostrophe’s, too.

To avoid typing <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> in the header of all your markdown files (which is not straightforward), you can tell the markdown mode of emacs to do this for you at compilation. You just have to customize the Markdown Xhtml Header Content variable and setting it to <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />.
You juste need to run M-x customize-mode, look for Markdown Xhtml Header Content and save with its new value.

Running M-x customize-variable and toggling markdown-command-needs-filename to on solved the problem for me.

Related

Set Chinese Fonts on HTML Emails (Outlook)

Is it possible to set a Chinese font on HTML Emails for Outlook 2013? I want to be able to change the style of the punctuation for commas and full stop.
So it'll look similar to the Microsoft JhengHei font instead of the SimSun font.
There are a couple things you can do to make sure Chinese characters display in web or email. First, some code for the email <head>:
<!DOCTYPE html>
<!--
Set HTML language attribute
zh = Chinese
zh-Hans = Chinese (Simplified)
zh-Hant = Chinese (Traditional)
-->
<html lang="zh" xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office">
<head>
<!--
utf-8 works for most cases, including Chinese
-->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
</html>
You must make sure that you save your document in UTF-8 format and upload the document to your server or ESP so that the format is preserved. Some editors won't do or aren't configured like this by default, so you may need to check on that.
But ultimately these fonts won't display if a user doesn't have them installed on their local system. Specifying an appropriate font stack behind Microsoft JhengHei will help ensure that something shows up.

A trouble with czech encoding

I'm new here and I have a question about an encoding.
I created a simple html page and I use czech characters in it (ěščřžýáí)
But when I open it in a browser, the characters are deformed and they look... Russian... and the encoding is set to "windows-2051" instead of "windows-2050" as it should.
So I added this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//CZ" "http://www.w3.org/TR/html4/strict.dtd">
And this:
<meta charset="windows-1250">
But it didn't help. Still looks russian. So, could you, please, help me?
TL:DR version:
Shows "dnщ zbэvб do zaибtku novй шady" instead of "dnů zbývá do začátku nové řady"
Thank you very much!
You could use UTF-8? Make sure your editor is also saving as UTF-8 Read this helped me a lot.
Also, for HTML-4, you need something more like this <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

Phantomjs text symbols not displayed with html to pdf

Hi I'm running phantomjs on centOS 6 and special text symbols are not displayed in the pdf output, such as ⊥ - up tack (u+22a5) and ∩ - intersection (u+2229). Phantomjs on my old server worked fine. Do I need to install special fonts on the new server?
I found my answer by adding this:
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
My reference:
https://jsreport.net/blog/national-characters-in-phantom-pdf-recipe

Encoding UTF-8 for Czech chars

I want to ask you, as a beginner, what basic settings for the document encoding are you doing with UTF-8?
An example how I do it below and am asking about repair if something is wrong. I want to rely on all devices in different browsers with different user settings will render the text as it should, so I will do the following:
I use Notepad ++ , first in the Format tab choose "change the encoding to UTF-8 (if its already not)";
Because I use <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> mostly or . <!DOCTYPE html>, then select the correct attribute for the meta tag in the head, so either <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> , respectively . <meta charset="UTF-8" />
I'm concerned mainly about the Czech characters
Am I right or isn´t it that simple if I expect cooperation between HTML, PHP or JS, maybe MySQL?
Thank you for your answers and sorry for incomplete English.
If you read text from a Database make sure that it is set to utf8 and that the columns are as well. Then you can use SET NAMES UTF8 to make sure the connection encoding is utf8 as well. Just make it your first query to the databse.

why "»" shows as a question mark("?") in my page?

Is there any restrictions for it to show normally?
Sounds like an encoding problem. For special characters like that, I prefer to use HTML entities. In this case, try »
After my experience, a question mark usually replaces undecodable special characters when you encode your special characters with utf8, because web browsers by default decode the web page using iso-latin1. You can/should explicitely declare the encoding of your web page using the following directive:
<?xml version="1.0" encoding="UTF-8" ?>
for xhtml, or
<meta http-equiv="Content-Type" content="text/html"; charset="utf-8">
(inside the element), for HTML.
Regard this post as a supplement, because I guess that using the xml/html entities like » or » mentioned above are the better way to go.
You can also use »
If your Apache server is configured with...
AddDefaultCharset UTF-8
...in the httpd.conf file (which, strangely, was the default on my server), then Content-Type specs in the .html files (e.g., <meta http-equiv=Content-Type content="text/html; charset=windows-1252">) will be ignored, causing character codes above 127 to be interpreted incorrectly.
Comment out the AddDefaultCharset line and restart Apache.