Markdown: less sign and back slash interpretation - github

I'm writing a README.md file on GitHub.
In my project explanation I would like to refer to a Windows user path, in particular:
C:\Users\<username>\AppData\Local
I just see here on SO that the formatting is working fine. On Github, instead, that row is shown like this:
C:\Users<username>\AppData\Local\
even if the md file contains the correct format. It seems that the \ and the < are (due to some rule I don't know) simply converted into <.
Do you have any clue why is happening?

Markdown allowd HTML entities in it, so '<' is <, '\' is e.g. \ (also \\ works for me on GitHub) and '>' is > etc.; search for HTML special characters if you need more. you can encode verything in UTF8 easily by the notation with &#dddd; where d stands for a digit.
unfortunately often in code segments or other interpreters it is different.
examples working in GitHub editor with preview:
C:\\Users\\\<username\>\\AppData\\Local
C:\\Users\\<username>\\AppData\\Local
C:\Users\<username>\AppData\Local
all result to the same output.

Related

GitHub ReadMe.md Showing Some Images but not Others

So I was working on a little personal project with lots of pretty graphs in Jupyter Notebook. I made a .ipynb markdown file to make the ReadMe, since I can actually view the rReadMe as it will appear in the .md file when I move it over to GitHub, and it all worked fine and dandy.
And then I transferred it all to GitHub, and for some reason, 5 of the 7 graphs are refusing to appear.
https://github.com/nmwhitehead/Healthcare-Costs
If you can't view the code in the Read Me.ipnyb file, you'll have to take me work that all of the images would input with the same format of <img src='Graphs/Healthcare_Cost.png'/>. Every time I search how to fix this problem, everything says it's gotta be a formatting thing, but <img src='Graphs/50%_Predictor.png'/> is the same format and it's not appearing and I am losing my mind here.
Your files contain a % symbol in their name. This file name is simply copied in the URL link of the image.
However, in URL encoding, the percent symbol is special. You need to use %25 in place of %. Doing so in the URL fixes the link.
% is a URL metacharacter and needs to be escaped. It lets you represent any ASCII character by following with the hex representation of an ASCII character. You use it to represent metacharacters like a space %20 or / as %2f or % as %25.
https://github.com/nmwhitehead/Healthcare-Costs/raw/master/Graphs/50%_Predictor.png means... well %_P is nonsense. Though some servers might accept it as a literal %_P, Github is following the standard and does not. The URL has incorrect syntax so you're getting a 400 Bad Request.
The % needs to be encoded as %25. https://raw.githubusercontent.com/nmwhitehead/Healthcare-Costs/master/Graphs/50%25_Predictor.png
It's probably best to avoid this gotcha and rename the files. 50pct_Predictor.png, for example.

Escape newline in source

There's a huge wiki page in our dokuwiki I have the pleasure of having to edit, the problem is that most of it is a giant table.
I know you can insert a newline into the result without writing a newline in the markup (which would be interpreted as a paragraph change), but all I want to do is put line breaks in the source and not have it affect the wiki page at all (so it's easier for editors to read, like an html table I suppose, where literal newlines are ignored).
So is there some syntax available to escape a newline in dokuwiki, not unlike \ for bash or ^ for DOS?
What you ask is afaik not possible in the dokuwiki core.
One alternative would be the edittable-plugin. It lets you edit your table within an actual table-editor. However the current version is somewhat flaky in very wide tables. Should that be an issue for you, then you could try an older version from June 2015.

Superscript in markdown (Github flavored)?

Following this lead, I tried this in a Github README.md:
<span style="vertical-align: baseline; position: relative;top: -0.5em;>text in superscript</span>
Does not work, the text appears as normal. Help?
Use the <sup></sup>tag (<sub></sub> is the equivalent for subscripts). See this gist for an example.
You have a few options for this. The answer depends on exactly what you're trying to do, how readable you want the content to be when viewed as Markdown and where your content will be rendered:
HTML Tags
As others have said, <sup> and <sub> tags work well for arbitrary text. Embedding HTML in a Markdown document like this is well supported so this approach should work with most tools that render Markdown.
Personally, I find HTML impairs the readable of Markdown somewhat, when working with it "bare" (eg. in a text editor) but small tags like this aren't too bad.
LaTeX (New!)
As of May 2022, GitHub supports embedding LaTeX expressions in Markdown docs directly. This gives us new way to render arbitrary text as superscript or subscript in GitHub flavoured Markdown, and it works quite well.
LaTeX expressions are delineated by $$ for blocks or $ for inline expressions. In LaTeX you indicate superscript with the ^ and subscript with _. Curly braces ({ and }) can be used to group characters. You also need to escape spaces with a backslash. The GitHub implementation uses MathJax so see their docs for what else is possible.
You can use super or subscript for mathematical expressions that require it, eg:
$$e^{-\frac{t}{RC}}$$
Which renders as..
Or render arbitrary text as super or subscript inline, eg:
And so it was indeed: she was now only $_{ten\ inches\ high}$, and her face brightened up at the thought that she was now the right size for going through the little door into that lovely garden.
Which renders as..
I've put a few other examples here in a Gist.
Unicode
If the superscript (or subscript) you need is of a mathematical nature, Unicode may well have you covered.
I've compiled a list of all the Unicode super and subscript characters I could identify in this gist. Some of the more common/useful ones are:
⁰ SUPERSCRIPT ZERO (U+2070)
¹ SUPERSCRIPT ONE (U+00B9)
² SUPERSCRIPT TWO (U+00B2)
³ SUPERSCRIPT THREE (U+00B3)
ⁿ SUPERSCRIPT LATIN SMALL LETTER N (U+207F)
People also often reach for <sup> and <sub> tags in an attempt to render specific symbols like these:
™ TRADE MARK SIGN (U+2122)
® REGISTERED SIGN (U+00AE)
℠ SERVICE MARK (U+2120)
Assuming your editor supports Unicode, you can copy and paste the characters above directly into your document or find them in your systems emoji and symbols picker.
On MacOS, simultaneously press the Command ⌘ + Control + Space keys to open the emoji picker. You can browse or search, or click the small icon in the top right to open the more advanced Character Viewer.
On Windows, you can a emoji and symbol picker by pressing ⊞ Windows + ..
Alternatively, if you're putting these characters in an HTML document, you could use the hex values above in an HTML character escape. Eg, ² instead of ². This works with GitHub (and should work anywhere else your Markdown is rendered to HTML) but is less readable when presented as raw text.
Images
If your requirements are especially unusual, you can always just inline an image. The GitHub supported syntax is:
![Alt text goes here, if you'd like](path/to/image.png)
You can use a full path (eg. starting with https:// or http://) but it's often easier to use a relative path, which will load the image from the repo, relative to the Markdown document.
If you happen to know LaTeX (or want to learn it) you could do just about any text manipulation imaginable and render it to an image. Sites like Quicklatex make this quite easy. Of course, if you know your document will be rendered on GitHub, you can use the new (2022) embedded LaTeX syntax discussed earlier)
Comments about previous answers
The universal solution is using the HTML tag <sup>, as suggested in the main answer.
However, the idea behind Markdown is precisely to avoid the use of such tags:
The document should look nice as plain text, not only when rendered.
Another answer proposes using Unicode characters, which makes the document look nice as a plain text document but could reduce compatibility.
Finally, I would like to remember the simplest solution for some documents: the character ^.
Some Markdown implementation (e.g. MacDown in macOS) interprets the caret as an instruction for superscript.
Ex.
Sin^2 + Cos^2 = 1
Clearly, Stack Overflow does not interpret the caret as a superscript instruction. However, the text is comprehensible, and this is what really matters when using Markdown.
If you only need superscript numbers, you can use pure Unicode. It provides all numbers plus several additional characters as superscripts:
x⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿⁱ
However, it might be that the chosen font does not support them, so be sure to check the rendered output.
In fact, there are even quite a few superscript letters, however, their intended use might not be for superscript, and font support might be even worse. Use your own judgement.

Decoding Korean text files from the 90s

I have a collection of .html files created in the mid-90s, which include a significant ammount of Korean text. The HTML lacks character set metadata, so of course all of the Korean text now does not render properly. The following examples will all make use of the same excerpt of text .
In text editors such as Coda and Text Wrangler the text displays as
╙╦ ╝№бя└К ▓щ╥НВь╕цль▒Ф ▓щ╥НВь╕цль▒Ф
Which in the absence of character set metadata in < head > is rendered by the browser as:
ÓË ¼ü¡ïÀŠ ²éÒ‚ì¸æ«ì±” ²éÒ‚ì¸æ«ì±”
Adding euc-kr metadata to < head >
<meta http-equiv="Content-Type" content="text/html; charset=euc-kr">
Yields the following, which is illegible nonsense (verified by a native speaker):
沓 숩∽핅 꿴�귥멩レ콛 꿴�귥멩レ콛
I have tried this approach with all historic Korean character sets, each yielding similarly unsuccessful results. I also tried parsing and upgrading to UTF-8, via Beautiful Soup, which also failed.
Viewing the files in Emacs seems promising, as it reveals the text encoding a lower level. The following is the same sample of text:
\323\313 \274\374\241\357\300\212
\262\351\322\215\202\354\270\346\253\354\261\224 \262\3\
51\322\215\202\354\270\346\253\354\261\224
How can I identify this text encoding and promote it to UTF-8?
All of those octal codes that emacs revealed are less than 254 (or \376 in octal), so it looks like one of those old pre-Unicode fonts that just used it's own mapping in the ASCII range. If this is right, you'll just have to try to figure out what font it was intended for, find it and perhaps do the conversion yourself.
It's a pain. Many years ago I did something similar for some popular pre-Unicode Greek fonts: http://litot.es/unicode-converter/ (the code: https://github.com/seanredmond/Encoding-Converter)
In the end, it is about finding the correct character encoding and using iconv.
iconv --list
displays all available encodings. Grepping for "KR" reveals at least my system can do CSEUCKR, CSISO2022KR, EUC-KR, ISO-2022-KR and ISO646-KR. Korean is also BIG5HKSCS, CSKSC5636 and KSC5636 according to Wikipedia. Try them all until something reasonable pops out.
Even if this thread is old, it's still an issue, and not having found a way to convert the files in bulk (outside of using a Korean version of Windows7), now I'm using Naver, which has a cloud service like Google docs and if you upload those weirdly encoded files there, it deals with them very well. I just edit and copy the text, and it's back to being standard when I copy it elsewhere.
Not the kind of solution I like, but it might save a few passers-by.
You can register for the cloud account with an ID, even if you do not live in SKorea by the way, there's some minimal english to get by.

Japanese characters in a latex \section{} cause an error

I am working on getting Japanese documents created with latex. I have installed the latest version of texlive-2008 which includes CJK.
In my document I have the following:
\documentclass{class}
\usepackage{CJK}
\begin{document}
\begin{CJK*}{UTF8}{min}
\title{[Japanese Characters here 1]}
\maketitle
\section{[Japanese Characters here 2]}
[Japanese Characters here 3]
\end{CJK*}
\end{document}
In the above code there are 3 locations Japanese characters are used.
1 + 3 work fine whereas 2, which contains Japanese characters in a \section{} fails with the following error.
! Argument of \#sect has an extra }.
After some research it turns out this error manifests when you’ve put a fragile command inside a moving argument. A moving argument because section can be moved to a contents page for example.
Does anyone know how to get this to work and why latex thinks Japanese characters are "fragile".
Sorry to post this as an answer rather than a comment to your answer; I don't have enough rep yet to comment. (EDIT: Now I have enough rep to comment, but I'm not sorry anymore. Thanks Will.)
Your solution of replacing
\section{[Japanese Text]}
with
\section{\texorpdfstring{[Japanese Text]}{}}
suggests that you're using the hyperref package. When you use the hyperref package, any sort of not-totally-boring text (e.g. math) within \section causes a problem because \section is having trouble generating pdf bookmarks. \texorpdfstring allows you to specify how you want the section title to appear in the pdf bookmark. For example, I might write
\section{Calculation of \texorpdfstring{$H_2(\mathcal{X})$}{H\_2(X)}}
if I want the section title to be "Calculation of $H_2(\mathcal{X})$" but I want the pdf bookmark to be "Calculation of H_2(X)".
You should probably use xetex/xelatex, as it has been created to support unicode. The change is sometimes not easy for already existing documents, though. (xelatex should be included in texlive, it is just different executable to call -- this is how it is done in Debian).
I have managed to get this working now!
Using Latex and CJK as before.
\section{[Japanese Text]}
was replaced with
\section{\texorpdfstring{[Japanese Text]}{}}
Now the contents pages and section titles work and update fine.