Tumblr: PostSummary Outputs HTML Codes - tumblr

When I am trying to use these lines:
{block:PostSummary}
{PostSummary}
{/block:PostSummary}
I got HTML codes, not an excerpted pure text!
It's a big problem on Title and Meta:Description elements.
What should I do?
Thank you!!

Have a a look at http://www.tumblr.com/docs/en/custom_themes#variable-transformations
By prefixing variables with special transformation keywords, Tumblr will output variables in specialized formats — useful when passing data to Javascript, etc.
[…]
Plaintext
Prefix any theme variable with Plaintext to output the string with HTML-tags stripped and appropriate characters converted to HTML-entities so they’re safe to include in HTML attributes, etc..
So it'd probably look like:
{block:PostSummary}
{PlaintextPostSummary}
{/block:PostSummary}

Related

strikethrough code in markdown on github

I am talking about github markdown here, for files like README.md.
Question:
Is it possible to strikethrough a complete code block in markdown on github?
I know how to mark text as a block of code
this is
multiline code
and
this
this
also
by indenting by 4 spaces or by using ``` or `...
I also know how to strike through texts using
del tag
s tag
~~
Temporary solution:
Independently they work fine, but together not as expected or desired. I tried several combinations of the above mentioned.
For now, I use this:
striked
through
by using ~~ and ` for every single line.
Requirement:
I would like to have a code formatted text striked through, where the code block is continuous:
unfortunately, this is
not striked through
or at least with only a small paragraph in between:
unfortunately, also not
striked through
Is this possible at all?
I found some old posts and hints on using jekyll, but what I was searching for is a simple way, preferably in markdown.
This would only be possible with raw HTML, which GitHub doesn't allow. But you may be able to use a diff instead.
Code blocks are for "pre-formatted" text only. The only formatting you can get in a code block is the formatting that can be represented in plain text (indentation, capitalization, etc). There is no mechanism to mark up the content of a code block (as bold, italic, stricken, underlined, etc). This was an intentional design decision. Otherwise, how would you be able to show Markdown text in a code block? If you want formatted text, then you need to use something other than a code block.
As the rules state:
HTML is a publishing format; Markdown is a writing format. Thus, Markdown’s formatting syntax only addresses issues that can be conveyed in plain text.
For any markup that is not covered by Markdown’s syntax, you simply use HTML itself.
Therefore you would need to format your own custom HTML code block with the various bits marked up properly:
<pre><code><del>some stricken code</del>
<del>A second line of stricken code</del>
</code></pre>
However, for security reasons, GitHub will strip out any such raw HTML in your Markdown. So while this works where you have full control of the entire stack, on a hosted service it is most likely not possible.
However, I'm assuming you want to show some changes made to a block of code. As it turns out, a specific format already exists for that, namely, a diff. Just use a fenced code block with diff as the language and GitHub will format it correctly:
```diff
Unchanged Line
- Removed Line
+ Added Line
```
You can see how GitHub displays the above code block live (you can also see that in raw), but I've included a screenshot below for convenience.
I realize that the formatting does not use strike-through, but it does use a commonly used and understood format. For more complex blocks, you should probably use the diff utility program to generate the diff for you.
Expanding on Waylan's answer:
This may be obvious to others, but it caught me. When you have indented lines, be sure + or - is the first character on the line or it won't highlight.
```diff
<div>
Unchanged Line
<ul>
- <li>This won't work</li>
- <li>This will</li>
+ <li>1st character, then indent</li>
</ul>
</div>
```
After much much trying, I finally got it to work! It boils down to this:
inside ``` block, nothing is rendered (other than syntax highlights for language specified)
inside <code> block, markdown won't render, only HTML. You can use <strike>. It's fine, but you don't get the syntax coloring
now for the magic: use HTML for striking, and markdown for coloring:
<strike>
```language
this is
multiline code
```
</strike>
P.S. ``` blocks should always be surrounded by blank lines to work
On the subject of marking up the content of a code block, to tack an italicized string on to the end of a line of "code", try something like:
<code>id\_pn\_aside\_subscriber\_form\__form\_id_</code>
(You can see this in action at: https://github.com/devonostendorf/post-notif#how-do-you-use-the-stylesheet_filename-attribute-with-the-shortcode)
I had a hard time finding an example that matched this precise use case, so I hope this proves useful for anyone else trying to accomplish a similar effect.

Superscript within code block in Github Markdown

The <sup></sup> tag is used for superscripts. Creating a code block is done with backticks. The issue I have is when I try to create a superscript within a code block, it prints out the <sup></sup> tag instead of formatting the text between the tag.
How do I have superscript text formatted correctly when it's between backticks?
Post solution edit
Desired output:
A2 instead of A<sup>2</sup>
This is not possible unless you use raw HTML.
The rules specifically state:
With a code span, ampersands and angle brackets are encoded as HTML entities automatically, which makes it easy to include example HTML tags.
In other words, it is not possible to use HTML to format text in a code span. In fact, a code span is plain, unformatted text. Having any of that text appear as a superscript would mean it is not plain, unformatted text. Thus, this is not possible by design.
However, the rules also state:
Markdown is not a replacement for HTML, or even close to it. Its
syntax is very small, corresponding only to a very small subset of
HTML tags. The idea is not to create a syntax that makes it easier
to insert HTML tags. In my opinion, HTML tags are already easy to
insert. The idea for Markdown is to make it easy to read, write, and
edit prose. HTML is a publishing format; Markdown is a writing
format. Thus, Markdown's formatting syntax only addresses issues that
can be conveyed in plain text.
For any markup that is not covered by Markdown's syntax, you simply
use HTML itself. ...
So, if you really need some text in a code span to be in superscript, then use raw HTML for the entire span (be sure to escape things manually as required):
<code>A code span with <sup>superscript</sup> text and escaped characters: "<&>".</code>
Which renders as:
A code span with superscript text and escaped characters: "<&>".
This is expected behaviour:
Markdown wraps a code block in both <pre> and <code> tags.
You can use Unicode superscript and subscript characters within code blocks:
class SomeClass¹ {
}
Inputting these characters will depend on your operating system and configuration. I like to use compose key sequences on my Linux machines. As a last resort you should be able to copy and paste them from something like the Wikipedia page mentioned above.
¹Some interesting footnote, e.g. referencing MDN on <pre> and <code> tags.
If you're luck, the characters you want to superscript (or subscript) may have dedicated codepoints in Unicode. These will work inside codeblocks, as demonstrated in your question, where you include A² in backticks. Eg:
Water (chemical formula H₂O) is transparent, tasteless and odourless.
I've listed out the super and subscript Unicode characters in this Gist. You should be able to copy and paste any you need from there.

Is sanitizing html by removing angle brackets safe?

I want to sanitize a simple text field with a person's name, to protect from XSS and such. Stackoverflow pretty much says I must whitelist. I don't understand this. If I simply remove all < and > from the input value, or replace them with > and &ls;, does not that rule out code injection? Or am I missing something? Perhaps you only need to whitelist in more complex scenarios where you have to put up with angular brackets?
Sorry if it's a silly question, it's important to get this right.
Whether to whitelist or encode depends on how you want to use the text.
If you intend to treat the input as plain text, then encoding special characters is enough, and any HTML code entered will display as text only as long as you are careful not to allow unencoded text to end up anywhere in your HTML output. (This includes making sure any other systems you interface with don’t inappropriately use the unencoded text.)
If you want to allow some markup in the input, such as text styling or links, then you must whitelist the tags that you allow and get rid of all others.
No, it's not sufficient because if you were to include the person's name in an html attribute, you would also need to escape any double-quotes contained therein.

How to discover what codepage to use when converting RTF hex literals to Unicode

I'm parsing RTF 1.5+ files generated by Word 2003+ that may have content from other languages. This content is usually encoded as hex literals (\'xx). I would like to convert these literals to unicode values.
I know my document's code page by looking for ansicpg (\ansi\ansicpg1252).
When I use the ansicpg codepage to decode to Unicode, many languages (like French) seem to convert to the Unicode char values that I expect.
However when I see Russian text (like below), codepage 1252 decodes the content to jibberish.
\f277\lang1049\langfe1033\langnp1049\insrsid5989826\charrsid6817286
\'d1\'f2\'f0\'e0\'ed\'e8\'f6\'fb \'e1\'e5\'e7 \'ed\'e0\'e7\'e2\'e0\'ed\'e8\'ff. \'dd\'f2
\'e0 \'f1\'f2\'f0\'e0\'ed\'e8\'f6\'e0 \'ed\'e5 \'e4\'ee\'eb\'e6\'ed\'e0
\'ee\'f2\'ee\'e1\'f0\'e0\'e6\'e0\'f2\'fc\'f1\'ff \'e2 \'f2\'e0\'e1\'eb\'e8\'f6\'e5
\'e2 \'f1\'ee\'e4\'e5\'f0\'e6\'e0\'ed\'e8\'e8.
I assume that lang1049, langfe1033, langnp1049 should provide me clues so I can programmatically choose a different (non-default) code page for the text that they reference? If so, where can I find information that explains how to map a lang* code to a codepage? Or should I be looking for some other RTF command/directive to provide me with the information I'm looking for? (Or must I use \f277 as a font reference and see if it has an associated codepage?)
\lang really only marks up particular stretches of the text as being in a particular language, and shouldn't impact what code page is to be used for the old non-Unicode \' escapes.
Putting an \ansicpg token in the header should perhaps do it, but seems to be ignored by Word (for both raw bytes and \' escapes.
Or must I use \f277 as a font reference and see if it has an associated codepage?
It looks that way. Changing the \fcharset of the font assigned to a particular stretch of text is the only way I can get Word to change how it treats the bytes, anyway. The codes in this token (see eg here for list) are, aggravatingly, different again from either the language ID or the code page number.
It is not so clear but you can use the RichEdit control in order to convert the RTF to UTF-8 format according to the MSDN:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb774304(v=vs.85).aspx
Take a look to the SF_USECODEPAGE for the EM_STREAMOUT message.

NSURL doesn't work any time

i have the following problem sometimes my openURL-Dialog works perfectly, then i looked at the variable from the url and that is the variable:
www.brehm-gmbh.de
but some other times there are some crazy elements at the end of the variable like this:
www.adamczyk-fenster.de%E2%80%8E
i get this pages from an .asc file and both are in this file normal without this elements,
what can i do to solve this problem?
thank you all for helping beforehand
From Wikipedia:
The left-to-right mark (LRM) is a
control character or non-printing
character, used in the computerized
typesetting of bi-directional text,
containing mixed left-to-right scripts
(such as English and Russian) and
right-to-left scripts (such as Arabic
and Hebrew). It is used to change the
way adjacent characters are grouped
with respect to text direction.
You're getting this because (1) you've got non-English URLs, are composing URLs from non-English strings or you have some other non-English elements and the string encoding is attempting to compensate or (2) it's garbarge being interpreted as an encoding (unlikely if it is consistant.)
Call -[NSString localizedNameOfStringEncoding] on the string before you use it see what encoding it is using. You probably need to explicitly establish an encoding when you read in the strings before you put them in the NSURL.