regex_replace to replace certain html tags - perl

Is there a way to convert BR tags and/or DIV tags to new lines so it will format correctly when I use an in a mailto? I was thinking I should look for any P, DIV, and BR tags and replace them with a new line character. So anywhere there is a closing tag put the new line character and remove the opening tag. After I do the above I will remove the rest of the html with remove_html="1" but I want to keep the paragraph format.
I thought it can be done using regex_replace but I'm not sure how to write it. Anyone know?

Do not parse HTML files using regex, use HTML parser (HTML::TreeBuilder or something similar that can do in line changes) module, or in this case, even better use XSLT transformations.

Related

Superscript within code block in Github Markdown

The <sup></sup> tag is used for superscripts. Creating a code block is done with backticks. The issue I have is when I try to create a superscript within a code block, it prints out the <sup></sup> tag instead of formatting the text between the tag.
How do I have superscript text formatted correctly when it's between backticks?
Post solution edit
Desired output:
A2 instead of A<sup>2</sup>
This is not possible unless you use raw HTML.
The rules specifically state:
With a code span, ampersands and angle brackets are encoded as HTML entities automatically, which makes it easy to include example HTML tags.
In other words, it is not possible to use HTML to format text in a code span. In fact, a code span is plain, unformatted text. Having any of that text appear as a superscript would mean it is not plain, unformatted text. Thus, this is not possible by design.
However, the rules also state:
Markdown is not a replacement for HTML, or even close to it. Its
syntax is very small, corresponding only to a very small subset of
HTML tags. The idea is not to create a syntax that makes it easier
to insert HTML tags. In my opinion, HTML tags are already easy to
insert. The idea for Markdown is to make it easy to read, write, and
edit prose. HTML is a publishing format; Markdown is a writing
format. Thus, Markdown's formatting syntax only addresses issues that
can be conveyed in plain text.
For any markup that is not covered by Markdown's syntax, you simply
use HTML itself. ...
So, if you really need some text in a code span to be in superscript, then use raw HTML for the entire span (be sure to escape things manually as required):
<code>A code span with <sup>superscript</sup> text and escaped characters: "<&>".</code>
Which renders as:
A code span with superscript text and escaped characters: "<&>".
This is expected behaviour:
Markdown wraps a code block in both <pre> and <code> tags.
You can use Unicode superscript and subscript characters within code blocks:
class SomeClass¹ {
}
Inputting these characters will depend on your operating system and configuration. I like to use compose key sequences on my Linux machines. As a last resort you should be able to copy and paste them from something like the Wikipedia page mentioned above.
¹Some interesting footnote, e.g. referencing MDN on <pre> and <code> tags.
If you're luck, the characters you want to superscript (or subscript) may have dedicated codepoints in Unicode. These will work inside codeblocks, as demonstrated in your question, where you include A² in backticks. Eg:
Water (chemical formula H₂O) is transparent, tasteless and odourless.
I've listed out the super and subscript Unicode characters in this Gist. You should be able to copy and paste any you need from there.

Minify HTML files in text/html templates

I use mustache/handlebar templates.
eg:
<script id="contact-detail-template" type="text/html">
<div>... content to be compressed </div>
</script>
I am looking to compress/minify my HTML files in the templates for the best compression.
YUIcompressor, closure does not work as they think that it is script and gives me script errors.
HTMLCompressor does not touch them even as it thinks that it is a script.
How do I minify the content in the script tags with type text/html?
Can I use a library?
If not, is sed or egrep a preferable way? Do you have sed/egrep syntax to remove empty lines (with just spaces or tabs), remove all tabs, trim extra spaces?
Thanks.
sed -e "s/^[ \t]*//g" -e "/^$/d" yourfile This will remove all the extra spaces and tabs from the begining, and remove all empty lines.
sed -e "s/^[ \t]*//g" -e ":a;N;$!ba;s/\n//g" yourfile This will remove all the extra spaces and tabs from the begining, and concatenate all your code.
Sorry if i missed something.
Use sed ':a;N;$!ba;s/>\s*</></g' file, it enables to you remove whitespaces and newlines where unneeded. Unlike ghaschel example, this doesn't remove those useful whitespaces in the beginning of the line as it preserves <pre> and <p> tags.
This is useful as you can remove whitespaces between > and < which is a common method to enlarge a html file. This example could also be used for a XML file like atom feed and rss feed for example.
I personally use this as a pipe in my site generator, this can reduce a normaly file size and can be use in conjunction with gzip.
Try using Pretty Diff to minify this kind of code. It will only assume the stuff inside script tags is JavaScript if there is no mime type or if the type is one of the various JavaScript types. It is also intelligent enough to know which white space is okay to remove without corrupting the output of content or the recursive beautification of code later.

Only display one paragraph of text

You can set what the Facebook Share preview says. I would like it to be the first paragraph of my movable type entry. The people who make entries sometimes use
<p>
tags or they use the rich editor which puts in two
<br /><br />
tags to separate paragraphs.
Is there a way I can have movable type detect when the first paragraph end and only display the first paragraph? I would like to add that to my entry template so it will add some information to my head.
EntryBody has a lot of attributes to help format the output of the tag. You can use those to change the content so it shows up correctly in HTML, JavaScript, PHP, XML or other forms of output.
If you understand how to use regular expressions, you can use that and an additional language, say PHP, to break the body up into an array and only output the first paragraph or element of the array.
The simplest thing, though, I would think, would be to do something like
<mt:EntryBody words=100>
That will cut off the entry body after the first 100 words. You could also require users to upload an excerpt with the entry and use the entry excerpt for Facebook, instead.

Decode HTML from XML with NewLine

First I parse XML and retrieve this:
<p><strong>Berns Salonger - the City's
The I decode it with MWFeedParser (stringByDecodingHTMLEntities) and retrieve this:
<p><strong>Berns Salonger - the City's Ideal Meeting Place
Note that this is only one line of many many lines which includes alot of tags.
Then I replace with \n and the console writes out the text with new lines. Everything is great except that all the other HTML tags is still there.
So I then run stringByConvertingHTMLToPlainText and all HTML tags dissapears. But also my replaced new lines.
How can I decode HTML without and at the same time replace with \n to print out a nice formatted text in a UITextView?
Instead of replacing <br> with \n, try replacing it with an HTML entity for newline:
. Then, when you call stringByConvertingHTMLToPlainText, it will convert the entity to an actual newline character.

with tinymce, how to convert an html tag into a different format

I want to convert an HTML tag that tinymce returns into a different format.
e.g.
The italics tag I want to convert to #i#
Is that possible with the editor itself?
During postback I strip all html tags, so I need it in a different safer format.
Add an onsubmit call to your form and use a simple javascript function to string replace the html tags you want to keep.
A more constructive method that might achieve what you want is to use the built in 'Valid elements' feature of tinymce. You can specify exactly which HTML tags you want to keep and it will strip out anything else. Plus it might be able to save you the step of stripping out the HTML yourself.
e.g.
valid_elements : "i,b,u",
http://wiki.moxiecode.com/index.php/TinyMCE:Configuration/valid_elements