Constraining block types in draftjs editor - draftjs

In draftjs plain paragraphs are given have a block type of unstyled, although draftjs does define a paragraph type. Similarly when pasting, the default block type for p or div tags is unstyled.
What is the reason for this design? Is there a way to instead use paragraph as the default block type for normal paragraphs?

Related

Using characters larger than 0xFFFF

I have an OpenType font with some optional glyphs selected by features. I've opened it in FontForge and I can see that the associated unicode code point is, for example, 0x1002a.
Is it possible to use this value to render the glyph in iText? I've tried calling showText() with a string containing the corresponding surrogate pairs ("\uD800\uDC2A") but nothing appears.
Is there another way to do this, or am I barking up the wrong tree?

How to generalize special entities

We use Apache UIMA Ruta for processing our documents. The input documents contains all kind of patterns that we try to recognize and translate to a hierarchy of annotations.
One of the things we will do with the result is to decorate the input text with links. For that it's import that we know the original position information of the found annotations.
Some of the annotations are based on value lists. We use MarkTable to resolve them.
The problem we have is that input document can contain different kind of special entities. For example, the document can contain also words that contain entities like & 𝌆. These can also exist in words / sentences that will be looked up into valuelists.
We are searching for an option to generalize (convert) all that kind of options to a normal "plain text" format, so that we don't have to add all kind of options, with special entities to the valuelists.
Doing a pre-processing of the document and replace them all (for example with the HTMLConverter engine) is AFAIK not a good option, because that will also change the position info. & should match on &, but still seen as size 5.
I tried to use the replace action, that will add an extra "replacement" attribute to the annotation. When I add an interceptor (aspect) to the getCoveredText of the annotation class, and return replacement instead of real text if available, the matching will succeed. But this give problems if the replacement text contains spacers (the end position is still equal with the original text / first RutBasic).
Any suggestions how we can solve this?
I solved this issue by building a pre- and post processor for the content.
In the pre-processor I replace text fragments with other text. For example the & and & will be replaced by a normal &. While preprocessing I store each replacement details in an replacement object, that will be added to an ordered list. A replacement object contains the original text and the difference in length (& is 4 characters longer than a single &).
After annotating with RUTA(and other annotators) I correct all the found annotation values (text) to the original value and I fix the position information (begin and end) of the annotations, so that they match with the original content. I use the list with replacement details for this process.

pandoc markdown to docx - keep list on one page

I have a markdown list like so:
* Question A
- Answer 1
- Answer 2
- Answer 3
I need to ensure that all the answers (1 - 3) appear on the same page as Question A when I convert the markdown document to docx using pandoc. How can I do this?
Use custom styles in your Markdown and then define those styles in a custom docx template.
It's important to note that Pandoc's documentation states (emphasis added):
Because pandoc’s intermediate representation of a document is less
expressive than many of the formats it converts between, one should
not expect perfect conversions between every format and every other.
Pandoc attempts to preserve the structural elements of a document, but
not formatting details...
Of course, Markdown has no concept of "pages" or "page breaks," so that is not something Pandoc can handle by default. However, Pandoc is aware of docx styles. As the documentation explains:
By default, pandoc’s docx output applies a predefined set of styles
for blocks such as paragraphs and block quotes, and uses largely
default formatting (italics, bold) for inlines. This will work for
most purposes, especially alongside a reference.docx file. However, if
you need to apply your own styles to blocks, or match a preexisting
set of styles, pandoc allows you to define custom styles for blocks
and text using divs and spans, respectively.
If you define a div or span with the attribute custom-style, pandoc
will apply your specified style to the contained elements. So, for
example using the bracketed_spans syntax,
[Get out]{custom-style="Emphatically"}, he said.
would produce a docx file with “Get out” styled with character style
Emphatically. Similarly, using the fenced_divs syntax,
Dickinson starts the poem simply:
::: {custom-style="Poetry"}
| A Bird came down the Walk---
| He did not know I saw---
:::
would style the two contained lines with the Poetry paragraph style.
If the styles are not yet in your reference.docx, they will be defined
in the output file as inheriting from normal text. If they are already
defined, pandoc will not alter the definition.
If you don't want to define the style manually, but would like it applied to every list automatically (or perhaps to every list which follows a specific pattern), you could define a custom filter which applied the style(s) to every matching element in the document.
Of course, that only adds the style names to the output. You still need to define the styles (tell Word how to display elements assigned those styles). As the documentation for the --reference-doc option explains :
For best results, the reference docx should be a modified version of a
docx file produced using pandoc. The contents of the reference docx
are ignored, but its stylesheets and document properties (including
margins, page size, header, and footer) are used in the new docx. If
no reference docx is specified on the command line, pandoc will look
for a file reference.docx in the user data directory (see --data-dir).
If this is not found either, sensible defaults will be used.
To produce a custom reference.docx, first get a copy of the default
reference.docx: pandoc --print-default-data-file reference.docx >
custom-reference.docx. Then open custom-reference.docx in Word, modify
the styles as you wish, and save the file.
Of course, when modifying the custom-reference.docx in Word, you can add your new custom style which you have used in your Markdown. As #CindyMeister points out in a comment:
Word would handle this using styles, where the Question style would
have the paragraph setting "Keep with Next". the Answer style would
have this as well. A third style, for the last entry, would NOT have
the setting activated. In addition, all three styles would have the
paragraph setting "Keep together" activated.
Finally, when using pandoc to convert your Markdown to a Word docx file, use the option --reference-doc=custom-reference.docx and your custom style definitions will be included in the generated docx file. As long as you also properly identify which elements in the Markdown document get which styles, your should have a list which doesn't get broken across a page break as long at the entire list fits on one page.

eLisp. check if return value of (read-event) is a graphical character

I'm trying to check if the return value of (read-event) is a graphical character. Example: a (97) is a graphical character. return is not a graphical character. f1 is not a graphical character and so on. I tried a lot of ways to do that, but nothing works.
Did you try char-displayable-p? C-h f tells you:
char-displayable-p is an autoloaded Lisp function in mule-util.el.
(char-displayable-p CHAR)
Return non-nil if we should be able to display CHAR.
On a multi-font display, the test is only whether there is an
appropriate font from the selected frame's fontset to display
CHAR's charset in general. Since fonts may be specified on a
per-character basis, this may not be accurate.
But that says that it expects CHAR to be a character. So you might want to also test to make sure that it is, using characterp.
(In fact, characterp might be all you need: (characterp (read-event)). It depends on whether you care if a given character is displayable in your environment, i.e., given the fonts you have.)
You can often find a function with a name like char-displayable-p using apropos. Try, for instance:
M-x apropos RET char display RET
That shows you something like this:
Type RET on a type label to view its full documentation.
char-displayable-p
Function: Return non-nil if we should be able to display CHAR.
Properties: autoload
glyphless-char-display
Variable: Char-table defining glyphless characters.
Properties: char-table-extra-slots variable-documentation
glyphless-char-display-control
User option: List of directives to control display of glyphless characters.
Properties: standard-value custom-version custom-type custom-options custom-set custom-requests variable-documentation
nobreak-char-display
Variable: Control highlighting of non-ASCII space and hyphen chars.
Properties: variable-documentation
tabulated-list-glyphless-char-display
Variable: The glyphless-char-display table in Tabulated List buffers.
Properties: variable-documentation
update-glyphless-char-display
Function: Make the setting of glyphless-char-display-control take effect.

Superscript within code block in Github Markdown

The <sup></sup> tag is used for superscripts. Creating a code block is done with backticks. The issue I have is when I try to create a superscript within a code block, it prints out the <sup></sup> tag instead of formatting the text between the tag.
How do I have superscript text formatted correctly when it's between backticks?
Post solution edit
Desired output:
A2 instead of A<sup>2</sup>
This is not possible unless you use raw HTML.
The rules specifically state:
With a code span, ampersands and angle brackets are encoded as HTML entities automatically, which makes it easy to include example HTML tags.
In other words, it is not possible to use HTML to format text in a code span. In fact, a code span is plain, unformatted text. Having any of that text appear as a superscript would mean it is not plain, unformatted text. Thus, this is not possible by design.
However, the rules also state:
Markdown is not a replacement for HTML, or even close to it. Its
syntax is very small, corresponding only to a very small subset of
HTML tags. The idea is not to create a syntax that makes it easier
to insert HTML tags. In my opinion, HTML tags are already easy to
insert. The idea for Markdown is to make it easy to read, write, and
edit prose. HTML is a publishing format; Markdown is a writing
format. Thus, Markdown's formatting syntax only addresses issues that
can be conveyed in plain text.
For any markup that is not covered by Markdown's syntax, you simply
use HTML itself. ...
So, if you really need some text in a code span to be in superscript, then use raw HTML for the entire span (be sure to escape things manually as required):
<code>A code span with <sup>superscript</sup> text and escaped characters: "<&>".</code>
Which renders as:
A code span with superscript text and escaped characters: "<&>".
This is expected behaviour:
Markdown wraps a code block in both <pre> and <code> tags.
You can use Unicode superscript and subscript characters within code blocks:
class SomeClassÂą {
}
Inputting these characters will depend on your operating system and configuration. I like to use compose key sequences on my Linux machines. As a last resort you should be able to copy and paste them from something like the Wikipedia page mentioned above.
ÂąSome interesting footnote, e.g. referencing MDN on <pre> and <code> tags.
If you're luck, the characters you want to superscript (or subscript) may have dedicated codepoints in Unicode. These will work inside codeblocks, as demonstrated in your question, where you include A² in backticks. Eg:
Water (chemical formula Hâ‚‚O) is transparent, tasteless and odourless.
I've listed out the super and subscript Unicode characters in this Gist. You should be able to copy and paste any you need from there.