Emacs fill-paragraph not breaking lines where expected - emacs

I have the following HTML code (a list item). The content isn't important--the problem is the end of line 2.
<li>Yes, you can learn how to play piano without becoming a
great notation reader,
however, <strong class="warning">you <em class="emphatic">will</em>
have to acquire a <em class="emphatic">very</em>basic amount
of notation reading skill</strong>. But the extremely
difficult task of honing your note reading skills that
classical students are required to endure for years and years
is <em class="emphatic">totally non-existant</em>as a
requirement for playing non-classical piano.</li>
The command fill-paragraph (M-q) has been applied. I can't for the life of me figure out why a line break is being placed on the second line after "reader," since there's more space available on that line to put "however,". Another weird thing I've noticed is that when I delete and then reapply the tab characters on lines 4 and 5 (starting with "have" and "of" respectively), two space characters are automatically inserted as well, like so:
<li>Yes, you can learn how to play piano without becoming a
great notation reader,
however, <strong class="warning">you <em class="emphatic">will</em>
have to acquire a <em class="emphatic">very</em>basic amount
of notation reading skill</strong>. But the extremely
difficult task of honing your note reading skills that
classical students are required to endure for years and years
is <em class="emphatic">totally non-existant</em>as a
requirement for playing non-classical piano.</li>
I don't know if this is some kind of clue or not. This doesn't happen with any of the other lines.
Is this just a bug, or does any experienced Emacs person know what might be going on here?
Thank you

This is intentional. Lines that start with an XML or SGML tag are paragraph separator lines. If Emacs broke the paragraph in such a way that the tag ended up at the start of a line, subsequent applications of fill-paragraph would stop at that line. This is to ensure that, for instance,
<p>a paragraph</p>
<!-- no blank line -->
<p>another paragraph</p>
does not turn into
<p>a paragraph</p> <!-- no blank line --> <p>another paragraph</p>
For the same reason, Emacs will not break a line after a period unless there are two or more spaces after the period, because it uses a double space to distinguish between a period that ends a sentence and a period that ends an abbreviation, and breaking a line after the period that ends an abbreviation would create an ambiguous situation.

Looks like a bug to me.
I was able to trim down your example to something like this:
<li>blabla
blabla <b>some_long_text_here</b> <b>more_long_text_here</b>
If I remove a single character of text from it, fill-paragraph works as expected. Or if I add a chacter between the two consequtive <b> elements.

Related

Is it correct to use Word Joiner (U+2060) in the same word?

In Bangla, Hosonto (U+09CD) is used to create a ligature, which joins adjacent letters. For example ক্ক is created using ক + ্ + ক. But sometimes we need a non-joining Hosonto (ক্‌ক). To make it possible, traditionally we use a Zero-width non-joiner (‌‌‌‌‌U+200C‌).
The problem with ‌‌‌‌‌ZWNJ is that, when the line is too long and line wrapping occurs, the word is broken into two lines. To keep the word as a whole, I need a character, something like “Zero-width non-breaking non-joiner”. But I don’t see such character in Unicode. So I think, Word Joiner (U+2060) is the best option.
To me, Word Joiner sounds like “joins two words”. But in my case, I need to join two parts of a single word. So, the question is, is it correct to use Word Joiner here?
U+200C ZERO WIDTH NON-JOINER has no effect on line breaking. Its absence or presence does not change where line wrapping can occur. If inserting a ZWNJ within a word causes that word to be broken across lines, then whatever application you are using to view your text does not implement the standard correctly.
ZWNJ is the only correct character for your purposes. More than that, using U+2060 WORD JOINER could in fact lead to inconsistent results. Much like ZWNJ does not affect line breaks, WJ is not supposed to affect joining behaviour (it is defined as “transparent” in that regard). While the standard doesn’t explicitly mention cases like this to the best of my knowledge, one could reasonably argue that inserting a WJ between the two letters in your example should not change the way they are displayed.

Is it possible to break a line on gist?

When writing in gist, and using mark down mode, we need to enter a blank line between two lines if we want to break a line (add a new line).
Is it possible to break a line without the need of a space (blank line between them)?
As Ryan said, in markdown the most conventional way of creating a linebreak is by adding two spaces at the end of a line. Though GFM also supports the use of basic HTML blocks, so <br> can also be used to create linebreaks, which can be helpful where multiple linebreaks are needed.

How to add empty spaces into MD markdown readme on GitHub?

I'm struggling to add empty spaces before the string starts to make my GitHub README.md looks something like this:
Right now it looks like this:
I tried adding <br /> tag to fix the new string start, now it works, but I don't understand how to add spaces before the string starts without changing everything to . Maybe there's a more elegant way to format it?
You can use <pre> to display all spaces & blanks you have typed. E.g.:
<pre>
hello, this is
just an example
....
</pre>
Markdown really changes everything to html and html collapses spaces so you really can't do anything about it. You have to use the for it. A funny example here that I'm writing in markdown and I'll use couple of here.
Above there are some without backticks
Instead of using HTML entities like and   (as others have suggested), you can use the Unicode em space (8195 in UTF-8) directly. Try copy-pasting the following into your README.md. The spaces at the start of the lines are em spaces.
The action of every agent <br />
  into the world <br />
starts <br />
  from their physical selves. <br />
I'm surprised no one mentioned the HTML entities   and   which produce horizontal white space equivalent to the characters n and m, respectively. If you want to accumulate horizontal white space quickly, those are more efficient than .
no space
 
  
  
Along with <space> and  , these are the five entities HTML provides for horizontal white space.
Note that except for , all entities allow breaking. Whatever text surrounds them will wrap to a new line if it would otherwise extend beyond the container boundary. With it would wrap to a new line as a block even if the text before could fit on the previous line.
Depending on your use case, that may be desired or undesired. For me, unless I'm dealing with things like names (John Doe), addresses or references (see eq. 5), breaking as a block is usually undesired.
Markdown gets converted into HTML/XHMTL.
John Gruber created the Markdown language in 2004 in collaboration with Aaron Swartz on the syntax, with the goal of enabling people to write using an easy-to-read, easy-to-write plain text format, and optionally convert it to structurally valid HTML (or XHTML).
HTML is completely based on using for adding extra spaces if it doesn't externally define/use JavaScript or CSS for elements.
Markdown is a lightweight markup language with plain text formatting syntax. It is designed so that it can be converted to HTML and many other formats using a tool by the same name.
If you want to use »
only one space » either use or just hit Spacebar (2nd one is good choice in this case)
more than one space » use +space (for 2 consecutive spaces)
eg. If you want to add 10 spaces contiguously then you should use
space space space space space
instead of using 10 one after one as the below one
For more details check
Adding multiple spaces between text in Markdown,
How to create extra space in HTML or web page.
After different tries, I end up to a solution since most markdown interpreter support Math environment.
The following adds one white space :
$~$
And here ten:
$~~~~~~~~~~~$
As a workaround, you can use a code block to render the code literally. Just surround your text with triple backticks ```. It will look like this:
2018-07-20 Wrote this answer
Can format it without
Also don't need <br /> for new line
Note that using <pre> and <code> you get slightly different behaviour: &nbsp and <br /> will be parsed rather than inserted literally.
<pre>:
2018-07-20 Wrote this answer
Can format it without
Also don't need for new line
<code>:
2018-07-20 Wrote this answer
Can format it without
Also don't need for new line
You can also use spaces from the known list:
  &hairsp;
'6-per-em space'  
'narrow no-break space'  
'thin space'    
'4-per-em space'   &emsp14;
'no breaking space'  
'punctuation space'   &puncsp;
'3-per-em space'   &emsp13;
'en space'    
'figure space'   &numsp;
'em space'    
I have tried so many methods on Github markdown.
Only starting the line with </br> with a normal empty line underneath works for me.
(so two line in total; one just </br> and one is empty)
One line of </br> will do the line break. The reason for the empty line underneath is that it won't mess up the formats of the content coming up.

Why is this LSEP symbol showing up on Chrome and not Firefox or Edge?

So this web page is rendering with these symbols and they are found throughout this website/application but on no other sites. Can anyone tell me
What this symbol is?
Why it is showing up only in one browser?
That character is U+2028 Line Separator, which is a kind of newline character. Think of it as the Unicode equivalent of HTML’s <br>.
As to why it shows up here: my guess would be that an internal database uses LSEP to not conflict with literal newlines or HTML tags (which might break the database or cause security errors), and either:
The server-side scripts that convert the database to HTML neglected to replace LSEP with <br>
Chrome just breaks standards by displaying LSEP as a printing (visible) character, or
You have a font installed that displays LSEP as a printing character that only Chrome detects. To figure out which font it is, right click on the offending text and click “Inspect”, then switch to the “Computed” tab on the right-hand panel. At the very bottom you should see a section labeled “Rendered Fonts” which will help you locate the offending font.
More information on the line separator, excerpted from the Unicode standard, Chapter 5.8, Newline Guidelines (on p. 12 of this PDF):
Line Separator and Paragraph Separator
A paragraph separator—independent of how it is encoded—is used to indicate a
separation between paragraphs. A line separator indicates where a line break
alone should occur, typically within a paragraph. For example:
This is a paragraph with a line separator at this point,
causing the word “causing” to appear on a different line, but not causing
the typical paragraph indentation, sentence breaking, line spacing, or
change in flush (right, center, or left paragraphs).
For comparison, line separators basically correspond to HTML <BR>, and
paragraph separators to older usage of HTML <P> (modern HTML delimits
paragraphs by enclosing them in <P>...</P>). In word processors, paragraph
separators are usually entered using a keyboard RETURN or ENTER; line
separators are usually entered using a modified RETURN or ENTER, such as
SHIFT-ENTER.
A record separator is used to separate records. For example, when exchanging
tabular data, a common format is to tab-separate the cells and to use a CRLF
at the end of a line of cells. This function is not precisely the same as line
separation, but the same characters are often used.
Traditionally, NLF started out as a line separator (and sometimes record
separator). It is still used as a line separator in simple text editors such as
program editors. As platforms and programs started to handle word processing
with automatic line-wrap, these characters were reinterpreted to stand for
paragraph separators. For example, even such simple programs as the Windows
Notepad program and the Mac SimpleText program interpret their platform’s NLF
as a paragraph separator, not a line separator. Once NLF was reinterpreted to
stand for a paragraph separator, in some cases another control character was
pressed into service as a line separator. For example, vertical tabulation VT
is used in Microsoft Word. However, the choice of character for line separator
is even less standardized than the choice of character for NLF. Many Internet
protocols and a lot of existing text treat NLF as a line separator, so an
implementer cannot simply treat NLF as a paragraph separator in all
circumstances.
Further reading:
Unicode Technical Report #13: Newline Guidelines
General Punctuation (U+2000–U+206F) chart PDF
SE: Why are there so many spaces and line breaks in Unicode?
SO: What is unicode character 2028 (LS / Line Separator) used for?
U+2028 on codepoints.net A misprint here says that U+2028 was added in v. 1.1 of the Unicode standard, which is false — it was added in 1.0
I found that in WordPress the easiest way to remove "L SEP" and "P SEP" characters is to execute this two SQL queries:
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('e280a9'), '')
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('e280a8'), '')
The javascript way (mentioned in some of the answers) can break some things (in my case some modal windows stopped working).
You can use this tool...
http://www.nousphere.net/cleanspecial.php
...to remove all the special characters that Chrome displays.
Steps:
Paste your HTML and Clean using HTML option.
You can manually delete the characters in the editor on this page and see the result.
Paste back your HTML in file and save :)
I recently ran into this issue, tried a number of fixes but ultimately I had to paste the text into VIM and there was an extra space I had to delete. I tried a number of HTML cleaners but none of them worked, VIM was the key!
9999years answers is great.
In case you use Symfony with Twig template I would recommend to check for an empty Twig block. In my case it was an empty Twig block with an invisible char inside.
The LSEP char was only displayed on certain device / browser.
On the other I had a blank space above the header and I could not see any invisible char.
I had to inspect the GET request to see that the value 1f18 was before the open html tag.
Once I removed an empty Twig block it was gone.
hope this can help someone one day ...
My problem was similar, it was "PSEP" or "P SEP". Similar issue, an invisible character in my file.
I replaced \x{2029} with a normal space. Fixed. This problem only appeared on Windows Chrome. Not on my Mac.
I agree with #Kapil Bathija - Basically you can copy & paste your HTML code into http://www.nousphere.net/cleanspecial.php and convert it.
Then it will convert the special characters for you - Just remove the spaces in between the words and you will realize you have to press backspace 2x meaning there is an invalid character that can't be translated.
I had the same issue and it worked just fine afterwards.
You can also copy the text, paste it into a HTML editor such as Coda, remove the linebreak, copy it and paste it back into your site.
Video here: https://www.loom.com/share/501498afa7594d95a18382f1188f33ce
Looks like my client pasted HTML into Wordpress after initially creating it with MS-Word. Even deleting the and visible spaces did not fix the issue. The extended characters became visible in vi/vim.
If you don't have vi/vim available, try highlighting from 2 chars before the LSEP to 2 chars after the LSEP; delete that chunk, and re-type the correct characters.

What is difference between \n and \r? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What is the difference between \r and \n?
Hi,
What is difference between \n (newline) and \r (carriage return)? They both move current cursor to the next line. Are they same?
\r returns the cursor to the beginning of the line, NOT to the next line. When you use \nin Linux, \r is implied, in windows, it is not.
Using \r in Unix-like systems may result in overwriting the same line.
I suggest you read this.
In short, a newline in Windows is "\r\n", while a newline in Unix is just "\n" (and, just to make life difficult, a newline in older Macs is "\r")
Actually, a carriage return is supposed to move the cursor to the beginning of the current line. Then, newline moves the cursor exactly down one.
Nowadays, compilers will often automatically convert one or the other to \r\n on Windows or \n on Linux. Mac used to use \r but they have changed to the \n convention.
(edit: removed false/untested statements)
Read The Great Newline Schism it explains everything in deep detail with great humor.
Ah the old days of the typewriter...
The difference between the two stems from days of yonder when typing was done directly to paper. It required two actions to go to the next line:
pushing the 'carriage' (big cilinder on the top) back to the left (this is where the character would end up).
shifting the paper one line up. (thus going down one line)
Splitting these two actions facilitated going back to a precise character position to correct it (there was no way to go up one line, or left one character!). Holding paper whiteout on the erroneous character and hitting that key would neatly whiteout exactly that erroneous character, then you could go back again and hit the correct key
(there was a key for not moving the carriage though).
In the young computer age these actions were translated 1 to 1 into \r for carriage return and \n for shifting the 'paper'.
Nowadays the major operating systems apparently have differing opinions on whether this is still necessary for computer technology where going back to previous position is much easier. However, in modern programming languages you'll generally see that \n is assumed to mean \r\n.
No they're not. Modern text editors often treat them the same however because their old uses don't make much sense for digital word processors.
For example \r literally means "return to the beginning of the line". While this might have been useful for a typewriter if you just wanted to overwrite everything on that line this sort of functionality doesn't make much sense for digital type.
\n on the other hand would simply move down a line without returning to the beginning. This was also useful on a typewriter for indentation or bulleting. Again, not something that makes much sense for digital type.
Telnet is one example where both characters are still used in this manner.
Both characters were included in ascii language simply because when it was being spec'd they hadn't realized that functionality that was useful on a typewriter didn't make much sense on a computer.