unexpected whitespace handling in rythm removes wanted blanks - rythm

In the attempt to move from Freemarker to Rythm I am getting the effect that Rythm removes a lot of whitespace.
I am aware that there are #compact #nocompact and #escape options.I tried some of these but they seem to have no effect on the whitespace handling.
According to http://rythmengine.org/doc/configuration.md#codegen_compact_enabled
the default handling is compacting.
Rythm seems to remove whitespace that I actively try to insert e.g.
no whitespace here
#nocompact() {
#for (int i=0;i<2;i++) {
please keep the whitespace
}
}
no whitespace here
will lead to
no whitespace here
please keep the whitespace
please keep the whitespace
no whitespace here
effectively changing the whitespace to a single space.
How can the original whitespace setting be kept?
Is the non functioning nocompact() a bug?

I think it's a bug. Please fire an issue on https://github.com/greenlaw110/rythm/issues

Related

Prevent newline in (.md) files

How do I prevent newlines in the readme.md files (GitHub)?
We can always write the whole thing in one line to prevent it. But is there an exclusive tag/option to prevent the same, especially for tags that create newlines (headings) like span in html?
Doesn't a space followed by a backslash do the concatenation you want? It does for me. That way I can break a paragraph into one sentence per line.

Flex: easy way to see if a line has any content?

Among many rules in my Altair BASIC Flex file is this one:
[\n]
{
++num_lines;
++num_statements;
return '\n';
}
++statements; is not actually correct - in theory the line might be empty (due to bad data in the .BAS file for instance) and thus not have any statements on that line. So is there any way to know if there's any tokens in front of the \n since the last \n? I know you can do this with the BEGIN() et all, but that seems like a LOT of work for a simple problem! Is there an easier way?
It's easy to match a blank line, although I'm not sure that's really what you're looking for.
The first pattern matches a line which only contains space and tab characters (adjust as necessary to match other whitespace). The second pattern matches the same whitespace when it's not at the beginning of a line. (Actually, it would match the whitespace anywhere, but at the beginning of a line, the first pattern wins.)
^[ \t]*\n ;
[ \t]*\n { ++num_statements; return '\n'; }
Instead of counting lines yourself, I suggest you use %option yylineno so flex will count them for you. (In yylineno.)

Why is this LSEP symbol showing up on Chrome and not Firefox or Edge?

So this web page is rendering with these symbols and they are found throughout this website/application but on no other sites. Can anyone tell me
What this symbol is?
Why it is showing up only in one browser?
That character is U+2028 Line Separator, which is a kind of newline character. Think of it as the Unicode equivalent of HTML’s <br>.
As to why it shows up here: my guess would be that an internal database uses LSEP to not conflict with literal newlines or HTML tags (which might break the database or cause security errors), and either:
The server-side scripts that convert the database to HTML neglected to replace LSEP with <br>
Chrome just breaks standards by displaying LSEP as a printing (visible) character, or
You have a font installed that displays LSEP as a printing character that only Chrome detects. To figure out which font it is, right click on the offending text and click “Inspect”, then switch to the “Computed” tab on the right-hand panel. At the very bottom you should see a section labeled “Rendered Fonts” which will help you locate the offending font.
More information on the line separator, excerpted from the Unicode standard, Chapter 5.8, Newline Guidelines (on p. 12 of this PDF):
Line Separator and Paragraph Separator
A paragraph separator—independent of how it is encoded—is used to indicate a
separation between paragraphs. A line separator indicates where a line break
alone should occur, typically within a paragraph. For example:
This is a paragraph with a line separator at this point,
causing the word “causing” to appear on a different line, but not causing
the typical paragraph indentation, sentence breaking, line spacing, or
change in flush (right, center, or left paragraphs).
For comparison, line separators basically correspond to HTML <BR>, and
paragraph separators to older usage of HTML <P> (modern HTML delimits
paragraphs by enclosing them in <P>...</P>). In word processors, paragraph
separators are usually entered using a keyboard RETURN or ENTER; line
separators are usually entered using a modified RETURN or ENTER, such as
SHIFT-ENTER.
A record separator is used to separate records. For example, when exchanging
tabular data, a common format is to tab-separate the cells and to use a CRLF
at the end of a line of cells. This function is not precisely the same as line
separation, but the same characters are often used.
Traditionally, NLF started out as a line separator (and sometimes record
separator). It is still used as a line separator in simple text editors such as
program editors. As platforms and programs started to handle word processing
with automatic line-wrap, these characters were reinterpreted to stand for
paragraph separators. For example, even such simple programs as the Windows
Notepad program and the Mac SimpleText program interpret their platform’s NLF
as a paragraph separator, not a line separator. Once NLF was reinterpreted to
stand for a paragraph separator, in some cases another control character was
pressed into service as a line separator. For example, vertical tabulation VT
is used in Microsoft Word. However, the choice of character for line separator
is even less standardized than the choice of character for NLF. Many Internet
protocols and a lot of existing text treat NLF as a line separator, so an
implementer cannot simply treat NLF as a paragraph separator in all
circumstances.
Further reading:
Unicode Technical Report #13: Newline Guidelines
General Punctuation (U+2000–U+206F) chart PDF
SE: Why are there so many spaces and line breaks in Unicode?
SO: What is unicode character 2028 (LS / Line Separator) used for?
U+2028 on codepoints.net A misprint here says that U+2028 was added in v. 1.1 of the Unicode standard, which is false — it was added in 1.0
I found that in WordPress the easiest way to remove "L SEP" and "P SEP" characters is to execute this two SQL queries:
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('e280a9'), '')
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('e280a8'), '')
The javascript way (mentioned in some of the answers) can break some things (in my case some modal windows stopped working).
You can use this tool...
http://www.nousphere.net/cleanspecial.php
...to remove all the special characters that Chrome displays.
Steps:
Paste your HTML and Clean using HTML option.
You can manually delete the characters in the editor on this page and see the result.
Paste back your HTML in file and save :)
I recently ran into this issue, tried a number of fixes but ultimately I had to paste the text into VIM and there was an extra space I had to delete. I tried a number of HTML cleaners but none of them worked, VIM was the key!
9999years answers is great.
In case you use Symfony with Twig template I would recommend to check for an empty Twig block. In my case it was an empty Twig block with an invisible char inside.
The LSEP char was only displayed on certain device / browser.
On the other I had a blank space above the header and I could not see any invisible char.
I had to inspect the GET request to see that the value 1f18 was before the open html tag.
Once I removed an empty Twig block it was gone.
hope this can help someone one day ...
My problem was similar, it was "PSEP" or "P SEP". Similar issue, an invisible character in my file.
I replaced \x{2029} with a normal space. Fixed. This problem only appeared on Windows Chrome. Not on my Mac.
I agree with #Kapil Bathija - Basically you can copy & paste your HTML code into http://www.nousphere.net/cleanspecial.php and convert it.
Then it will convert the special characters for you - Just remove the spaces in between the words and you will realize you have to press backspace 2x meaning there is an invalid character that can't be translated.
I had the same issue and it worked just fine afterwards.
You can also copy the text, paste it into a HTML editor such as Coda, remove the linebreak, copy it and paste it back into your site.
Video here: https://www.loom.com/share/501498afa7594d95a18382f1188f33ce
Looks like my client pasted HTML into Wordpress after initially creating it with MS-Word. Even deleting the and visible spaces did not fix the issue. The extended characters became visible in vi/vim.
If you don't have vi/vim available, try highlighting from 2 chars before the LSEP to 2 chars after the LSEP; delete that chunk, and re-type the correct characters.

iText Chinese punctuation at the beginning of line

Do you know how to resolve the problem when one line is full, then the Chinese punctuations will be placed at the beginning of next line as shown in (1)? In fact we hope the punctuations to be placed at the end of each line as shown in (2).
(1)
你好你好
,你好你好
(2)
你好你好,
你好你好
Thank you very much for your help in advance!
You are placing a space between the last char and the punctuation and that is a split point. The simplest way is to remove the space before the puntuation and add it after. Other option is to replace the space with a non breaking space \u00a0 to avoid the split at that point.

Line 1, Column 1: character "‍" not allowed in prolog

When I am going to validate my page using w3c validator, I am getting : Line 1, Column 1: character "‍" not allowed in prolog error.
There is a character, or data interpreted as a character, in the document before the doctype declaration. In the error message quoted, there is the character U+200D ZERO WIDTH JOINER (ZWJ) between the quotation marks, so this seems to be the culprit. ZWJ is an invisible control character. There is no point in having it at the start of a file, as it is supposed to cause ligature or joining behavior for the characters (usually letters) around it. ZWJ is invalid at the start of a document by HTML rules.
You may need a good editor, like BabelPad, to detect and remove the ZWJ.
I copied all my code into a new fresh file and used that file instead. It worked for me