Unicode character for marking - unicode

We are going to digitize a lot of books. We want to mark place of line break in original book without influencing the flow of digital book. Which invisible Unicode charter can be used to mark some special places in a raw file?
(\n will used to indicate end of paragraph)
This is a sentence
in the original book that
I want to mark line
break places.
What is the proper character to replace *:
This is a sentence * in the original book that * I want to mark line *break places.

Unicode has no concept of a hidden character that represents a line break in some original but does not cause line break in rendering. Unicode encodes plain text data, and its control characters for line breaks have an effect when plain text is rendered.
What matters here is how the files will be used. If they need to be processable with plain text editors, then you need to decide: either the line breaks are replicated in default rendering, or they are omitted when creating the file. You can’t make them invisible. And different text editors, like Notepad and Emacs, may well use different line control conventions; one program’s end of line is another program’s end of paragraph.
If the files will only be processed by programs that you create, then you can use whatever conventions you like. The most logical one is this:
“Line and Paragraph Separator. The Unicode Standard provides two unambiguous characters,
U+2028 line separator and U+2029 paragraph separator, to separate lines and
paragraphs. They are considered the default form of denoting line and paragraph boundaries
in Unicode plain text. A new line is begun after each line separator. A new paragraph
is begun after each paragraph separator. As these characters are separator codes, it is not necessary either to start the first line or paragraph or to end the last line or paragraph with them. Doing so would indicate that there was an empty paragraph or line following. The paragraph separator can be inserted between paragraphs of text. Its use allows the creation of plain text files, which can be laid out on a different line width at the receiving end. The line separator can be used to indicate an unconditional end of line.”
http://www.unicode.org/versions/Unicode6.1.0/ch16.pdf (pages 6 and 7 in the PDF)
Beware that U+2028 and U+2029 are generally not understood by text editors. They are suitable for storing data in plain text format. When the text is to be rendered, the rendering software has the option of ignoring the original division into lines and treating U+2028 as equivalent to a space, except if preceded by a hyphen (which poses a problem that cannot be resolved without higher level information: a line that ends with “foo-” and is follod by a line beginning with “bar” could represent the word “foobar” as hyphenated for line breaking, or a hyphenated compound “foo-bar” or, in some cases, the combination “foo- bar”).

Use the line feed character (LF, "\n", 0x0A) and/or maybe carriage return (CR, "\r", 0x0D).
I.e., the regular characters for this purpose.

Related

Why is this LSEP symbol showing up on Chrome and not Firefox or Edge?

So this web page is rendering with these symbols and they are found throughout this website/application but on no other sites. Can anyone tell me
What this symbol is?
Why it is showing up only in one browser?
That character is U+2028 Line Separator, which is a kind of newline character. Think of it as the Unicode equivalent of HTML’s <br>.
As to why it shows up here: my guess would be that an internal database uses LSEP to not conflict with literal newlines or HTML tags (which might break the database or cause security errors), and either:
The server-side scripts that convert the database to HTML neglected to replace LSEP with <br>
Chrome just breaks standards by displaying LSEP as a printing (visible) character, or
You have a font installed that displays LSEP as a printing character that only Chrome detects. To figure out which font it is, right click on the offending text and click “Inspect”, then switch to the “Computed” tab on the right-hand panel. At the very bottom you should see a section labeled “Rendered Fonts” which will help you locate the offending font.
More information on the line separator, excerpted from the Unicode standard, Chapter 5.8, Newline Guidelines (on p. 12 of this PDF):
Line Separator and Paragraph Separator
A paragraph separator—independent of how it is encoded—is used to indicate a
separation between paragraphs. A line separator indicates where a line break
alone should occur, typically within a paragraph. For example:
This is a paragraph with a line separator at this point,
causing the word “causing” to appear on a different line, but not causing
the typical paragraph indentation, sentence breaking, line spacing, or
change in flush (right, center, or left paragraphs).
For comparison, line separators basically correspond to HTML <BR>, and
paragraph separators to older usage of HTML <P> (modern HTML delimits
paragraphs by enclosing them in <P>...</P>). In word processors, paragraph
separators are usually entered using a keyboard RETURN or ENTER; line
separators are usually entered using a modified RETURN or ENTER, such as
SHIFT-ENTER.
A record separator is used to separate records. For example, when exchanging
tabular data, a common format is to tab-separate the cells and to use a CRLF
at the end of a line of cells. This function is not precisely the same as line
separation, but the same characters are often used.
Traditionally, NLF started out as a line separator (and sometimes record
separator). It is still used as a line separator in simple text editors such as
program editors. As platforms and programs started to handle word processing
with automatic line-wrap, these characters were reinterpreted to stand for
paragraph separators. For example, even such simple programs as the Windows
Notepad program and the Mac SimpleText program interpret their platform’s NLF
as a paragraph separator, not a line separator. Once NLF was reinterpreted to
stand for a paragraph separator, in some cases another control character was
pressed into service as a line separator. For example, vertical tabulation VT
is used in Microsoft Word. However, the choice of character for line separator
is even less standardized than the choice of character for NLF. Many Internet
protocols and a lot of existing text treat NLF as a line separator, so an
implementer cannot simply treat NLF as a paragraph separator in all
circumstances.
Further reading:
Unicode Technical Report #13: Newline Guidelines
General Punctuation (U+2000–U+206F) chart PDF
SE: Why are there so many spaces and line breaks in Unicode?
SO: What is unicode character 2028 (LS / Line Separator) used for?
U+2028 on codepoints.net A misprint here says that U+2028 was added in v. 1.1 of the Unicode standard, which is false — it was added in 1.0
I found that in WordPress the easiest way to remove "L SEP" and "P SEP" characters is to execute this two SQL queries:
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('e280a9'), '')
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('e280a8'), '')
The javascript way (mentioned in some of the answers) can break some things (in my case some modal windows stopped working).
You can use this tool...
http://www.nousphere.net/cleanspecial.php
...to remove all the special characters that Chrome displays.
Steps:
Paste your HTML and Clean using HTML option.
You can manually delete the characters in the editor on this page and see the result.
Paste back your HTML in file and save :)
I recently ran into this issue, tried a number of fixes but ultimately I had to paste the text into VIM and there was an extra space I had to delete. I tried a number of HTML cleaners but none of them worked, VIM was the key!
9999years answers is great.
In case you use Symfony with Twig template I would recommend to check for an empty Twig block. In my case it was an empty Twig block with an invisible char inside.
The LSEP char was only displayed on certain device / browser.
On the other I had a blank space above the header and I could not see any invisible char.
I had to inspect the GET request to see that the value 1f18 was before the open html tag.
Once I removed an empty Twig block it was gone.
hope this can help someone one day ...
My problem was similar, it was "PSEP" or "P SEP". Similar issue, an invisible character in my file.
I replaced \x{2029} with a normal space. Fixed. This problem only appeared on Windows Chrome. Not on my Mac.
I agree with #Kapil Bathija - Basically you can copy & paste your HTML code into http://www.nousphere.net/cleanspecial.php and convert it.
Then it will convert the special characters for you - Just remove the spaces in between the words and you will realize you have to press backspace 2x meaning there is an invalid character that can't be translated.
I had the same issue and it worked just fine afterwards.
You can also copy the text, paste it into a HTML editor such as Coda, remove the linebreak, copy it and paste it back into your site.
Video here: https://www.loom.com/share/501498afa7594d95a18382f1188f33ce
Looks like my client pasted HTML into Wordpress after initially creating it with MS-Word. Even deleting the and visible spaces did not fix the issue. The extended characters became visible in vi/vim.
If you don't have vi/vim available, try highlighting from 2 chars before the LSEP to 2 chars after the LSEP; delete that chunk, and re-type the correct characters.

Do Unicode's line breaking rules require the last character to be a mandatory break?

I'm trying to use libunibreak (https://github.com/adah1972/libunibreak) to mark the possible line breaks in some given unicode text.
Libunibreak gives back four possible options for each code unit in some text:
LINEBREAK_MUSTBREAK
LINEBREAK_ALLOWBREAK
LINEBREAK_NOBREAK
LINEBREAK_INSIDEACHAR
Hopefully these are self explanatory. I would expect that MUSTBREAK corresponds to newline characters like LF. However, for any given text Libunibreak always indicates that the last character is MUSTBREAK
So for example with the string "abc", the output would be [NOBREAK,NOBREAK,MUSTBREAK]. For "abc\n" the output would be [NOBREAK,NOBREAK,NOBREAK,MUSTBREAK]. I use the MUSTBREAK attribute to start a new line when drawing text so the first case ("abc") creates an extra linebreak that shouldn't be there.
Is this behaviour what Unicode specifies or is this a quirk of the library implementation I'm using?
Yes, this is what the Unicode line breaking algorithm specifies. Rule LB3 in UAX #14: Unicode Line Breaking Algorithm, section 6.1 "Non-tailorable Line Breaking Rules" says:
Always break at the end of text.
The spec further explains:
[This rule is] designed to deal with degenerate cases, so that there is [...] at least one line break for the whole text.

Invisible characters - ASCII

Are there any invisible characters? I have checked Google for invisible characters and ended up with many answers but I'm not sure about those. Can someone on Stack Overflow tell me more about this?
Also I have checked a profile on Facebook and found that the user didn't have any name to his profile? How can this be possible? Is it some database issue? Hacking or something?
When I searched over Internet, I found that 200D is an ASCII value with an invisible character. Is it true?
I just went through the character map to get these.
They are all in Calibri.
Number    Name      HTML Code    Appearance
------    --------------------  ---------    ----------
U+2000    En Quad           " "
U+2001    Em Quad           " "
U+2002    En Space        " "
U+2003    Em Space        " "
U+2004  Three-Per-Em Space      " "
U+2005  Four-Per-Em Space         " "
U+2006 Six-Per-Em Space       " "
U+2007 Figure Space         " "
U+2008 Punctuation Space        " "
U+2009 Thin Space         " "
U+200A Hair Space        " "
U+200B Zero-Width Space ​      "​"
U+200C Zero Width Non-Joiner ‌   "‌"
U+200D Zero Width Joiner ‍      "‍"
U+200E Left-To-Right Mark ‎      "‎"
U+200F Right-To-Left Mark ‏      "‏"
U+202F Narrow No-Break Space        " "
How a character is represented is up to the renderer, but the server may also strip out certain characters before sending the document.
You can also have untitled YouTube videos like https://www.youtube.com/watch?v=dmBvw8uPbrA by using the Unicode character ZERO WIDTH NON-JOINER (U+200C), or ‌ in HTML. The code block below should contain that character:
‌‌
There is actually a truly invisible character: U+FEFF.
This character is called the Byte Order Mark and is related to the Unicode 8 system. It is a really confusing concept that can be explained HERE The Byte Order Mark or BOM for short is an invisible character that doesn't take up any space. You can copy the character bellow between the > and <.
Here is the character:
> <
How to catch this character in action:
Copy the character between the > and <,
Write a line of text, then randomly put your caret in the line of text
Paste the character in the line.
Go to the beginning of the line and press and hold the right arrow key.
You will notice that when your caret gets to the place you pasted the character, it will briefly stop for around half a second. This is becuase the caret is passing over the invisible character. Even though you can't see it doesn't mean it isn't there. The caret still sees that there is a character in that area that you pasted the BOM and will pass through it. Since the BOM is invisble, the caret will look like it has paused for a brief moment. You can past the BOM multiple times in an area and redo the steps above to really show the affect. Good luck!
EDIT: Sadly, Stackoverflow doesn't like the character. Here is an example from w3.org: https://www.w3.org/International/questions/examples/phpbomtest.php
Other answers are correct - whether a character is invisible or not depends on what font you use. This seems to be a pretty good list to me of characters that are truly invisible (not even space). It contains some chars that the other lists are missing.
'\u2060', // Word Joiner
'\u2061', // FUNCTION APPLICATION
'\u2062', // INVISIBLE TIMES
'\u2063', // INVISIBLE SEPARATOR
'\u2064', // INVISIBLE PLUS
'\u2066', // LEFT - TO - RIGHT ISOLATE
'\u2067', // RIGHT - TO - LEFT ISOLATE
'\u2068', // FIRST STRONG ISOLATE
'\u2069', // POP DIRECTIONAL ISOLATE
'\u206A', // INHIBIT SYMMETRIC SWAPPING
'\u206B', // ACTIVATE SYMMETRIC SWAPPING
'\u206C', // INHIBIT ARABIC FORM SHAPING
'\u206D', // ACTIVATE ARABIC FORM SHAPING
'\u206E', // NATIONAL DIGIT SHAPES
'\u206F', // NOMINAL DIGIT SHAPES
'\u200B', // Zero-Width Space
'\u200C', // Zero Width Non-Joiner
'\u200D', // Zero Width Joiner
'\u200E', // Left-To-Right Mark
'\u200F', // Right-To-Left Mark
'\u061C', // Arabic Letter Mark
'\uFEFF', // Byte Order Mark
'\u180E', // Mongolian Vowel Separator
'\u00AD' // soft-hyphen
The question about invisible characters in Unicode deserves a more thorough explanation.
Short answer - there are lots
Here are 134 invisible characters →­؜᠎​‌‍‎‏‪‫‬‭‮⁠⁡⁢⁣⁤⁧⁦⁨⁩𝅳𝅴𝅵𝅶𝅷𝅸𝅹𝅺󠀁󠀠󠀡󠀢󠀣󠀤󠀥󠀦󠀧󠀨󠀩󠀪󠀫󠀬󠀭󠀮󠀯󠀰󠀱󠀲󠀳󠀴󠀵󠀶󠀷󠀸󠀹󠀺󠀻󠀼󠀽󠀾󠀿󠁀󠁁󠁂󠁃󠁄󠁅󠁆󠁇󠁈󠁉󠁊󠁋󠁌󠁍󠁎󠁏󠁐󠁑󠁒󠁓󠁔󠁕󠁖󠁗󠁘󠁙󠁚󠁛󠁜󠁝󠁞󠁟󠁠󠁡󠁢󠁣󠁤󠁥󠁦󠁧󠁨󠁩󠁪󠁫󠁬󠁭󠁮󠁯󠁰󠁱󠁲󠁳󠁴󠁵󠁶󠁷󠁸󠁹󠁺󠁻󠁼󠁽󠁾󠁿← and here is their escaped ASCII representation: U+00AD U+061C U+180E U+200B U+200C U+200D U+200E U+200F U+202A U+202B U+202C U+202D U+202E U+2060 U+2061 U+2062 U+2063 U+2064 U+2067 U+2066 U+2068 U+2069 U+206A U+206B U+206C U+206D U+206E U+206F U+FEFF U+1D173 U+1D174 U+1D175 U+1D176 U+1D177 U+1D178 U+1D179 U+1D17A U+E0001 U+E0020 U+E0021 U+E0022 U+E0023 U+E0024 U+E0025 U+E0026 U+E0027 U+E0028 U+E0029 U+E002A U+E002B U+E002C U+E002D U+E002E U+E002F U+E0030 U+E0031 U+E0032 U+E0033 U+E0034 U+E0035 U+E0036 U+E0037 U+E0038 U+E0039 U+E003A U+E003B U+E003C U+E003D U+E003E U+E003F U+E0040 U+E0041 U+E0042 U+E0043 U+E0044 U+E0045 U+E0046 U+E0047 U+E0048 U+E0049 U+E004A U+E004B U+E004C U+E004D U+E004E U+E004F U+E0050 U+E0051 U+E0052 U+E0053 U+E0054 U+E0055 U+E0056 U+E0057 U+E0058 U+E0059 U+E005A U+E005B U+E005C U+E005D U+E005E U+E005F U+E0060 U+E0061 U+E0062 U+E0063 U+E0064 U+E0065 U+E0066 U+E0067 U+E0068 U+E0069 U+E006A U+E006B U+E006C U+E006D U+E006E U+E006F U+E0070 U+E0071 U+E0072 U+E0073 U+E0074 U+E0075 U+E0076 U+E0077 U+E0078 U+E0079 U+E007A U+E007B U+E007C U+E007D U+E007E U+E007F
Are there more? Yes.
Are there invisible characters in the ASCII range? Depends on the font.
Long answer - ready? set. go!
The Unicode Standard enables anyone to read and write in their own language. To do that, it lists unique code points󠁗󠁲󠁩󠁴󠁴󠁥󠁮󠀠󠁢󠁹󠀠󠁚󠁶󠁩󠀠󠁁󠁺󠁲󠁡󠁮󠀠󠀻󠀩 (U+hex), that are categorized into letters (D,ž,Dž,ʶ,愛,𓂀), symbols (+∊≠,£¥₪,҂˚˟˿), marks (ם֑֟֯ ,ী,◌҉ ), separators ( , , , ,  ), emojis (😊,🙏,👍), and much more. ASCII/Basic Latin is the very beginning of the table and more code points are added every update.
Simply listing unique numbers for characters is not enough. Characters can change their shape or change the sentence depending on the context. To support that, every code point comes with a list of properties . These properties may define the width (AA), its role in the sentence (-“.), its direction (cכ), and much more.
Most invisible characters have the property General_Category=Format (other answers here included Spaces as well). Theis characters have a supporting role to a word/sentence. Here are some examples:
General Punctuation Block -
Invisible characters that are an integral part of some writing systems and emojis. Common ones are Zero width joiner (U+200D), Zero width non joiner (U+200C), Word joiner (U+2060)
Explicit Bidirectional Formatting characters - 12 invisible characters󠁗󠁲󠁩󠁴󠁴󠁥󠁮󠀠󠁢󠁹󠀠󠁚󠁶󠁩󠀠󠁁󠁺󠁲󠁡󠁮󠀠󠀻󠀩 used to enforce different direction constraints on the sentence. Helping present text to more than 300 million speakers of right-to-left languages e.g. Hebrew or Arabic.
Tags - 97 invisible characters that mirror ASCII (just drop the E and you get characters in the ASCII range). These are used as emoji modifiers and digital signatures to prove who copied your text.
This all leads to talk about exploiting invisible characters for homograph attack/visual spoofing. Sometimes it's harmless like invisible names and titles but in lots of cases they are used maliciously. For example U+202E is one invisible character that keeps doing more harm than good for decades!!
Last point, there is another way to make invisible characters using fonts. Fonts are files that store glyphs (pictures of characters), that present the characters' look. If the font does not contain a glyph for a codepoint, a substitute/replacement󠁗󠁲󠁩󠁴󠁴󠁥󠁮󠀠󠁢󠁹󠀠󠁚󠁶󠁩󠀠󠁁󠁺󠁲󠁡󠁮󠀠󠀻󠀩 character is displayed (e.g. �, □). But if the font contains a transparent glyph for a codepoint, then the character is invisible, only when displayed by that font. This is the only way to have invisible characters in the ASCII range (for example can you see →``← U+000C Form Feed).
Hope you find this explanation helpful and may you check strings for invisible characters more often 󠁗󠁲󠁩󠁴󠁴󠁥󠁮󠀠󠁢󠁹󠀠󠁚󠁶󠁩󠀠󠁁󠁺󠁲󠁡󠁮󠀠󠀻󠀩😉
Yes you can use invisible or blank name on facebook by using some HTML code/symbols.
Method 1:
Copy and paste (ﹺ                         ﹺ) symbols without brackets in your first and last name field.
Method 2:
Click on edit name. Now copy and paste following symbol in first and last name.
ՙՙ ՙՙ
An invisible Character is ​, or U+200b
​

What is the syntax for new line in Objective-C?

Can anyone tell me what is the symbol used for new line?
In the C language we use '\n' for new line. What do we use in Objective-C?
is it same?
Objective-C is an extension of C. So '\n' works too in Objective-C.
It's the same (\n), but there's a lot more to the topic depending on whether it's just a new line or a new paragraph, what context the text will be processed in, etc. From the documentation (referencing the Cocoa docs here because they cover both Objective-C [implicitly] and Cocoa, since you have the iphone tag on your question):
There are a number of ways in which a line or paragraph break may be represented. Historically \n, \r, and \r\n have been used. Unicode defines an unambiguous paragraph separator, U+2029 (for which Cocoa provides the constant NSParagraphSeparatorCharacter), and an unambiguous line separator, U+2028 (for which Cocoa provides the constant NSLineSeparatorCharacter).
In the Cocoa text system, the NSParagraphSeparatorCharacter is treated consistently as a paragraph break, and NSLineSeparatorCharacter is treated consistently as a line break that is not a paragraph break—that is, a line break within a paragraph. However, in other contexts, there are few guarantees as to how these characters will be treated. POSIX-level software, for example, often recognizes only \n as a break. Some older Macintosh software recognizes only \r, and some Windows software recognizes only \r\n. Often there is no distinction between line and paragraph breaks.
Which line or paragraph break character you should use depends on how your data may be used and on what platforms. The Cocoa text system recognizes \n, \r, or \r\n all as paragraph breaks—equivalent to NSParagraphSeparatorCharacter. When it inserts paragraph breaks, for example with insertNewline:, it uses \n. Ordinarily NSLineSeparatorCharacter is used only for breaks that are specifically line breaks and not paragraph breaks, for example in insertLineBreak:, or for representing HTML <br> elements.
If your breaks are specifically intended as line breaks and not paragraph breaks, then you should typically use NSLineSeparatorCharacter. Otherwise, you may use \n, \r, or \r\n depending on what other software is likely to process your text. The default choice for Cocoa is usually \n.
It's the same, but if you are printing to the console, you should use
NSLog(#"This is a console statement\n on two different lines");
Hope this helps.
It's same dude. Objective-c is superset of c so most of the things from c will work in objective-c too.
Its same.In Objective c "\n" use for new line.
Straight from the dragons mouth.
http://developer.apple.com/library/mac/#documentation/cocoa/conceptual/Strings/Articles/stringsParagraphBreaks.html

What is the difference between \r and \n?

How are \r and \n different? I think it has something to do with Unix vs. Windows vs. Mac, but I'm not sure exactly how they're different, and which to search for/match in regexes.
They're different characters. \r is carriage return, and \n is line feed.
On "old" printers, \r sent the print head back to the start of the line, and \n advanced the paper by one line. Both were therefore necessary to start printing on the next line.
Obviously that's somewhat irrelevant now, although depending on the console you may still be able to use \r to move to the start of the line and overwrite the existing text.
More importantly, Unix tends to use \n as a line separator; Windows tends to use \r\n as a line separator and Macs (up to OS 9) used to use \r as the line separator. (Mac OS X is Unix-y, so uses \n instead; there may be some compatibility situations where \r is used instead though.)
For more information, see the Wikipedia newline article.
EDIT: This is language-sensitive. In C# and Java, for example, \n always means Unicode U+000A, which is defined as line feed. In C and C++ the water is somewhat muddier, as the meaning is platform-specific. See comments for details.
In C and C++, \n is a concept, \r is a character, and \r\n is (almost always) a portability bug.
Think of an old teletype. The print head is positioned on some line and in some column. When you send a printable character to the teletype, it prints the character at the current position and moves the head to the next column. (This is conceptually the same as a typewriter, except that typewriters typically moved the paper with respect to the print head.)
When you wanted to finish the current line and start on the next line, you had to do two separate steps:
move the print head back to the beginning of the line, then
move it down to the next line.
ASCII encodes these actions as two distinct control characters:
\x0D (CR) moves the print head back to the beginning of the line. (Unicode encodes this as U+000D CARRIAGE RETURN.)
\x0A (LF) moves the print head down to the next line. (Unicode encodes this as U+000A LINE FEED.)
In the days of teletypes and early technology printers, people actually took advantage of the fact that these were two separate operations. By sending a CR without following it by a LF, you could print over the line you already printed. This allowed effects like accents, bold type, and underlining. Some systems overprinted several times to prevent passwords from being visible in hardcopy. On early serial CRT terminals, CR was one of the ways to control the cursor position in order to update text already on the screen.
But most of the time, you actually just wanted to go to the next line. Rather than requiring the pair of control characters, some systems allowed just one or the other. For example:
Unix variants (including modern versions of Mac) use just a LF character to indicate a newline.
Old (pre-OSX) Macintosh files used just a CR character to indicate a newline.
VMS, CP/M, DOS, Windows, and many network protocols still expect both: CR LF.
Old IBM systems that used EBCDIC standardized on NL--a character that doesn't even exist in the ASCII character set. In Unicode, NL is U+0085 NEXT LINE, but the actual EBCDIC value is 0x15.
Why did different systems choose different methods? Simply because there was no universal standard. Where your keyboard probably says "Enter", older keyboards used to say "Return", which was short for Carriage Return. In fact, on a serial terminal, pressing Return actually sends the CR character. If you were writing a text editor, it would be tempting to just use that character as it came in from the terminal. Perhaps that's why the older Macs used just CR.
Now that we have standards, there are more ways to represent line breaks. Although extremely rare in the wild, Unicode has new characters like:
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR
Even before Unicode came along, programmers wanted simple ways to represent some of the most useful control codes without worrying about the underlying character set. C has several escape sequences for representing control codes:
\a (for alert) which rings the teletype bell or makes the terminal beep
\f (for form feed) which moves to the beginning of the next page
\t (for tab) which moves the print head to the next horizontal tab position
(This list is intentionally incomplete.)
This mapping happens at compile-time--the compiler sees \a and puts whatever magic value is used to ring the bell.
Notice that most of these mnemonics have direct correlations to ASCII control codes. For example, \a would map to 0x07 BEL. A compiler could be written for a system that used something other than ASCII for the host character set (e.g., EBCDIC). Most of the control codes that had specific mnemonics could be mapped to control codes in other character sets.
Huzzah! Portability!
Well, almost. In C, I could write printf("\aHello, World!"); which rings the bell (or beeps) and outputs a message. But if I wanted to then print something on the next line, I'd still need to know what the host platform requires to move to the next line of output. CR LF? CR? LF? NL? Something else? So much for portability.
C has two modes for I/O: binary and text. In binary mode, whatever data is sent gets transmitted as-is. But in text mode, there's a run-time translation that converts a special character to whatever the host platform needs for a new line (and vice versa).
Great, so what's the special character?
Well, that's implementation dependent, too, but there's an implementation-independent way to specify it: \n. It's typically called the "newline character".
This is a subtle but important point: \n is mapped at compile time to an implementation-defined character value which (in text mode) is then mapped again at run time to the actual character (or sequence of characters) required by the underlying platform to move to the next line.
\n is different than all the other backslash literals because there are two mappings involved. This two-step mapping makes \n significantly different than even \r, which is simply a compile-time mapping to CR (or the most similar control code in whatever the underlying character set is).
This trips up many C and C++ programmers. If you were to poll 100 of them, at least 99 will tell you that \n means line feed. This is not entirely true. Most (perhaps all) C and C++ implementations use LF as the magic intermediate value for \n, but that's an implementation detail. It's feasible for a compiler to use a different value. In fact, if the host character set is not a superset of ASCII (e.g., if it's EBCDIC), then \n will almost certainly not be LF.
So, in C and C++:
\r is literally a carriage return.
\n is a magic value that gets translated (in text mode) at run-time to/from the host platform's newline semantics.
\r\n is almost always a portability bug. In text mode, this gets translated to CR followed by the platform's newline sequence--probably not what's intended. In binary mode, this gets translated to CR followed by some magic value that might not be LF--possibly not what's intended.
\x0A is the most portable way to indicate an ASCII LF, but you only want to do that in binary mode. Most text-mode implementations will treat that like \n.
"\r" => Return
"\n" => Newline or Linefeed
(semantics)
Unix based systems use just a "\n" to end a line of text.
Dos uses "\r\n" to end a line of text.
Some other machines used just a "\r". (Commodore, Apple II, Mac OS prior to OS X, etc..)
\r is used to point to the start of a line and can replace the text from there, e.g.
main()
{
printf("\nab");
printf("\bsi");
printf("\rha");
}
Produces this output:
hai
\n is for new line.
In short \r has ASCII value 13 (CR) and \n has ASCII value 10 (LF).
Mac uses CR as line delimiter (at least, it did before, I am not sure for modern macs), *nix uses LF and Windows uses both (CRLF).
In addition to #Jon Skeet's answer:
Traditionally Windows has used \r\n, Unix \n and Mac \r, however newer Macs use \n as they're unix based.
\r is Carriage Return; \n is New Line (Line Feed) ... depends on the OS as to what each means. Read this article for more on the difference between '\n' and '\r\n' ... in C.
in C# I found they use \r\n in a string.
\r used for carriage return. (ASCII value is 13)
\n used for new line. (ASCII value is 10)