We are looking for a "BREAK NO-SPACE" character reverse to NO-BREAK SPACE. It should not print anything, just indicate the components down the line, the word can be split and linebroken at these positions.
Is there anything similar to this in Unicode or any other encoding scheme? It would make life easier since we could then rely on built-in methods for line split in our framework instead of introducing custom logic and some "Magic Character".
Soft hyphen U+00AD is invisible but indicates where a word should be broken.
So I found the Zero Width Space character 200B. The documentation describes exactly what I was looking for.
I'm developing an Android app with Galaxy S8.
When I write some special characters, they appear to be Galaxy emoji. Things like these: ↔♥◀▶
How can I prevent them from turning into emoji?
Typically to select between the emoji and the character version you'll have to use a variation selector
Unicode defines variation sequences for many of its emoji to indicate their desired presentation.
Emoji characters can have two main kinds of presentation:
an emoji presentation, with colorful and perhaps whimsical shapes, even animated
a text presentation, such as black & white
— Unicode Technical Report #51: Unicode Emoji
Specifying the desired presentation is done by following the base emoji with either U+FE0E VARIATION SELECTOR-15 (VS15) for text or U+FE0F VARIATION SELECTOR-16 (VS16) for emoji-style.
https://en.wikipedia.org/wiki/Emoji#Emoji_versus_text_presentation
So if you don't want them to be displayed as emojis just attach a VS15 to the end
Are there any invisible characters? I have checked Google for invisible characters and ended up with many answers but I'm not sure about those. Can someone on Stack Overflow tell me more about this?
Also I have checked a profile on Facebook and found that the user didn't have any name to his profile? How can this be possible? Is it some database issue? Hacking or something?
When I searched over Internet, I found that 200D is an ASCII value with an invisible character. Is it true?
I just went through the character map to get these.
They are all in Calibri.
Number Name HTML Code Appearance
------ -------------------- --------- ----------
U+2000 En Quad " "
U+2001 Em Quad " "
U+2002 En Space " "
U+2003 Em Space " "
U+2004 Three-Per-Em Space " "
U+2005 Four-Per-Em Space " "
U+2006 Six-Per-Em Space " "
U+2007 Figure Space " "
U+2008 Punctuation Space " "
U+2009 Thin Space " "
U+200A Hair Space " "
U+200B Zero-Width Space ""
U+200C Zero Width Non-Joiner ""
U+200D Zero Width Joiner ""
U+200E Left-To-Right Mark ""
U+200F Right-To-Left Mark ""
U+202F Narrow No-Break Space " "
How a character is represented is up to the renderer, but the server may also strip out certain characters before sending the document.
You can also have untitled YouTube videos like https://www.youtube.com/watch?v=dmBvw8uPbrA by using the Unicode character ZERO WIDTH NON-JOINER (U+200C), or in HTML. The code block below should contain that character:
There is actually a truly invisible character: U+FEFF.
This character is called the Byte Order Mark and is related to the Unicode 8 system. It is a really confusing concept that can be explained HERE The Byte Order Mark or BOM for short is an invisible character that doesn't take up any space. You can copy the character bellow between the > and <.
Here is the character:
> <
How to catch this character in action:
Copy the character between the > and <,
Write a line of text, then randomly put your caret in the line of text
Paste the character in the line.
Go to the beginning of the line and press and hold the right arrow key.
You will notice that when your caret gets to the place you pasted the character, it will briefly stop for around half a second. This is becuase the caret is passing over the invisible character. Even though you can't see it doesn't mean it isn't there. The caret still sees that there is a character in that area that you pasted the BOM and will pass through it. Since the BOM is invisble, the caret will look like it has paused for a brief moment. You can past the BOM multiple times in an area and redo the steps above to really show the affect. Good luck!
EDIT: Sadly, Stackoverflow doesn't like the character. Here is an example from w3.org: https://www.w3.org/International/questions/examples/phpbomtest.php
Other answers are correct - whether a character is invisible or not depends on what font you use. This seems to be a pretty good list to me of characters that are truly invisible (not even space). It contains some chars that the other lists are missing.
'\u2060', // Word Joiner
'\u2061', // FUNCTION APPLICATION
'\u2062', // INVISIBLE TIMES
'\u2063', // INVISIBLE SEPARATOR
'\u2064', // INVISIBLE PLUS
'\u2066', // LEFT - TO - RIGHT ISOLATE
'\u2067', // RIGHT - TO - LEFT ISOLATE
'\u2068', // FIRST STRONG ISOLATE
'\u2069', // POP DIRECTIONAL ISOLATE
'\u206A', // INHIBIT SYMMETRIC SWAPPING
'\u206B', // ACTIVATE SYMMETRIC SWAPPING
'\u206C', // INHIBIT ARABIC FORM SHAPING
'\u206D', // ACTIVATE ARABIC FORM SHAPING
'\u206E', // NATIONAL DIGIT SHAPES
'\u206F', // NOMINAL DIGIT SHAPES
'\u200B', // Zero-Width Space
'\u200C', // Zero Width Non-Joiner
'\u200D', // Zero Width Joiner
'\u200E', // Left-To-Right Mark
'\u200F', // Right-To-Left Mark
'\u061C', // Arabic Letter Mark
'\uFEFF', // Byte Order Mark
'\u180E', // Mongolian Vowel Separator
'\u00AD' // soft-hyphen
The question about invisible characters in Unicode deserves a more thorough explanation.
Short answer - there are lots
Here are 134 invisible characters →← and here is their escaped ASCII representation: U+00AD U+061C U+180E U+200B U+200C U+200D U+200E U+200F U+202A U+202B U+202C U+202D U+202E U+2060 U+2061 U+2062 U+2063 U+2064 U+2067 U+2066 U+2068 U+2069 U+206A U+206B U+206C U+206D U+206E U+206F U+FEFF U+1D173 U+1D174 U+1D175 U+1D176 U+1D177 U+1D178 U+1D179 U+1D17A U+E0001 U+E0020 U+E0021 U+E0022 U+E0023 U+E0024 U+E0025 U+E0026 U+E0027 U+E0028 U+E0029 U+E002A U+E002B U+E002C U+E002D U+E002E U+E002F U+E0030 U+E0031 U+E0032 U+E0033 U+E0034 U+E0035 U+E0036 U+E0037 U+E0038 U+E0039 U+E003A U+E003B U+E003C U+E003D U+E003E U+E003F U+E0040 U+E0041 U+E0042 U+E0043 U+E0044 U+E0045 U+E0046 U+E0047 U+E0048 U+E0049 U+E004A U+E004B U+E004C U+E004D U+E004E U+E004F U+E0050 U+E0051 U+E0052 U+E0053 U+E0054 U+E0055 U+E0056 U+E0057 U+E0058 U+E0059 U+E005A U+E005B U+E005C U+E005D U+E005E U+E005F U+E0060 U+E0061 U+E0062 U+E0063 U+E0064 U+E0065 U+E0066 U+E0067 U+E0068 U+E0069 U+E006A U+E006B U+E006C U+E006D U+E006E U+E006F U+E0070 U+E0071 U+E0072 U+E0073 U+E0074 U+E0075 U+E0076 U+E0077 U+E0078 U+E0079 U+E007A U+E007B U+E007C U+E007D U+E007E U+E007F
Are there more? Yes.
Are there invisible characters in the ASCII range? Depends on the font.
Long answer - ready? set. go!
The Unicode Standard enables anyone to read and write in their own language. To do that, it lists unique code points (U+hex), that are categorized into letters (D,ž,Dž,ʶ,愛,𓂀), symbols (+∊≠,£¥₪,҂˚˟˿), marks (ם֑֟֯ ,ী,◌҉ ), separators ( , , , , ), emojis (😊,🙏,👍), and much more. ASCII/Basic Latin is the very beginning of the table and more code points are added every update.
Simply listing unique numbers for characters is not enough. Characters can change their shape or change the sentence depending on the context. To support that, every code point comes with a list of properties . These properties may define the width (AA), its role in the sentence (-“.), its direction (cכ), and much more.
Most invisible characters have the property General_Category=Format (other answers here included Spaces as well). Theis characters have a supporting role to a word/sentence. Here are some examples:
General Punctuation Block -
Invisible characters that are an integral part of some writing systems and emojis. Common ones are Zero width joiner (U+200D), Zero width non joiner (U+200C), Word joiner (U+2060)
Explicit Bidirectional Formatting characters - 12 invisible characters used to enforce different direction constraints on the sentence. Helping present text to more than 300 million speakers of right-to-left languages e.g. Hebrew or Arabic.
Tags - 97 invisible characters that mirror ASCII (just drop the E and you get characters in the ASCII range). These are used as emoji modifiers and digital signatures to prove who copied your text.
This all leads to talk about exploiting invisible characters for homograph attack/visual spoofing. Sometimes it's harmless like invisible names and titles but in lots of cases they are used maliciously. For example U+202E is one invisible character that keeps doing more harm than good for decades!!
Last point, there is another way to make invisible characters using fonts. Fonts are files that store glyphs (pictures of characters), that present the characters' look. If the font does not contain a glyph for a codepoint, a substitute/replacement character is displayed (e.g. �, □). But if the font contains a transparent glyph for a codepoint, then the character is invisible, only when displayed by that font. This is the only way to have invisible characters in the ASCII range (for example can you see →``← U+000C Form Feed).
Hope you find this explanation helpful and may you check strings for invisible characters more often 😉
Yes you can use invisible or blank name on facebook by using some HTML code/symbols.
Method 1:
Copy and paste (ﹺ ﹺ) symbols without brackets in your first and last name field.
Method 2:
Click on edit name. Now copy and paste following symbol in first and last name.
ՙՙ ՙՙ
An invisible Character is , or U+200b
I have a little app which lists the names of certain people from around the world, and some of those names use characters that are not normal ASCII characters, like DÌaz, or ThÈrËse for example.
The strings show up in Xcode just fine, but when I put them in a UILabel, they behave unexpectedly.
My question is: Is there a way to set up a UILabel to to take the exact string in Xcode, and display it properly, even if it is a UTF-8 character (or any other character encoding for that matter)?
UIKit fully supports unicode, your problem is most likely the encoding of the source file. You can set that in the inspector (Xcode 4: ⌘⌥1) under "Text Settings". Make sure it is UTF-8 as well.
Alternative: Use unicode escapes like #"\u2605" (should display ★).
Try to encode the String:
NSString *s = [NSString stringWithCString:value encoding:NSASCIIStringEncoding];
I'm trying to set up a CTFrame that exactly matches my UITextView's text format in iPad.
First of all, I converted UITextView's text to an attributed string. Then I set up a width and a height of drawing box in which Core Text will draw text.
I succeeded to draw text using Core Text, but UITextView and Core Text show slightly different results even though I used the same font and size.
Specifically, when I used [UIFont systemFontOfSize:21], each space in UITextView has one more pixel than Core Text's result.
It's okay for a short sentence or word, but if UITextView and Core Text have multiple lines, their result become very different. For example, UITextView performs word-wrapping for one word at the end of line, while Core Text keeps that word in the same line. If you see the attached picture, the start positions of the last word "paragraph" are already very different (8 pixel gap due to 8 space characters).
More badly, if I use different fonts such as a custom font added to my project, each character in UITextView has 1 pixel more.
I'm using Core Text to find the pixel-position of the current cursor in UITextView, so both of them should perfectly match each other, containing the same number of characters and words in each line.
Question: Is there a way to make Core Text object that perfectly matches UITextView's text format?
Thank you!
Here's a code how I set up attributed string. (I just followed Core Text Guide.)
CTFontRef font = CTFontCreateWithName((CFStringRef) [UIFont systemFontOfSize:21.0].fontName, 21.0, NULL);
CFMutableAttributedStringRef attrString2 = CFAttributedStringCreateMutable(kCFAllocatorDefault, 0);
CFAttributedStringReplaceString (attrString2, CFRangeMake(0, 0), (CFStringRef) string);
CFAttributedStringSetAttribute(attrString2, CFRangeMake(0, [string length]),kCTFontAttributeName, font);
-
-
Here's a picture.
Your solution might work for a specific font at a specific point size, but you can't rely on it in general.
CoreText is simply not compatible with normal UILabels, UITextView:s or UIStringDrawing, so you can't mix them.
Either you have to use only CT functions for all string handling (including implementing custom input if that is what you need) or not use them at all.
Answer to myself.
I just found very simple solution! Using any font editor, you can just change the width of space character (ascii value 32); an original font for UITextView and a modified font for Core Text or vice versa. I used a freeware font editor FontForge. Though I still have to do some extreme-case tests such as writing Japanese characters and English alphabets in the same line and so on, now it becomes almost possible to find a pixel-position of a cursor/caret in UITextView.