I found this regular expression on a website. It is said to be the best URL validation expression out there and I agree. Diego Perini created it.
The problem I am facing is when trying to use it with objective-C to detect URLs on strings. I have tried using options like NSRegularExpressionAnchorsMatchLines, NSRegularExpressionIgnoreMetacharacters and others, but still no luck.
Is the expression not well formatted for Objective-C? Am I missing something? Any ideas?
I have tried John Gruber's regex, also, but it fails with some invalid URLs.
Regular Expression Explanation of expression
^ match at the beginning
//Protocol identifier
(?:
(?:https?|ftp http, https or ftp
):\\/\\/ ://
)? optional
// User:Pass authentication
(?:
^\\s+ non white spaces, 1 or more times
(?:
:^\\s* : non white spaces, 0 or more times, optionally
)?# #
)? optional
//Private IP Addresses ?! Means DO NOT MATCH ahead. So do not match any of the following
(?:
(?!10 10 10.0.0.0 - 10.999.999.999
(?:
\\.\\d{1,3} . 1 to 3 digits, three times
){3}
)
(?!127 127 127.0.0.0 - 127.999.999.999
(?:
\\.\\d{1,3} . 1 to 3 digits, three times
){3}
)
(?!169\\.254 169.254 169.254.0.0 - 169.254.999.999
(?:
\\.\\d{1,3} . 1 to 3 digits, two times
){2}
)
(?!192\\.168 192.168 192.168.0.0 - 192.168.999.999
(?:
\\.\\d{1,3} . 1 to 3 digits, two times
){2}
)
(?!172\\. 172. 172.16.0.0 - 172.31.999.999
(?:
1[6-9] 1 followed by any number between 6 and 9
| or
2\\d 2 and any digit
| or
3[0-1] 3 followed by a 0 or 1
)
(?:
\\.\\d{1,3} . 1 to 3 digits, two times
){2}
)
//First Octet IPv4 // match these. Any non network or broadcast IPv4 address
(?:
[1-9]\\d? any number from 1 to 9 followed by an optional digit 1 - 99
| or
1\\d\\d 1 followed by any two digits 100 - 199
| or
2[01]\\d 2 followed by any 0 or 1, followed by a digit 200 - 219
| or
22[0-3] 22 followed by any number between 0 and 3 220 - 223
)
//Second and Third Octet IPv4
(?:
\\. .
(?:
1?\\d{1,2} optional 1 followed by any 1 or two digits 0 - 199
| or
2[0-4]\\d 2 followed by any number between 0 and 4, and any digit 200 - 249
| or
25[0-5] 25 followed by any numbers between 0 and 5 250 - 255
)
){2} two times
//Fourth Octet IPv4
(?:
\\. .
(?:
[1-9]\\d? any number between 1 and 9 followed by an optional digit 1 - 99
| or
1\\d\\d 1 followed by any two digits 100 - 199
| or
2[0-4]\\d 2 followed by any number between 0 and 4, and any digit 200 - 249
| or
25[0-4] 25 followed by any number between 0 and 4 250 - 254
)
)
//Host name
| or
(?:
(?:
[a-z\u00a1-\uffff0-9]+-? any letter, digit or character one or more times with optional -
)* zero or more times
[a-z\u00a1-\uffff0-9]+ any letter, digit or character one or more times
)
//Domain name
(?:
\\. .
(?:
[a-z\u00a1-\uffff0-9]+-? any letter, digit or character one or more times with optional -
)* zero or more times
[a-z\u00a1-\uffff0-9]+ any letter, digit or character one or more times
)* zero or more times
//TLD identifier
(?:
\\. .
(?:
[a-z\u00a1-\uffff]{2,} any letter, digit or character more than two times
)
)
)
//Port number
(?:
:\\d{2,5} : followed by any digit, two to five times, optionally
)?
//Resource path
(?:
\\/[^\\s]* / followed by an optional non space character, zero or more times
)? optional
$ match at the end
EDIT
I think I forgot to say that I am using the expression in the following code: (partial code)
NSError *error = NULL;
NSRegularExpression *detector = [NSRegularExpression regularExpressionWithPattern:[self theRegularExpression] options:0 error:&error];
NSArray *links = [detector matchesInString:theText options:0 range:NSMakeRange(0, theText.length)];
^(?i)(?:(?:https?|ftp):\\/\\/)?(?:\\S+(?::\\S*)?#)?(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))(?::\\d{2,5})?(?:\\/[^\\s]*)?$
Is the best URL validation regular expression I found and it is explained on my question. It is already formatted to work on Objective-C. However, using it with NSRegularExpression gave me all sorts of problems, including my app crashing. RegexKitLite had no problems handling it. I do not know if it is a size limitation or some flag not being set.
My final code looked like:
//First I take the string and put every word in an array, then I match every word with the regular expression
NSArray *splitIntoWordsArray = [textToMatch componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceAndNewLineCharacterSet]];
NSMutableString *htmlString = [NSMutableString stringWithString:textToMatch];
for (NSString *theText in splitIntoWordsArray){
NSEnumerator *matchEnumerator = [theText matchEnumeratorWithRegex:theRegularExpressionString];
for (NSString *temp in matchEnumerator){
[htmlString replaceOccurrencesOfString:temp withString:[NSString stringWithFormat:#"%#", temp, temp] options:NSLiteralSearch range:NSMakeRange(0, [htmlString length])];
}
}
[htmlString replaceOccurrencesOfString:#"\n" withString:#"<br />" options:NSLiteralSearch range:NSMakeRange(0, htmlString.length)];
//embed the text on a webView as HTML
[webView loadHTMLString:[NSString stringWithFormat:embedHTML, [mainFont fontName], [mainFont pointSize], htmlString] baseURL:nil];
The result: a UIWebView with some embedded HTML, where URLs and emails are clickable. Do not forget to set dataDetectorTypes = UIDataDetectorTypeNone
You can also try
NSError *error = NULL;
NSRegularExpression *expression = [NSRegularExpression regularExpressionWithPattern:#"(?i)(?:(?:https?):\\/\\/)?(?:\\S+(?::\\S*)?#)?(?:(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))|(?:(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]+-?)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))(?::\\d{2,5})?(?:\\/[^\\s]*)?" options:NSRegularExpressionCaseInsensitive error:&error];
if (error)
NSLog(#"error");
NSString *someString = #"This is a sample of a sentence with a URL http://. http://.. http://../ http://? http://?? http://??/ http://# http://-error-.invalid/ http://-.~_!$&'()*+,;=:%40:80%2f::::::#example.com within it.";
NSRange range = [expression rangeOfFirstMatchInString:someString options:NSMatchingCompleted range:NSMakeRange(0, [someString length])];
if (!NSEqualRanges(range, NSMakeRange(NSNotFound, 0))){
NSString *match = [someString substringWithRange:range];
NSLog(#"%#", match);
}
else {
NSLog(#"no match");
}
Hope it helps somebody in the future
The regular expression will sometimes cause the application to hang, so I decided to use gruber's regular expression modified to recognize url without protocol or the www part:
(?i)\\b((?:[a-z][\\w-]+:(?:/{1,3}|[a-z0-9%])|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/?)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))*(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’])*)
Am I missing something?
You're missing the built-in stuff to do this for you. There's a handy object called NSDataDetector. You create it to look for certain data "types" (like, say, NSTextCheckingTypeLink), then ask it for its -matchesInString:options:range:.
Here's an earlier answer of mine showing how to use it.
Related
I am trying to extract data from a string of text using Powershell. The data I need is between the first and last bracket. What I have so far appears to work but doesn't work if the data itself contains a close bracket...
$MyText = "BT /F3 8.999 Tf 0 0 0 rg 407.446 TL 64.368 772.194 Td (\(TESTJulia\) Julia's Test Company) Tj T* ET"
[regex]::match($MyText,'(?<=\().+?(?=\))')
Is this what you want?
$MyText = "BT /F3 8.999 Tf 0 0 0 rg 407.446 TL 64.368 772.194 Td (\(TESTJulia\) Julia's Test Company) Tj T* ET"
$match = [regex]::Match($MyText,'\(+?(.*)\)')
Write-Host $match.Captures.groups[1].value
Output:
\(TESTJulia\) Julia's Test Company
Regex explanation (courtesy Regex101.com):
\(+? matches the character ( literally (case sensitive)
+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)
1st Capturing Group (.*)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\) matches the character ) literally (case sensitive)
here's a slightly different way to get there ... [grin]
it depends on the td ( and ) tj being always there, but it works with your sample data.
$InStuff = "BT /F3 8.999 Tf 0 0 0 rg 407.446 TL 64.368 772.194 Td (\(TESTJulia\) Julia's Test Company) Tj T* ET"
$InStuff -match 'td \((.+)\) tj .+$'
$Matches[1]
output ...
\(TESTJulia\) Julia's Test Company
Why not remove the lazy quantifier? This will make it greedy so that it grabs as many characters as it can unti it hits the lookahead.
PS>$MyText = "BT /F3 8.999 Tf 0 0 0 rg 407.446 TL 64.368 772.194 Td (\(TESTJulia\) Julia's Test Company) Tj T* ET"
PS>[regex]::match($MyText,'(?<=\().+(?=\))')
Groups : {0}
Success : True
Name : 0
Captures : {0}
Index : 55
Length : 35
Value : \(TESTJulia\) Julia's Test Company
Currently, I am looping through phone numbers to find matches in a database. However, I need to remove dashes and any area codes so that the database search can exact. Currently, I am trying to use this regex on the phone numbers :
(?:\+\d{2}\s*(?:\(\d{2}\))|(?:\(\d{2}\)))?\s*(\d{4,5}\-?\d{4})
and I am trying to apply it as such
if let longNumber = (contact.phoneNumbers.first?.value as? CNPhoneNumber)?.stringValue {
let phoneNumber = longNumber.replacingOccurrences(of: "(?:\+\d{2}\s*(?:\(\d{2}\))|(?:\(\d{2}\)))?\s*(\d{4,5}\-?\d{4})", with: "$1", options: .regularExpression)
However, I receive the error Invalid escape sequence in literal and Missing argument for parameter 'for' in call
How can I properly get only the phone digits from the string? ie if it is +1 300-300-3000
I need it to return 3003003000.
Examples:
+1 390 - 456 - 8823 -> 3904568823
+92 084 765 4902 --> 0847654902
+922 (064) 864 0921 --> 0648640921
842.231.9072 --> 8422319072
+1 (972) - 864 - 0921 --> 9728640921
+92 33 - 783 - 9382 --> 337839282
From the examples you have shown, I assume the following rules:
Phone numbers are formatted in 3 or 4 parts
Part1 (optional)
+
1 to 3 digits
one or more whitespaces follow
Part2
May be enclosed in ( and )
2 or 3 digits
a hyphen or a decimal point or a whitespace with extra whitespaces at both ends follows
Part3
3 digits
a hyphen or a decimal point or a whitespace with extra whitespaces at both ends follows
Part4
4 digits
(Please remember, this sort of phone number notation rule is local to a specific region. When you want to internationalize your app, you may need many more rules. The pattern you have now may be written for some other region.)
Partial pattern for each part would be something as follows:
let part1 = "(?:\\+\\d{1,3}\\s+)?"
let part2 = "(?:\\((\\d{2,3})\\)|(\\d{2,3}))\\s*[\\s\\.-]?\\s*"
let part3 = "(\\d{3})\\s*[\\s\\.-]?\\s*"
let part4 = "(\\d{4})"
(Please do not miss that all backslashes are escaped.)
Testing code:
import Foundation
let numbers: [(long: String, expected: String)] = [
("+1 300-300-3000", "3003003000"),
("+1 390 - 456 - 8823", "3904568823"),
("+92 084 765 4902", "0847654902"),
("+922 (064) 864 0921", "0648640921"),
("842.231.9072", "8422319072"),
("+1 (972) - 864 - 0921", "9728640921"),
("+92 33 - 783 - 9382", "337839382"), //I assume your example is wrong here
]
for number in numbers {
let longNumber = number.long
let part1 = "(?:\\+\\d{1,3}\\s+)?"
let part2 = "(?:\\((\\d{2,3})\\)|(\\d{2,3}))\\s*[\\s\\.-]?\\s*"
let part3 = "(\\d{3})\\s*[\\s\\.-]?\\s*"
let part4 = "(\\d{4})"
let pattern = "^\(part1)\(part2)\(part3)\(part4)$"
let phoneNumber = longNumber.replacingOccurrences(of: pattern, with: "$1$2$3$4", options: .regularExpression)
print("\(longNumber) --> \(phoneNumber)", phoneNumber == number.expected ? "Success" : "Fail (expected \(number.expected))")
}
Output:
+1 300-300-3000 --> 3003003000 Success
+1 390 - 456 - 8823 --> 3904568823 Success
+92 084 765 4902 --> 0847654902 Success
+922 (064) 864 0921 --> 0648640921 Success
842.231.9072 --> 8422319072 Success
+1 (972) - 864 - 0921 --> 9728640921 Success
+92 33 - 783 - 9382 --> 337839382 Success
The code above may not work as expected for possible other inputs, please try to fix it to fit for such inputs by yourself.
What is the best way to match a string that occurs anywhere from 1 to 10000 times except prime number of times?
say so "xyz" ~~ m/ <[x y z]> ** <[ 1..10000] - [ all prime numbers ]> /
Thanks !!!
Not necessarily the best way (in particular, it will create up to 10_000 submatch objects), but a way:
$ perl6 -e 'say "$_ ", so <x y z>.roll x $_ ~~ /^ (<[xyz]>) ** 1..10_000 <!{$0.elems.is-prime}> $/ for 1..10'
1 True
2 False
3 False
4 True
5 False
6 True
7 False
8 True
9 True
10 True
If the substring of interest has fixed length, you could also capture the repetition as a whole and check its length, avoiding submatch creation.
a Calculator. Each time I click on a digit button, the tag is appended to a "displaystring" nsmutable string. I ve noticed that after entering 11 digits for the same number the floatValue or intValue function give me the same error.
the self.lbldisplay.text displays the content of displaystring correctly. But the
intValue or floatValue or even the NSCanner utilities return , after entering 10 digits , the same error
here are the code and the log :
// original
[self.displayString appendString: [NSString stringWithFormat: #"%i", [sender tag]]];
else { // new entry
//origin test
[self.displayString setString:#""];
[self.displayString appendString: [NSString stringWithFormat: #"%i", [sender tag]]];
bIsTypingANumber = TRUE;
}
// fCurrentNumber = [self.displayString floatValue];
// [self display:fCurrentNumber];
// self.lblDisplay.text = displayString;
NSString *numberString;
NSString *str = [NSString stringWithString:displayString];
NSScanner *theScanner = [NSScanner scannerWithString:str];
[theScanner scanUpToCharactersFromSet:[NSCharacterSet decimalDigitCharacterSet] intoString:nil];
[theScanner scanCharactersFromSet:[NSCharacterSet decimalDigitCharacterSet] intoString:&numberString];
NSLog(#"Attempts: %i", [numberString integerValue]);
self.lblDisplay.text = str;
Output:
2012-08-26 17:23:34.264 Pilots Fuel[4989:f803] Attempts: 1
2012-08-26 17:23:34.400 Pilots Fuel[4989:f803] Attempts: 11
2012-08-26 17:23:34.552 Pilots Fuel[4989:f803] Attempts: 111
2012-08-26 17:23:34.800 Pilots Fuel[4989:f803] Attempts: 1111
2012-08-26 17:23:34.936 Pilots Fuel[4989:f803] Attempts: 11111
2012-08-26 17:23:35.072 Pilots Fuel[4989:f803] Attempts: 111111
2012-08-26 17:23:35.216 Pilots Fuel[4989:f803] Attempts: 1111111
2012-08-26 17:23:35.344 Pilots Fuel[4989:f803] Attempts: 11111111
2012-08-26 17:23:35.488 Pilots Fuel[4989:f803] Attempts: 111111111
2012-08-26 17:23:35.632 Pilots Fuel[4989:f803] Attempts: 1111111111
2012-08-26 17:23:35.776 Pilots Fuel[4989:f803] Attempts: 2147483647
2012-08-26 17:23:36.056 Pilots Fuel[4989:f803] Attempts: 2147483647
It is because you overflow the integer data type, which can only hold values from -2147483648 to 2147483647 (32 bits in size). The double data type can hold larger values but will eventually lose precision.
Try
NSLog(#"Attempts: %lli", [numberString longLongValue]);
which will use 64 bits of precision and will allow for a larger number of digits.
Welcome to SO. Please write questions that clearly state your problem. Also, if you provide code samples, please make sure they are formatted properly so that they are more easily read by others.
Note that you are using int and float which both have size limits. They can not represent all possible numbers. You could use double and 64-bit integers, but they have a limit as well. If you really need bigger numbers, see if NSDecimalNumber will fit your use case. It does have a limit (though quite large).
If you need even bigger numbers, you will need to use a library that provides support for arbitrarily large numbers.
I'm using perl's XML::Writer to generate an import file for a program called OpenNMS. According to the documentation I need to pre-declare all special characters as XML ENTITY declarations. Obviously I need to go through all strings I'm exporting and catalogue the special characters used. What's the easiest way to work out which characters in a perl string are "special" with respect to UTF-8 encoding? Is there any way to work out what the entity names for those characters should be?
In order to find "special" characters, you can use ord to find out the codepoint. Here's an example:
# Create a Unicode test file with some Latin chars, some Cyrillic,
# and some outside the BMP.
# The BMP is the basic multilingual plane, see perluniintro.
# (Not sure what you mean by saying "non-basic".)
perl -CO -lwe "print join '', map chr, 97 .. 100, 0x410 .. 0x415, 0x10000 .. 0x10003" > u.txt
# Read it and find codepoints outside the BMP.
perl -CI -nlwe "print for map ord, grep ord > 0xffff, split //" < u.txt
You can get a good introduction from reading perluniintro.
I'm not sure what the docs you're referring to mean in the section "Exported XML".
Looks like some limitation of a system which is de facto ASCII and doesn't do Unicode.
Or a misunderstanding of XML. Or both.
Anyway, if you're looking for names you could use or reference the canonical ones.
See XML Entity Definitions for Characters or one of the older documents for HTML or MathML referenced therein.
You might look into the uniquote program. It has a --xml option. For example:
$ cat sample
1 NFD single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
2 NFC single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
3 NFD multiple combining characters: (hẫç̌k) and (hã̂ç̌k).
3 NFC multiple combining characters: (hẫç̌k) and (hã̂ç̌k).
5 invisible characters: (4⁄3πr³) and (4⁄3πr³).
6 astral characters: (𝐂 = sqrt[𝐀² + 𝐁²]) and (𝐂 = sqrt[𝐀² + 𝐁²]).
7 astral + combining chars: (𝐂̅ = sqrt[𝐀̅² + 𝐁̅²]) and (𝐂̅ = sqrt[𝐀̅² + 𝐁̅²]).
8 wide characters: (wide) and (wide).
9 regular characters: (normal) and (normal).
$ uniquote -x sample
1 NFD single combining characters: (cre\x{300}me bru\x{302}le\x{301}e et fiance\x{301}) and (cre\x{300}me bru\x{302}le\x{301}e et fiance\x{301}).
2 NFC single combining characters: (cr\x{E8}me br\x{FB}l\x{E9}e et fianc\x{E9}) and (cr\x{E8}me br\x{FB}l\x{E9}e et fianc\x{E9}).
3 NFD multiple combining characters: (ha\x{302}\x{303}c\x{327}\x{30C}k) and (ha\x{303}\x{302}c\x{327}\x{30C}k).
3 NFC multiple combining characters: (h\x{1EAB}\x{E7}\x{30C}k) and (h\x{E3}\x{302}\x{E7}\x{30C}k).
5 invisible characters: (4\x{2044}3\x{2062}\x{3C0}\x{2062}r\x{B3}) and (4\x{2044}3\x{2062}\x{3C0}\x{2062}r\x{B3}).
6 astral characters: (\x{1D402} = sqrt[\x{1D400}\x{B2} + \x{1D401}\x{B2}]) and (\x{1D402} = sqrt[\x{1D400}\x{B2} + \x{1D401}\x{B2}]).
7 astral + combining chars: (\x{1D402}\x{305} = sqrt[\x{1D400}\x{305}\x{B2} + \x{1D401}\x{305}\x{B2}]) and (\x{1D402}\x{305} = sqrt[\x{1D400}\x{305}\x{B2} + \x{1D401}\x{305}\x{B2}]).
8 wide characters: (\x{FF57}\x{FF49}\x{FF44}\x{FF45}) and (\x{FF57}\x{FF49}\x{FF44}\x{FF45}).
9 regular characters: (normal) and (normal).
$ uniquote -b sample
1 NFD single combining characters: (cre\xCC\x80me bru\xCC\x82le\xCC\x81e et fiance\xCC\x81) and (cre\xCC\x80me bru\xCC\x82le\xCC\x81e et fiance\xCC\x81).
2 NFC single combining characters: (cr\xC3\xA8me br\xC3\xBBl\xC3\xA9e et fianc\xC3\xA9) and (cr\xC3\xA8me br\xC3\xBBl\xC3\xA9e et fianc\xC3\xA9).
3 NFD multiple combining characters: (ha\xCC\x82\xCC\x83c\xCC\xA7\xCC\x8Ck) and (ha\xCC\x83\xCC\x82c\xCC\xA7\xCC\x8Ck).
3 NFC multiple combining characters: (h\xE1\xBA\xAB\xC3\xA7\xCC\x8Ck) and (h\xC3\xA3\xCC\x82\xC3\xA7\xCC\x8Ck).
5 invisible characters: (4\xE2\x81\x843\xE2\x81\xA2\xCF\x80\xE2\x81\xA2r\xC2\xB3) and (4\xE2\x81\x843\xE2\x81\xA2\xCF\x80\xE2\x81\xA2r\xC2\xB3).
6 astral characters: (\xF0\x9D\x90\x82 = sqrt[\xF0\x9D\x90\x80\xC2\xB2 + \xF0\x9D\x90\x81\xC2\xB2]) and (\xF0\x9D\x90\x82 = sqrt[\xF0\x9D\x90\x80\xC2\xB2 + \xF0\x9D\x90\x81\xC2\xB2]).
7 astral + combining chars: (\xF0\x9D\x90\x82\xCC\x85 = sqrt[\xF0\x9D\x90\x80\xCC\x85\xC2\xB2 + \xF0\x9D\x90\x81\xCC\x85\xC2\xB2]) and (\xF0\x9D\x90\x82\xCC\x85 = sqrt[\xF0\x9D\x90\x80\xCC\x85\xC2\xB2 + \xF0\x9D\x90\x81\xCC\x85\xC2\xB2]).
8 wide characters: (\xEF\xBD\x97\xEF\xBD\x89\xEF\xBD\x84\xEF\xBD\x85) and (\xEF\xBD\x97\xEF\xBD\x89\xEF\xBD\x84\xEF\xBD\x85).
9 regular characters: (normal) and (normal).
$ uniquote -v sample
1 NFD single combining characters: (cre\N{COMBINING GRAVE ACCENT}me bru\N{COMBINING CIRCUMFLEX ACCENT}le\N{COMBINING ACUTE ACCENT}e et fiance\N{COMBINING ACUTE ACCENT}) and (cre\N{COMBINING GRAVE ACCENT}me bru\N{COMBINING CIRCUMFLEX ACCENT}le\N{COMBINING ACUTE ACCENT}e et fiance\N{COMBINING ACUTE ACCENT}).
2 NFC single combining characters: (cr\N{LATIN SMALL LETTER E WITH GRAVE}me br\N{LATIN SMALL LETTER U WITH CIRCUMFLEX}l\N{LATIN SMALL LETTER E WITH ACUTE}e et fianc\N{LATIN SMALL LETTER E WITH ACUTE}) and (cr\N{LATIN SMALL LETTER E WITH GRAVE}me br\N{LATIN SMALL LETTER U WITH CIRCUMFLEX}l\N{LATIN SMALL LETTER E WITH ACUTE}e et fianc\N{LATIN SMALL LETTER E WITH ACUTE}).
3 NFD multiple combining characters: (ha\N{COMBINING CIRCUMFLEX ACCENT}\N{COMBINING TILDE}c\N{COMBINING CEDILLA}\N{COMBINING CARON}k) and (ha\N{COMBINING TILDE}\N{COMBINING CIRCUMFLEX ACCENT}c\N{COMBINING CEDILLA}\N{COMBINING CARON}k).
3 NFC multiple combining characters: (h\N{LATIN SMALL LETTER A WITH CIRCUMFLEX AND TILDE}\N{LATIN SMALL LETTER C WITH CEDILLA}\N{COMBINING CARON}k) and (h\N{LATIN SMALL LETTER A WITH TILDE}\N{COMBINING CIRCUMFLEX ACCENT}\N{LATIN SMALL LETTER C WITH CEDILLA}\N{COMBINING CARON}k).
5 invisible characters: (4\N{FRACTION SLASH}3\N{INVISIBLE TIMES}\N{GREEK SMALL LETTER PI}\N{INVISIBLE TIMES}r\N{SUPERSCRIPT THREE}) and (4\N{FRACTION SLASH}3\N{INVISIBLE TIMES}\N{GREEK SMALL LETTER PI}\N{INVISIBLE TIMES}r\N{SUPERSCRIPT THREE}).
6 astral characters: (\N{MATHEMATICAL BOLD CAPITAL C} = sqrt[\N{MATHEMATICAL BOLD CAPITAL A}\N{SUPERSCRIPT TWO} + \N{MATHEMATICAL BOLD CAPITAL B}\N{SUPERSCRIPT TWO}]) and (\N{MATHEMATICAL BOLD CAPITAL C} = sqrt[\N{MATHEMATICAL BOLD CAPITAL A}\N{SUPERSCRIPT TWO} + \N{MATHEMATICAL BOLD CAPITAL B}\N{SUPERSCRIPT TWO}]).
7 astral + combining chars: (\N{MATHEMATICAL BOLD CAPITAL C}\N{COMBINING OVERLINE} = sqrt[\N{MATHEMATICAL BOLD CAPITAL A}\N{COMBINING OVERLINE}\N{SUPERSCRIPT TWO} + \N{MATHEMATICAL BOLD CAPITAL B}\N{COMBINING OVERLINE}\N{SUPERSCRIPT TWO}]) and (\N{MATHEMATICAL BOLD CAPITAL C}\N{COMBINING OVERLINE} = sqrt[\N{MATHEMATICAL BOLD CAPITAL A}\N{COMBINING OVERLINE}\N{SUPERSCRIPT TWO} + \N{MATHEMATICAL BOLD CAPITAL B}\N{COMBINING OVERLINE}\N{SUPERSCRIPT TWO}]).
8 wide characters: (\N{FULLWIDTH LATIN SMALL LETTER W}\N{FULLWIDTH LATIN SMALL LETTER I}\N{FULLWIDTH LATIN SMALL LETTER D}\N{FULLWIDTH LATIN SMALL LETTER E}) and (\N{FULLWIDTH LATIN SMALL LETTER W}\N{FULLWIDTH LATIN SMALL LETTER I}\N{FULLWIDTH LATIN SMALL LETTER D}\N{FULLWIDTH LATIN SMALL LETTER E}).
9 regular characters: (normal) and (normal).
$ uniquote --xml sample
1 NFD single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
2 NFC single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
3 NFD multiple combining characters: (hâçk) and (hãçk).
3 NFC multiple combining characters: (hẫk) and (hãk).
5 invisible characters: (4⁄3r³) and (4⁄3r³).
6 astral characters: (𝐂 = sqrt[𝐀 + 𝐁]) and (𝐂 = sqrt[𝐀 + 𝐁]).
7 astral + combining chars: (𝐂 = sqrt[𝐀 + 𝐁]) and (𝐂 = sqrt[𝐀 + 𝐁]).
8 wide characters: (w) and (w).
9 regular characters: (normal) and (normal).
$ uniquote --verbose --html sample
1 NFD single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
2 NFC single combining characters: (crème brûlée et fiancé) and (crème brûlée et fiancé).
3 NFD multiple combining characters: (hẫç̌k) and (hã̂ç̌k).
3 NFC multiple combining characters: (hẫç̌k) and (hã̂ç̌k).
5 invisible characters: (4⁄3πr³) and (4⁄3πr³).
6 astral characters: (𝐂 = sqrt[𝐀² + 𝐁²]) and (𝐂 = sqrt[𝐀² + 𝐁²]).
7 astral + combining chars: (𝐂̅ = sqrt[𝐀̅² + 𝐁̅²]) and (𝐂̅ = sqrt[𝐀̅² + 𝐁̅²]).
8 wide characters: (wide) and (wide).
9 regular characters: (normal) and (normal).