Is there a way to check if a character belongs to a CharacterSet?
I wanna know what CharacterSet should I use for character -. Do I use symbols?
I've checked this documentation but still no idea. https://developer.apple.com/documentation/foundation/characterset
When removing extra whitespace at the end of a string, we do it like this:
let someString = " "
print("\(11111) - \(someString)".trimmingCharacters(in: .whitespaces))
But what if I just want to remove the -? Or any special character such as *?
EDIT: I was looking for a complete set of characters per each CharacterSet if it's possible.
What you want is defined in the Unicode standard. It is referred to as Unicode General Categories. Each Unicode character is in a category.
The Unicode website provides a complete character list showing the character's code, category, and name. You can also find a complete list of Unicode categories as well.
The - is U+2D (HYPHEN-MINUS). It is listed as being in the "Pd" (punctuation) category.
If you look at the documentation for CharacterSet, you will see punctuationCharacters which is documented as:
Returns a character set containing the characters in Unicode General Category P*.
The "Pd" category is included in "P*" (which means any "P" category).
I also found https://www.compart.com/en/unicode/category which is a third party list of each character by category. A bit more user friendly than the Unicode reference.
To summarize. If you want to know which CharacterSet to use for a given character, lookup the character's category using one of the charts I linked. Once you know its category, look at the documentation for CharacterSet to see which predefined character set applies to that category.
Related
There seems to be a problem with the String library that apple uses.
Here's my Localizable.strings
"error_failed_to_retrieve_certificate" = "เกิิดผิดพลาดในการกู้คะแนน";
Here's how I set it to any view
anyView.text = return NSLocalizedString("error_failed_to_retrieve_certificate", comment: "")
But somehow the string that is being displayed gets warped, when it gets displayed, (the second character becomes different.
Here's what it looks like too when I search it using the Project Search.
But on the Strings it looks different (notice the third character)
Here's one image that is side by side
Note that I don't know any Thai.
It seems like that your string has an extra ิ (U+0E34 THAI CHARACTER SARA I) in it. The character before that, กิ, is already two code points combined - ก (U+0E01 THAI CHARACTER KO KAI) and ิ, so the extra ิ got displayed alone. I would say it's an Xcode bug.
I've removed the extra character here:
เกิดผิดพลาดในการกู้คะแนน
Copy and paste that and it should be fine.
You need to check if you have unique key "error_failed_to_retrieve_certificate". this key value is unique.
I'm coming across a strange situation where I cannot search on string tags that end with a special character. So far I've tried ) and ].
For example, given a Fruit index with a record with a tag apple (red), if you query (using the JS library) with tagFilters: "apple (red)", no results will be returned even if there are records with this tag.
However, if you change the tag to apple (red (not ending with a special character), results will be returned.
Is this a known issue? Is there a way to get around this?
EDIT
I saw this FAQ on special characters. However, it seems as though even if I set () as separator characters to index that only effects the direct attriubtes that are searchable, not the tag. is this correct? can I change the separator characters to index on tags?
You should try using the array syntax for your tags:
tagFilters: ["apple (red)"]
The reason it is currently failing is because of the syntax of tagFilters. When you pass a string, it tries to parse it using a special syntax, documented here, where commas mean "AND" and parentheses delimit an "OR" group.
By the way, tagFilters is now deprecated for a much clearer syntax available with the filters parameter. For your specific example, you'd use it this way:
filters: '_tags:"apple (red)"'
Background
I have search indexes containing Greek characters. Many people don't know how to type Greek so they enter something called "beta-code". Beta-code can be converted into Greek. For example, beta-code "NO/MOU" would be converted to "νόμου". Characters such as a slash or parenthesis is used to indicate an accent.
Desired Behavior
I want users to be able to search using either beta-code or text in the Greek script. I figured out that the Whoosh Variations class provides the mechanism I need and it almost solves my problem.
Problem
The Variation class works well except for when a slash or a parenthesis are used to indicate an accent in a users' query. The problem is the query are parsed such that the the special characters used to denote the accent result in the words being split up. For example, a search for "NO/MOU" results in the Variations class being asked to find variations of "no" and "mou" instead of "NO/MOU".
Question
Is there a way to influence how the query is parsed such that slashes and parentheses are included in the search words (i.e. that a search for "NO/MOU" results in a search for a token of ""NO/MOU" instead of "no" and "mou")?
The search parser uses a Tokenizer class for breaking up the search string into individual terms. Whoosh will use the class that is associated with the schema. For example, the case below, the SimpleAnalyzer() will be used when searching the "content" field.
Schema( verse_id = NUMERIC(unique=True, stored=True),
content = TEXT(analyzer=SimpleAnalyzer()) )
By default, the SimpleAnalyzer() uses the following regular expression to tokenize search terms: "\w+(.?\w+)*"
To use a different regular expression, assign the first argument to the SimpleAnalyzer to another regular expression. For example, to include beta-code characters (slashes, parentheses, etc.) in tokens, use the following SimpleAnalyzer:
SimpleAnalyzer( rcompile(r"[\w/*()=\+|&']+(\.?[\w/*()=\+|&']+)*") )
Searches will now allow terms to include the special beta-code characters and the Variations class will be able to convert the term to the unicode version.
How to get stroke count of Chinese character?
Example>
一 => 1
十 => 2
日 => 4
Short answer: You can't without a hardcoded map of characters to stroke counts. And then, you'll have to assume the user is using a particular Chinese variant (e.g. traditional.)
Unicode (the basic character set used by NSString) doesn't distinguish between traditional, simplified, Japanese-specific, Korean-specific, etc. hanzi. Unicode does not encode stroke information directly. Rather, it distinguishes between characters (not their graphical representations) and a character may have different stroke counts depending on language and font used. So while the character 十 may universally have two strokes, other characters will vary.
The example Wikipedia gives is the character for "grass", U+8279, which has four strokes in traditional Chinese, but 3 in every other variant.
You can use "ssc install cnstroke" STATA command for the said purpose.
Thanks, math.
First, call
NSInteger section = [[UILocalizedIndexedCollation currentCollation] sectionForObject:yourObject collationStringSelector:#selector(objectsProperty)];
then check index of section in following array
[UILocalizedIndexedCollation currentCollation].sectionTitles
Remember to add
Localized resources can be mixed = YES
in info.plist
In my current implementation of a UISearchBarController I'm using [NSString compare:] inside the filterContentForSearchText:scope: delegate method to return relevant objects based on their name property to the results UITableView as you start typing.
So far this works great in English and Korean, but what I'd like to be able to do is search within NSString's defined character clusters. This is only applicable for a handfull of languages, of which Korean is one.
In English, compare: returns new results after every letter you enter, but in Korean the results are generated once you complete a recognized grapheme cluster. I would like to be able to search through my Korean objects name property via the individual elements that make up a syllable.
Can anyone shed any light on how to approach this? I'm sure it has something to do with searching through UTF16 characters manually, or by utilising a lower level class.
Cheers!
Here is a specific example that's just not working:
`NSString *string1 = #"이";
`NSString *string2 = #"ㅣ";
NSRange resultRange = [[string1 decomposedStringWithCanonicalMapping] rangeOfString: [string2 decomposedStringWithCanonicalMapping] options:(NSLiteralSearch)];
The result is always NSNotFound, with or without decomposedStringWithCanonicalMapping.
Any ideas?
I'm no expert, but I think you're very unlikely to find a clean solution for what you want. There doesn't seem to be any relationship between a Korean character's Unicode value and the graphemes that it's made up of.
e.g. "이" is \uc774 and "ㅣ" is \u3163. From the perspective of the NSString, they're just two different characters with no specific relationship to each other.
I suspect that you will have to find or create an explicit mapping between characters and their graphemes, and then write your own search function that consults this mapping.
This very long page on Unicode Korean can help you, if it comes to that. It has a table of all the characters which suggests some structured relation between the way characters are numbered and their components.
If you use compare:options with NSLiteralString, it should compare character by character, that is, the Unicode code points, regardless of the grapheme. The default behavior of compare: is to use no options. You could use - decomposedStringWithCanonicalMapping to get the Unicode bytes of the input string, but I'm not sure how that would interact with compare:.