Get unicode category from rune

Get unicode category from rune - unicode

I'm looking for a way to get the unicode category (RangeTable) from a rune in Go. For example, the character a maps to the Ll category. The unicode package specifies all of the categories (http://golang.org/pkg/unicode/#pkg-variables), but I don't see any way to lookup the category from a given rune. Do I need to manually construct the RangeTable from the rune using the appropriate offsets?

The docs for the "unicode" package does not have a method that returns ranges for the rune but it is not very tricky to build one:
func cat(r rune) (names []string) {
names = make([]string, 0)
for name, table := range unicode.Categories {
if unicode.Is(table, r) {
names = append(names, name)
}
}
return
}

Here is an alternative version based on the accepted answer, that returns the Unicode Category:
// UnicodeCategory returns the Unicode Character Category of the given rune.
func UnicodeCategory(r rune) string {
for name, table := range unicode.Categories {
if len(name) == 2 && unicode.Is(table, r) {
return name
}
}
return "Cn"
}

Related

Is there a way to sort string lists by numbers inside of the strings?

Is there a way to sort something like:
List<String> hi = ['1hi', '2hi','5hi', '3hi', '4hi'];
to this?
['1hi', '2hi','3hi', '4hi', '5hi']

Just calling List<String>.sort() by itself will do a lexicographic sort. That is, your strings will be sorted in character code order, and '10' will be sorted before '2'. That usually isn't expected.
A lexicographic sort will work if your numbers have leading 0s to ensure that all numbers have the same number of digits. However, if the number of digits is variable, you will need to parse the values of the numbers for sorting. A more general approach is to provide a callback to .sort() to tell it how to determine the relative ordering of two items.
Luckily, package:collection has a compareNatural function that can do this for you:
import 'package:collection/collection.dart';
List<String> hi = ['1hi', '2hi','5hi', '3hi', '4hi'];
hi.sort(compareNatural);
If your situation is a bit more complicated and compareNatural doesn't do what you want, a more general approach is to make the .sort() callback do parsing itself, such as via a regular expression:
/// Returns the integer prefix from a string.
///
/// Returns null if no integer prefix is found.
int parseIntPrefix(String s) {
var re = RegExp(r'(-?[0-9]+).*');
var match = re.firstMatch(s);
if (match == null) {
return null;
}
return int.parse(match.group(1));
}
int compareIntPrefixes(String a, String b) {
var aValue = parseIntPrefix(a);
var bValue = parseIntPrefix(b);
if (aValue != null && bValue != null) {
return aValue - bValue;
}
if (aValue == null && bValue == null) {
// If neither string has an integer prefix, sort the strings lexically.
return a.compareTo(b);
}
// Sort strings with integer prefixes before strings without.
if (aValue == null) {
return 1;
} else {
return -1;
}
}
void main() {
List<String> hi = ['1hi', '2hi','5hi', '3hi', '4hi'];
hi.sort(compareIntPrefixes);
}

You can sort the list like this:
hi.sort();
(because numbers sort before letters in its implementation)

Swift - How to check a string not included punctuations and numbers

I want to check a string to be able to understand that string is suitable for using as a display name in the app. Below block looks only for english characters. How can I cover all language letters? Also all punctuations and numbers won't be allowed.
func isSuitableForDisplayName(inputString: String) -> Bool {
let mergedString = inputString.stringByRemovingWhitespaces
let characterset = CharacterSet(charactersIn: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
if mergedString.rangeOfCharacter(from: characterset.inverted) != nil {
return false
} else {
return true
}
}

You can use CharacterSet.letters, which contains all the characters in the Unicode categories L and M.
Category M includes combining marks. If you don't want those, use:
CharacterSet.letters.subtracting(.nonBaseCharacters)
Also, your way of checking whether a string contains only the characters in a character set is quite weird. I would do something like this:
return mergedString.trimmingCharacters(in: CharacterSet.letters) == ""

Convert a string slice to a BSON array

I am trying to insert an array into a MongoDB instance using Go. I have the [] string slice in Go and want to convert it into a BSON array to pass it to the DB using the github.com/mongodb/mongo-go-driver driver.
var result bson.Array
for _, data := range myData {
value := bson.VC.String(data)
result.Append(value)
}
This loops over each element of my input data and tries to append it to the BSON array. However the line with the Append() fails with panic: document is nil. How should I do this conversion?

Edit: The code in the question and this answer is no longer relevant because the bson.Array type was deleted from the package. At the time of this edit, the bson.A and basic slice operations should be used to construct arrays.
Use the factory function NewArray to create the array:
result := bson.NewArray()
for _, data := range myData {
value := bson.VC.String(data)
result.Append(value)
}

As mentioned by #Cerise bson.Array has since been deleted. I do this with multiple utility functions as follows:
func BSONStringA(sa []string) (result bson.A) {
result = bson.A{}
for_, e := range sa {
result = append(result, e)
}
return
}
func BSONIntA(ia []string) (result bson.A) {
// ...
}

Converting a slice of string (ids) to BSON array
var objIds bson.A
for _, val := range ids {
objIds = append(objIds, val)
}
log.Println(objIds)

How can I check if a string contains Chinese in Swift?

I want to know that how can I check if a string contains Chinese in Swift?
For example, I want to check if there's Chinese inside:
var myString = "Hi! 大家好！It's contains Chinese!"
Thanks!

This answer
to How to determine if a character is a Chinese character can also easily be translated from
Ruby to Swift (now updated for Swift 3):
extension String {
var containsChineseCharacters: Bool {
return self.range(of: "\\p{Han}", options: .regularExpression) != nil
}
}
if myString.containsChineseCharacters {
print("Contains Chinese")
}
In a regular expression, "\p{Han}" matches all characters with the
"Han" Unicode property, which – as I understand it – are the characters
from the CJK languages.

Looking at questions on how to do this in other languages (such as this accepted answer for Ruby) it looks like the common technique is to determine if each character in the string falls in the CJK range. The ruby answer could be adapted to Swift strings as extension with the following code:
extension String {
var containsChineseCharacters: Bool {
return self.unicodeScalars.contains { scalar in
let cjkRanges: [ClosedInterval<UInt32>] = [
0x4E00...0x9FFF, // main block
0x3400...0x4DBF, // extended block A
0x20000...0x2A6DF, // extended block B
0x2A700...0x2B73F, // extended block C
]
return cjkRanges.contains { $0.contains(scalar.value) }
}
}
}
// true:
"Hi! 大家好！It's contains Chinese!".containsChineseCharacters
// false:
"Hello, world!".containsChineseCharacters
The ranges may already exist in Foundation somewhere rather than manually hardcoding them.
The above is for Swift 2.0, for earlier, you will have to use the free contains function rather than the protocol extension (twice):
extension String {
var containsChineseCharacters: Bool {
return contains(self.unicodeScalars) {
// older version of compiler seems to need extra help with type inference
(scalar: UnicodeScalar)->Bool in
let cjkRanges: [ClosedInterval<UInt32>] = [
0x4E00...0x9FFF, // main block
0x3400...0x4DBF, // extended block A
0x20000...0x2A6DF, // extended block B
0x2A700...0x2B73F, // extended block C
]
return contains(cjkRanges) { $0.contains(scalar.value) }
}
}
}

The accepted answer only find if string contains Chinese character, i created one suit for my own case:
enum ChineseRange {
case notFound, contain, all
}
extension String {
var findChineseCharacters: ChineseRange {
guard let a = self.range(of: "\\p{Han}*\\p{Han}", options: .regularExpression) else {
return .notFound
}
var result: ChineseRange
switch a {
case nil:
result = .notFound
case self.startIndex..<self.endIndex:
result = .all
default:
result = .contain
}
return result
}
}
if "你好".findChineseCharacters == .all {
print("All Chinese")
}
if "Chinese".findChineseCharacters == .notFound {
print("Not found Chinese")
}
if "Chinese你好".findChineseCharacters == .contain {
print("Contains Chinese")
}
gist here: https://gist.github.com/williamhqs/6899691b5a26272550578601bee17f1a

Try this in Swift 2:
var myString = "Hi! 大家好！It's contains Chinese!"
var a = false
for c in myString.characters {
let cs = String(c)
a = a || (cs != cs.stringByApplyingTransform(NSStringTransformMandarinToLatin, reverse: false))
}
print("\(myString) contains Chinese characters = \(a)")

I have created a Swift 3 String extension for checking how much Chinese characters a String contains. Similar to the code by Airspeed Velocity but more comprehensive. Checking various Unicode ranges to see whether a character is Chinese. See Chinese character ranges listed in the tables under section 18.1 in the Unicode standard specification: http://www.unicode.org/versions/Unicode9.0.0/ch18.pdf
The String extension can be found on GitHub: https://github.com/niklasberglund/String-chinese.swift
Usage example:
let myString = "Hi! 大家好！It contains Chinese!"
let chinesePercentage = myString.chinesePercentage()
let chineseCharacterCount = myString.chineseCharactersCount()
print("String contains \(chinesePercentage) percent Chinese. That's \(chineseCharacterCount) characters.")

Does Go allow specification of an interface for a map with particular key type?

I wrote a function that would return a sorted slice of strings from a map[string]Foo. I'm curious what is the best way to create a generic routine that can return a sorted slice of strings from any type that is a map with strings as keys.
Is there a way to do it using an interface specification? For example, is there any way to do something like:
type MapWithStringKey interface {
<some code here>
}
To implement the interface above, a type would need strings as keys. I could then write a generic function that returns a sorted list of keys for fulfilling types.
This is my current best solution using the reflect module:
func SortedKeys(mapWithStringKey interface{}) []string {
keys := []string{}
typ := reflect.TypeOf(mapWithStringKey)
if typ.Kind() == reflect.Map && typ.Key().Kind() == reflect.String {
switch typ.Elem().Kind() {
case reflect.Int:
for key, _ := range mapWithStringKey.(map[string]int) {
keys = append(keys, key)
}
case reflect.String:
for key, _ := range mapWithStringKey.(map[string]string) {
keys = append(keys, key)
}
// ... add more cases as needed
default:
log.Fatalf("Error: SortedKeys() does not handle %s\n", typ)
}
sort.Strings(keys)
} else {
log.Fatalln("Error: parameter to SortedKeys() not map[string]...")
}
return keys
}
Click for Go Playground version
I'm forced to code type assertions for each supported type even though at compile time, we should know the exact type of the mapWithStringKey parameter.

You cannot make partial types. But you can define an interface which serves your purpose:
type SortableKeysValue interface {
// a function that returns the strings to be sorted
Keys() []string
}
func SortedKeys(s SortableKeysValue) []string {
keys := s.Keys()
sort.Strings(keys)
return keys
}
type MyMap map[string]string
func (s MyMap) Keys() []string {
keys := make([]string, 0, len(s))
for k, _ := range s {
keys = append(keys, k)
}
return keys
}
Try it here: http://play.golang.org/p/vKfri-h4Cp

Hope that helps (go-1.1):
package main
import (
"fmt"
"reflect"
)
var m = map[string]int{"a": 3, "b": 4}
func MapKeys(m interface{}) (keys []string) {
v := reflect.ValueOf(m)
for _, k := range v.MapKeys() {
keys = append(keys, k.Interface().(string))
}
return
}
func main() {
fmt.Printf("%#v\n", MapKeys(m))
}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Get unicode category from rune - unicode

Related

Is there a way to sort string lists by numbers inside of the strings?

Swift - How to check a string not included punctuations and numbers

Convert a string slice to a BSON array

How can I check if a string contains Chinese in Swift?

Does Go allow specification of an interface for a map with particular key type?

Categories

Resources