String to Unicode in C# - unicode

I want to use Unicode in my code. My Unicode value is 0100 and I am adding my Unicode string \u with my value. When I use string myVal="\u0100", it's working, but when I use like below, it's not working. The value is looking like "\\u1000";. How do I resolve this?
I want to use it like the below one, because the Unicode value may vary sometimes.
string uStr=#"\u" + "0100";

There are a couple of problems here. One is that #"\u" is actually the literal string "\u" (can also be represented as "\u").
The other issue is that you cannot construct a string in the way you describe because "\u" is not a valid string by itself. The compiler is expecting a value to follow "\u" (like "\u0100") to determine what the encoded value is supposed to be.
You need to keep in mind that strings in .NET are immutable, which means that when you look at what is going on behind the scenes with your concatenated example (`#"\u"+"0100"), this is what is actually happening:
Create the string "\u"
Create the string "0100"
Create the string "\u0100"
So you have three strings in memory. In order for that to happen all of the strings must be valid.
The first option that comes to mind for handling those values is to actually parse them as integers, and then convert them to characters. Something like:
var unicodeValue = (char)int.Parse("0100",System.Globalization.NumberStyles.AllowHexSpecifier);
Which will give you the single Unicode character value. From there, you can add it to a string, or convert it to a string using ToString().

Related

Proper handling of UTF8 string concatenation

I just learned that it's OK for a Unicode string to contain isolated combining characters.
This triggers another question, relative to concatenation of strings starting with such characters.
I'm developing a UTF8String object, to make UTF-8 string handling easier.
This object has a concat() method, that concatenates another string to the current one.
If the second string starts with a combining character, should I add a non-breaking space between the two strings, to avoid combining the previously isolated first character of the second string, to the last character of the first string?
Or would it be expected to have the combination occur?
I'm developing a UTF8String object, to make UTF-8 string handling easier. [...] should I add a non-breaking space between the two strings?
I would say definitely not. Handling byte encodings like UTF-8 is a separate, lower-level concern than handling grapheme boundaries. Mixing the two issues together would be an unexpected, unwelcome layering violation.
If you want to build a string class that treats grapheme clusters as its indivisible units that's fine, but that's a different animal (and quite a lot of work to do consistently).

Convert unknown symbols to cyrillic

I have this kind of symbols in db table (Наиме) , and I don't know who inserted this data to table.Is there any way to convert them to cyrillic ?
Yes, you can do the conversion. Since you haven't mentioned any langauge, so the logic is given:
Assuming the string length is even, take two immediate characters.
Combine the underlying byte values of two characters to give a 16 bit value. This gives you the multi-byte value of Cryllic character. You can decode the value to give its representation using a proper decoding format like utf-8.
Repeat points 1 and 2 for next two characters until the end of string.
If you want, you can implement it in any language of your choice.

String that cannnot be generated by SHA1

How do I generate or find string that can't be possibly generated by SHA1 encrypting of any input string?
The reason I ask this is because I need a global password placeholder in user table.
Thanks
It depends on the representation you use to store the SHA1-hash, actually. But just a * like sometimes used in /etc/passwd, should work. Actually an empty string would work, too, but I would use something more explicid -- like '*invalid'
If you are using the standard hex representation (like '68ac906495480a3404beee4874ed853a037a7a8f' e.g.), you could use everything that is not a 40digit hex number actually. Use some ascii char, not in [0-9a-f] better yet not in [0-9a-zA-Z].

Determine if non-numerical characters have been pasted into UITextField

For a specialized calculator I would like to allow copy / paste for a textfield which is meant for numerical values only. So, only numerical characters should be actually pasted or the pasted string should be rejected if it contains non-numerical characters.
I was thinking about using UITextFieldDelegates textField:shouldChangeCharactersInRange:replacementString: method to check the pasted string for non-numerical characters. But NSString offers no method for checking whether it does NOT contain characters specified in a single set. So this way I would need to check occurances of characters from several sets, which is clumsy and these checks would run for every single number that would be typed in, which appears like quite some overhead to me.
Another way would be to iterate and check for every character in the replacement string whether there's a match in a numerical set.
Either way would propably work, but I feel like I'm missing something.
Do you have any advice? Is there a convenience method to achieve this?
But NSString offers no method for checking whether it does NOT contain characters specified in a single set
sure it does.
if([myString rangeOfCharacterFromSet:myCharacterSet].location ==NSNotFound)
{
//means there is no character from specified set in specified string
}

How should I handle digits from different sets of UNICODE digits in the same string?

I am writing a function that transliterates UNICODE digits into ASCII digits, and I am a bit stumped on what to do if the string contains digits from different sets of UNICODE digits. So for example, if I have the string "\x{2463}\x{24F6}" ("④⓶"). Should my function
return 42?
croak that the string contains mixed sets?
carp that the string contains mixed sets and return 42?
give the user an additional argument to specify one of the three above behaviours?
do something else?
Your current function appears to do #1.
I suggest that you should also write another function to do #4, but only when the requirement appears, and not before .
I'm sure Joel wrote about "premature implementation" in a blog article sometime recently, but I can't find it.
I'm not sure I see a problem.
You support numeric conversion from a range of scripts, which is to say, you are aware of the Unicode codepoints for their numeric characters.
If you find an unknown codepoint in your input data, it is an error.
It is up to you what you do in the event of an error; you may insert a space or underscore, or you may abort conversion. What you would do will depend on the environment in which your function executes; it is not something we can tell you.
My initial thought was #4; strictly based on the fact that I like options. However, I changed my mind, when I viewed your function.
The purpose of the function seems to be, simply, to get the resulting digits 0..9. Users may find it useful to send in mixed sets (a feature :) . I'll use it.
If you ever have to handle input in bases greater than 10, you may end up having to treat many variants on the first 6 letters of the Latin alphabet ('ABCDEF') as digits in all their forms.