String that contains English and Hebrew letters gets messed up after String.Join() - C# .NET - string-concatenation

I have a string containing both English and Hebrew characters:
"Hitachi - היטצ'י:Hitachi – cartel CRT"
1st Step: flip the two parts that are separated by :.
Expected result: "Hitachi – cartel CRT:Hitachi - היטצ'י"
Next: I would like to concatenate the following text: ":אגם:עץ תיוק"
Final expected result: "Hitachi - cartel CRT:Hitachi - אגם:עץ תיוק:היטצ'י"
Actual result: "Hitachi – cartel CRT:Hitachi - היטצ'י:אגם:עץ תיוק"
This is my current code:
string path = "Hitachi - היטצ'י:Hitachi – cartel CRT";
string[] splittedByColonPath = path.Split(':');
Array.Reverse(splittedByColonPath);
List<string> list = new List<string>(splittedByColonPath);
list.Add("אגם:עץ תיוק:");
string result = String.Join(":", list.ToArray());
Any ideas on how to rearrange it the proper way?

The String.Join is working just fine, and the string is exactly what you want it to be. (You can test this if you like by writing some code to printthe string one character at a time, one character on each line.) The trouble is that, when displaying it, all the Hebrew text and colons is treated as one phrase, and since Hebrew is primarily right-to-left that means the first word in the phrase appears on the right.
Depending on what you want to achieve, this may be fine (eg if you're passing it to another program that expects data separeated by colons - in that case, the string may look wrong, but the other program will interpret it just fine). But if you want it to look how you're expecting, you have to force the display algorithm to treat the colons as left-to-right. You may be able to do this by changing the code to be
string result = String.Join("\u200e:"), list.ToArray());
The \u200e is a left-to-right marker (LRM), which causes any adjacent punctuation to be treated as left-to-right.
The downside of doing this is that any other program interpreting the data may not expect the LRM and may be confused by it.

Related

crystal reports attempting to link two tables by matching string with no luck

As stated in the title, I have two tables I'm attempting to link. Both Strings appear to be a match, however Crystal Reports is not picking it up. The only thing I can think is that that length of the field is different, even though the strings are the same. could that cause a discrepancy? If so how can I correct for it? Thank you
Length of the string will prevent a match. If you are using the Trim(string) function, that only removes spaces found at the beginning or end of your string, so the two strings could still be of different lengths after using this function. You will need to use another function to capture a substring of the original string. To do this you can use the Left(string, length) function to ensure both strings are the same length.
If they still do not match then you may have non-printable characters in one or both of your strings. Carriage Return and Line Feed tend to be the most commonly found non-printable characters. A Carriage Return is represented as Chr(10), while a Line Feed is represented as Chr(13). These are Built In Constants similar to those found in VBA and Visual Basic.
You can use a find and replace to remove them with the following formula. Its not a bad idea to also include the trim and left functions in this as well to ensure you get the best match possible.
Replace(Replace(Left(Trim({YourStringField}), 10),Chr(10), ""),Chr(13), "")
There are a few additional Built In Constants you may need to check for if this doesn't work. A Tab is represented as Chr(9) for example. Its very rare for strings to contain the other Built In Constants though. In most cases Carriage Return and Line Feed are the only ones that are typically found in Plain Text. Tabs and the other constants should only be found in Rich Text and are very rare in string data.

Swift Thai Localization Problems

There seems to be a problem with the String library that apple uses.
Here's my Localizable.strings
"error_failed_to_retrieve_certificate" = "เกิิดผิดพลาดในการกู้คะแนน";
Here's how I set it to any view
anyView.text = return NSLocalizedString("error_failed_to_retrieve_certificate", comment: "")
But somehow the string that is being displayed gets warped, when it gets displayed, (the second character becomes different.
Here's what it looks like too when I search it using the Project Search.
But on the Strings it looks different (notice the third character)
Here's one image that is side by side
Note that I don't know any Thai.
It seems like that your string has an extra ิ (U+0E34 THAI CHARACTER SARA I) in it. The character before that, กิ, is already two code points combined - ก (U+0E01 THAI CHARACTER KO KAI) and ิ, so the extra ิ got displayed alone. I would say it's an Xcode bug.
I've removed the extra character here:
เกิดผิดพลาดในการกู้คะแนน
Copy and paste that and it should be fine.
You need to check if you have unique key "error_failed_to_retrieve_certificate". this key value is unique.

How to print non-BMP Unicode characters in Tkinter (e.g. 𝄫)

Note: Non-BMP characters can be displayed in IDLE as of Python 3.8 (so, it's possible Tkinter might display them now, too, since they both use TCL), which was released some time after I posted this question. I plan to edit this after I try out Python 3.9 (after I install an updated version of Xubuntu). I also read the editing these characters in IDLE might not be as straightforward as other characters; see the last comment here.
So, today I was making shortcuts for entering certain Unicode characters. All was going well. Then, when I decided to do these characters (in my Tkinter program; they wouldn't even try to go in IDLE), 𝄫 and 𝄪, I got a strange unexpected error and my program started deleting just about everything I had written in the text box. That's not acceptable.
Here's the error:
_tkinter.TclError: character U+1d12b is above the range (U+0000-U+FFFF) allowed by Tcl
I realize most of the Unicode characters I had been using only had four characters in the code. For some reason, it doesn't like five.
So, is there any way to print these characters in a ScrolledText widget (let alone without messing everything else up)?
UTF-8 is my encoding. I'm using Python 3.4 (so UTF-8 is the default).
I can print these characters just fine with the print statement.
Entering the character without just using ScrolledText.insert (e.g. Ctrl-shift-u, or by doing this in the code: b'\xf0\x9d\x84\xab') does actually enter it, without that error, but it still starts deleting stuff crazily, or adding extra spaces (including itself, although it reappears randomly at times).
There is currently no way to display those characters as they are supposed to look in Tkinter in Python 3.4 (although someone mentioned how using surrogate pairs may work [in Python 2.x]). However, you can implement methods to convert the characters into displayable codes and back, and just call them whenever necessary. You have to call them when you print to Text widgets, copy/paste, in file dialogs*, in the tab bar, in the status bar, and other stuff.
*The default Tkinter file dialogs do not allow for much internal engineering of the dialogs. I made my own file dialogs, partly to help with this issue. Let me know if you're interested. Hopefully I'll post the code for them here in the future.
These methods convert out-of-range characters into codes and vice versa. The codes are formatted with ordinal numbers, like this: {119083ū}. The brackets and the ū are just to distinguish this as a code. {119083ū} represents 𝄫. As you can see, I haven’t yet bothered with a way to escape codes, although I did purposefully try to make the codes very unlikely to occur. The same is true for the ᗍ119083ūᗍ used while converting. Anyway, I'm meaning to add escape sequences eventually. These methods are taken from my class (hence the self). (And yes, I know you don’t have to use semi-colons in Python. I just like them and consider that they make the code more readable in some situations.)
import re;
def convert65536(self, s):
#Converts a string with out-of-range characters in it into a string with codes in it.
l=list(s);
i=0;
while i<len(l):
o=ord(l[i]);
if o>65535:
l[i]="{"+str(o)+"ū}";
i+=1;
return "".join(l);
def parse65536(self, match):
#This is a regular expression method used for substitutions in convert65536back()
text=int(match.group()[1:-2]);
if text>65535:
return chr(text);
else:
return "ᗍ"+str(text)+"ūᗍ";
def convert65536back(self, s):
#Converts a string with codes in it into a string with out-of-range characters in it
while re.search(r"{\d\d\d\d\d+ū}", s)!=None:
s=re.sub(r"{\d\d\d\d\d+ū}", self.parse65536, s);
s=re.sub(r"ᗍ(\d\d\d\d\d+)ūᗍ", r"{\1ū}", s);
return s;
My answer is based on #Shule answer but provide more pythnoic and easy to read code. It also provide a real case.
This is the methode populating items to a tkinter.Listbox. There is no back conversion. This solution only take care of displaying strings with Tcl-unallowed characters.
class MyListbox (Listbox):
# ...
def populate(self):
"""
"""
def _convert65536(to_convert):
"""Converts a string with out-of-range characters in it into a
string with codes in it.
Based on <https://stackoverflow.com/a/28076205/4865723>.
This is a workaround because Tkinter (Tcl) doesn't allow unicode
characters outside of a specific range. This could be emoticons
for example.
"""
for character in to_convert[:]:
if ord(character) > 65535:
convert_with = '{' + str(ord(character)) + 'ū}'
to_convert = to_convert.replace(character, convert_with)
return to_convert
# delete all listbox items
self.delete(0, END)
# add items to listbox
for item in mydata_list:
try:
self.insert(END, item)
except TclError as err:
_log.warning('{} It will be converted.'.format(err))
self.insert(END, _convert65536(item))

String to Unicode in C#

I want to use Unicode in my code. My Unicode value is 0100 and I am adding my Unicode string \u with my value. When I use string myVal="\u0100", it's working, but when I use like below, it's not working. The value is looking like "\\u1000";. How do I resolve this?
I want to use it like the below one, because the Unicode value may vary sometimes.
string uStr=#"\u" + "0100";
There are a couple of problems here. One is that #"\u" is actually the literal string "\u" (can also be represented as "\u").
The other issue is that you cannot construct a string in the way you describe because "\u" is not a valid string by itself. The compiler is expecting a value to follow "\u" (like "\u0100") to determine what the encoded value is supposed to be.
You need to keep in mind that strings in .NET are immutable, which means that when you look at what is going on behind the scenes with your concatenated example (`#"\u"+"0100"), this is what is actually happening:
Create the string "\u"
Create the string "0100"
Create the string "\u0100"
So you have three strings in memory. In order for that to happen all of the strings must be valid.
The first option that comes to mind for handling those values is to actually parse them as integers, and then convert them to characters. Something like:
var unicodeValue = (char)int.Parse("0100",System.Globalization.NumberStyles.AllowHexSpecifier);
Which will give you the single Unicode character value. From there, you can add it to a string, or convert it to a string using ToString().

NSURL doesn't work any time

i have the following problem sometimes my openURL-Dialog works perfectly, then i looked at the variable from the url and that is the variable:
www.brehm-gmbh.de
but some other times there are some crazy elements at the end of the variable like this:
www.adamczyk-fenster.de%E2%80%8E
i get this pages from an .asc file and both are in this file normal without this elements,
what can i do to solve this problem?
thank you all for helping beforehand
From Wikipedia:
The left-to-right mark (LRM) is a
control character or non-printing
character, used in the computerized
typesetting of bi-directional text,
containing mixed left-to-right scripts
(such as English and Russian) and
right-to-left scripts (such as Arabic
and Hebrew). It is used to change the
way adjacent characters are grouped
with respect to text direction.
You're getting this because (1) you've got non-English URLs, are composing URLs from non-English strings or you have some other non-English elements and the string encoding is attempting to compensate or (2) it's garbarge being interpreted as an encoding (unlikely if it is consistant.)
Call -[NSString localizedNameOfStringEncoding] on the string before you use it see what encoding it is using. You probably need to explicitly establish an encoding when you read in the strings before you put them in the NSURL.