How to manually convert between Latin-5 and Unicode code points? - unicode

I have socket connection and I can get data from my socket. The data encoded as Latin-5.
Example: // will get data from socket like:
_xCustomerName = “879255:_:NÝYAZÝ TOROS”;
_xCustomerName.replaceAll(new RegExp(r'Ý'), 'İ');
An seems it doesnt replace the 'Ý' to ‘İ’.
How to manually convert between Latin-5 and Unicode code points?

The Latin-5 characters are specified, e.g., here. That looks like Cyrillic to me, so what you need is probably codepage 1254 Windows Latin-5 or ISO-8859-9/Latin-9. They both have Turkish characters with capital dotted I at position 0xDD. The six-character change you mention in your follow-up post matches Latin-9.
From the character table, you can build a translation table with one character per byte in the 0..255 range:
import "dart:typed_data";
// The Unicode characters of ISO-8859-9.
const _latin9Table =
"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
"\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
" !\"#\$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_"
"`abcdefghijklmnopqrstuvwxyz{|}~\x7f"
"\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
"\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
"\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
"\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
"\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
"\u011e\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\u0130\u015e\xdf"
"\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef"
"\u011f\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\u0131\u015f\xff";
String decodeLatin9(List<int> bytes) {
Uint16List charCodes = new Uint16List(bytes.length);
for (int i = 0; i < bytes.length; i++) {
charCodes[i] = _latin9Table.codeUnitAt(bytes[i]);
}
return String.fromCharCodes(charCodes);
}
That allows decoding of Latin-9. If you want to combine it into that dart:convert framework, you may need to create a Converter around that, or even an Encoding. Sadly, the package:convert doesn't have an easy way to create a code-page convereter. I think it should.

I know the differences Latin-5 (Turkish) character codes.
With simple switch statements I can turn “879255::NÝYAZÝ TOROS” to “879255::NİYAZİ TOROS”. I wish I had a better solution (some sort of Dart plugin) but this is best I came so far.
_xCustomerName.runes.forEach((int rune) {
var character = new String.fromCharCode(rune);
switch (character) {
case 'Ð':
character = 'Ğ';
break;
case 'Ý':
character = 'İ';
break;
case 'Þ':
character = 'Ş';
break;
case 'ð':
character = 'ğ';
break;
case 'ý':
character = 'ı';
break;
case 'þ':
character = 'ş';
break;
}
_xCustomerNameTurkish = _xCustomerNameTurkish + character;
});
print(_xCustomerNameTurkish);

Related

String transformation for subject course code for Dart/Flutter

For interaction with an API, I need to pass the course code in <string><space><number> format. For example, MCTE 2333, CCUB 3621, BTE 1021.
Yes, the text part can be 3 or 4 letters.
Most users enter the code without the space, eg: MCTE2333. But that causes error to the API. So how can I add a space between string and numbers so that it follows the correct format.
You can achieve the desired behaviour by using regular expressions:
void main() {
String a = "MCTE2333";
String aStr = a.replaceAll(RegExp(r'[^0-9]'), ''); //extract the number
String bStr = a.replaceAll(RegExp(r'[^A-Za-z]'), ''); //extract the character
print("$bStr $aStr"); //MCTE 2333
}
Note: This will produce the same result, regardless of how many whitespaces your user enters between the characters and numbers.
Try this.You have to give two texfields. One is for name i.e; MCTE and one is for numbers i.e; 1021. (for this textfield you have to change keyboard type only number).
After that you can join those string with space between them and send to your DB.
It's just like hack but it will work.
Scrolling down the course codes list, I noticed some unusual formatting.
Example: TQB 1001E, TQB 1001E etc. (With extra letter at the end)
So, this special format doesn't work with #Jahidul Islam's answer. However, inspired by his answer, I manage to come up with this logic:
var code = "TQB2001M";
var i = course.indexOf(RegExp(r'[^A-Za-z]')); // get the index
var j = course.substring(0, i); // extract the first half
var k = course.substring(i).trim(); // extract the others
var formatted = '$j $k'.toUpperCase(); // combine & capitalize
print(formatted); // TQB 1011M
Works with other formats too. Check out the DartPad here.
Here is the entire logic you need (also works for multiple whitespaces!):
void main() {
String courseCode= "MMM 111";
String parsedCourseCode = "";
if (courseCode.contains(" ")) {
final ensureSingleWhitespace = RegExp(r"(?! )\s+| \s+");
parsedCourseCode = courseCode.split(ensureSingleWhitespace).join(" ");
} else {
final r1 = RegExp(r'[0-9]', caseSensitive: false);
final r2 = RegExp(r'[a-z]', caseSensitive: false);
final letters = courseCode.split(r1);
final numbers = courseCode.split(r2);
parsedCourseCode = "${letters[0].trim()} ${numbers.last}";
}
print(parsedCourseCode);
}
Play around with the input value (courseCode) to test it - also use dart pad if you want. You just have to add this logic to your input value, before submitting / handling the input form of your user :)

How to display Unicode Smiley from json response dynamically in flutter

How to display Unicode Smiley from json response dynamically in flutter. It's display properly when i declare string as a static but from dynamic response it's not display smiley properly.
Static Declaration: (Working)
child: Text("\ud83d\ude0e\ud83d\ude0eThis is just test notification..\ud83d\ude0e\ud83d\ude0e\ud83d\udcaf\ud83d\ude4c")
Dynamic Response:
"message":"\\ud83d\\ude4c Be Safe at your home \\ud83c\\udfe0",
When i'm parse and pass this response to Text then it's consider Unicode as a String and display as a string instead of Smiley Code is below to display text with smiley:
child: Text(_listData[index].message.toString().replaceAll("\\\\", "\\"))
Already go through this: Question but it's only working when single unicode not working with multiple unicode.
Anyone worked with text along with unicode caracter display dynamically then please let me know.
Another alternate Good Solution I would give to unescape characters is this:
1st ->
String s = "\\ud83d\\ude0e Be Safe at your home \\ud83c\\ude0e";
String q = s.replaceAll("\\\\", "\\");
This would print and wont be able to escape characters:
\ud83d\ud83d Be Safe at your home \ud83c\ud83d
and above would be the output.
So what one can do is either unescape them while parsing or use:
String convertStringToUnicode(String content) {
String regex = "\\u";
int offset = content.indexOf(regex) + regex.length;
while(offset > 1){
int limit = offset + 4;
String str = content.substring(offset, limit);
// print(str);
if(str!=null && str.isNotEmpty){
String uni = String.fromCharCode(int.parse(str,radix:16));
content = content.replaceFirst(regex+str,uni);
// print(content);
}
offset = content.indexOf(regex) + regex.length;
// print(offset);
}
return content;
}
This will replace and convert all the literals into unicode characters and result and output of emoji:
String k = convertStringToUnicode(q);
print(k);
😎 Be Safe at your home 🈎
That is above would be the output.
Note: above answer given would just work as good but this is just when you want to have an unescape function and don't need to use third-party libraries.
You can extend this using switch cases with multiple unescape solutions.
Issue resolved by using below code snippet.
Client client = Client();
final response = await client.get(Uri.parse('YOUR_API_URL'));
if (response.statusCode == 200) {
// If the server did return a 200 OK response,
// then parse the JSON.
final extractedData = json.decode(response.body.replaceAll("\\\\", "\\"));
}
Here we need to replace double backslash to single backslash and then decode JSON respone before set into Text like this we can display multiple unicode like this:
final extractedData = json.decode(response.body.replaceAll("\\",
"\"));
Hope this answer help to other

Convert persian unicode to Ascii

I need to get the ASCII code of a Persian string to use it in a program. But the method below give the ? marks: "??? ????"
public string PerisanAscii()
{
//persian string
string unicodeString = "صبح بخیر";
// Create two different encodings.
Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;
// Convert the string into a byte array.
byte[] unicodeBytes = unicode.GetBytes(unicodeString);
// Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
// Convert the new byte[] into a char[] and then into a string.
char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
return asciiString;
}
Can you help me?
Best regards,
Mohsen
You can convert Persian UTF8 data to Windows-1256 (Arabic Windows):
var enc1256 = Encoding.GetEncoding("windows-1256");
var data = enc1256.GetBytes(unicodeString);
System.IO.File.WriteAllBytes(path, data);
ASCII does not support Persian. You may need old school Iran System encoding standard. This is determined by your Autocad application. I don't know if there is a direct Encoding in windows for it or not. But you can convert characters manually too. It's a simple mapping.

Storing Special Characters in Windows Azure Blob Metadata

I have an app that is storing images in a Windows Azure Block Blob. I'm adding meta data to each blob that gets uploaded. The metadata may include some special characters. For instance, the registered trademark symbol (®). How do I add this value to meta data in Windows Azure?
Currently, when I try, I get a 400 (Bad Request) error anytime I try to upload a file that uses a special character like this.
Thank you!
You might use HttpUtility to encode/decode the string:
blob.Metadata["Description"] = HttpUtility.HtmlEncode(model.Description);
Description = HttpUtility.HtmlDecode(blob.Metadata["Description"]);
http://lvbernal.blogspot.com/2013/02/metadatos-de-azure-vs-caracteres.html
The supported characters in the blob metadata must be ASCII characters. To work around this you can either escape the string ( percent encode), base64 encode etc.
joe
HttpUtility.HtmlEncode may not work; if Unicode characters are in your string (i.e. &#8217), it will fail. So far, I have found Uri.EscapeDataString does handle this edge case and others. However, there are a number of characters that get encoded unnecessarily, such as space (' '=chr(32)=%20).
I mapped the illegal ascii characters metadata will not accept and built this to restore the characters:
static List<string> illegals = new List<string> { "%1", "%2", "%3", "%4", "%5", "%6", "%7", "%8", "%A", "%B", "%C", "%D", "%E", "%F", "%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17", "%18", "%19", "%1A", "%1B", "%1C", "%1D", "%1E", "%1F", "%7F", "%80", "%81", "%82", "%83", "%84", "%85", "%86", "%87", "%88", "%89", "%8A", "%8B", "%8C", "%8D", "%8E", "%8F", "%90", "%91", "%92", "%93", "%94", "%95", "%96", "%97", "%98", "%99", "%9A", "%9B", "%9C", "%9D", "%9E", "%9F", "%A0", "%A1", "%A2", "%A3", "%A4", "%A5", "%A6", "%A7", "%A8", "%A9", "%AA", "%AB", "%AC", "%AD", "%AE", "%AF", "%B0", "%B1", "%B2", "%B3", "%B4", "%B5", "%B6", "%B7", "%B8", "%B9", "%BA", "%BB", "%BC", "%BD", "%BE", "%BF", "%C0", "%C1", "%C2", "%C3", "%C4", "%C5", "%C6", "%C7", "%C8", "%C9", "%CA", "%CB", "%CC", "%CD", "%CE", "%CF", "%D0", "%D1", "%D2", "%D3", "%D4", "%D5", "%D6", "%D7", "%D8", "%D9", "%DA", "%DB", "%DC", "%DD", "%DE", "%DF", "%E0", "%E1", "%E2", "%E3", "%E4", "%E5", "%E6", "%E7", "%E8", "%E9", "%EA", "%EB", "%EC", "%ED", "%EE", "%EF", "%F0", "%F1", "%F2", "%F3", "%F4", "%F5", "%F6", "%F7", "%F8", "%F9", "%FA", "%FB", "%FC", "%FD", "%FE" };
private static string MetaDataEscape(string value)
{
//CDC%20Guideline%20for%20Prescribing%20Opioids%20Module%206%3A%20%0Ahttps%3A%2F%2Fwww.cdc.gov%2Fdrugoverdose%2Ftraining%2Fdosing%2F
var x = HttpUtility.HtmlEncode(value);
var sz = value.Trim();
sz = Uri.EscapeDataString(sz);
for (int i = 1; i < 255; i++)
{
var hex = "%" + i.ToString("X");
if (!illegals.Contains(hex))
{
sz = sz.Replace(hex, Uri.UnescapeDataString(hex));
}
}
return sz;
}
The result is:
Before ==> "1080x1080 Facebook Images"
Uri.EscapeDataString =>
"1080x1080%20Facebook%20Images"
After => "1080x1080 Facebook
Images"
I am sure there is a more efficient way, but the hit seems negligible for my needs.

What is the character encoding?

I have several characters that aren't recognized properly.
Characters like:
º
á
ó
(etc..)
This means that the characters encoding is not utf-8 right?
So, can you tell me what character encoding could it be please.
We don't have nearly enough information to really answer this, but the gist of it is: you shouldn't just guess. You need to work out where the data is coming from, and find out what the encoding is. You haven't told us anything about the data source, so we're completely in the dark. You might want to try Encoding.Default if these are files saved with something like Notepad.
If you know what the characters are meant to be and how they're represented in binary, that should suggest an encoding... but again, we'd need to know more information.
read this first http://www.joelonsoftware.com/articles/Unicode.html
There are two encodings: the one that was used to encode string and one that is used to decode string. They must be the same to get expected result. If they are different then some characters will be displayed incorrectly. we can try to guess if you post actual and expected results.
I wrote a couple of methods to narrow down the possibilities a while back for situations just like this.
static void Main(string[] args)
{
Encoding[] matches = FindEncodingTable('Ÿ');
Encoding[] enc2 = FindEncodingTable(159, 'Ÿ');
}
// Locates all Encodings with the specified Character and position
// "CharacterPosition": Decimal position of the character on the unknown encoding table. E.G. 159 on the extended ASCII table
//"character": The character to locate in the encoding table. E.G. 'Ÿ' on the extended ASCII table
static Encoding[] FindEncodingTable(int CharacterPosition, char character)
{
List matches = new List();
byte myByte = (byte)CharacterPosition;
byte[] bytes = { myByte };
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = thisEnc.GetChars(bytes);
if (chars[0] == character)
{
matches.Add(thisEnc);
break;
}
}
return matches.ToArray();
}
// Locates all Encodings that contain the specified character
static Encoding[] FindEncodingTable(char character)
{
List matches = new List();
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = { character };
byte[] temp = thisEnc.GetBytes(chars);
if (temp != null)
matches.Add(thisEnc);
}
return matches.ToArray();
}
Encoding is the form of modifying some existing content; thus allowing it to be parsed by the required destination protocols.
An example of encoding can be seen when browsing the internet:
The URL you visit: www.example.com, may have the search facility to run custom searches via the URL address:
www.example.com?search=...
The following variables on the URL require URL encoding. If you was to write:
www.example.com?search=cat food cheap
The browser wouldn't understand your request as you have used an invalid character of ' ' (a white space)
To correct this encoding error you should exchange the ' ' with '%20' to form this URL:
www.example.com?search=cat%20food%20cheap
Different systems use different forms of encoding, in this example I have used standard Hex encoding for a URL. In other applications and instances you may find the need to use other types of encoding.
Good Luck!