Since I updated Flutter and all libraries, I encounter a strange bug when decoding a list of bytes.
The app communicates with a bluetooth device with flutter_blue library like that:
import 'dart:convert';
var result = await characteristic.read(); // [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
return utf8.decode(result, allowMalformed: true);
The decoded string is displayed in a widget.
Previously, I had no problem, the string seems empty. But recently everything was updated, the string looks empty in the console but not in the widget since I see several empty squares as character. And the length of the string, even after the trim method, is 15, not 0.
I don't find any reason about this change on internet neither how to solve the problem.
Have you ever met this bug? Do you have a good solution?
Thanks
Edit:
The result is the same with allowMalformed = true, of with
new String.fromCharCodes(result)
I think there is a bug with flutter when decoding only 0
Char NUL (the 0) is an allowed character in UTF-8. It looks like the previous updates didn't implement the decoding right and ignored the NUL character. It should be expected that the NUL char is present in UTF-8 as a well-formed char.
The official docs also say:
If allowMalformed is true, the decoder replaces invalid (or unterminated) character sequences with the Unicode Replacement character U+FFFD (�).
So, a solution for this problem is to parse the resulting string and check/replace the NUL chars and/or the Unicode Replacement chars.
As editing in my question, it seems there is a problem with trailing zero.
From decimal to string, space equal to 32, and 0 is normally a empty character. But it's not anymore.
The solution I found is
var result = await characteristic.read();
var tempResult = List<int>.from(result);
tempResult.removeWhere((item) => item == 0);
return utf8.decode(tempResult, allowMalformed: true);
I copy the first list into a mutable list, and remove all 0 from it. That's work perfertly.
Related
A common issue many of the developers might face while building an app with localization (especially when it involves Arabic or an RTL-supported language), is that the search would not result as expected. An example of the issue is:
print(listOfName.contains(InputName))); //prints false
To overcome this issue, I tried comparing the encoded search strings (in both languages) and encoded result strings then only realized that somehow some special characters had been added. For my instance, the characters were RTL[226, 128, 143] and LTR[226, 128, 142]. Before I actually encoded the strings, both search and result were identical or equals to the same. After knowing the extra added characters, I did the following:
var InputNameEncode = InputName.encode
var rightToLeftMark = utf8.decode([226, 128, 143]);
var leftToRightMark = utf8.decode([226, 128, 142]);
InputNameEncode = InputNameEncode.replaceAll(rightToLeftMark, "");
InputNameEncode = InputNameEncode.replaceAll(leftToRightMark, "");
inputName = InputNameEncode.decode
print(listOfName.contains(InputName))); //prints true
As mentioned earlier, in my case the extra characters were RTL and LTR. In your case, you may find something entirely different.
You can know what each encoded set of characters represents on this page: https://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192&number=128&utf8=dec
Flutter Version: 1.22.6
I have the code snippet below (as suggested in this previous Stack Overflow answer ... Deleting all special characters from a string in progress 4GL) which is attempting to remove all extended characters from a string so that I may transmit it to a customer's system which will not accept any extended characters.
do v-int = 128 to 255:
assign v-string = replace(v-string,chr(v-int),"").
end.
It is working perfectly with one exception (which makes me fear there may be others I have not caught). When it gets to 255, it will replace all 'y's in the string.
If I do the following ...
display chr(255) = chr(121). /* 121 is asc code of y */
I get true as the result.
And therefore, if I do the following ...
display replace("This is really strange",chr(255),"").
I get the following result:
This is reall strange
I have verified that 'y' is the only character affected by running the following:
def var v-string as char init "abcdefghijklmnopqrstuvwxyz".
def var v-int as int.
do v-int = 128 to 255:
assign v-string = replace(v-string,chr(v-int),"").
end.
display v-string.
Which results in the following:
abcdefghijklmnopqrstuvwxz
I know I can fix this by removing 255 from the range but I would like to understand why this is happening.
Is this a character collation set issue or am I missing something simpler?
Thanks for any help!
This is a bug. Here's a Progress Knowledge Base article about it:
http://knowledgebase.progress.com/articles/Article/000046181
The workaround is to specify the codepage in the CHR() statement, like this:
CHR(255, "UTF-8", "1252")
Here it is in your example:
def var v-string as char init "abcdefghijklmnopqrstuvwxyz". def var v-int as int.
do v-int = 128 to 255:
assign v-string = replace(v-string, chr(v-int, "UTF-8", "1252"), "").
end.
display v-string.
You should now see the 'y' in the output.
This seems to be a bug!
The REPLACE() function returns an unexpected result when replacing character CHR(255) (ÿ) in a String.
The REPLACE() function modifies the value of the target character, but additionally it changes any occurrence of characters 'Y' and 'y' present in the String.
This behavior seems to affect only the character ÿ. Other characters are correctly changed by REPLACE().
Using default codepage ISO-8859-1
Link to knowledgebase
I'm trying to encode from utf16 to say utf32 using Apple Core Foundation API :
cfString = CFStringCreateWithBytes(nullptr, str, strLen, kCFStringEncodingUTF16, FALSE);
auto range = CFRangeMake(0, CFStringGetLenth(cfString));
CFStringGetBytes(cfString, range, kCFStringEncodingUTF32, 0, false, buffer, bufferSize, usedsize);
Most of the time that works, untill input buffer contains first part of surrogate pair say U+df9f, Corefoundation will simply return output without ill-formed characters.
So to be a bit unicode compliant, I have to manually determine that situation and follow unicode documentation to create standard substitution for that in form of U+FFFD: http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf
Same situation for other encodings: like symbol 0x80 in the middle of utf-8, then CFStringCreateWithBytes always return nullptr instead of pointing to invalid character.
Is that expected behaviour or UB of Corefoundation, or may be there is a hint to tune CF to be reporting malformed input somehow?
UPDATE:
I did exactly following:
UInt8 str[] = {0x41, 0x00, 0x9f, 0xdf}; // coresponding to unicode A + invalid surogate pair
CFStringRef mystr = CFStringCreateWithBytes(nullptr, str, 4, kCFStringEncodingUTF16, false, FALSE);
after that mystr has 2 characters len according to CFStringGetLength(), so looks invalid char gets processed
std::vector<char> str(7);
CFStringGetCString(mystr, &*str.begin(), str.size(), kCFStringEncodingUTF8);
that gives me false, so no conversion to utf8 is possible, and Xcode debug watches shows nothing for string myStr.
So output is nothing for utf8, and c-string, ok after that i checked with conversion to utf-32 with get bytes routine
result = CFStringGetBytes(s, range, kCFStringEncodingUTF32BE, 0, false, buffer, bufferSize, usedSize);
that gives me usedSize=4, result=1, and output contains 0x0041, so only A symbol converted. So that is why i’m thinking no substitution happened for malformed surogate pair.
How can I display a character above U+FFFF? Say I want to show U+1F384. "\u1F384" is interpreted as "\u1F38" followed by the character "4", and gives the error No glyph found for the ? (\u1F38) character. "\uD83C\uDF84" is interpreted as the character "\uD83C" followed by the character "\uDF84", and gives the error
No glyph found for the ? (\uD83C) character
No glyph found for the ? (\uDF84) character
Here is some example code to demonstrate:
PFont font;
void setup()
{
size(128, 128);
background(0);
font = loadFont("Symbola-8.vlw");
textFont(font, 8);
fill(255);
text("\u1F384", 10, 10);
}
void draw()
{
}
As stated by the tag, this is in the language Processing.
\U0001F384 gives the error unexpected char: 'U', presumably because processing doesn't support UTF-32 in that format.
It doesn't really matter how it is displayed, the main problem is making a string contain a character whose decimal codepoint is greater than 65,535.
Unfortunately, there does not appear to be a way to do this.
You can now use '\u{1F384}' in ECMASCript 6 compatible browser.
I'm new to PowerBuilder 12, and would like to know is there any way to determine the encoding (e.g. Unicode, BIG5) of an input file. Any comments and code samples are appreciated! Thanks!
From the PB 12.5 help file :
FileEncoding ( filename )
filename : The name of the file you want to test for encoding type
Return Values
EncodingANSI!
EncodingUTF8!
EncodingUTF16LE!
EncodingUTF16BE!
If filename does not exist, returns null.
Finding Unicode is pretty easy, if you assume the Unicode file has a BOM prefix (although reality is that not all Unicode files do have this). Some code to do this is below. However, I have no idea about Big5; it looks to me (at first glance at the spec, never had occasion to use it) like it doesn't have a similar prefix.
Good luck,
Terry
function of_filetype (string as_filename) returns encoding
integer li_NullCount, li_NonNullCount, li_OffsetTest
long ll_File
encoding le_Return
blob lblb_UTF16BE, lblb_UTF16LE, lblb_UTF8, lblb_Test, lblb_BOMTest, lblb_Null
lblb_UTF16BE = Blob ("~hFE~hFF", EncodingANSI!)
lblb_UTF16LE = Blob ("~hFF~hFE", EncodingANSI!)
lblb_UTF8 = Blob ("~hEF~hBB~hBF", EncodingANSI!)
lblb_Null = blobmid (blob ("~h01", encodingutf16le!), 2, 1)
SetNull (le_Return)
// Get a set of bytes to test
ll_File = FileOpen (as_FileName, StreamMode!, Read!, Shared!)
FileRead (ll_File, lblb_Test)
FileClose (ll_File)
// test for BOMs: UTF-16BE (FF FE), UTF-16LE (FF FE), UTF-8 (EF BB BF)
lblb_BOMTest = BlobMid (lblb_Test, 1, Len (lblb_UTF16BE))
IF lblb_BOMTest = lblb_UTF16BE THEN RETURN EncodingUTF16BE!
lblb_BOMTest = BlobMid (lblb_Test, 1, Len (lblb_UTF16LE))
IF lblb_BOMTest = lblb_UTF16LE THEN RETURN EncodingUTF16LE!
lblb_BOMTest = BlobMid (lblb_Test, 1, Len (lblb_UTF8))
IF lblb_BOMTest = lblb_UTF8 THEN RETURN EncodingUTF8!
//I've removed a hack from here that I wouldn't encourage others to use, basically checking for
//0x00 in places I'd "expect" them to be if it was a Unicode file, but that makes assumptions
RETURN le_Return