Emoji and accent encoding in dart/flutter - flutter

I get the next String from my api
"à é í ó ú ü ñ \uD83D\uDE00\uD83D\uDE03\uD83D\uDE04\uD83D\uDE01\uD83D\uDE06\uD83D\uDE05"
from a response in a json format
{
'apiText': "à é í ó ú ü ñ \uD83D\uDE00\uD83D\uDE03\uD83D\uDE04\uD83D\uDE01\uD83D\uDE06\uD83D\uDE05",
'otherInfo': 'etc.',
.
.
.
}
it contains accents à é í ó ú ü ñ that are not correctly encoded and it contains emojis \uD83D\uDE00\uD83D\uDE03\uD83D\uDE04\uD83D\uDE01\uD83D\uDE06\uD83D\uDE05
so far i have tried
var json = jsonDecode(response.body)
String apiText = json['apiText'];
List<int> bytes = apiText.codeUnits;
comentario = utf8.decode(bytes);
but produces a
[ERROR:flutter/lib/ui/ui_dart_state.cc(166)] Unhandled Exception: FormatException: Invalid UTF-8 byte (at offset 21)
how can i get the correct text with accents and emoji?

Based on the fact you called response.body I assumes you are using the http package which does have the body property on Response objects.
You should note the following detail in the documentation:
This is converted from bodyBytes using the charset parameter of the Content-Type header field, if available. If it's unavailable or if the encoding name is unknown, latin1 is used by default, as per RFC 2616.
Well, it seems rather likely that it cannot figure out the charset and therefore defaults to latin1 which explains how your response got messed up.
A solution for this is to use the resonse.bodyBytes instead which contains the raw bytes from the response. You can then manually parse this with e.g. utf8.decode(resonse.bodyBytes) if you are sure the response should be parsed as UTF-8.

rewrite your function as
String utf8convert(String text) {
var bytes = text.codeUnits;
String decodedCode = utf8.decode(bytes, allowMalformed: true);
if (decodedCode.contains("�")) {
return text;
}
return decodedCode;
}

Related

Decode utc8 character from string Flutter

I'm unable to decode the utc8 characters from string.
Below line getting from API
replace the cable if a fault is found â for example
actual string
replace the cable if a fault is found - for example
Can any one help me how to fix this ?
Thanks!!!
You can use the Charset Detector Flutter Charset Detector Package to convert it to the correct encoding.
I already used it, and it works perfectly. You just have to convert the body of the response to the desired encoding.
You pass the bodyBytes field to the package autoDecode method.
Uint8List bodyBytes = response.bodyBytes;
DecodingResult decoded = (await CharsetDetector.autoDecode(bodyBytes));
String charset = decoded.charset;
String bodyInRightEncodage = decoded.string;
The response field is a http response object.
Below line is working fine
jsonDecode(utf8.decode(response.bodyBytes, allowMalformed: true));

An error in the encoding of the Content-Disposition header (contains the Cyrillic alphabet) UWP

When I get the file name from the link "Ð\u0097акаÑ\u0082.jpg" instead "Закат.jpg".
Determined that the file name comes in the encoding "windows-1251". When trying to convert to UTF8, UTF16. The file name continues to be displayed incorrectly. Has anyone come across something like this?
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
Encoding wind1252 = Encoding.GetEncoding("windows-1252");
Encoding utf8 = Encoding.UTF8;
byte[] wind1252Bytes = wind1252.GetBytes(FileName);
byte[] utf8Bytes = Encoding.Convert(wind1252, utf8, wind1252Bytes);
string utf8String = Encoding.UTF8.GetString(utf8Bytes);

How can I convert a utf8 string to LATIN1 in Dart?

I have many strings where the accents are converted wrongfully. I take those Strings from an API, so I cannot get them in other encoding formats. As an example, the string é returns as é from the API. Is there any way I can convert these strings to show the accents correctly?
Well, you can try something like this:
import 'dart:convert';
void main() {
const input = 'é';
final output = utf8.decode(latin1.encode(input), allowMalformed: true);
print(output); // é
}
Alternative you can get the response from your web call as bytes by using bodyBytes on the response object:
https://pub.dev/documentation/http/latest/http/Response/bodyBytes.html
And parse it with: latin1.decode or whatever charset the server are sending the the data as.

Flutter - Character encoding is not behaving as expected

I am parsing a local JSON file that has words containing rarely used special characters in Icelandic.
When displaying the characters I get mixed up symbols but not the characters, for some others I just get a square instead of a symbol.
I am using this type of encoding "\u00c3"
Update: Example of the characters I am using: þ, æ, ý, ð
Q: What is the best way to display those kind of characters and avoid any chance of display failures?
Update #2:
How I am parsing:
Future<Null> getAll() async{
var response = await
DefaultAssetBundle.of(context).loadString('assets/json/dictionary.json');
var decodedData = json.decode(response);
setState(() {
for(Map word in decodedData){
mWordsList.add(Words.fromJson(word));
}
});
}
The class:
class Words{
final int id;
final String wordEn, wordIsl;
Words({this.id, this.wordEn, this.wordIsl});
factory Words.fromJson(Map<String, dynamic> json){
return new Words(
id: json['wordId'],
wordEn: json['englishWord'],
wordIsl: json['icelandicWord']
);
}
}
JSON Model:
{
"wordId": 47,
"englishWord": "Age",
//Here's a String that has two special characters
"icelandicWord": "\u00c3\u00a6vi"
}
I had similar issues with accented characters. They were not displayed as expected.
This worked for me
final codeUnits = source.codeUnits;
return Utf8Decoder().convert(codeUnits);
The problem is that your JSON is stored locally.
Let's say you have
Map<String, String> jsonObject = {"info": "Æ æ æ Ö ö ö"};
So to show your text correctly you have to encode and decode back your JSON with utf-8.
I understand that's serialization and deserialization are costly operations, but's it's a workaround for locally stored JSON objects that contains UTF-8 texts.
import 'dart:convert';
jsonDecode(jsonEncode(jsonObject))["info"]
If you get that JSON from server, then it's much more simpler, for example in dio package you can chose contentType params that's is "application/json; charset=utf-8" by default.

What is the character encoding?

I have several characters that aren't recognized properly.
Characters like:
º
á
ó
(etc..)
This means that the characters encoding is not utf-8 right?
So, can you tell me what character encoding could it be please.
We don't have nearly enough information to really answer this, but the gist of it is: you shouldn't just guess. You need to work out where the data is coming from, and find out what the encoding is. You haven't told us anything about the data source, so we're completely in the dark. You might want to try Encoding.Default if these are files saved with something like Notepad.
If you know what the characters are meant to be and how they're represented in binary, that should suggest an encoding... but again, we'd need to know more information.
read this first http://www.joelonsoftware.com/articles/Unicode.html
There are two encodings: the one that was used to encode string and one that is used to decode string. They must be the same to get expected result. If they are different then some characters will be displayed incorrectly. we can try to guess if you post actual and expected results.
I wrote a couple of methods to narrow down the possibilities a while back for situations just like this.
static void Main(string[] args)
{
Encoding[] matches = FindEncodingTable('Ÿ');
Encoding[] enc2 = FindEncodingTable(159, 'Ÿ');
}
// Locates all Encodings with the specified Character and position
// "CharacterPosition": Decimal position of the character on the unknown encoding table. E.G. 159 on the extended ASCII table
//"character": The character to locate in the encoding table. E.G. 'Ÿ' on the extended ASCII table
static Encoding[] FindEncodingTable(int CharacterPosition, char character)
{
List matches = new List();
byte myByte = (byte)CharacterPosition;
byte[] bytes = { myByte };
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = thisEnc.GetChars(bytes);
if (chars[0] == character)
{
matches.Add(thisEnc);
break;
}
}
return matches.ToArray();
}
// Locates all Encodings that contain the specified character
static Encoding[] FindEncodingTable(char character)
{
List matches = new List();
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = { character };
byte[] temp = thisEnc.GetBytes(chars);
if (temp != null)
matches.Add(thisEnc);
}
return matches.ToArray();
}
Encoding is the form of modifying some existing content; thus allowing it to be parsed by the required destination protocols.
An example of encoding can be seen when browsing the internet:
The URL you visit: www.example.com, may have the search facility to run custom searches via the URL address:
www.example.com?search=...
The following variables on the URL require URL encoding. If you was to write:
www.example.com?search=cat food cheap
The browser wouldn't understand your request as you have used an invalid character of ' ' (a white space)
To correct this encoding error you should exchange the ' ' with '%20' to form this URL:
www.example.com?search=cat%20food%20cheap
Different systems use different forms of encoding, in this example I have used standard Hex encoding for a URL. In other applications and instances you may find the need to use other types of encoding.
Good Luck!