How to construct a unicode glyph key in C# - unicode

I am using FontAwesome to display glyphs in my Xamarin Android application. If I hardcode the glyph like this, where everything works fine:
string iconKey = "\uf0a3";
var drawable = new IconDrawable(this.Context, iconKey, "Font Awesome 5 Pro-Regular-400.otf").Color(Xamarin.Forms.Color.White.ToAndroid()).SizeDp(fontSize);
However, if what I have is the four letter code "f0a3" from FontAwesome's cheatsheet, stored in a string variable, I don't know how to set my iconKey variable to a value that works. Just concatenating a "\u" onto the beginning doesn't work, which makes sense since that's a Unicode escape indicator, not part of a standard string, but I don't know what to do instead. I also tried converting to and from Unicode in various random ways - e.g.
iconKey = unicode.GetChars(unicode.GetBytes("/u" + myFourChar.ToString())).ToString();
but unsurprisingly that didn't work either.
The IconDrawable is from here. The value I send becomes an input there to the Paint.GetTextBounds method and the Canvas.DrawText method.
Thanks for any assistance!

Found the answer here. Here is the code I am using, based on that post but simplified since I have only one hexadecimal character to handle:
string myString = "f0a3";
var chars = new char[] { (char)Convert.ToInt32(myString, 16) };
string iconKey = new string(chars);
var drawable = new IconDrawable(this.Context, iconKey, "Font Awesome 5 Pro-Regular-400.otf").Color(Xamarin.Forms.Color.White.ToAndroid()).SizeDp(fontSize);

Related

Java iText - special characters not being displayed

I'm trying for quite some time now to generate a PDF in Java using itextpdf (com.itextpdf kernel,layout,form,pdfa) with text containing special characters (äöüß). I tried several things in different variations, like loading a TTF file and setting the encoding:
FontProgram fontProgram = FontProgramFactory.createFont( "font/FreeSans.ttf") ;
PdfFont font = PdfFontFactory.createFont( fontProgram, "UTF-8" ) ;
document.setFont( font );
This way it just doesn't display special characters at all.
This doesn't work either:
var font = PdfFontFactory.createFont(StandardFonts.HELVETICA, PdfEncodings.UTF8);
document.setFont( font );
I haven't found any solution to this and the official tutorials don't seem to have a solution.
Other encodings just render placeholder characters.
this is how I add the text:
PdfWriter writer = new PdfWriter(filename);
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf);
Paragraph p = new Paragraph("äüöß");
document.add(p);
document.close();
edit: I just realized that it works when I load the text from elsewhere like an input field, instead of passing a normal string. How can I make this work with hardcoded strings?
I tried re-encoding the string as described here: https://www.baeldung.com/java-string-encode-utf-8
but none of these methods work either. It always shows wrong characters.
PdfFont freeUnicode = PdfFontFactory.createFont("font/FreeSans.ttf", PdfEncodings.IDENTITY_H);
String rawString = "äöüß1234'";
byte[] bytes = StringUtils.getBytesUtf8(rawString);
String utf8EncodedString = StringUtils.newStringUtf8(bytes);
document.add(new Paragraph().setFont(freeUnicode)
.add(utf8EncodedString));
edit: The encoding in the source code editor is UTF-8 and I passed UTF-8 to the createFont() method, but that didn't work. When I pass CP1252 and change the source code encoding to ISO-8859-1, it shows the correct characters. Really strange how I couldn't find much information about this problem.

String transformation for subject course code for Dart/Flutter

For interaction with an API, I need to pass the course code in <string><space><number> format. For example, MCTE 2333, CCUB 3621, BTE 1021.
Yes, the text part can be 3 or 4 letters.
Most users enter the code without the space, eg: MCTE2333. But that causes error to the API. So how can I add a space between string and numbers so that it follows the correct format.
You can achieve the desired behaviour by using regular expressions:
void main() {
String a = "MCTE2333";
String aStr = a.replaceAll(RegExp(r'[^0-9]'), ''); //extract the number
String bStr = a.replaceAll(RegExp(r'[^A-Za-z]'), ''); //extract the character
print("$bStr $aStr"); //MCTE 2333
}
Note: This will produce the same result, regardless of how many whitespaces your user enters between the characters and numbers.
Try this.You have to give two texfields. One is for name i.e; MCTE and one is for numbers i.e; 1021. (for this textfield you have to change keyboard type only number).
After that you can join those string with space between them and send to your DB.
It's just like hack but it will work.
Scrolling down the course codes list, I noticed some unusual formatting.
Example: TQB 1001E, TQB 1001E etc. (With extra letter at the end)
So, this special format doesn't work with #Jahidul Islam's answer. However, inspired by his answer, I manage to come up with this logic:
var code = "TQB2001M";
var i = course.indexOf(RegExp(r'[^A-Za-z]')); // get the index
var j = course.substring(0, i); // extract the first half
var k = course.substring(i).trim(); // extract the others
var formatted = '$j $k'.toUpperCase(); // combine & capitalize
print(formatted); // TQB 1011M
Works with other formats too. Check out the DartPad here.
Here is the entire logic you need (also works for multiple whitespaces!):
void main() {
String courseCode= "MMM 111";
String parsedCourseCode = "";
if (courseCode.contains(" ")) {
final ensureSingleWhitespace = RegExp(r"(?! )\s+| \s+");
parsedCourseCode = courseCode.split(ensureSingleWhitespace).join(" ");
} else {
final r1 = RegExp(r'[0-9]', caseSensitive: false);
final r2 = RegExp(r'[a-z]', caseSensitive: false);
final letters = courseCode.split(r1);
final numbers = courseCode.split(r2);
parsedCourseCode = "${letters[0].trim()} ${numbers.last}";
}
print(parsedCourseCode);
}
Play around with the input value (courseCode) to test it - also use dart pad if you want. You just have to add this logic to your input value, before submitting / handling the input form of your user :)

Flutter Unicode Apostrophe In String

I'm hoping this is an easy question, and that I'm just not seeing the forest due to all the trees.
I have a string in flutter than came from a REST API that looks like this:
"What\u0027s this?"
The \u is causing a problem.
I can't do a string.replaceAll("\", "\") on it as the single slash means it's looking for a character after it, which is not what I need.
I tried doing a string.replaceAll(String.fromCharCode(0x92), "") to remove it - That didn't work.
I then tried using a regex to remove it like string.replaceAll("/(?:\)/", "") and the same single slash remains.
So, the question is how to remove that single slash, so I can add in a double slash, or replace it with a double slash?
Cheers
Jase
I found the issue. I was looking for hex 92 (0x92) and it should have been decimal 92.
I ended up solving the issue like this...
String removeUnicodeApostrophes(String strInput) {
// First remove the single slash.
String strModified = strInput.replaceAll(String.fromCharCode(92), "");
// Now, we can replace the rest of the unicode with a proper apostrophe.
return strModified.replaceAll("u0027", "\'");
}
When the string is read, I assume what's happening is that it's being interpreted as literal rather than as what it should be (code points) i.e. each character of \0027 is a separate character. You may actually be able to fix this depending on how you access the API - see the dart convert library. If you use utf8.decode on the raw data you may be able to avoid this entire problem.
However, if that's not an option there's an easy enough solution for you.
What's happening when you're writing out your regex or replace is that you're not escaping the backslash, so it's essentially becoming nothing. If you use a double slash, that solve the problem as it escapes the escape character. "\\" => "\".
The other option is to use a raw string like r"\" which ignores the escape character.
Paste this into https://dartpad.dartlang.org:
String withapostraphe = "What\u0027s this?";
String withapostraphe1 = withapostraphe.replaceAll('\u0027', '');
String withapostraphe2 = withapostraphe.replaceAll(String.fromCharCode(0x27), '');
print("Original encoded properly: $withapostraphe");
print("Replaced with nothing: $withapostraphe1");
print("Using char code for ': $withapostraphe2");
String unicodeNotDecoded = "What\\u0027s this?";
String unicodeWithApostraphe = unicodeNotDecoded.replaceAll('\\u0027', '\'');
String unicodeNoApostraphe = unicodeNotDecoded.replaceAll('\\u0027', '');
String unicodeRaw = unicodeNotDecoded.replaceAll(r"\u0027", "'");
print("Data as read with escaped unicode: $unicodeNotDecoded");
print("Data replaced with apostraphe: $unicodeWithApostraphe");
print("Data replaced with nothing: $unicodeNoApostraphe");
print("Data replaced using raw string: $unicodeRaw");
To see the result:
Original encoded properly: What's this?
Replaced with nothing: Whats this?
Using char code for ': Whats this?
Data as read with escaped unicode: What\u0027s this?
Data replaced with apostraphe: What's this?
Data replaced with nothing: Whats this?
Data replaced using raw string: What's this?

How to encode Chinese text in QR barcodes generated with iTextSharp?

I'm trying to draw QR barcodes in a PDF file using iTextSharp. If I'm using English text the barcodes are fine, they are decoded properly, but if I'm using Chinese text, the barcode is decoded as question marks. For example this character '测' (\u6D4B) is decoded as '?'. I tried all supported character sets, but none of them helped.
What combination of parameters should I use for the QR barcode in iTextSharp in order to encode correctly Chinese text?
iText and iTextSharp apparently don't natively support this but you can write some code to handle this on your own. The trick is to get the QR code parser to work with just an arbitrary byte array instead of a string. What's really nice is that the iTextSharp code is almost ready for this but doesn't expose the functionality. Unfortunately many of the required classes are sealed so you can't just subclass them, you'll have to recreate them. You can either download the entire source and add these changes or just create separate classes with the same names. (Please check over the license to make sure you are allowed to do this.) My changes below don't have any error correction so make sure you do that, too.
The first class that you'll need to recreate is iTextSharp.text.pdf.qrcode.BlockPair and the only change you'll need to make is to make the constructor public instead of internal. (You only need to do this if you are creating your own code and not modifying the existing code.)
The second class is iTextSharp.text.pdf.qrcode.Encoder. This is where we'll make the most changes. Add an overload to Append8BitBytes that looks like this:
static void Append8BitBytes(byte[] bytes, BitVector bits) {
for (int i = 0; i < bytes.Length; ++i) {
bits.AppendBits(bytes[i], 8);
}
}
The string version of this method converts text to a byte array and then uses the above so we're just cutting out the middle man. Next, add a new overload to the constructor that takes in a byte array instead of a string. We'll then just cut out the string detection part and force the system to byte-mode, otherwise the code below is pretty much the same.
public static void Encode(byte[] bytes, ErrorCorrectionLevel ecLevel, IDictionary<EncodeHintType, Object> hints, QRCode qrCode) {
String encoding = DEFAULT_BYTE_MODE_ENCODING;
// Step 1: Choose the mode (encoding).
Mode mode = Mode.BYTE;
// Step 2: Append "bytes" into "dataBits" in appropriate encoding.
BitVector dataBits = new BitVector();
Append8BitBytes(bytes, dataBits);
// Step 3: Initialize QR code that can contain "dataBits".
int numInputBytes = dataBits.SizeInBytes();
InitQRCode(numInputBytes, ecLevel, mode, qrCode);
// Step 4: Build another bit vector that contains header and data.
BitVector headerAndDataBits = new BitVector();
// Step 4.5: Append ECI message if applicable
if (mode == Mode.BYTE && !DEFAULT_BYTE_MODE_ENCODING.Equals(encoding)) {
CharacterSetECI eci = CharacterSetECI.GetCharacterSetECIByName(encoding);
if (eci != null) {
AppendECI(eci, headerAndDataBits);
}
}
AppendModeInfo(mode, headerAndDataBits);
int numLetters = dataBits.SizeInBytes();
AppendLengthInfo(numLetters, qrCode.GetVersion(), mode, headerAndDataBits);
headerAndDataBits.AppendBitVector(dataBits);
// Step 5: Terminate the bits properly.
TerminateBits(qrCode.GetNumDataBytes(), headerAndDataBits);
// Step 6: Interleave data bits with error correction code.
BitVector finalBits = new BitVector();
InterleaveWithECBytes(headerAndDataBits, qrCode.GetNumTotalBytes(), qrCode.GetNumDataBytes(),
qrCode.GetNumRSBlocks(), finalBits);
// Step 7: Choose the mask pattern and set to "qrCode".
ByteMatrix matrix = new ByteMatrix(qrCode.GetMatrixWidth(), qrCode.GetMatrixWidth());
qrCode.SetMaskPattern(ChooseMaskPattern(finalBits, qrCode.GetECLevel(), qrCode.GetVersion(),
matrix));
// Step 8. Build the matrix and set it to "qrCode".
MatrixUtil.BuildMatrix(finalBits, qrCode.GetECLevel(), qrCode.GetVersion(),
qrCode.GetMaskPattern(), matrix);
qrCode.SetMatrix(matrix);
// Step 9. Make sure we have a valid QR Code.
if (!qrCode.IsValid()) {
throw new WriterException("Invalid QR code: " + qrCode.ToString());
}
}
The third class is iTextSharp.text.pdf.qrcode.QRCodeWriter and once again we just need to add an overloaded Encode method supports a byte array and that calls are new constructor created above:
public ByteMatrix Encode(byte[] bytes, int width, int height, IDictionary<EncodeHintType, Object> hints) {
ErrorCorrectionLevel errorCorrectionLevel = ErrorCorrectionLevel.L;
if (hints != null && hints.ContainsKey(EncodeHintType.ERROR_CORRECTION))
errorCorrectionLevel = (ErrorCorrectionLevel)hints[EncodeHintType.ERROR_CORRECTION];
QRCode code = new QRCode();
Encoder.Encode(bytes, errorCorrectionLevel, hints, code);
return RenderResult(code, width, height);
}
The last class is iTextSharp.text.pdf.BarcodeQRCode which we once again add our new constructor overload:
public BarcodeQRCode(byte[] bytes, int width, int height, IDictionary<EncodeHintType, Object> hints) {
newCode.QRCodeWriter qc = new newCode.QRCodeWriter();
bm = qc.Encode(bytes, width, height, hints);
}
The last trick is to make sure when calling this that you include the byte order mark (BOM) so that decoders know to decode this properly, in this case UTF-8.
//Create an encoder that supports outputting a BOM
System.Text.Encoding enc = new System.Text.UTF8Encoding(true, true);
//Get the BOM
byte[] bom = enc.GetPreamble();
//Get the raw bytes for the string
byte[] bytes = enc.GetBytes("测");
//Combine the byte arrays
byte[] final = new byte[bom.Length + bytes.Length];
System.Buffer.BlockCopy(bom, 0, final, 0, bom.Length);
System.Buffer.BlockCopy(bytes, 0, final, bom.Length, bytes.Length);
//Create are barcode using our new constructor
var q = new BarcodeQRCode(final, 100, 100, null);
//Add it to the document
doc.Add(q.GetImage());
Looks like you may be out of luck. I tried too and got the same results as you did. Then looked at the Java API:
"*CHARACTER_SET the values are strings and can be Cp437, Shift_JIS and
ISO-8859-1 to ISO-8859-16. The default value is ISO-8859-1.*"
Lastly, looked at the iTextSharp BarcodeQRCode class source code to confirm that only those characters sets are supported. I'm by no means an authority on Unicode or encoding, but according to ISO/IEC 8859, the character sets above won't work for Chinese.
Essentially the same trick that Chris has done in his answer could be implemented by specifying UTF-8 charset in barcode hints.
var hints = new Dictionary<EncodeHintType, Object>() {{EncodeHintType.CHARACTER_SET, "UTF-8"}};
var q = new BarcodeQRCode("\u6D4B", 100, 100, hints);
If you want to be more safe, you can start your string with BOM character '\uFEFF', like Chris suggested, so it would be "\uFEFF\u6D4B".
UTF-8 is unfortunately not supported by QR codes specification, and there are a lot of discussions on this subject, but the fact is that most QR code readers will correctly read the code created by this method.

What is the character encoding?

I have several characters that aren't recognized properly.
Characters like:
º
á
ó
(etc..)
This means that the characters encoding is not utf-8 right?
So, can you tell me what character encoding could it be please.
We don't have nearly enough information to really answer this, but the gist of it is: you shouldn't just guess. You need to work out where the data is coming from, and find out what the encoding is. You haven't told us anything about the data source, so we're completely in the dark. You might want to try Encoding.Default if these are files saved with something like Notepad.
If you know what the characters are meant to be and how they're represented in binary, that should suggest an encoding... but again, we'd need to know more information.
read this first http://www.joelonsoftware.com/articles/Unicode.html
There are two encodings: the one that was used to encode string and one that is used to decode string. They must be the same to get expected result. If they are different then some characters will be displayed incorrectly. we can try to guess if you post actual and expected results.
I wrote a couple of methods to narrow down the possibilities a while back for situations just like this.
static void Main(string[] args)
{
Encoding[] matches = FindEncodingTable('Ÿ');
Encoding[] enc2 = FindEncodingTable(159, 'Ÿ');
}
// Locates all Encodings with the specified Character and position
// "CharacterPosition": Decimal position of the character on the unknown encoding table. E.G. 159 on the extended ASCII table
//"character": The character to locate in the encoding table. E.G. 'Ÿ' on the extended ASCII table
static Encoding[] FindEncodingTable(int CharacterPosition, char character)
{
List matches = new List();
byte myByte = (byte)CharacterPosition;
byte[] bytes = { myByte };
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = thisEnc.GetChars(bytes);
if (chars[0] == character)
{
matches.Add(thisEnc);
break;
}
}
return matches.ToArray();
}
// Locates all Encodings that contain the specified character
static Encoding[] FindEncodingTable(char character)
{
List matches = new List();
foreach (EncodingInfo encInfo in Encoding.GetEncodings())
{
Encoding thisEnc = Encoding.GetEncoding(encInfo.CodePage);
char[] chars = { character };
byte[] temp = thisEnc.GetBytes(chars);
if (temp != null)
matches.Add(thisEnc);
}
return matches.ToArray();
}
Encoding is the form of modifying some existing content; thus allowing it to be parsed by the required destination protocols.
An example of encoding can be seen when browsing the internet:
The URL you visit: www.example.com, may have the search facility to run custom searches via the URL address:
www.example.com?search=...
The following variables on the URL require URL encoding. If you was to write:
www.example.com?search=cat food cheap
The browser wouldn't understand your request as you have used an invalid character of ' ' (a white space)
To correct this encoding error you should exchange the ' ' with '%20' to form this URL:
www.example.com?search=cat%20food%20cheap
Different systems use different forms of encoding, in this example I have used standard Hex encoding for a URL. In other applications and instances you may find the need to use other types of encoding.
Good Luck!