Zxing : How to embed Carriage return in a Code-128 barcode - zxing

I found in the thread below how to embed extended characters in a Code128 barcode by using the Zxing lib.
Encode extended ASCII characters in a Code 128 barcode
But how can i embed the Carriage Return character (Hex 0x0d) in a Code 128 ?

Related

how to calculate URL encoding for characters outside the ASCII character set?

I know that for ASCII characters the URL encoding is just a percentage sign and a hex number that corresponds to the character. But for characters outside that range, hex encoding consists of two or more %hex-number sequences.
For example, for the character that corresponds to hex value 56CE, URL encoding, according to standard .net/java APIs is not %56CE but "%e5%9b%8e"
So if we know the hex value for a character outside the ASCII character range, how is the URL encoding calculated? In other words, how does e5, 9b, 8e come out of 56CE? I tried converting to binary and did see a pattern for the last 2 numbers (%9b, %8e) but have no idea where the %e5 comes from.
You have to encode the Unicode codepoints into charset bytes first, and then you can url-encode those bytes. In your example, E5 9B 8E are the UTF-8 encoded bytes of Unicode codepoint U+56CE, and then %E5%9B%8E is the url encoded form of the UTF-8 bytes.

Display Unicode character in MFC Static Text

I am trying to get an MFC Static Text control to display an ASCII Unicode character, specifically Omega (&#937). When i use just that the & doesn't display and the rest of the text does. But if i set the 'No Prefix' Property of the Control to True, then it removes the & and everything after it.
Is this possible to do through a project setting or am i just inputting the string wrong?
Here is what I am using for a string: VDC Resistance (k&#937) → where I want &#937 to be the omega symbol.
First of all &#937 isn't an ASCII character, it is a Unicode character: GREEK CAPITAL LETTER OMEGA.
&#937 is the Html escape sequence for an omega, so a static text control doesn't do translate Html escape sequences. If you are entering the text in C/C++ source then use the C escape sequence L"\u03A9". (3A9 in hex equals 937 decimal). This assumes that you are building a Unicode application in in ANSI it won't work. I'm not sure how you would do it in that case.

How to convert from unicode to ASCII

Is there any way to convert unicode values to ASCII?
To simply strip the accents from unicode characters you can use something like:
string.Concat(input.Normalize(NormalizationForm.FormD).Where(
c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
You CAN'T convert from Unicode to ASCII. Almost every character in Unicode cannot be expressed in ASCII, and those that can be expressed have exactly the same codepoints in ASCII as in UTF-8, which is probably what you have. Almost the only thing you can do that is even close to the right thing is to discard all characters above codepoint 128, and even that is very likely nowhere near what your requirements say. (The other possibility is to simplify accented or umlauted letters to make more than 128 characters 'nearly' expressible, but that still doesn't even begin to actually cover Unicode.)
Technically, yes you can by using Encoding.ASCII.
Example (from byte[] to ASCII):
// Convert Unicode to Bytes
byte[] uni = Encoding.Unicode.GetBytes("Whatever unicode string you have");
// Convert to ASCII
string Ascii = Encoding.ASCII.GetString(uni);
Just remember Unicode a much larger standard than Ascii and there will be characters that simply cannot be correctly encoded. Have a look here for tables and a little more information on the two encodings.
This workaround might better suit your needs. It strips the unicode chars from a string and only keeps the ASCII chars.
byte[] bytes = Encoding.ASCII.GetBytes("eéêëèiïaâäàåcç  test");
char[] chars = Encoding.ASCII.GetChars(bytes);
string line = new String(chars);
line = line.Replace("?", "");
//Results in "eiac test"
Please note that the 2nd "space" in the character input string is the char with ASCII value 255
It depends what you mean by "convert".
You can transliterate using the AnyAscii package.
// C#
using AnyAscii;
string s = "άνθρωποι".Transliterate();
// anthropoi
Well, seeing as how there's some 100,000+ unicode characters and only 128 ASCII characters, a 1-1 mapping is obviously impossible.
You can use the Encoding.ASCII object to get the ASCII byte values from a Unicode string, though.
If your metadata fields only accept ASCII input. Unicode characters can be converted to their TEX equivalent through MathJax. What is MathJax?
MathJax is a JavaScript display engine for rendering TEX or MathML-coded mathematics in browsers without requiring font installation or browser plug-ins. Any modern browser with JavaScript enabled will be MathJax-ready. For general information about MathJax, visit mathjax.org.

What is this encodification that FF and Chrome does?

I was looking a source code of a particular page of my project and noticed that FF transforms special characters such as "á" to á.
Which encodification is that?
Thanks!!
I suspect it is the other way round; Firefox and Chrome take á in the HTML source code and render it as the character á ("Latin small a with acute").
The reason for allowing these in HTML is that the HTML might be supplied in an encoding which doesn't support the character. Any Unicode character is allowed, but it may not get rendered correctly if your browser doesn't have that character in any of its fonts.
As it says in the W3C HTML spec, there are two ways of encoding arbitrary Unicode characters:
&#D;: where D is the decimal value of the Unicode character (e.g. á)
&#xH;: where H is the (case-insensitive) hexadecimal value of the Unicode character, e.g. 1 in your case
It's Numeric character references as defined in the HTML 4.01 Specification.
HTML ASCII Character Encoding. Here's a table of many of them:
HTML Codes

What is the difference between EM Dash #151; and #8212;?

I've an ASCII file that contains an EM Dash (— or — in HTML). The hex value is 0x97. When we pass this file through one application it arrives as UTF-8, and it converts the character to 0xC297, which is — in HTML. However, when we pass this file through a different application it converts the character to 0xE28094 or —.
What would cause these applications to convert these characters differently? Is it perhaps a code page setting?
— is wrong. When you use numeric character references, the number refers to the Unicode codepoint. For numbers below 256 that is the same as the codepoint in ISO-8859-1. In 8859-1, character 151 is amongst the “C1 control codes”, and not a dash or any other visible character.
The confusion arises because character 151 is a dash in Windows code page 1252 (Western European). Many people think cp1252 is the same thing as ISO-8859-1, but in reality it's not: the characters in the C1 range (128 to 159) are different.
The first application is reading your “ASCII” file* as ISO-8859-1, but actually it's probably cp1252 and you'll need a way to clue the app in about what encoding it has to expect.
(*: “ASCII” is a misnomer if there are top-bit-set characters in the file. You probably mean “ANSI”, which is really also a misnomer, but one which has stuck in the Windows world to mean “text encoded in the current system-default code page”.)
— is not em dash, your text was mis-translated from em dash to that value.
— is the HTML decimal entity for em dash. Specifically it is referencing the Unicode code point 8212 which represents an em dash.
Your file is not ASCII if it contains an em dash. ASCII chars only encode to decimal range 0 - 127, and em dash is not a character that can be represented by ASCII encoding. If you have em dash stored as 0x97 (151 in decimal) you probably have an ANSI text file (aka Windows Codepage 1252 (w-1252)).
Your first app...
The data started as an em dash encoded in w-1252. In w-1252 the em dash maps to the decimal value 151 (0x97 in hex, or 10010111 in binary).
At some point the em dash was handled by code that thought the bytes in your file were iso-8859-1 encoded text. When that code interpreted 0x97 as a string/char it mapped 0x97 to a character according to the iso-8859-1 encoding. In iso-8859-1 0x97 maps to the char "End of guarded area".
Next, the string, which the code thinks is the "End of guarded area" control char, was encoded as utf-8. "End of guarded area" encoded in utf-8 is the two-byte sequence: 0xC2 0x97.
Your second app...
The text file was correctly interpreted as w-1252, thus the 0x97 is recognized as em dash, which was correctly encoded as the em dash in utf-8: 0xE2 0x80 0x94.
What influences this behavior
Not sure if you're dealing with web apps or what, but the concept should be the same whatever it is. We had the same 0x97->0xC297 scenario in a web app where people input data into a form. I found that the charset of the web page was declared as iso8859-1, and the browser's best way to handle the w1252 chars was to just send them along as as the iso bytes without alerting the user or the server. The server receives the data thinks it's iso and converts to utf-8, resulting in 0xC297.
Basically any time an app touches text it needs to be told how the text is encoded, or else it might fall back to a system default. If that happens you risk data corruption.
According to the HTML4 specification's character entity reference, the emdash is — (U+2014).
An ASCII file can not contain the character 0x97, as the ASCII character set only ranges from 0x00 to 0x7F. Therefore your file is not ASCII, but some other single byte encoding. The windows-1250 encoding for example has the em-dash at 0x97.
If the applications decode the text file using some other encoding than the one that was used to create the file, any character above 0x7F will be wrong.
In unicode the em-dash has the character code 0x2014, or 8212 in decimal.
Unicode Character 'EM DASH' (U+2014)
In a web page that for example uses windows-1250 as encoding, the code — will render as an em-dash:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>em-dash</title>
<meta http-equiv="content-type" content="text/html; charset=windows-1250"/>
</head>
<body>
<div>—</div>
</body>
</html>