Is it just me, or are characters being rendered incorrectly more lately? - unicode

I'm not sure if it's my system, although I haven't done anything unusual with it, but I've started noticing incorrectly rendered characters popping up in web pages, text-files, like this:
http://www.kbssource.com/strange-characters.gif
I have a hunch it's a related to the fairly recent trend to use unicode for everything, which is a good thing I think, combined with fonts that don't support all possible characters.
So, does anyone know what's causing these blips (am I right?), and how do I stop this showing up in my own content?

It appears that for this particular author, the text was edited in some editor that assumed it wasn't UTF8, and then re-wrote it out in UTF8. I'm basing this off the fact that if I tell my browser to interpret the page as different common encodings, none make it display correctly. This tells me that some conversion was done at some point improperly.
The only problem with UTF8 is that there isn't a standardized way to recognize that a file is UTF8, and until all editors are standardizing on UTF8, there will still be conversion errors. For other unicode variants, a Byte Order Mark (BOM) is fairly standard to help identify a file, but BOMs in UTF8 files are pretty rare.
To keep it from showing up in your content, make sure you're always using unicode-aware editors, and make sure that you always open your files with the proper encodings. It's a pain, unfortunately, and errors will occasionally crop up. The key is just catching them early so that you can undo it or make a few edits.

I'm fairly positive it's nothing you can do. I've seen this on the front page of digg alot recently. It more than likely has to do with a character being encoded improperly. Not necessarily a factor of the font, just a mistake made somewhere in translation.

It looked for a while like the underscore and angle bracket problem had gone away, but it seems it might not be fixed.
here's a small sample, which should look like this:
#include
____
#include <stdio.h>
____
#include
Update: it looks like it's fixed in display mode, and only broken in edit mode

Related

classic look of windows tab control in unicode MFC program?

I am working on an MFC dialog based program with CTabCtrl (VS2017, W10). Everything works as expected, apart from the way tabs look (convoluted story, don't ask).
I need them to look like on the right, but when I created a new project with a CDialogEx based class and added tabs to the dialog (just the standard VS/MFC stuff, nothing fancy yet) they looked like the ones on the left. What I found after some testing and comparing with older projects is that if I switch in project defaults Character Set from Unicode to Multi-Byte Character Set I get the look I want (yes, sounds completely unrelated, but checked and rechecked several times). But that's ridiculously inconvenient, program needs to work with different languages and uses Unicode libraries for managing the data.
No idea if the problem is really MFC related, could be some deeper Windows thing.
Any idea what can be done to get the right look (pun intended), other than implementing my own OwnerDraw() or adding an additional layer of code to translate between data in Unicode and MBCS? Both approaches sound pretty off.

How to make a GitHub README.md render

I have a README.md here but it is not showing up as rendered Markdown, it just shows the raw text. Does anyone know what I'm doing wrong here?
https://github.com/slothdude/soundcloud-groupme-bot/blob/master/README.md
There's no way to reliably detect a file's encoding. At the end of the day, it's a guessing game.
That particular file is stored in some strange encoding. Some editors (e.g. Emacs) seem to mostly open it successfully (though there are a few strange characters that might be whitespace), but don't know what it is. When I ask Emacs what encoding it's using I get no-conversion, which isn't very helpful.
Others, like Gedit, show what looks like a mixture of kanji and rectangular symbols suggesting unknown values.
Tools like file and enca seem to have no idea what it is:
$ file README.md
README.md: data
$ enca README.md
enca: Cannot determine (or understand) your language preferences.
Please use `-L language', or `-L none' if your language is not supported
(only a few multibyte encodings can be recognized then).
Run `enca --list languages' to get a list of supported languages.
Open it in a decent text editor (ideally the one you've used to author it) and save it as UTF-8, then commit that change. I suspect that this will fix its rendering on GitHub.

Any method to restore garbled/distorted text file by Matlab?

I got a very weird situation that highly needs your assistance. I appreciate your effort and time in advance.
I have a machine which produces a text file that records some information of the machine's working status such as, the coordinate of the drill head and the rotating speed used at that position. While we examine the text file, it appears to be unreadable because most of the contents are garbled. Please see the attached figure. http://ppt.cc/sA1I
If I open it with UltraEdit I see: http://ppt.cc/TrnV
As you can see some part of the file is readable; however many unrecognizable characters, which should be those numeric values we want.
Two reasons that I believe this problem should be solved by Matlab. First, I am sure this machine has many built-in matlab code inside for analysis purpose. Second, we have a .exe file, which is compiled by Matlab, can restore the garbled text file into arranged and readable format (the values of the coordinates are restored).
We desperately want to see the contents of this file by ourselves. Please kindly provide solution or idea or any direction for me to solve this issue.
Sincerely,
Old question without answer: For the record, a suggestion.
Sounds like a case of Mojibake, a problem with text encoding. Here's how I solved it.
Background: I had text files created on a Mac, others on a Windows, others still on Linux, each in different text encoding. So I got a text editor that would allow me to view the format and to change it. In my case, I used TextMate on MacOS, opened the files, picked the correct encoding upon opening, which sometimes was a Windows format, a Mac format, sometimes a Latin format -- had to use trial and error to figure it out based on a preview this particular piece of software gave me. Once I had the file opened in the correct encoding, I would save it in the utf-8format, which is not platform-specific and allows me to move my text files across various computers.
There may be more scalable methods, but I only had a hundred or so files to deal with, so I opted for the manual method, in order to personally visualize the rendering on screen, and because my files came in different encoding to begin with.

writing unicode text in XCode

when I try to write an Arabic language words in xcode editor it does not display correctly, it's displayed as messed up words and reversed (the output in iPhone is OK), so it becomes harder for me to review the strings I entered in the editor, is there anyway to overcome this issue ?
I think those are bugs in Xcode (you can try changing the font, but I don't think the direction can be changed).
However, it is generally preferable to write your strings in English and then use internationalization (i18n) techniques to convert and display them in Arabic at runtime. A quick google revealed this blogpost. This solves two issues:
You can support any number of languages.
You can store your Arabic text in a separate file and edit it with an external editor, making it easier to work with.

Why does my system not display unicode correctly?

I wrote this question and it turns out the code is correct but it doesnt display properly on my system. I dont understand! why might it do this? My system is set to united states english. I dont know what the problem can be.
This makes it difficult to develop unicode apps when it doesn't display properly on my system :(
-Edit- To be more clear. I made a winform app using .NET and the text appears incorrect on my machine but works on others. I can copy/paste text into my app but i wont know if it ran correctly since i see nonsense instead of text. However most unicode works. Special chars (like >16bits) does not.
I assume from the question you linked to that you are on a windows machine. The problem could be that windows does not have a global encoding option at all. The united states english is a language setting which as far as I know does not mean what you expect it to mean, as in it does not set all of your programs to show text in a unicode format.
The quick answer is that especially in windows, each program that displays text to the user is responsible for the character encoding. You have to make sure that the program and the environment where the problem appears are set to display text using some unicode format, such as UTF-8.
Read up on Unicode and UTF-8