Found some square boxes in a xliff file and not sure what they are? - box

I'm looking at a xliff file and found some weird boxes which I don't know what they are? (Please see screenshot)
Do you guys have any ideas what the weird bug boxes are?
Thank you very much and I'm looking forward to your reply!

I have never seen that character, but here is how I would go about finding out what it is:
The first thing to do is to check the source and target language of the XLIFF file, which should be defined in the XLIFF header. Perhaps this character is a valid character in either the source or the target language script.
The next step depends on whether you can contact the person who created the XLIFF file. If yes, you can show them what the file looks like for you and ask them if the file has perhaps been garbled during transmission.
If not, you could check the encoding of the XLIFF file. If it is UTF-16, just open the file in a hex editor, find the code point for this character, and look it up on unicode.org. If the file is encoded as UTF-8 open it in Notepad++ (or any other text editor that allows you to change the encoding), convert it to UTF-16, then proceed as described above.
If you don't know the encoding of the file it becomes a matter of guessing. You can look at some other <trans-units> (assuming that there are more than this one in your XLIFF file): if they contain other extended characters and they are displayed correctly your editor has probably guessed the right encoding, and you can convert to Unicode and look up the character code. Different text editors have different ways of guessing encodings: try a few.

It's possible that those characters are the result of an encoding conversion error, which are commonly called mojibake.
It's also possible this is some sort of emoji or unusual glyph that's not rendering correctly in your editor. This would be unusual, but given that it appears to be a UI string, it might be possible.

Related

Notepad++ can recognize encoding?

I created file with UTF-8 encoded content (using PHP fputcsv).
When I open this file in Notepad++ - characters are wrong (Notepad++ starts with ANSI encoding).
When I set Format->"Encode in UTF-8" from menu - everything is fine.
Im worrying, that Notepad++ can recognize encoding somehow, and maybe something is wrong with my file created with fputcsv? First byte or something?
Automatically detecting an encoding is not something that can be done accurately. It's pretty much essential that the encoding be specified explicitly. It can be guessed in some cases, but even then not with 100% certainty.
This documentation (Encoding) explains the situation in relation to Notepad++.
They also point out that the difficulty arises especially if the file has not been saved with a Byte Order Mark (BOM).
Given that your file displays correctly once you manually set the encoding, I would say there's nothing wrong with how you are generating and saving the file. The only thing you can check for is whether a BOM is being saved, which might improve the chances of Notepad++ being able to automatically detect the encoding.
It's worth noting that although it may help editors like Notepad++ identify the encoding more accurately, according to The Unicode Standard document, the BOM is not recommended.
You have to check the lower right corner of the Notepad++ GUI to see the actual enconding that is being used. The problem it's not that Notepad++ specific because guessing the right encoding is a big problem without any real solution so it's better to let the user decide what is the most appropriate encoding in each single case.
When you want to reflect the encoding of the text file in a Java program, you have to consider two thnigs: encoding and character set. When you open a text file, you see encoding under "Encoding" menu. Additionally look at the character set menu point. Under "Eastern European" you will find "ISO 8859-2", and under Central European "Windows-1250". You can set corresponding encoding in the Java program
when you look up in the table:
https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
For example, for Cenntral European character set "Windows-1250" the table suggest Java encoding "Cp1250". Set the encoding and you will see the characters in program properly.

How to use unicode characters in Eclipse File Search?

We have some XML file that contains some invalid character, and the program says neither which file it is, nor which line number or character offset. It would be a few seconds work to fix the problem if I could just search for exactly that character, but I cannot find how to express a Unicode character in the file search (or at least I assume so, since the search returns nothing).
Neither 0x1e nor \u001e seem to match anything.
[EDIT] I mean, I can still change the code, and eventually find which file it is by catching the Exception, and using some kind of script/tool to find where exactly the character is, but I do believe it should be possible to search with Unicode in Eclipse, and that is what I am asking in this question.
It may be a problem with the character encoding.
As you're going to need to perform a global / site-wide search to find the , you'll probably need to set the global text file encoding:
Preferences -> Workspace -> Text file encoding
This option may be under the 'General' section in Eclipse, depending on your setup and installed plugins etc.
Ensure that the encoding is set to UTF-8.
You will also need to escape the unicode character sequences, like so:
\u2665
(which I see you have tried)

Strange character rendered correctly in notepad, but as a control character elsewhere

I have a .csv list of businesses. The file has some strange characters in. For example, in this field: Stocktonon-Tees, the first hyphen, between Stockton and on seems to be a character with the value 6 rather than a hyphen, with the value 45. Stack overflow will probably sanatize this so you can't see it, so here is a pastebin:
http://pastebin.com/NuyyaQy9
Can anyone explain why this could be? Is it some encoding issue that I have missed? Or a corruption in the dataset?
Yes, it's almost certainly an encoding issue. A file just consists of binary data - it's how you interpret that binary data that matters. It sounds like Notepad is guessing at the originally-intended encoding, but whatever else you're using isn't.
Unfortunately you haven't said anything about what software is trying to read the file or what wrote it in the first place - but you should look at what encoding Notepad thinks it is, and work from there.
If it's your code that wrote the file out, and you get to decide the encoding, I'd recommend UTF-8 as a good general purpose, platform-portable encoding.

Handling special charaters æ,ø,å in objective c - iphone

I have a UILabel which I change through the code. However when I create a NSString with the charaters æ,ø,å(Danish) I get an input conversion warning. The code look as this:
NSString *label=[[NSString alloc]initWithFormat:#"Prøv igen"];
And the warning I get is this - warning: input conversion stopped due to an input byte that does not belong to the input codeset UTF-8. I can understand that ø is probably not an UTF encoding but what to do? Anyone who can give me a hint about what to do to solves this?
Regards
Bjarke
Your source code is not saved as UTF-8, but most likely as something like ISO-8859-1.
Just open the file and re-save it as UTF-8 - and while you're at it, you should probably also make that the default. Exactly how to do that depends on what editor you're using.
Make sure your file text encoding is set to UTF-8, not Western (ISO) or something else. You can use the Xcode file info inspector to do this.
http://developer.apple.com/library/mac/#documentation/DeveloperTools/Conceptual/XcodeWorkspace/050-File_Management/file_management.html%23//apple_ref/doc/uid/TP40002677-BABICEHI
Make sure it says Unicode (UTF-8) for the File Encoding. If it asks you, tell it to reinterpret your file with the new encoding. Also, you may want to delete the problematic text and reinput it to get it to work.
I had the same problem, but my source code files were already UTF-8 encoded so I fix it in a different way.
In your case, it would have been something like
NSString *label=[NSString stringWithUTF8String"Prøv igen"];
I hope this will be helpful for others who stumble on this question

Is there a way to get the encoding of a text file in UltraEdit?

Is there a setting in UltraEdit that allows me to see the encoding of the file?
In UltraEdit, the encoding that is being used to -display- the file, is shown in the status bar at the right somewhere, together with the line-ending type in use, for example, "U8-UNIX". You can also manually set as what encoding the file has to be displayed. In version 10 this is under menu View -> Set Code Page. You can also -convert- the actual codepage of the file under menu File -> Conversions.
If the file does not have a BOM header, a couple of bytes at the start of the file indicating the encoding, the -actual- encoding of the file, can only be guessed. And even if the file has a BOM header, there can still be encoding issues.
All text editors do this, and some are better at it than others. I haven't done a comparision to see which is best at it. At the moment (2012), I know UltraEdit fails to detect UTF-8 and other variants in 1000 line (or longer) text files if the first UTF-8 character only appears later in the document. It also fails to show the encoding properly when you set it manually.
Notepad++ is also not great at detecting it, but when you know the encoding, you can set it manually.
Sublime Text is, as far as I know, best at detecting the encoding, also in large files.
I think there are also some very good command line tools out there, ported from GNU to Windows, to detect encoding. My bet would be that that's going to be the best option.