topoJSON is not correctly converting special chars from geojson like ë or ó - special-characters

I really need these special (ë or ó) characters in my topoJSON file. I use QGIS and export the data as geojson. Here the special chars are all fine, still correct in the output. But when I now use topoJSON to bring all my geoJSON files together, all the special chars are gone.
Is there a way to enable special chars in topoJSON e.g. with a flag?
Maybe also another workaround, so that I can have a topoJSON with special chars at the end. Thanks.
Some Pictures
In QGIS:
then exported to a geojson file:
and then generated to topojson with topojson -p -o "world.json" "world1\line_text.geojson"*
As you can see in the last step the special char is lost. I think in the topojson process the file in not anymore a UTF-8 file. Can that be?
Btw:
I had a look on the type of the geojson and the topojson file, they are both UTF-8 w/o BOM. So it does not seem a file issue. It is really a special char converting issue somehow. Can someone confirm this?
Test-project
I have uploaded a small test project where the issue still exists: http://www.filedropper.com/test_22

The solution is really simple. Just rightclick the layer. Save as. Now choose geojson format. Then look out for the type of the output. Here you see it is currently ANSI and now you choose UTF-8.
Now you can save and it works.

Related

What are the characters shown on a file after forcefully changing the extension?

Recently I changed the extension of an .apk file to .txt and despite this, I was able to open it on Notepad with some random characters, that weren't available on the keyboard in the file. org/antlr/runtime/ANTLRFileStream.class…TmOÓP=w[×QËÀ)ê|A…ÑETÔ¢NP¢™ãË—º•Q3ZÓcüþ¿j",£ß4ñGÏmÇñ˽Ïs{žçœçeûùëóW ±¨á0F5d0ÖA˔‹LÈã’ŠËR˜PqEƒ†Iy\•ØkÒºÞÁЂ´¦TL«˜H­95{ÙÚ°2K/­×–Y³Üªù(ð·:%œv\'¸!Гû÷óðª#¢èUܵä¸öòæÆÛ_±^ÔÂt^Ùª­Z¾#ýæc"XwêKž_5-7¨ù¦¿éΆmÞZ^Y*ÍS “ÛÖ¹µ¹7eûUàxn]%µ‘Ð^TÊvË^…kžUˆ;u_àTw<sÁ}µDL%ÛªØ>ùÄš#º…Rø˜¨;o)\,0ǚԞ݇ؓ‡àΪ<ò6ýr³¥GsÃ횪EOÌ_…É =è•Ç¬Ž#8ª£½ú^fùõ˜Ž›¸%pü IT{`Á2þ¶<Š:î`NÇ<î긇A˜èÿïˆ8Ç0Q¥»¨#- Ze7srRÉšíVƒõÐ]0rí&tÀ”O´‡[Y±K ö¬H›¯Ü %÷¬8Ì) r+åšW·ÑÏF†¿,bd—i%h³­ˆá8½YÄiª‘
Not just this, but while converting many other extensions like .jar,.xapk, etc. would show me these characters.Can anyone please explain, what factors are these characters based on, and how does the OS decides or try what characters to show in an unsupported file exactly.
Is there a way to get the original content through this data?
Lets say you created a text editor, which can write and save text files as well as open text files. you also defined the encoding that will be used to save text in binary files(all files when saved are binary). So your encoding looks something as following:
Your encoding Emacs encoding
TEXT BINARY TEXT BINARY
A 01000001 ă 01000001
B 01000010 Ћ 01000010
... ...
Z 01011010 Ϡ 01011010
lets say you create a file with 'ABZ' as its contents. this file when saved contains value 010000010100001001011010. When you open this file with your text editor, the editor finds 010000010100001001011010 as file contents and using above encoding it knows that its 'ABZ' hence it prints 'ABZ' on the screen.
Now lets say you open same file using emacs, since emacs uses its own encoding it displays "ăЋϠ", There is nothing wrong with emacs. it just doesn't know that data was written using your custom encoding.
So the point is that every file is written in a specific format, for example APK format can only be correctly understood by Android system. when you try to open the APK file in a text editor it just tries to make sense of binary data in the same way as emacs does in above example.
Is there a way to get the original content through this data?
If you know the originally encoding using which data was written, then you can read the contents of file using same encoding.

How to programatically get the fragments called in an RTFtemplate?

I need to programmatically find the fragments that are called by each rtftemplate.
So, for example in the figure, I would need to get the "GlossaryTermsAcronyms" fragment for the H2_terms_acronyms template.
I can't seem to find any query or script solution to do this. But this should be possible, right?
Unfortunately that is (almost) impossible.
The information is stored in the t_documents.bincontent column. It is binary encoded RTF.
Somewhere in that RTF there should be a reference to the templates fragments that are used.
If you can figure out how to decode the bincontent to get to the actual RTF code of your template, you might have a chance.
Binary fields in EA are usually stored as a zipped text file.
In case the field is included in an xml file (or xml string in the database), it will be base64 encoded.

Found some square boxes in a xliff file and not sure what they are?

I'm looking at a xliff file and found some weird boxes which I don't know what they are? (Please see screenshot)
Do you guys have any ideas what the weird bug boxes are?
Thank you very much and I'm looking forward to your reply!
I have never seen that character, but here is how I would go about finding out what it is:
The first thing to do is to check the source and target language of the XLIFF file, which should be defined in the XLIFF header. Perhaps this character is a valid character in either the source or the target language script.
The next step depends on whether you can contact the person who created the XLIFF file. If yes, you can show them what the file looks like for you and ask them if the file has perhaps been garbled during transmission.
If not, you could check the encoding of the XLIFF file. If it is UTF-16, just open the file in a hex editor, find the code point for this character, and look it up on unicode.org. If the file is encoded as UTF-8 open it in Notepad++ (or any other text editor that allows you to change the encoding), convert it to UTF-16, then proceed as described above.
If you don't know the encoding of the file it becomes a matter of guessing. You can look at some other <trans-units> (assuming that there are more than this one in your XLIFF file): if they contain other extended characters and they are displayed correctly your editor has probably guessed the right encoding, and you can convert to Unicode and look up the character code. Different text editors have different ways of guessing encodings: try a few.
It's possible that those characters are the result of an encoding conversion error, which are commonly called mojibake.
It's also possible this is some sort of emoji or unusual glyph that's not rendering correctly in your editor. This would be unusual, but given that it appears to be a UI string, it might be possible.

How exactly are TMX map files base_64-encoded?

I am writing a game for iOS that uses .tmx map files. I am creating the maps in the application 'Tiled' and then at some point before they get to iOS, I'm parsing them with Perl.
When I save the files as straight XML, it's a cinch for perl to parse them. However, cocos2d insists that the files be base64-encoded. The 'Tiled' map editor has no problem saving files with this encoding scheme, and iOS reads them just fine, but it's presenting problems for my perl code.
For some reason, the standard MIME::Base64 decode_base64() method in perl is not cutting the mustard here- when I decode the strings, I get one or two binary characters-- question marks in diamond boxes and such.
And the vague documentation for the TMX file format makes it unclear if there is some other encoding going on before or after the base64 encoding which might be causing this problems. I looked at the cpp source for the encoder, and I saw lots of references to Latin1, but I couldn't decipher what's going on in detail.
I noticed that when I tried doing my own tests with MIME::Base64, encoding and then decoding a test string, the encoded text looks dramatically different than that which I see coming out of the TMX files-- for instance, my base64-encoded text for a short string looks like this:
aGVyZSBpcyBhIHNlbnRlbmNl
But the base64-encoded text coming from the TMX files looks like this:
9QAAAAABAAANAQAAGAEAAA==
Any suggestions on what else I might try in attempts to decode a string that looks like that?
I think this page might be what you're looking for. It suggests that first you decode_base64, then (if the compression="gzip" attribute is present) use gunzip to uncompress it, and finally use unpack('V*', $data) to extract the list of 4-byte little-endian integers.

How to figure out what and encoded string contains

I have a string that looks like this
H4sIALYnhUsCA9VXW5aDIAz9zypcgiU8dDnTWtfQ5Q8kEgSR
ap05c+YnhxLyumBu2r/s2PUvO3nh+rCaw0oFob1Q+Z51HfjNZ1jexCSsLAYx
BGG6eATZGJYALIIzG9QOy4NeaPYAyyarKfQY7TgypTjGI3ogkxDahSTw7kX/
FQUHeIgxsoClQD1JGRKF7Jy4oXNeQFou5TvJzlkJoAUIMuGAOlePMTEGWQry
2liLCfHNJPEwuiU7jmzEhM6gnGawSO3ORMnqLQRsNgki7AV4jEI9xKRU65V6
q7UUZVetqsZQC13z3UzMXkkM24nlvs+B/EktqmsnC0dxelvLycTaN+QugYw/
DTJeeTD4iy/ZXQHZ/KuXjH/2kvFKYtfaBfXtaUtlVZCZiIxw5WPLLxkFQZ2D
mMBmUaQJYCKyyBlShVqMuHUFSzu5/UTY1sVMVpwzSnimpEFOz5G7nKSoheIt
yqjg+pxU54zE64jd3zzdrYmW6Ybic2mVvcjAUKfg0s0QMfAXDadyotuGxOdH
hwZIU4NPR2fqbApbVnirTRdFGc/cjr7KwhmV+m6GGbMnf+RetoNNGwiohW4D
AREJ1R0FAhqo7gDx4b18iBh/uWPeGkwc07mMmdtKbBe0WQy9PMpr6TpLZwhR
whmj8/8FjTEWsv8ckhimqgj9+2q0hfWH1WpFCXPYfX27mEMGupKe1QA+gkwd
PDVv/xO+AbHzd9RzDQAA
My initial guess was that this was a Base64encoded file of some sort. Any ideas on how I can figure out if/which type of file it is? It should contain MIME info I guess but how would I save it to a file without fragmenting it.
It's base64. When you decode it, you get a gzipped file, which consists of a boatload of hex characters (literally, as ASCII 0xNN hex characters). They're mostly in the A-Z,a-z range.
I'd paste it here, but from this, I suspect this is part of some exercise you're doing, so I think I'll leave it to you to figure out.
P.S. For edification, I determined the binary output was a gzipped file by using the unix file command, to identify the "magic" bytes, which showed that it was gzipped. Use your decode_base64 function or whatever it is, then dump the return value into a file and gunzip it.