Can anyone tell me what encoding this is? - encoding

AAAAAAFuAAIAAAZNYWMgT1MAAAAAAAAAAAAAAAAAAAAAAAAAAADMrsHTSCsAAAALuG8NYWxleHN1Y2tzLmRpYwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPJXS83SjIoAAAAAAAAAAP////8AAAkgAAAAAAAAAAAAAAAAAAAAB0Rlc2t0b3AAABAACAAAzK6zwwAAABEACAAAzdJ+egAAAAEADAALuG8AC7hIAADK3wACADFNYWMgT1M6VXNlcnM6AGFuZHJld3ByeWRlOgBEZXNrdG9wOgBhbGV4c3Vja3MuZGljAAAOABwADQBhAGwAZQB4AHMAdQBjAGsAcwAuAGQAaQBjAA8ADgAGAE0AYQBjACAATwBTABIAJ1VzZXJzL2FuZHJld3ByeWRlL0Rlc2t0b3AvYWxleHN1Y2tzLmRpYwAAEwABLwAAFQACABL//wAA
It's a data field from the ~/Library/Preferences/com.microsoft.office.plist file for Microsoft Office 2011 Mac.
It partially decodes using base64 but doesn't appear to be completely base64.
Edit:
Here is another example.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<data>AAAAAAFWAAIAAAZNYWMgT1MAAAAAAAAAAAAAAAAAAAAAAAAAAADMrsHTSCsAAAALuG8HMm5kLmRpYwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPgev83SjIoAAAAAAAAAAP////8AAAkgAAAAAAAAAAAAAAAAAAAAB0Rlc2t0b3AAABAACAAAzK6zwwAAABEACAAAzdJ+egAAAAEADAALuG8AC7hIAADK3wACACtNYWMgT1M6VXNlcnM6AGFuZHJld3ByeWRlOgBEZXNrdG9wOgAybmQuZGljAAAOABAABwAyAG4AZAAuAGQAaQBjAA8ADgAGAE0AYQBjACAATwBTABIAIVVzZXJzL2FuZHJld3ByeWRlL0Rlc2t0b3AvMm5kLmRpYwAAEwABLwAAFQACABL//wAA</data>
</plist>

base64 is usually used to encode decode binary files like images. As you would have seen on decoding the above file, it contains few recognizable ASCII strings but most of it is binary.
Property list is a format for storing serialized objects. It is also used for storing settings in Office 2011 Mac. If you are interested in the details for this particular file you can check it here. Scroll to ~/Library/Preferences/com.microsoft.office.plist for the specific format details.
This will help you understand what the ASCII strings mean. To extract and view the plist completely (even binary part) you can use Property List Editor and plutil. (See the source). There are several programs which can do the same.
But if you need to learn how to read and write from plist file (Property List), you can check these links:
http://en.wikipedia.org/wiki/Property_list
http://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/PropertyLists/Introduction/Introduction.html

It appears to be Base64. The decoded string is
n Mac OS Ì®ÁÓH+ ¸o
alexsucks.dic òWKÍÒŒŠ ÿÿÿÿ Desktop  Ì®³Ã ÍÒ~z  ¸o ¸H Êß 1Mac OS:Users: andrewpryde: Desktop: alexsucks.dic 
a l e x s u c k s . d i c   M a c O S 'Users/andrewpryde/Desktop/alexsucks.dic  / ÿÿ

The encoding is base64, it decodes correctly to a binary file.
Something you'll often see in binary files containing strings is the byte immediately before a string contains the length of the string. This one is no different. If you look at it with a hex editor, the byte immediately before the word "Desktop" has a value of 7.
You're probably stuck with reverse-engineering the point of the file, if there's something other than the text you intend to get out of it, but it mostly just appears to be a reference to some sort of "cleverly" named dictionary file.
FWIW, I used this tool to decode the file.

looks like a binary file defining a dictionary with 32-bit keys (ints?), strings are prefixed with a byte containing the length of the string. Some of the values seems to be padded by zeros. What values do you need to write to?
edit: this tool might help: Package maker

Related

having problems opening DITA files in OxygenXML which contain special characters

I am having problems opening files which contain special characters like é, è, ë, ê, à, á, ö, etc. The error message I get from OxygnXML is:
File encoding (UTF8) does not support all characters from the current file.
To ignore these errors or to replace invalid characters follow the link below to change the "Encoding errors handling" option value from REPORT to IGNORE or REPLACE.
The strange thing is: when I alter the file (by swapping the 'ó' for an 'o', for instance), I can import the files both in OxygenXML and in FontoXML.Afterwards I can correct them again and save the file. But I don't see a difference between the original file and the altered file.
This is the original file
<p id="id-9f3a1788-a751-4f48-ed9c-9e19447ad3b0">Ze is zó zenuwachtig, dat ze bijna aan de ... moet .</p>
And this is the saved corrected file (from FontoXML, in this case - just to show the added instructions):
<p id="id-9f3a1788-a751-4f48-ed9c-9e19447ad3b0">Ze is
z<?fontoxml-change-addition-start author-id="erik.verhaar" change-id="6f6bb382-3d43-4c5b-b35f-f857d729cf22" timestamp="1627473671530"?>ó<?fontoxml-change-addition-end change-id="6f6bb382-3d43-4c5b-b35f-f857d729cf22"?><?fontoxml-change-deletion author-id="erik.verhaar" change-id="0296c77c-863b-421f-bf5c-c0901c7a2751" text="ó" timestamp="1627473669483"?>
zenuwachtig, dat ze bijna aan de ... moet .</p>
What is the difference between the original ó and the corrected one? And how can I change my original files so they can be imported in OxygenXML?
Thanks!!
Text files (XML for example) are saved on disk using bytes, they are edited and presented using characters. An encoding takes care of converting bytes to characters (sometimes multiple bytes are converted to characters) when the document is opened and again the encoding does the conversion of characters to bytes when the document is saved.
There are many encodings but with the most popular (like UTF-8) characters belonging to the 0-128 ASCII range like a-z A-Z are usually saved to a single byte. Characters outside of the range, for example e-acute (é) usually get saved as multiple bytes, depending on the encoding used for saving.
When an XML document is opened Oxygen attempts to understand what encoding to use for reading it. If the XML document has a heading like this:
Oxygen uses the encoding specified in the heading. If the XML doc is lacking the heading Oxygen will fallback to UTF-8. Basically Oxygen implements the XML specification when it comes to detecting the encoding of the XML file:
https://www.w3.org/TR/xml/#sec-guessing
In your case Oxygen detected the encoding as UTF-8 and started to use UTF-8 to convert bytes to characters. It encountered a sequence of bytes which were not encoded using UTF-8. Oxygen does not continue loading the file because in such cases you may end up with corrupt content when saving it back.
In my opinion the other editor tool you used to create the XML files was not XML aware, it did not actually saved the XML as UTF-8 even if the heading in the XML document specified this.
We do not actually know with what encoding that other editing tool used to save the XML, one thing you could try would be to reopen the XML document in that other editing tool and change its encoding heading declaration from:
<?xml version='1.0' encoding='UTF-8'?>
to:
<?xml version='1.0' encoding='CP1250'?>
because I suspect that other editing tool actually used for saving the XML document the default platform encoding which on Windows should usually be CP1250.
Then save the XML document in the other editing tool and try to re-open it in Oxygen, if it works change its heading encoding declaration back to UTF-8 and save the XML document in Oxygen in order to properly save it using the UTF-8 encoding.
This older set of slides I made about XML encoding might also be useful to you:
https://www.oxygenxml.com/events/2018/large_xml_documents.pdf

writing subscripts in .plist files

I am trying to represent the number 2 as a subscript in a property list file. I tried using<sub>2</sub>, but it doesn't seem to work. Can anyone help me with this? and will it be stored correctly in a string after I store it there?
A plist file is XML that follows a certain schema. <sub> is not a valid tag in that schema. If you want to put that kind of stuff in the PList, you have to put in into CDATA:
<![CDATA[<sub>2</sub>]]>
A plist file is typically a UTF-8 encoded XML file. You should be able to use the Unicode subscript characters as it is. To copy a non-ASCII char, you can use the Keyboard & Character Viewer (on Lion: System Preferences > Language & Text > Input Sources > Select the Keyboard & Character Viewer as an input source).

How exactly are TMX map files base_64-encoded?

I am writing a game for iOS that uses .tmx map files. I am creating the maps in the application 'Tiled' and then at some point before they get to iOS, I'm parsing them with Perl.
When I save the files as straight XML, it's a cinch for perl to parse them. However, cocos2d insists that the files be base64-encoded. The 'Tiled' map editor has no problem saving files with this encoding scheme, and iOS reads them just fine, but it's presenting problems for my perl code.
For some reason, the standard MIME::Base64 decode_base64() method in perl is not cutting the mustard here- when I decode the strings, I get one or two binary characters-- question marks in diamond boxes and such.
And the vague documentation for the TMX file format makes it unclear if there is some other encoding going on before or after the base64 encoding which might be causing this problems. I looked at the cpp source for the encoder, and I saw lots of references to Latin1, but I couldn't decipher what's going on in detail.
I noticed that when I tried doing my own tests with MIME::Base64, encoding and then decoding a test string, the encoded text looks dramatically different than that which I see coming out of the TMX files-- for instance, my base64-encoded text for a short string looks like this:
aGVyZSBpcyBhIHNlbnRlbmNl
But the base64-encoded text coming from the TMX files looks like this:
9QAAAAABAAANAQAAGAEAAA==
Any suggestions on what else I might try in attempts to decode a string that looks like that?
I think this page might be what you're looking for. It suggests that first you decode_base64, then (if the compression="gzip" attribute is present) use gunzip to uncompress it, and finally use unpack('V*', $data) to extract the list of 4-byte little-endian integers.

How to figure out what and encoded string contains

I have a string that looks like this
H4sIALYnhUsCA9VXW5aDIAz9zypcgiU8dDnTWtfQ5Q8kEgSR
ap05c+YnhxLyumBu2r/s2PUvO3nh+rCaw0oFob1Q+Z51HfjNZ1jexCSsLAYx
BGG6eATZGJYALIIzG9QOy4NeaPYAyyarKfQY7TgypTjGI3ogkxDahSTw7kX/
FQUHeIgxsoClQD1JGRKF7Jy4oXNeQFou5TvJzlkJoAUIMuGAOlePMTEGWQry
2liLCfHNJPEwuiU7jmzEhM6gnGawSO3ORMnqLQRsNgki7AV4jEI9xKRU65V6
q7UUZVetqsZQC13z3UzMXkkM24nlvs+B/EktqmsnC0dxelvLycTaN+QugYw/
DTJeeTD4iy/ZXQHZ/KuXjH/2kvFKYtfaBfXtaUtlVZCZiIxw5WPLLxkFQZ2D
mMBmUaQJYCKyyBlShVqMuHUFSzu5/UTY1sVMVpwzSnimpEFOz5G7nKSoheIt
yqjg+pxU54zE64jd3zzdrYmW6Ybic2mVvcjAUKfg0s0QMfAXDadyotuGxOdH
hwZIU4NPR2fqbApbVnirTRdFGc/cjr7KwhmV+m6GGbMnf+RetoNNGwiohW4D
AREJ1R0FAhqo7gDx4b18iBh/uWPeGkwc07mMmdtKbBe0WQy9PMpr6TpLZwhR
whmj8/8FjTEWsv8ckhimqgj9+2q0hfWH1WpFCXPYfX27mEMGupKe1QA+gkwd
PDVv/xO+AbHzd9RzDQAA
My initial guess was that this was a Base64encoded file of some sort. Any ideas on how I can figure out if/which type of file it is? It should contain MIME info I guess but how would I save it to a file without fragmenting it.
It's base64. When you decode it, you get a gzipped file, which consists of a boatload of hex characters (literally, as ASCII 0xNN hex characters). They're mostly in the A-Z,a-z range.
I'd paste it here, but from this, I suspect this is part of some exercise you're doing, so I think I'll leave it to you to figure out.
P.S. For edification, I determined the binary output was a gzipped file by using the unix file command, to identify the "magic" bytes, which showed that it was gzipped. Use your decode_base64 function or whatever it is, then dump the return value into a file and gunzip it.

how to parse XML which contains data in Norwegian language?

how to parse XML which contains data in Norwegian language ?
Does i need any type of encoding with NSParser ?
Thanks.
I guess you are worried about non-ASCII characters in the XML file. Well you don't need to. The first line of an XML file should look something like:
<?xml version="1.0" encoding="UTF-8"?>
where the encoding attribute tells you which character set was used to encode the characters in the file. NSXMLParser will use that line to determine which character set it will use. Once it gets to your methods, all the text will be in NSStrings which will be able to cope with your Norwegian characters automatically.
All you need to be concerned about is that the file really is encoded in the character set that the first line says it is.
The xml is the language which don't concern which kind of language you are using!! In xml there should be one start tag and it's end tag. Then you can parse using xml parsing.
here is the tutorial to understand xml and
here is the link to tutorial to parse the xml file.
may this will be help full to your problem.