Are there any special variations of uuencoding / uudecoding? - encoding

I have written a small program which can encode/decode a text with uuencode/uudecode. The code is based on the algorithm described on Wikipedia. It works fine when I encode/decode a string. But I have found a uuencoded file which I can't decode. This website can decode the file, but when I encode it again I don't get the same file. In addition, when I decode only one line of the file I don't get readable text (neither with my program nor with the decoder I linked before). But in uuenoding all lines are independent from each other - this must be able.
Do someone know whether there are some special variations of the uuenoding, which are not described on Wikipedia? I can decode some strings so my decoder can't be totally wrong. Perhaps someone has written his own decoder, so I post the whole file:
begin 666 Restricted.zip
M4$L#!!0````(`%T[="_]<LYX`P(``'0#```.````4F5S=')I8W1E9"YT>'1M
M4\MNVT`0NQOP/TSNM#PT0!/X4N16`RE0%.GC.I9&TE;2CKH/J_K[<E;IX]"+
M'UJ20W)6^]U3)SX=]KO][D*]SD(7XHD2CX/S'26EU`L%U_6)9#E1?46NQ4,7
MR?E6P\3)J:=%#ABZY7'$P2MO"0J1GGT3Z;B1YJ#?I4ZT:!X;N#KI34)%3Y%6
MS8#>A#I-&[;E`-H%'(EY#G[/(-I',=GI;XN"H49?''YXT#LE]BNU.<!&,*(W
M0&4Y7V#,F_&11NV<-TNU-!D!>HZP5"MF91^YE0-D&H2C5CAL\T&P:#/'A*<+
M#F6(!IEXW?Q?13Q=#P[XLBHJ>L[UX,;U8+`"X3I)0S^RJX=Q+3-28)##+IK:
MEAD#AQRM7DY)ICG%BK[:(,\=L$C>20*EUCR/8BP'&'H+.OT5:+`V>,*NK$%9
MZ<;>Q1X"1WJOBZ#_8HQ+`3?K%(U<1U-:7.HI6A]_+/V[\RU,J]DW!SMV#<37
M89W+>5QCL6/"MDHTQPV&UT5-<R!=?%D)MG^AR&Y3^>]::JP0H2MZ4>3UR?F,
M[>18,L'"..I2K'.,BP8TF<K)YT_/IG1S#<#VZ^,KX$QO'[\\WC_<W;V[?_-P
MW>^`/%.?TGP^G99EJ29MCC^K6JL\G%H78CJQC[CGU=S/V_M2KEN<A0?;A5U`
M[AC.U2*6OUOE0<KD#Q#\MM_]`E!+`0(4`!0````(`%T[="_]<LYX`P(``'0#
M```.``````````$`(`"V#0````!297-T<FEC=&5D+G1X=%!+!08``````0`!
+`#P````O`#``````
`
end

I found the solution! The problem was that I did not notice the first line. This line holds information about the data encoded - a file named Restricted.zip. So the decoded data is a ZIP file which I just had to unpack.
I got a text file named Restricted.txt which contains the readable data.
The problem was so easy, but it took me days to see its solution.
That's a good change over to packing algorithms - perhaps the next thing I do is writing my own program which can pack/unpack zip files.

Related

Why importdata is not working here?

I have two data text files with the same text content but they have different sizes. The following snapshot compares them (using Beyond Compare).
It seems that the hex content of the files is different.
The MATLAB function importdata reads fine the file to the left but gives the following error with the file to the right (the bigger size file):
Unable to load file. Use TEXTSCAN or FREAD for more complex formats.
What exactly is the difference between the two files ?
How to make importdata work with the file to the right ?
The problem and solution is already mentioned in the comments:
it seems like one text file is in ascii encoding (that is 8 bits per char) while the second one is unicode (16 bits per char). try converting the big file into simple ascii and re-read it
You may see the difference already in BeyondCompare: On the left, you have an ANSI-File, on the right you have a unicode with BOM (The hex-code looks like UTF-16LE. I'm not sure, which version the BeyondCompare you used to get the screenshot. My BeyondCompare would not show 'Unicode' but 'UTF16-LE' ...).

CSV in bad Encoding

We have uploaded a file with bad encoding now when downloading it again all the "strange" French characters are mixed up.
Example of the bad text:
R�union
Now when opening the CSV with Openoffice we tried all of the encodings in the Dropdown none of them seem to work.
Anyone have a way to fix the encoding to the correct one that we can view the chars?
Links to file https://drive.google.com/file/d/0BwgeuQK3LAFRWkJuNHd2TlF2WjQ/view?usp=sharing
Kr.
Sadly there is no way to automatically fix the linked file. Consider the two words afectación and sécurité. In the file they have been converted incorrectly to afectaci?n and s?curit?. There is no way to convert the question marks back because sometimes they're ó and other times é.
(Actually instead of question marks the file uses the unicode replacement character, but that doesn't change the problem).
Hopefully you have an earlier version of the file that has not been converted incorrectly.
Next time try to use a consistent encoding. This question gives some suggestions for how to do this.
If the original data cannot be obtained, there is one thing that could be done outside of retyping the whole thing. It is possible to use dictionary lookups to guess the missing words. However this would be a difficult project, and there would be mistakes where incorrect guesses were made. It's probably not worth it.

Strange character rendered correctly in notepad, but as a control character elsewhere

I have a .csv list of businesses. The file has some strange characters in. For example, in this field: Stocktonon-Tees, the first hyphen, between Stockton and on seems to be a character with the value 6 rather than a hyphen, with the value 45. Stack overflow will probably sanatize this so you can't see it, so here is a pastebin:
http://pastebin.com/NuyyaQy9
Can anyone explain why this could be? Is it some encoding issue that I have missed? Or a corruption in the dataset?
Yes, it's almost certainly an encoding issue. A file just consists of binary data - it's how you interpret that binary data that matters. It sounds like Notepad is guessing at the originally-intended encoding, but whatever else you're using isn't.
Unfortunately you haven't said anything about what software is trying to read the file or what wrote it in the first place - but you should look at what encoding Notepad thinks it is, and work from there.
If it's your code that wrote the file out, and you get to decide the encoding, I'd recommend UTF-8 as a good general purpose, platform-portable encoding.

How exactly are TMX map files base_64-encoded?

I am writing a game for iOS that uses .tmx map files. I am creating the maps in the application 'Tiled' and then at some point before they get to iOS, I'm parsing them with Perl.
When I save the files as straight XML, it's a cinch for perl to parse them. However, cocos2d insists that the files be base64-encoded. The 'Tiled' map editor has no problem saving files with this encoding scheme, and iOS reads them just fine, but it's presenting problems for my perl code.
For some reason, the standard MIME::Base64 decode_base64() method in perl is not cutting the mustard here- when I decode the strings, I get one or two binary characters-- question marks in diamond boxes and such.
And the vague documentation for the TMX file format makes it unclear if there is some other encoding going on before or after the base64 encoding which might be causing this problems. I looked at the cpp source for the encoder, and I saw lots of references to Latin1, but I couldn't decipher what's going on in detail.
I noticed that when I tried doing my own tests with MIME::Base64, encoding and then decoding a test string, the encoded text looks dramatically different than that which I see coming out of the TMX files-- for instance, my base64-encoded text for a short string looks like this:
aGVyZSBpcyBhIHNlbnRlbmNl
But the base64-encoded text coming from the TMX files looks like this:
9QAAAAABAAANAQAAGAEAAA==
Any suggestions on what else I might try in attempts to decode a string that looks like that?
I think this page might be what you're looking for. It suggests that first you decode_base64, then (if the compression="gzip" attribute is present) use gunzip to uncompress it, and finally use unpack('V*', $data) to extract the list of 4-byte little-endian integers.

How to figure out what and encoded string contains

I have a string that looks like this
H4sIALYnhUsCA9VXW5aDIAz9zypcgiU8dDnTWtfQ5Q8kEgSR
ap05c+YnhxLyumBu2r/s2PUvO3nh+rCaw0oFob1Q+Z51HfjNZ1jexCSsLAYx
BGG6eATZGJYALIIzG9QOy4NeaPYAyyarKfQY7TgypTjGI3ogkxDahSTw7kX/
FQUHeIgxsoClQD1JGRKF7Jy4oXNeQFou5TvJzlkJoAUIMuGAOlePMTEGWQry
2liLCfHNJPEwuiU7jmzEhM6gnGawSO3ORMnqLQRsNgki7AV4jEI9xKRU65V6
q7UUZVetqsZQC13z3UzMXkkM24nlvs+B/EktqmsnC0dxelvLycTaN+QugYw/
DTJeeTD4iy/ZXQHZ/KuXjH/2kvFKYtfaBfXtaUtlVZCZiIxw5WPLLxkFQZ2D
mMBmUaQJYCKyyBlShVqMuHUFSzu5/UTY1sVMVpwzSnimpEFOz5G7nKSoheIt
yqjg+pxU54zE64jd3zzdrYmW6Ybic2mVvcjAUKfg0s0QMfAXDadyotuGxOdH
hwZIU4NPR2fqbApbVnirTRdFGc/cjr7KwhmV+m6GGbMnf+RetoNNGwiohW4D
AREJ1R0FAhqo7gDx4b18iBh/uWPeGkwc07mMmdtKbBe0WQy9PMpr6TpLZwhR
whmj8/8FjTEWsv8ckhimqgj9+2q0hfWH1WpFCXPYfX27mEMGupKe1QA+gkwd
PDVv/xO+AbHzd9RzDQAA
My initial guess was that this was a Base64encoded file of some sort. Any ideas on how I can figure out if/which type of file it is? It should contain MIME info I guess but how would I save it to a file without fragmenting it.
It's base64. When you decode it, you get a gzipped file, which consists of a boatload of hex characters (literally, as ASCII 0xNN hex characters). They're mostly in the A-Z,a-z range.
I'd paste it here, but from this, I suspect this is part of some exercise you're doing, so I think I'll leave it to you to figure out.
P.S. For edification, I determined the binary output was a gzipped file by using the unix file command, to identify the "magic" bytes, which showed that it was gzipped. Use your decode_base64 function or whatever it is, then dump the return value into a file and gunzip it.