When I decode (using one of the online decoders) a base64 string, the decoded data returns several special chars like sqaure blocks and `"
base64 encodes binary data to visible characters. If you decode it, the string will be turned back into the binary data, where some of the bytes won't have an ascii/unicode representation and will show as squares. This is normal behaviour. You should decode the data in the program you want to use the data in.
Related
I have the following base64 encoded strings with some bizarre pattern.
0069a1/jE0MgjRAQaqv3N9cXpzf01WmZ9OUUJZrqx0Yndme2JXRZSVSEJPSLuCdX5ncXF9TE2PnEhPMHq6sWJ0ZWBrcUdMlI8wLDR6urFidHJmd2NLR5WUX31CSa27fHR5e2zSsKsxabDhvpweRqrRjJfSirjoaUbhubTmRmfRibDUipHosGAxuIjmvnkeiZTUjIISOSgxaZfhvrAeRrYRBhTSi4joaXDhuIHmRmo8O9SMmdKxgDFps+G+gR5GsNGMhdKKkehpfOG5pOZHdtGJhtSKteiwSDG4iOa+Ux6JkxQCEtKwgjFpsOG/jh5GltGNttKKqehpViHhv4YeRpDRjIPSi4DoaXzhuafmRkvRiLDUirDosVgxuarmvl0eiYbUjJ/SsJMxaabhvpLz9B4bGhoc0rCCMWm24b+OHkac0Yye0oq66Glw4bm35kZzEdGMqtKKuuhpcuG5tOZHftGJv9SKp+iwYzG5oOa+Ux6JlBQBAgIo6Glw4bmy5kZb0YmR1IqG6LBQMbmi5r5PHomQFBzSiqroaWPhua3mRlvRiYXUirPosVAxuarmv3YeibbUjLjSsLoxaZThvrLX8zvRjJ/Si4HoaWPhuafmR33Rib7Ui7vosVExuYDmvkoeiagUDBLSsYgxaaDhv4EeRqXRjKHSi4DoaXzhuaHmRk/Ria0U0ouI6GlP4bml5kZM0YmE1Iqf6LBjMbmJ5r9+Homq1I210rCRMWmf4b6yHkaY0Yyh0ouA6Gl84bmT5kZP0YmuSHZ9W02En0hVRUm6u28AAA==t=
0069a1M83GVi8zMwVnco+Bg4gXG2pxe318Y0FaY2GInoWUHwZwYnZ3enBMS3ZPiYKVgxUZa2ptfnp9M3l3fJ6Il5IPFWBrdm0CHjd5d3yeiICUEwdsYHd2bU9BSmB2gIiLiQi2l4zTi4LTvZ/Ti1YtfmW27p/Pi6TTi7fli6otdUIm7vXPl4LTirrlvbTTdWgmfuZ2Hg/Ti6XTvbPTi0rt9Oa276/Pi5LTioLli6fAxyZ+/baWp9OLgdO9gtOLTC1+d7buts+LntOLp+WKuy11dCbu0c+XqtOKuuW9ntN1b+bwdraXpdOLgtO8jdOLai1/RLbujs+LtBPTvIXTi2wtfnG276fPi57Ti6Tli4YtdEIm7tTPlrrTi5jlvZDTdXomfvu2l7TTi5TTvZE+OeLn6Oh4tpel04uE07yN04tgLX5stu6dz4uS04u05Yu+7S1+WLbunc+LkNOLt+WKsy11TSbuw8+XgdOLkuW9ntN1aObzZmYPz4uS04ux5YuWLXVjJu7iz5ey04uQ5b2C03Vs5u627o3Pi4HTi67li5YtdXcm7tfPlrLTi5jlvLvTdUomfty2l53Ti6bTvbEaPsctfm2276bPi4HTi6TlirAtdUwm79/PlrPTi7LlvYfTdVTm/na2lq/Ti5LTvILTi1ktflO276fPi57Ti6Lli4ItdV/mtu+vz4ut04um5YuBLXV2Ju77z5eB04u75byz03VWJn/Rtpe204ut072x04tkLX5Ttu+nz4ue04uQ5YuCLXVcuhIZfGpmfXpnRkp3dpP88g==t=
0069a1kA0HZTAANvPE0U9BQkkkKHVuSE55ZreswMJIXkRVLDVvfUVEf3W6vdXsSUJUQiYqdHVeTX94xY/U315IVlM8Jn90RV4HG8GP1N9eSEFVIDRzf0RFaEq3vMPVQEhKSDuFiJPguIfWS2lwKJbtv6SF3YDQuJfWjkETKAnttYPn3cbQiLHgj78TSxdwtajnv9VFARDguKDWS0VwKIotNSeF3LDQuKHWj3QTKAQAB+e/zoWJuOC4hNZLdHAojO2/toXdqdC4rdaOURMpGO21tefd4tCImeCPvxNLPXC1rycxRYWIuuC4h9ZKe3Aoqu2+hYXdkdC4hxbWSnNwKKztv7CF3LjQuK3WjlITKCXttIPn3efQiYngjp0TSzNwtbrnv8iFiKvguJHWS2edmiInKSlLhYi64LiB1kp7cCig7b+thd2C0Lih1o5CEygdLe2/mYXdgtC4o9aOQRMpEO21jOfd8NCIsuCOlxNLPXC1qCcyVVUQ0Lih1o5HEyg17bWi593R0IiB4I6VE0shcLWsJy+F3ZLQuLLWjlgTKDXttbbn3eTQiYHgjp0TShhwtYrnv++FiILguKPWS0e5nQftv6yF3LnQuLLWjlITKRPttY3n3OzQiYDgjrcTSyRwtZQnP0WFibDguJfWSnRwKJntv5KF3LjQuK3WjlQTKCHttZ4nhdyw0Lie1o5QEygi7bW3593I0Iiy4I6+E0oQcLWW577ihYip4Lio1ktHcCik7b+Shdy40Lit1o5mEygh7bWdeyEqY3VVTn9isLzU1VM8Mw==t=
0094dfBQcJXQfMLwJRREVLTEccEEJZhIJgf0ZdVVdCVEpbFA1YSomIZmxLTEB5Q0haTB4SQ0KSgWZhNH5BSlRCWF0EHkhDiZIeAjB+QUpUQk9bGAxESIiJcVNGTVZASkJERgO9v6QsdJ7PupjlvZznsaq95bfndFvPl7DivZznv43p5f7nv30slqbiuoLlv6Lpse19NicsdLnPurTlvYAnOym95IfndG3PloXivZEKDemx9r2+jyx0nc+6heW9huexuL3lnud0Yc+XoOK8jee/u+nl2ue/VSyWpuK6qOW/pSk/fb2/jSx0ns+7iuW9oOewi73lpud0Sw/Pu4Llvabnsb695I/ndGHPl6PivbDnvo3p5d/nvkUsl4Tiuqblv7DpsfC9v5wsdIjPupYIDygtJydzvb+NLHSYz7uK5b2q57GjveW153Rtz5ez4r2IJ+exl73lted0b8+XsOK8hee/gunlyOe/fiyXjuK6qOW/oik8bW0n53Rtz5e24r2g57+s6eXp579NLJeM4rq05b+mKSG95aXndH7Pl6nivaDnv7jp5dznvk0sl4Tiu43lv4Dpsde9v7UsdLrPurYsCA3nsaK95I7ndH7Pl6PivIbnv4Pp5NTnvkwsl67iurHlv54pMX29vocsdI7Pu4XlvZPnsZy95I/ndGHPl6XivbTnv5ApveSH53RSz5eh4r2357+56eXw579+LJen4ruF5b+c6bDavb+eLHSxz7q25b2u57GcveSP53Rhz5eX4r2057+TdRkSVEKZgmZ7QU1BQFk2PQ==J=
0094df/M8GDV7OBDaovY2DQ0hMQBsAhoBLVHJprK6KnEVURF0BE4uKTUd/eLmAi4BVQ05CGhuQg01KAEq4s5yKV1JUThEai5A2eHJ5r7mJnUNXTkIaG5CyQEtlc7G5goBY5rWuvuZ/Lrye1o5nHHds5r697ebJLna25I6vHERLL76l7bXvvndH5Lyx1kRZL3e2Jj8tvudOLryl1o97HHdbCwzttfW+d0bkvITWRHsvd4fmtby+5lcuvKnWjl4cdkfmvr/t5tkudp3kj78cRGIvvqQtO36+dkTkvIfWRXQvd6HmtI++5m8uvIMW1kV8L3en5rW6vudGLryp1o5dHHd65r+J7ebcLneN5I6dHERsL76x7bXzvnZV5LyR1kRowsUpLCMjcL52ROS8gdZFdC93q+a1p77mfC68pdaOTRx3QibmtZO+5nwuvKfWjk4cdk/mvobt5ssudrbkjpccRGIvvqMtOG5u7i68pdaOSBx3aua+qO3m6i52heSOlRxEfi++py0lvuZsLry21o5XHHdq5r687ebfLneF5I6dHEVHL76B7bXUvnZ85Lyj1kRI5sIM5rWmvudHLry21o5dHHZM5r6H7efXLneE5I63HER7L76fLTV+vndO5LyX1kV7L3eS5rWYvudGLryp1o5bHHd+5r6ULb7nTi68mtaOXxx3fea+ve3m8y52tuSOvhxFTy++ne202b52V+S8qNZESC93r+a1mL7nRi68qdaOaRx3fua+l3EaEZ2LUUp/Yr+zi4pYNzk=J=
I tried cutting incorrect base64 pattern out and then decoded it . i tried to decoded them with different character sets and still return unreadable text.I'm not sure how previous developer encoded this data.All i know is he encoded them with 2-layer base64 encoding and the result should be readable text in Thai.
Can anyone see some pattern that could help me identify and decode these strings ?
I am working on a project where I am getting parts of base64 encoded data, but not the whole thing. Is it possible to figure out what that part of the base64 encoded data was?
For example. Say I base64 encode hello world
It becomes aGVsbG8gd29ybGQ=
But say I am only able to capture sbG8gd29y
Which base4 decodes to ݽ
I am familiar with how base64 encoding process works and I cannot think of a way to figure out what part of a base64 encoded message is without adding data randomly to the chunk on the front and back and comparing with dictionary words, but the problem is I am not even 100% sure that the data I am working with includes dictionary words.
Thanks
I just spent a little time using an online conveter (http://www.convertstring.com/EncodeDecode/Base64Decode)
If you take your captured section you can run it through the converter and see that its an invalid length for a base64 encoded string.
For a captured section to have a valid length you will need to add some extra characters (0-3 depending on the length of the section). A valid base64 string has a length that is exactly devisible by 4.
Pick a character ('a' for example) and then run through the posibilities of adding the correct amount of characters to the section, front and back. With your added characters the string will be decodable and one of the decoded values will be more readable, that will be the one that has the partially decoded data.
E.G:
sbG8gd29yaaa
and
aaasbG8gd29y
decodes to:
����ݽɦ�
and
i��lo wor
You can make a rudimentary programatic test for readability by counting the number of 'normal' characters within the string (a-z for example). You will need to make up your own mind what is 'normal', it will depend on the expected language of the data and the context (is it known to be numeric only for example).
All base64 strings have a header (according to Wikipedia it's 814 bytes). I was wondering whether these headers are the same for data of the same type, since I've noticed that when I encode JPEG images in base64, the first 500+ characters are the same.
Not true. When base64 encoding a byte stream, the first three bytes of the byte stream get converted to the first four characters of the base64 encoded string. There's no headers.
See the example section of the Wikipedia article on Base64, where the three ASCII encoded bytes for the string Man get encoded to four base64 characters TWFu.
So if two base64 encoded byte streams start with the same characters, the original byte streams must also have started with the same bytes. All JPEG files start with magic number bytes FF D8, possibly followed by a format string and image metadata before the actual image details follow. See Magic number programming on wikipedia
The headers mentioned on the wikipedia article on Base64 are MIME headers for email attachments.
I am encoding a string to base64 encoded data.
Edit: removed irrelevant base64 conversion code
Is there would be any problem when I trying to encode a mixed english and arabic data, because we are here using
base64Data = [string dataUsingEncoding:NSASCIIStringEncoding];
I heard that NSASCIIStringEncoding should not be used with Unicode encoded string.
Base64 encodes data (raw bytes) and produces ASCII encoded strings. So your problem is in converting your string into an encoded byte array.
You could use any encoding that contains arabic and english characters. But you have to make sure the recipient of the base 64 encoded message would understand and know the encoding.
UTF-8 is a good point to start.
I'm trying to understand what the input requirements are for base64 encoding. Nicholas Zakas, who I have tremendous respect for has an article here where he quotes a specification that an error should be thrown if input contains any character with a code higher than 255 Zakas Article on base64
Before even attempting to base64 encode a string, you should check to see if the string contains only ASCII characters. Since base64 encoding requires eight bits per input character, any character with a code higher than 255 cannot be accurately represented. The specification indicates that an error should be thrown in this case:
if (/([^\u0000-\u00ff])/.test(text)){
throw new Error("Can't base64 encode non-ASCII characters.");
}
He provides a link in another separate part of the article to the RFC 3548 but I don't see any input requirements other than:
Implementations MUST reject the encoding if it contains characters
outside the base alphabet when interpreting base encoded data, unless
the specification referring to this document explicitly states
otherwise.
Not sure what "base alphabet" means but perhaps this is what Zakas is referring to. But by saying they must reject the encoding it seems to imply that this is something that has already been encoded as opposed to the input (of course if the input is invalid it will also show up in the encoding so perhaps the point is moot).
A bit confused on what the standard is.
Fundamentally, it's a mistake to talk about "base64 encoding a string" where "string" is meant in terms of text.
Base64 encoding is applied to binary data (a sequence of bytes, or octets if you want to be even more picky), and the result is text. Every character in the output is printable ASCII text. The whole point of base64 is to provide a safe way of converting arbitrary binary data into a text format which can be reliably embedded in other text, transported etc. ASCII is compatible with almost all character sets, so you're very unlikely to be unable to encode ASCII text as part of something else.
When someone talks about "base64 encoding a string" they're really talking about encoding text as binary using some existing encoding (e.g. UTF-8), then applying a base64 encoding to the result. When decoding, you'd need to decode the base64 back to binary, and then decode that binary data with the original encoding, to get the original text.
For me the (first) linked article has a fundamental problem:
Before even attempting to base64 encode a string, you should check to see if the string contains only ASCII characters
You don't base64 encode strings. You base64 encode byte sequences. And when you're dealing with any kind of encoding work, it's extremely important to keep in mind this difference.
Also, his check for 'ASCII' actually lets through everything from 80 to ff, which aren't ASCII - ASCII is only 00 to 7f.
Now, if you have a string which you have checked is pure ASCII, you can then safely treat it as a byte sequence of the ASCII values of the characters in it - but this is a separate earlier step, nothing strictly to do with the act of base64 encoding.
(I should say that I do like his repeated urging for the reader to note that base64 encoding is not in any shape or form encryption)