Should I use utf-8 or base64 to encode email in url parameter? - rest

Should I use utf-8 or base64 to encode email in url parameter?
As far as I am aware base64 is for binary data encoding while utf-8 is to encode text. So I assume utf-8 is the right answer? However I see lots of cases where emails are required to be encoded in base64.
I was just wondering what was best practice?

Neither. URL parameters are encoded using percent-encoding.

Related

How to decode 2-layer unknown base64 encoding

I have the following base64 encoded strings with some bizarre pattern.
0069a1/jE0MgjRAQaqv3N9cXpzf01WmZ9OUUJZrqx0Yndme2JXRZSVSEJPSLuCdX5ncXF9TE2PnEhPMHq6sWJ0ZWBrcUdMlI8wLDR6urFidHJmd2NLR5WUX31CSa27fHR5e2zSsKsxabDhvpweRqrRjJfSirjoaUbhubTmRmfRibDUipHosGAxuIjmvnkeiZTUjIISOSgxaZfhvrAeRrYRBhTSi4joaXDhuIHmRmo8O9SMmdKxgDFps+G+gR5GsNGMhdKKkehpfOG5pOZHdtGJhtSKteiwSDG4iOa+Ux6JkxQCEtKwgjFpsOG/jh5GltGNttKKqehpViHhv4YeRpDRjIPSi4DoaXzhuafmRkvRiLDUirDosVgxuarmvl0eiYbUjJ/SsJMxaabhvpLz9B4bGhoc0rCCMWm24b+OHkac0Yye0oq66Glw4bm35kZzEdGMqtKKuuhpcuG5tOZHftGJv9SKp+iwYzG5oOa+Ux6JlBQBAgIo6Glw4bmy5kZb0YmR1IqG6LBQMbmi5r5PHomQFBzSiqroaWPhua3mRlvRiYXUirPosVAxuarmv3YeibbUjLjSsLoxaZThvrLX8zvRjJ/Si4HoaWPhuafmR33Rib7Ui7vosVExuYDmvkoeiagUDBLSsYgxaaDhv4EeRqXRjKHSi4DoaXzhuaHmRk/Ria0U0ouI6GlP4bml5kZM0YmE1Iqf6LBjMbmJ5r9+Homq1I210rCRMWmf4b6yHkaY0Yyh0ouA6Gl84bmT5kZP0YmuSHZ9W02En0hVRUm6u28AAA==t=
0069a1M83GVi8zMwVnco+Bg4gXG2pxe318Y0FaY2GInoWUHwZwYnZ3enBMS3ZPiYKVgxUZa2ptfnp9M3l3fJ6Il5IPFWBrdm0CHjd5d3yeiICUEwdsYHd2bU9BSmB2gIiLiQi2l4zTi4LTvZ/Ti1YtfmW27p/Pi6TTi7fli6otdUIm7vXPl4LTirrlvbTTdWgmfuZ2Hg/Ti6XTvbPTi0rt9Oa276/Pi5LTioLli6fAxyZ+/baWp9OLgdO9gtOLTC1+d7buts+LntOLp+WKuy11dCbu0c+XqtOKuuW9ntN1b+bwdraXpdOLgtO8jdOLai1/RLbujs+LtBPTvIXTi2wtfnG276fPi57Ti6Tli4YtdEIm7tTPlrrTi5jlvZDTdXomfvu2l7TTi5TTvZE+OeLn6Oh4tpel04uE07yN04tgLX5stu6dz4uS04u05Yu+7S1+WLbunc+LkNOLt+WKsy11TSbuw8+XgdOLkuW9ntN1aObzZmYPz4uS04ux5YuWLXVjJu7iz5ey04uQ5b2C03Vs5u627o3Pi4HTi67li5YtdXcm7tfPlrLTi5jlvLvTdUomfty2l53Ti6bTvbEaPsctfm2276bPi4HTi6TlirAtdUwm79/PlrPTi7LlvYfTdVTm/na2lq/Ti5LTvILTi1ktflO276fPi57Ti6Lli4ItdV/mtu+vz4ut04um5YuBLXV2Ju77z5eB04u75byz03VWJn/Rtpe204ut072x04tkLX5Ttu+nz4ue04uQ5YuCLXVcuhIZfGpmfXpnRkp3dpP88g==t=
0069a1kA0HZTAANvPE0U9BQkkkKHVuSE55ZreswMJIXkRVLDVvfUVEf3W6vdXsSUJUQiYqdHVeTX94xY/U315IVlM8Jn90RV4HG8GP1N9eSEFVIDRzf0RFaEq3vMPVQEhKSDuFiJPguIfWS2lwKJbtv6SF3YDQuJfWjkETKAnttYPn3cbQiLHgj78TSxdwtajnv9VFARDguKDWS0VwKIotNSeF3LDQuKHWj3QTKAQAB+e/zoWJuOC4hNZLdHAojO2/toXdqdC4rdaOURMpGO21tefd4tCImeCPvxNLPXC1rycxRYWIuuC4h9ZKe3Aoqu2+hYXdkdC4hxbWSnNwKKztv7CF3LjQuK3WjlITKCXttIPn3efQiYngjp0TSzNwtbrnv8iFiKvguJHWS2edmiInKSlLhYi64LiB1kp7cCig7b+thd2C0Lih1o5CEygdLe2/mYXdgtC4o9aOQRMpEO21jOfd8NCIsuCOlxNLPXC1qCcyVVUQ0Lih1o5HEyg17bWi593R0IiB4I6VE0shcLWsJy+F3ZLQuLLWjlgTKDXttbbn3eTQiYHgjp0TShhwtYrnv++FiILguKPWS0e5nQftv6yF3LnQuLLWjlITKRPttY3n3OzQiYDgjrcTSyRwtZQnP0WFibDguJfWSnRwKJntv5KF3LjQuK3WjlQTKCHttZ4nhdyw0Lie1o5QEygi7bW3593I0Iiy4I6+E0oQcLWW577ihYip4Lio1ktHcCik7b+Shdy40Lit1o5mEygh7bWdeyEqY3VVTn9isLzU1VM8Mw==t=
0094dfBQcJXQfMLwJRREVLTEccEEJZhIJgf0ZdVVdCVEpbFA1YSomIZmxLTEB5Q0haTB4SQ0KSgWZhNH5BSlRCWF0EHkhDiZIeAjB+QUpUQk9bGAxESIiJcVNGTVZASkJERgO9v6QsdJ7PupjlvZznsaq95bfndFvPl7DivZznv43p5f7nv30slqbiuoLlv6Lpse19NicsdLnPurTlvYAnOym95IfndG3PloXivZEKDemx9r2+jyx0nc+6heW9huexuL3lnud0Yc+XoOK8jee/u+nl2ue/VSyWpuK6qOW/pSk/fb2/jSx0ns+7iuW9oOewi73lpud0Sw/Pu4Llvabnsb695I/ndGHPl6PivbDnvo3p5d/nvkUsl4Tiuqblv7DpsfC9v5wsdIjPupYIDygtJydzvb+NLHSYz7uK5b2q57GjveW153Rtz5ez4r2IJ+exl73lted0b8+XsOK8hee/gunlyOe/fiyXjuK6qOW/oik8bW0n53Rtz5e24r2g57+s6eXp579NLJeM4rq05b+mKSG95aXndH7Pl6nivaDnv7jp5dznvk0sl4Tiu43lv4Dpsde9v7UsdLrPurYsCA3nsaK95I7ndH7Pl6PivIbnv4Pp5NTnvkwsl67iurHlv54pMX29vocsdI7Pu4XlvZPnsZy95I/ndGHPl6XivbTnv5ApveSH53RSz5eh4r2357+56eXw579+LJen4ruF5b+c6bDavb+eLHSxz7q25b2u57GcveSP53Rhz5eX4r2057+TdRkSVEKZgmZ7QU1BQFk2PQ==J=
0094df/M8GDV7OBDaovY2DQ0hMQBsAhoBLVHJprK6KnEVURF0BE4uKTUd/eLmAi4BVQ05CGhuQg01KAEq4s5yKV1JUThEai5A2eHJ5r7mJnUNXTkIaG5CyQEtlc7G5goBY5rWuvuZ/Lrye1o5nHHds5r697ebJLna25I6vHERLL76l7bXvvndH5Lyx1kRZL3e2Jj8tvudOLryl1o97HHdbCwzttfW+d0bkvITWRHsvd4fmtby+5lcuvKnWjl4cdkfmvr/t5tkudp3kj78cRGIvvqQtO36+dkTkvIfWRXQvd6HmtI++5m8uvIMW1kV8L3en5rW6vudGLryp1o5dHHd65r+J7ebcLneN5I6dHERsL76x7bXzvnZV5LyR1kRowsUpLCMjcL52ROS8gdZFdC93q+a1p77mfC68pdaOTRx3QibmtZO+5nwuvKfWjk4cdk/mvobt5ssudrbkjpccRGIvvqMtOG5u7i68pdaOSBx3aua+qO3m6i52heSOlRxEfi++py0lvuZsLry21o5XHHdq5r687ebfLneF5I6dHEVHL76B7bXUvnZ85Lyj1kRI5sIM5rWmvudHLry21o5dHHZM5r6H7efXLneE5I63HER7L76fLTV+vndO5LyX1kV7L3eS5rWYvudGLryp1o5bHHd+5r6ULb7nTi68mtaOXxx3fea+ve3m8y52tuSOvhxFTy++ne202b52V+S8qNZESC93r+a1mL7nRi68qdaOaRx3fua+l3EaEZ2LUUp/Yr+zi4pYNzk=J=
I tried cutting incorrect base64 pattern out and then decoded it . i tried to decoded them with different character sets and still return unreadable text.I'm not sure how previous developer encoded this data.All i know is he encoded them with 2-layer base64 encoding and the result should be readable text in Thai.
Can anyone see some pattern that could help me identify and decode these strings ?

Arabic character base64 encoding issue

I am encoding a string to base64 encoded data.
Edit: removed irrelevant base64 conversion code
Is there would be any problem when I trying to encode a mixed english and arabic data, because we are here using
base64Data = [string dataUsingEncoding:NSASCIIStringEncoding];
I heard that NSASCIIStringEncoding should not be used with Unicode encoded string.
Base64 encodes data (raw bytes) and produces ASCII encoded strings. So your problem is in converting your string into an encoded byte array.
You could use any encoding that contains arabic and english characters. But you have to make sure the recipient of the base 64 encoded message would understand and know the encoding.
UTF-8 is a good point to start.

Perl encodings question

I need to get a string from <STDIN>, written in latin and russian mixed encodings, and convert it to some url:
$search_url = "http://searchengine.com/search?text=" . uri_escape($query);
But this proccess goes bad and gives out Mojibake (a mixture of weird letters). What can I do with Perl to solve it?
Before you can get started, there's a few things you need to know.
You'll need to know the encoding of your input. "Latin" and "russian" aren't (character) encodings.
If you're dealing with multiple encodings, you'll need to know what is encoded using which encoding. "It's a mix" isn't good enough.
You'll need to know the encoding the site expects the query to use. This should be the same encoding as the page that contains the search form.
Then, it's just a matter of decoding the input using the correct encoding, and encoding the query using the correct encoding. That's the easy part. Encode provides functions decode and encode to do just that.

Using Rome to aggregate RSS feeds and Description not getting encoded. Why?

When I specify the encoding for my new feed as UTF-8 (matches my sources), my output feed has no encoding for the emdash and curly quotes. If I specify ISO-8859-1 for encoding (but I don't want to), the characters are encoded. How to I force it to encode for UTF-8?
I was having the some problem when was trying to generating RSS feed in Java Servlet. Just make sure that response content encoding is equal to encoding of feed output.
feed.setEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");

base64 encoding: input character

I'm trying to understand what the input requirements are for base64 encoding. Nicholas Zakas, who I have tremendous respect for has an article here where he quotes a specification that an error should be thrown if input contains any character with a code higher than 255 Zakas Article on base64
Before even attempting to base64 encode a string, you should check to see if the string contains only ASCII characters. Since base64 encoding requires eight bits per input character, any character with a code higher than 255 cannot be accurately represented. The specification indicates that an error should be thrown in this case:
if (/([^\u0000-\u00ff])/.test(text)){
throw new Error("Can't base64 encode non-ASCII characters.");
}
He provides a link in another separate part of the article to the RFC 3548 but I don't see any input requirements other than:
Implementations MUST reject the encoding if it contains characters
outside the base alphabet when interpreting base encoded data, unless
the specification referring to this document explicitly states
otherwise.
Not sure what "base alphabet" means but perhaps this is what Zakas is referring to. But by saying they must reject the encoding it seems to imply that this is something that has already been encoded as opposed to the input (of course if the input is invalid it will also show up in the encoding so perhaps the point is moot).
A bit confused on what the standard is.
Fundamentally, it's a mistake to talk about "base64 encoding a string" where "string" is meant in terms of text.
Base64 encoding is applied to binary data (a sequence of bytes, or octets if you want to be even more picky), and the result is text. Every character in the output is printable ASCII text. The whole point of base64 is to provide a safe way of converting arbitrary binary data into a text format which can be reliably embedded in other text, transported etc. ASCII is compatible with almost all character sets, so you're very unlikely to be unable to encode ASCII text as part of something else.
When someone talks about "base64 encoding a string" they're really talking about encoding text as binary using some existing encoding (e.g. UTF-8), then applying a base64 encoding to the result. When decoding, you'd need to decode the base64 back to binary, and then decode that binary data with the original encoding, to get the original text.
For me the (first) linked article has a fundamental problem:
Before even attempting to base64 encode a string, you should check to see if the string contains only ASCII characters
You don't base64 encode strings. You base64 encode byte sequences. And when you're dealing with any kind of encoding work, it's extremely important to keep in mind this difference.
Also, his check for 'ASCII' actually lets through everything from 80 to ff, which aren't ASCII - ASCII is only 00 to 7f.
Now, if you have a string which you have checked is pure ASCII, you can then safely treat it as a byte sequence of the ASCII values of the characters in it - but this is a separate earlier step, nothing strictly to do with the act of base64 encoding.
(I should say that I do like his repeated urging for the reader to note that base64 encoding is not in any shape or form encryption)