What charset SHOULD be used for a Location: header in a 301 response? - encoding

Trying to consume the URI cot.ag/o1LnfW from .NET with the HttpWebRequest, I get the 301 Moved response, the response header Location has a (incorrect) value of:
http://www.joycemeyer.org/BroadcastHome.aspx?video=Living_Beyond_Your_Feelings_â_Pt_1&utm_source=Twitter&utm_campaign=EEL&utm_medium=post&utm_term=September29&utm_content=post
From Fiddler, I get the (correct) Location header value:
http://www.joycemeyer.org/BroadcastHome.aspx?video=Living_Beyond_Your_Feelings_–_Pt_1&utm_source=Twitter&utm_campaign=EEL&utm_medium=post&utm_term=September29&utm_content=post
Noted the difference where the – occurs in the Fiddler URL. In the case of Fiddler, the bytes are E2 80 93. In the case of .Net, the bytes are E2 3F 3F. This results in an incorrect header interpretation, with subsequent failure to follow the redirection.
I think this is a .NET framework bug, but I have no idea what the RFCs say it SHOULD sent as. Should I report this as a bug to Microsoft, or is this a failure by bit.ly in serving the headers in the wrong code-page?

RFC 2616 specifies that the Location header should contain a URI as defined by RFC 1630, which requires a URI be 7-bit clean ASCII with any special characters URL encoded.
In other words, the server is delivering the URI incorrectly and should be escaping it.

I've reported this a bug over at bit.ly's support forum. They should be responding with a legal RFC 1630 URI in the ASCII character set (no octets with the high-bit set).

Related

Headers for REST API with optional Base64 encoding

We have a media file repository, with which other services communicate over a REST API. For various reasons we want the users of the repository to be able to upload and download files over HTTP both directly (plaintext for text files and byte array for binary files) and using Base64 encoding. We want the fact that the file is uploaded (PUT, POST) and requested for download (GET) in the Base64 encoding be reflected in the header of the HTTP request.
How do we reflect the fact that the content of the request or requested response is Base64 encoded in the HTTP header?
So far I'm tending towards appending ;base64 after the mime type in the Content-Type header, for example Content-Type: image/png;base64. Other options (X- header, Content-Encoding) are discussed in this related question but do not offer satisfactory resolution to our question.
You have to use Content-Transfer-Encoding header.
It is in RFC https://www.rfc-editor.org/rfc/rfc2045#page-14.
It supports base64 value among others, like "7bit" / "8bit" / "binary" / "quoted-printable" / "base64" / ietf-token / x-token
This header is specially designed for your case, to use as a complement for MIME type.

REST API Design: Respond with 406 or 404 if a resource is not available in a requested representation

We have a REST API to fetch binary files from the server.
The requests look like
GET /documents/e62dd3f6-18b0-4661-92c6-51c7258f9550 HTTP/1.1
Accept: application/octet-stream
For every response indicating an error, we'd like to give a reason in JSON.
The problem is now, that as the response is not of the same content type as the client requested.
But what kind of response should the server produce?
Currently, it responds with a
HTTP / 1.1 406 Not Acceptable
Content-Type: application/json
{
reason: "blabla"
...
}
Which seems wrong to me, as the underlying issue is, that the resource is not existing and not the client requesting the wrong content type.
But the question is, what would be the right way to deal with such situations?
Is it ok, to respond with 404 + application/json although application/octet-stream was requested
Is it ok, to respond with 406 + application/json, as the client did not specify an application/json as an acceptable type
Should spec been extended so that the client should use the q-param - for example, application/octet-stream, application/json;q=0.1
Other options?
If no representation can be found for the requested resource (because it doesn't exist or because the server wishes to "hide" its existence), the server should return 404.
If the client requests a particular representation in the Accept header and the server is not available to provide such representation, the server could either:
Return 406 along with a list of the available representations. (see note** below)
Simply ignore the Accept header and return a default representation of the resource.
See the following quote from the RFC 7231, the document the defines the content and semantics of the HTTP/1.1 protocol:
A request without any Accept header field implies that the user agent will accept any media type in response. If the header field is present in a request and none of the available representations for the response have a media type that is listed as acceptable, the origin server can either honor the header field by sending a 406 (Not Acceptable) response or disregard the header field by treating the response as if it is not subject to content negotiation.
Mozilla also recommends the following regarding 406:
In practice, this error is very rarely used. Instead of responding using this error code, which would be cryptic for the end user and difficult to fix, servers ignore the relevant header and serve an actual page to the user. It is assumed that even if the user won't be completely happy, they will prefer this to an error code.
** Regarding the list of available representations, see this answer.

HTTP multipart/form-data. What happends when binary data has no string representation?

I want to write an HTTP implementation.
I've been looking around for a few days about sending files over HTTP with Content-Type: multipart/form-data, and I'm really interested about how browsers (or any HTTP client) creates that kind of request.
I already took a look at a lots of questions about it here at stackoverflow like:
How does HTTP file upload work?
What does enctype='multipart/form-data' mean?
I dig into RFCs 2616 (and newer versions), 2046, etc. But I didn't find a clear answer (obviously I did not get the idea behind it).At most articles and answers I found this piece of request string, that's is simple to me to interpret, all these things are documented at RFCs...
POST /upload?upload_progress_id=12344 HTTP/1.1
Host: localhost:3000
Content-Length: 1325
Origin: http://localhost:3000
... other headers ...
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryePkpFF7tjBAqx29L
------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="MAX_FILE_SIZE"
100000
------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="uploadedfile"; filename="hello.o"
Content-Type: application/x-object
... contents of file goes here ...
------WebKitFormBoundaryePkpFF7tjBAqx29L--
...and it would be simple to implement an HTTP client to construct a piece of string that way in any language.The problem becomes at ... contents of file goes here ..., there's little information about what "contents of file" is. I know it's binary data with a certain type and encoding, but It's difficult to think out of string data, how I would add a piece of binary data that has no string representation inside a string.
I would like to see examples of low level implementations of HTTP protocol with any language. And maybe in depth explanations about binary data transfer over HTTP, how client creates requests and how server read/parse it. PD. I know this question my look a duplicate but most of the answers are not focused on explaining binary data transfer (like media).
You should not try to handle strings on this part of the body, you should send binary data, see it as reading bytes from the resource and sending theses bytes unaltered.
So especially no encoding applied, no utf-8, no base64, HTTP is not a protocol with an ascii7 restriction like smtp, where base64 encoding is applied to ensure only ascii7 characters are used.
There is, by definition, no string version of this data, and looking at raw HTTP transfer (with wireshark for example) you should see binary data, bytes, stuff.
This is why most HTTP servers uses C to manage HTTP, they parse the HTTP communication byte per byte (as the protocol headers are ascii 7 only, certainly not multibytes characters) and they can also read/write arbitrary
binary data for the body quite easily (or even using system calls like readfile to let the kernel manage the binary part).
Now, about examples.
When you use Content-Length and no multipart stuff the body is exactly (content-length) bytes long, so the client parsing your sent data will just read this number of bytes and will treat this whole raw data as the body content (which may have a mime type and and encoding information, but that's just informations for layers set on top of the HTTP protocol).
When you use Transfer-Encoding: chunked, the raw binary body is separated into pieces, each part is then prefixed by an hexadecimal number (the size of the chunk) and the end of line marker. With a final null marker at the end.
If we take the wikipedia example:
4\r\n
Wiki\r\n
5\r\n
pedia\r\n
E\r\n
in\r\n
\r\n
chunks.\r\n
0\r\n
\r\n
We could replace each ascii7 letter by any byte, even a byte that would have no ascii7 representation, Ill use a * character for each real body byte:
4\r\n
****\r\n
5\r\n
*****\r\n
E\r\n
**************\r\n
0\r\n
\r\n
All the other characters are part of the HTTP protocol (here a chunked body transmission). I could also use a \n representation of binary data, and send only the null byte for each byte of the body, that would be:
4\r\n
\0\0\0\0\0\r\n
5\r\n
\0\0\0\0\0\0\r\n
E\r\n
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\n
0\r\n
\r\n
That's just a representation, we could also use \xNN or \NN representations, in reality these are bytes, 8 bits (too lazy to write the 0/1 representation of this body :-) ).
If the text of the example, instead of being:
Wikipedia in\r\n
\r\n
chunks.
It could have been a more complex one, with multibytes characters (here a é in utf-8):
Wikipédia in\r\n
\r\n
chunks.
This é is in fact 11000011:10101001 in utf-8, two bytes: \xc3\xa9 in \xNN representation), instead of the simple 01100101 / \x65 / echaracter. The HTTP body is now (see that second chunk size is 6 and not 5):
4\r\n
Wiki\r\n
6\r\n
p\xc3\xa9dia\r\n
E\r\n
in\r\n
\r\n
chunks.\r\n
0\r\n
\r\n
But this is only valid if the source data was effectively in utf-8, could have been another encoding. By default, unless you have some specific configuration settings available in your web server where you enforce a conversion of the source document in a specific encoding, that's not really the job of the web server to convert the source document, you take what you have, and you maybe add an header to tell the client what encoding was defined on the source document.
Finally we have the multipart way of transmitting the body, like in your question, it's a lot like the chunked version, except here boundaries and intermediary headers are used, but for the binary data between these boundaries, headers, and line endings control characters it is the same rule, everything inside are just bytes...

Perl Change Content Type Of Response

I am calling a SOAP web service as client.
Following is content-type value of response
Content-Type: text/xml
I requested customer to add UTF-8 to response as follow:
Content-Type: text/xml;charset=utf-8
But customer says that it can be from client side. Is this possible? Can I determine content type of server as client?
PS: I noticed that the cited RFC2376 is obsolete by RFC3023 (conservative enough) and then RFC7303 that I'm omitting to evaluate now in involved current use and content, so the relevance of the following might not be that definitive, I'm feeling to delete it.
You have everything formal in RFC2376 XML Media Types: Section 3.1 text/xml Registration
See also Section 6 Examples of that RFC, particularly Section 6.4 text/xml with Omitted Charset
The server side (your customer) is STRONGLY RECOMMENDED to use charset parameter that they are not currently using.
And if charter is omitted XML processors MUST use the default charset value of "us-ascii"
You are right asking the customer to specify charset, the "MUST" in the RFC is a strong requirement that limits also your adaptability from client side when they are not sending us-ascii.

How to C - windows socket reading textfile content

I am having problems reading a text file content via winsock on C , does anyone have any idea how it should work? actually when I try to GET HTTP header from google am able to, but when I try on my xampp machine,
it just gives me 400 bad request.
HTTP/1.1 400 Bad Request
char *message = "GET / HTTP/1.1\r\n\r\n";
Ok the problem that I was receiving 400 bad request on my localhost via winsock was the my HTTP request, i just changed the 1.1 to 1.0 .. and it worked!!! what I am wanting now is printing nothing the content of the text file and not the whole banner?! :)
Read RFC 2616, in particular sections 5.2 and 14.23. An HTTP 1.1 request is required to include a Host header, and an HTTP 1.1 server is required to send a 400 reply if the header is missing and no host is specified in the request line.
char *message = "GET / HTTP/1.1\r\nHost: hostnamehere\r\n\r\n";
As for the text content, you need to read from the socket until you encounter a \r\n\r\n sequence (which terminates the response headers), then process the headers, then read the text content accordingly. The response headers tell you how to read the raw bytes of the text content and when to stop reading (refer to RFC 2616 section 4.4 for details). Once you have the raw bytes, the Content-Type header tells you how to interpret the raw bytes (data type, charset, etc).