HTTP response content is truncated when request is sent over WiFi - rest

We're getting some HTTP responses with status 200 but with incomplete content (cut off at a different position in each request).
I wonder if there's something wrong with that network or if this kind of response errors must be expected and dealt by our application, i.e. if HTTP clients must check if the response content is complete. I haven't seen this behavior before.
More information:
The issue happens in a place where they have WiFi. If the client, the app, uses the mobile network, the response is always complete. We suspect that there's something wrong with that WiFi because we have not detected this issue in other places (other WiFis). The problem is not caused by the client since I could also reproduce it using Postman.
I made several tests at that place (using the app and also Postman) and found that on mobile data the response always comes OK, but when connecting to that WiFi the response sometimes gets truncated. Also, I saw that other API requests get the response correctly, even when the content length is 10 times bigger (I thought that maybe the response was too big, but we're talking about only 10Kb).
The failing request is a regular HTTP POST request using JSON and sent to our REST API made with Dropwizard and hosted on Heroku. The request gets processed correctly on the server, which returns status 200 and the content. The client gets the status 200 but the content is truncated, so the whole operation can't be finished successfully.

Related

HTTP Post under the hood

We have 2 Windows services (same machine) that communicate on top of HTTP Protocol.
On specific machine we see the HTTP POST being sent from the client (Windows service) and arrives to the server (Windows service listening to REST CALLs) - 2 times, meaning i get 2 exact HTTP Post request on the service, but we see on client it was executed only 1 time.
Before going to wireshark/analyze the HTTP protocol, I wish to understand what explain this behavior.
When going to https://www.rfc-editor.org/rfc/rfc7231#section-4.3.3
"the origin server SHOULD send a 201 (Created) response containing a Location header
field that provides an identifier for the primary resource created"
I guess we should look in wireshark for 201 response? And if no response? Does the HTTP or network framework for my C# application is retrying the POST on the server side? because we dont see 2 requests sent from client code.
POST reply behavior
While true, more often than not the server replies with a 200-ok status code and some extra information.
Whether this is by mistake or to avoid chatty apis or some other architecture/design consideration, only the developer can tell.
So in theory you get a 201 with an identifier and then make a GET request with said identifier to retrieve details.
In practice a lot of times this does not occur. So it is not safe to assume this behavior.
Your problem
I highly doubt that there is a built in mechanism that retries post. There are plenty of reasons for that:
Duplicating entries. Imagine creating a PayPal payment. If the network has an error and you just did not receive the answer, the built in mechanism will charge you twice.
There are libraries that do that only when you are sure that the request is idempotent, that is the post contained some sort of identifier and the second request will fail.
First, the calls are HTTP GET (not POST).
We define the URL with hostname/FQDN, the solution to avoid duplicated calls was to work with ip address instead of hostname when sending the Rest API.
This is the long explanation of the problem, no root cause yet.
Used both Wireshark/Process Monitor to diag, not sure for the root cause.
Process Monitor: Filtering to display network calls
Wireshark: Filter to show only HTTP
The Client send a single HTTP Get request to:
/DLEManagement/API/Engine/RunLearningPeriod
The call was executed at 11:08:16.931906
We can see 2nd call at 11:08:54.511909 - We did not trigger.
HTTP Get executed from *Server.exe (in red) and the Server is at *Management.Webservice.exe (in red).
We see that a *Client.exe (Antivirus process, in blue) is sending TCPCopy packets in the window between we sent and received.
Also, we can see that the first request was made with APIPA IPv6 and the 2nd call is IPv4, We checked the network interface and it was disabled.
Wireshark screenshot:
Process Monitor screenshot:
Network configuration:

Differentiating REST status codes

Lately, I have started adding status codes to my responses instead of returning them directly.
Let's assume /person/1 returns a person with id 1 from the DB. If the person does not exist, should I return 404 status? How am I supposed to differentiate if the endpoint does not exist on the server or the resource does not exist?
Now, let's assume I have a POST endpoint for inserting users. What if that endpoint checks if the email is formed correctly and I return 400? How should I know if the request was not formed correctly and did not route to any servlets or if it indeed reached the servlet which decided that email is badly formed?
Is it a good practice to always return a 200 OK response from all of my servlets indicating that the application has done its job regardless of the outcome and write the status in a json field status or is this an overkill and an anti-pattern?
I do not have a lot of experience nor knowledge of HTTP servers so I am not sure I am explaining this (nor using it) right, so I apologize for the broad descriptions.
Let's assume /person/1 returns a person with id 1 from the DB. If the person does not exist, should I return 404 status? How am I supposed to differentiate if the endpoint does not exist on the server or the resource does not exist?
To a client it doesn't matter whether the resource or the endpoint did not exist. All it is told by the server is that for the given URI there is no representation available.
As inf3rno already mentioned a client is usually served all of the URIs a client will need by the server directly in a response. Through bookmarking or including links in some external resource certain links might get invalid over time and as such a 404 Not Found response just informs the client that no representation is available for the given URI.
A client typically is also not interested in the internals of an API but just to send or receive data it can work upon.
A further misconception many users have, unfortunately, is, that they already assume certain resources to return certain types. Such types may lead to failures on the client side if the expected representation format ever changes. In addition to that the URI structure itself, including any path, matrix and query parameters, should not be used to deduce any logical structure of the API, its exposed endpoints or the logical structure of the resources to other resources of that API. A URI as a whole is a pointer to a resource. A resource may have a dozens of links pointing to it. You might think of a URI as cache-key for representations returned that, on consecutive invocations are further served by the cache instead of the actual server. This is actually one of the constraints REST imposes and is widely used on the Web.
Now, let's assume I have a POST endpoint for inserting users. What if that endpoint checks if the email is formed correctly and I return 400? How should I know if the request was not formed correctly and did not route to any servlets or if it indeed reached the servlet which decided that email is badly formed?
RFC 7231 defines POST as an all-purpose tool that should be used if other methods aren't fitting for the task at hand. It explicitely states that the payload provided by that method will be processed according to the resource's own specific semantics. So, if you need to validate an email-address of a user before persisting it or before starting a calculation, background process or whatever, fine, do that :) Even PUT, which is often said to only replace the current representatin with the given one in the request, is not only allowed but also encouraged to perform verifications regarding any constraints the server has for the target resource and therefore it should refuse payloads that do not fit its expectations.
The quintesence here is, that a server should provide a client always with as much information as possible to let a client determine what to do next. Think of a Web based application which you access through your browser. If you receive a 400 Bad Request the browser will usually tell what the server didn't like about your request, i.e. incomplete syntax or missing value of a required field. The same holds true for REST APIs as they are basically just a generalization of the interaction model used on the Web. So the same concepts that apply to the Web also apply to REST :)
By that, each HTTP status code has its own semantics and should help the client to determine what the client should do next. A 400 Bad Request i.e. states that the server either cannot or will not process the request due to something that the server considers to be a client based error and it's up to the client to correct that failure and resend the request.
A 405 Method Not Allowed on the other hand indicates that the client used a HTTP method not supported by the targeted endpoint. An error response not only indicates that to the client but also which methods are allowed on the targeted endpoint within an Alllow response header.
Each of the HTTP status codes specified in RFC 7231 has their own semantics and its probably advisable to at least skim over these. You can also lookup all available status codes at IANA that provides links to the specificaton describing those status codes.
Is it a good practice to always return a 200 OK response from all of my servlets indicating that the application has done its job regardless of the outcome and write the status in a json field status or is this an overkill and an anti-pattern?
As with error codes also the success codes (in the 200 range) have their own semantics. If a new resource is created as outcome of processing a request (via PUT or POST) a client should be notified with a 201 Created status response that furthremore contains a HTTP Location header containing a URI targeting at the newly created resource.
If a server may take some time in order to calculate a response it is probably advisable to return a 202 Accepted response in order to inform a client about the pending request. A client can later on poll for the request either after some threshold period or after getting notified by the server through callback mechanisms such as email-notification or similar stuff. Due to German law restrictions i.e. German companies have to maintain archives of their messages exchanged via EDI. We, as an EDI provider, offer our clients to perform an archive of their exchanged messages via triggering one of our HTTP endpoints. Depending on the number of messages exchange by that company and the time period selected the archive should be generated for, this process may take some time (a couple of hours to be more concrete) and instead of letting the client wait for that period we simply return 202 Accepted and start the archiving process in the back. Depending on the configuration they either poll for the finished archive, get an information about the final result or directly get the archive sent through email if the file isn't to large.
204 No Content is also quite useful if a client performs an update onto a resource. As PUT is generally defined as replace the current representation with the one provided in the payload, upon receiving a 204 No Content response the client knows that the server applied the update and the current representation does look like the requested one by the client. Thus the server does not need to inform the client further how the current representation looks like, as the client already knows how it should look like. However, in case the server had to convert the payload to a different representation that maybe lead to an other outcome, it is probably benefitial to inform a client about the new state of the resource within a 200 OK response including the a representation of the outcome of the update process.
Returning 200 OK for a failure including a JSON payload with fields indicating about the error is for sure a bad way to proceed. Not only does it give clients a wrong hint but the response might also be cached by intermediaries and returned to other clients requesting the same even when the failure might only be of temporary nature (DB crash or the like). In additon to that is such a JSON payload proabably using a non-standardized format and thus requires out-of-band knowledge to actually process the message. While we humans are quite capable of figuring out what's going on, computers aren't yet that smart on their own.
I hope you can see that HTTP offers a lot of semantics on when to use what method or response code. They are there for a reason and therefore also should be used if the circumstances are right.
In GET request, 404 status is just a response code. You have to provide error message in body of the response in case when record is not found for the id provided.
For POST request, you can return 400 error code with specifying in the body which fields are missing/failing validation.
For url not found, User will always get the 404 error code.
For succcessful GET or POST request, you can return the response with 200 status
How am I supposed to differentiate if the endpoint does not exist on
the server or the resource does not exist?
The endpoint is the IRI (URI) of the web resource in this case. If the endpoint does not exist, then there is a good chance that the web resource does not exist either. It is an unlikely scenario, since you got your URIs from the server (HATEOAS), but it can happen if something changes between two requests, e.g. the URI template changes or somebody deletes the resource. In all of these cases the 404 is a fine HTTP status code. You can elaborate in the error message or use an additional error code, but for me it does not make sense, because the URI template change is a rare event. It would make the client more flexible though, since it could clear the cache and retry with a new link.

HTTP error code when server cannot find a user-given external resource

Our image board allows users to upload images by copy-pasting URLs. A client app sends a POST request to our API with an image URL given in the request body. Our web service receives the POST request and handles it by downloading the image from the given URL by using a server-side HTTP client (request in our case).
In successful case, the service finds the image, downloads it, and stores it to the server. The service returns HTTP 200 to the client.
Now, what if the image cannot be found? What if the download attempt results in HTTP 404? What HTTP error code should we use to response to the client?
HTTP 400 Bad Request is not applicable because the request was well-formed and all parameters were valid.
HTTP 404 Not Found is not applicable because the request URL was found and served although the image URL was not.
HTTP 502 Bad Gateway does not feel right either because there is nothing wrong with our server or the upstream server (the server of the source image). The user just happened to type in an image URL that does not exist.
Any experience on the matter? Which error code is the most correct?
First of all you should decide if this is a client error (4xx) or server error (5xx). From what you describe, it feels more like a client error. The client has requested the creation of a resource from another resource (the image URL) which does not exist.
There is no perfect match for this scenario, although one could make a case for each of the 2 following response codes:
HTTP 409 Conflict: From the RFC:
The request could not be completed due to a conflict with the current
state of the target resource. This code is used in situations where
the user might be able to resolve the conflict and resubmit the
request...
This applies to your case if you consider the target resource to be in a bad state (image not found). If someone provides an image at the specified URL, that effectively transitions your resource to a valid state.
This is also a good match because, as the RFC states, this code implies the user might be able to resolve the conflict (in your case the user would correct this by posting the image to the specified URL).
HTTP 424 Failed Dependency: From the RFC:
The 424 (Failed Dependency) status code means that the method could
not be performed on the resource because the requested action depended
on another action and that action failed...
This applies to your case in that "the requested action depended on another action and that action failed". The dependent action is the posting of an image to the other URL. What you have described is a case where that dependent action either failed or did not happen (which could also be called a failure).
Since the API determines on something that is not available, its service is unavailable as well.
The status code 503: Service Unavailable is the best fit for your situation.
According to the RFC description:
The server is currently unable to handle the request due to a temporary overloading or maintenance of the server. The implication is that this is a temporary condition which will be alleviated after some delay. If known, the length of the delay MAY be indicated in a Retry-After header. If no Retry-After is given, the client SHOULD handle the response as it would for a 500 response.
Alternatively, if your API supports a way of communicating errors (e.g. to tell the user that the information he submitted is incorrect) you may be able to use this method to tell the user that the external resource is unavailable. This might be a little friendlier and might avoid some error raises on the user's side.
Since the client app sends POST requests to your API server the response codes should be generated according to the received server in your case this is your API server.
If the server has received correct information from the client app and server determines the request as valid, it should return apropriate code with proper JSON or header based error messages.
http error codes were conceived assuming that all pages possibly served were stored locally, one way or another.
Your scenario does not match that assumption and it should therefore not come as a surprise that you don't find codes that fit your bill properly.
Your "not found" scenario is in fact an application error and you should notify your user of the situation by providing an error message on the form where he entered the URL (or return a fully dedicated error page or some such). Or choose an http error nonetheless and accept the notion that it will be a poor fit no matter what.
Now, what if the image cannot be found? What if the download attempt results in HTTP 404? What HTTP error code should we use to response to the client?
The main thing to keep in mind: you are trying to fool the client into thinking that you are a web site - just a dumb document store which might respond to some content editing messages.
For the client, the primary means of communication is the body of the response. See RFC 7231
Except when responding to a HEAD request, the server SHOULD send a representation containing an explanation of the error situation, and whether it is a temporary or permanent condition.
The status code is meta-data: aimed at giving the generic components participating in the exchange a chance to know what is going on (examples: the web browser doesn't need to know what page you are asking for to recognize a redirection response returned by the server, the web browser asking for credentials when it receives a 401 unauthorized response, web caches invalidating entries, or not, depending on the status code returned by the response).
HTTP 400 Bad Request is not applicable because the request was well-formed and all parameters were valid.
Yes, that's exactly right.
I would probably use 500 Internal Server Error, on the grounds that there's nothing wrong with the _document that the server received, the problems are all involved in the side effects of the server's implementation.
A different approach you might consider: 202 Accepted. Roughly translated "I got your message, I understood your message, and I'll get around to it later." If you don't need the side effects to be synchronous, you can defer judgment. That allows you to do things like applying a retry strategy.
The representation sent with this response ought to describe the request's current status and point to (or embed) a status monitor that can provide the user with an estimate of when the request will be fulfilled.
"I'll get to it later; if you want to know how it is going, go ask him -->"
Because 202 is a non-error status code, its effect on caches is different from those of a 4xx or 5xx. If you are already thinking ahead about caching, you'll want to the implications of that in mind.

Http Request not seen in Fiddler ends up with NoHttpResponseException

I am trying to send Http Requests from Android phone using Apache HttpClient to a server routing my requests via Fiddler. For certain requests, the DefaultHttpClient.execute throws a NoHttpResponseException, but this particular request is not seen in the fiddler at all.
The same thing happens if i direct my traffic without Fiddler but directly over wifi. The code for execute works fine generally.
Fixed this by handling the NoHttpResponseException and re-sending the request. When the same requests gets sent again, it goes through fine.
I would be interested to know the root cause of this issue - but for the time being this works for me.

How to handle correctly HTTP Digest Authentication on iPhone

I'm trying to upload a file onto my personal server.
I've written a small php page that works flawlessy so far.
The little weird thing is the fact that I generate all the body of the HTTP message I'm going to send (let's say that amounts to ~4 mb) and then I send the request to my server.
The server, then, asks for an HTTP challenge and my delegate connection:didReceiveAuthenticationChallenge:challenge replies to the server with the proper credentials and the data.
But, what's happened? The data has been sent twice!
In fact I've noticed that when I added the progressbar.. the apps sends the data (4mb), the server asks for authentication, the apps re-sends the data with the authentication (another 4mb). So, at the end, I've sent 8mb. That's wrong.
I started googling and searching for a solution but I can't figure out how to fix this.
The case scenarios are two (my guess):
Share the realm for the whole session (a minimal HTTP request, then challenge, then data)
Use the synchronized way to perform an HTTP connection (things that I do not want to do since it seems an ugly way to handle this kind of stuff to me)
Thank you
You've run into a flaw into the http protocol: you have to send all the data before getting the response with the auth challenge (when you send a request with no credentials). You can try doing a small round trip as the first request in the same session (as you've mentioned), like a HEAD request, then future requests will share the same nonce.
Too late to answer the original requester, but in time if somebody else read this.
TL;DR: Section 8.2.3 of RFC 2616 describes the 100 Continue status which is all what you need (were needing) in such a situation.
Also have a look at sections 10.1.1 and 14.20.
The client sends a request with an "Expect: 100-continue" header, pausing the request before sending the body. The server uses the already received headers to make its decision whether this request may be accepted or not (if the entity –the body– to be received is not too large, if the user's credentials are correct...). If the request is acceptable for the server, it replies with a "100 Continue" status code, the client sends the body and the server replies with the final status code for that request. To the contrary, if the request is not acceptable, the server replies with a 4xx status code ("413 Request Entity Too Large" if the provided body size is... too large, or a "401 Unauthorized" + the WWW-Authenticate: header) and the client does not send the body. Being answered with a 401 status code and the corresponding WWW-Authenticate: information, the client can now perform the request again and provides its credentials.