Wikipedia url stopped after 10 redirects error GoLang

Wikipedia url stopped after 10 redirects error GoLang - redirect

Upon executing an HTTP Get request, I receive the following error:
2015/08/30 16:42:09 Get https://en.wikipedia.org/wiki/List_of_S%26P_500_companies:
stopped after 10 redirects
In the following code:
package main
import (
"net/http"
"log"
)
func main() {
response, err := http.Get("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")
if err != nil {
log.Fatal(err)
}
}
I know that according to the documentation,
// Get issues a GET to the specified URL. If the response is one of
// the following redirect codes, Get follows the redirect, up to a
// maximum of 10 redirects:
//
// 301 (Moved Permanently)
// 302 (Found)
// 303 (See Other)
// 307 (Temporary Redirect)
//
// An error is returned if there were too many redirects or if there
// was an HTTP protocol error. A non-2xx response doesn't cause an
// error.
I was hoping that somebody knows what the solution would be in this case. It seems rather odd that this simple url results in more than ten redirects. Makes me think that there may be more going on behind the scenes.
Thank you.

As others have pointed out, you should first give thought to why you are encountering so many HTTP redirects. Go's default policy of stopping at 10 redirects is reasonable. More than 10 redirects could mean you are in a redirect loop. That could be caused outside your code. It could be induced by something about your network configuration, proxy servers between you and the website, etc.
That said, if you really do need to change the default policy, you do not need to resort to editing the net/http source as someone suggested.
To change the default handling of redirects you will need to create a Client and set CheckRedirect.
For your reference:
http://golang.org/pkg/net/http/#Client
// If CheckRedirect is nil, the Client uses its default policy,
// which is to stop after 10 consecutive requests.
CheckRedirect func(req *Request, via []*Request) error

I had this issue with Wikipedia URLs containing %26 because they redirect to a version of the URL with & which Go then encodes to %26 which Wikipedia redirects to & and ...
Oddly, removing gcc-go (v1.4) from my Arch box and replacing it with go (v1.5) has fixed the problem.
I'm guessing this can be put down to the changes in net/http between v1.4 and v1.5 then.

Related

I need simple proxy between 2 rest APIs

My code is working ok for GET/POST/PUT to/from restApi1 and restApi2.
However, my problem I need to implement HEAD/OPTIONS (no body!) and GET uri1
HEAD/OPTIONS could return 204 or 200 depends on a process status. I am getting error "Stream closed". Sounds like Camel want body bytes, but I don't indend to have it. Even I set ExchangePattern.InOnly or optional etc error occur...
What is correct way to see responses and handle requests WITHOUT body, just statuses exchange?
How to see response from restApi2 on Camel rest("/restApi1").head().route().routeId("id1")
.to("direct:restApi2").routeId("/id1").setHeader(Exchange.HTTP_METHOD,constant("HEAD"))
setExchanggePattern(ExchangePattern.OutOptionalIn).recepientList(simple(restApi2));

I figured it out. Need to set '.convertBodyTo(String.class)' even I don't have a body.

Route SockJS connection at variable URL?

Let's say I have a bunch of clients who all have their own numeric IDs. Each of them connect to my server through SockJS, with something like:
var sock = new SockJS("localhost:8080/sock/100");
In this case, 100 is that client's numeric ID, but it could be any number with any number of digits. How can I set up a SockJS router in my server-side code that allows for the client to set up a SockJS connection through a URL that varies based on what the user's ID is? Here's a simplified version of what I have on the server-side right now:
public void start() {
HttpServer server = vertx.createHttpServer();
SockJSHandler sockHandler = SockJSHandler.create(vertx);
router.route("/sock/*").handler(sockHandler);
server.requestHandler(router::accept).listen(8080);
}
This works fine if the client connects through localhost:8080/sock, but it doesn't seem to work if I add "/100" to the end of the URL. Instead of getting the default "Welcome to SockJS!" message, I just get "Not Found." I tried setting a path regex and I got an error saying that sub-routers can't use pattern URLs. So is there some way to allow for the client to connect through a variable URL, whether it's /sock/100, /sock/15, or /sock/1123123?
Ideally, I'd be able to capture the numeric ID that the client uses (like with routing REST API calls, when you could add "/:ID" to the routing path and then capture the value that the client uses), but I can't find anything that works for SockJS connections.
Since it seems that SockJS connections are considered to be the same as sub-routers, and sub-routers can't have pattern URLs, is there some work-around for this? Or is it not possible?
Edit
Just to add to what I said above, I've tried a couple different things which haven't seemed to work yet.
I tried setting up an initial, generic main router, which then re-directs to the SockJS handler. Here's the idea I had:
router.routeWithRegex("/sock/\\d+").handler(context -> {
context.reroute("/final");
});
router.route("/final").handler(SockJSHandler.create(vertx));
With this, if I access localhost:8080/sock/100 directly through the browser, it takes me to the "Welcome to SockJS!" page, and the Chrome network tab shows that a websocket connection has been created when I test it through my client.
However, I still get an error because the websocket shows a 200 status code rather than 101, and I'm not 100% sure as to why that is happening, but I would guess that it has to do with the response that the initial handler produces. If I try to set the initial handler's status code to 101, I still get an error, because then the initial handler fails.
If there's some way to work around these status codes (it seems like the websocket is expecting 101 but the initial handler is expecting 200, and I think I can only pick one), then that could potentially solve this. Any ideas?

Play 2.3 WS withFollowRedirects(true) does not work

The following code should post a form to an endpoint (which returns 302) and, after following the redirect, parse the url of the page and return some information from there.
val start = System.currentTimeMillis()
val requestHolder = WS.url(conf("login.url"))
.withRequestTimeout(loginRequestTimeOut)
.withFollowRedirects(true) //This appears to have no effect...
requestHolder.post(getMap(username, password))
.map(resp =>{
Logger.debug(resp.status.toString)
val loginResponse = getResponse(resp)
val end = System.currentTimeMillis()
Logger.debug("Login for the user: "+username+", request took: " + (end - start) + " milliseconds.")
loginResponse
})
The problem is that .withFollowRedirects(true) appears to have no effect on the query. The status of the response is 302 and the request does not follow the redirect.
I've gone through the process manually using httpie and following the redirects does lead to the correct page.
Any help or insight would be much appreciated.

POST redirection isn't as well supported as GET redirection. W3 specification says:
If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.
Some browsers don't do that, and just ignore. Have a look also at the 307 status:
307 Temporary Redirect (since HTTP/1.1)
In this case, the request should be repeated with another URI; however, future requests should still use the original URI. In contrast to how 302 was historically implemented, the request method is not allowed to be changed when reissuing the original request. For instance, a POST request should be repeated using another POST request.
There is also a discussion about this on Programmer Stack Exchange.

I've had a lot of trouble with withFollowRedirects and POST.
At some point, while fighting to make things work, I had .withFollowRedirects(false) in my code, then removed it during cleanups & things broke. My current guess is that if this option is not explicitly made false, the default behavior is to follow redirects (302 in my case) with some faulty mechanism. Perhaps the default mechanism uses POST again with same arguments. But in my case, interacting with Google App Script (GAS), one needs to use GET to retrieve JSON output of a POST.
Whatever the mechanism was doing, I was getting 400 with no further diagnostics.
After wasting hours, I realized that .withFollowRedirects(false) was in fact truly needed: it disabled Play's messing with redirects, I was able to see the 302 response & handle the following GET manually with success.

Is it correct to return 404 when a REST resource is not found?

Let's say I have a simple (Jersey) REST resource as follows:
#Path("/foos")
public class MyRestlet extends BaseRestlet
{
#GET
#Path("/{fooId}")
#Produces(MediaType.APPLICATION_XML)
public Response getFoo(#PathParam("fooId") final String fooId)
throws IOException, ParseException
{
final Foo foo = fooService.getFoo(fooId);
if (foo != null)
{
return response.status(Response.Status.OK).entity(foo).build();
}
else
{
return Response.status(Response.Status.NOT_FOUND).build();
}
}
}
Based on the code above, is it correct to return a NOT_FOUND status (404), or should I be returning 204, or some other more appropriate code?

A 404 response in this case is pretty typical and easy for API users to consume.
One problem is that it is difficult for a client to tell if they got a 404 due to the particular entity not being found, or due to a structural problem in the URI. In your example, /foos/5 might return 404 because the foo with id=5 does not exist. However, /food/1 would return 404 even if foo with id=1 exists (because foos is misspelled). In other words, 404 means either a badly constructed URI or a reference to a non-existent resource.
Another problem arises when you have a URI that references multiple resources. With a simple 404 response, the client has no idea which of the referenced resources was not found.
Both of these problems can be partially mitigated by returning additional information in the response body to let the caller know exactly what was not found.

Yes, it is pretty common to return 404 for a resource not being found. Just like a web page, when it's not found, you get a 404. It's not just REST, but an HTTP standard.
Every resource should have a URL location. URLs don't need to be static, they can be templated. So it's possible for the actual requested URL to not have a resource. It is the server's duty to break down the URL from the template to look for the resource. If they resource doesn't exist, then it's "Not Found"
Here's from the HTTP 1.1 spec
404 Not Found
The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.
Here's for 204
204 No Content
The server has fulfilled the request but does not need to return an entity-body, and might want to return updated metainformation. The response MAY include new or updated metainformation in the form of entity-headers, which if present SHOULD be associated with the requested variant.
If the client is a user agent, it SHOULD NOT change its document view from that which caused the request to be sent. This response is primarily intended to allow input for actions to take place without causing a change to the user agent's active document view, although any new or updated metainformation SHOULD be applied to the document currently in the user agent's active view.
The 204 response MUST NOT include a message-body, and thus is always terminated by the first empty line after the header fields.
Normally 204 would be used when a representation has been updated or created and there's no need to send an response body back. In the case of a POST, you could send back just the Location of the newly created resource. Something like
#POST
#Path("/something")
#Consumes(...)
public Response createBuzz(Domain domain, #Context UriInfo uriInfo) {
int domainId = // create domain and get created id
UriBuilder builder = uriInfo.getAbsolutePathBuilder();
builder.path(Integer.toString(domainId)); // concatenate the id.
return Response.created(builder.build()).build();
}
The created(URI) will send back the response with the newly created URI in the Location header.
Adding to the first part. You just need to keep in mind that every request from a client is a request to access a resource, whether it's just to GET it, or update with PUT. And a resource can be anything on the server. If the resource doesn't exist, then a general response would be to tell the client we can't find that resource.
To expand on your example. Let's say FooService accsses the DB. Each row in the database can be considered a resource. And each of those rows (resources) has a unique URL, like foo/db/1 might locate a row with a primary key 1. If the id can't be found, then that resource is "Not Found"

Though this question already have an accepted answer, I believe it's really an opinionated thing. Adding my two cents to help you make a more informed decision about the response code.
404 - Not Found. (Reference)
The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.
The resource may exist and you may not have permission to see the resource, will also be equivalent of Not Found. So 404 for a call where data doesn't exist is a very apt thing to do.
Now as for a non-existing URL; though 404 is a widely adapted response code 400 is a more appropriate code.
400 - Bad Request (Reference)
The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).
If you put an invalid parameter in the request, what would be the response code?
If query param has a typo, what should be response code?
Answer to both is 400.
Most of the file-servers, return 404 for invalid URL because for an invalid URL they try to look for a file, which they can't find on the storage ~= Resource Not Found
Apart from the HTTP Status Code, the response will have some info about the error details, where one can be more descriptive about the error and can clear the ambiguity.
If client is calling with an invalid URL, it's an integration issue and should be caught at least during the sanity. No-way they will push the code to production without testing and catching this. Even if they do, God bless them!
tl;dr - 404 for not-found resource; 400 for not-found URL.

A 4XX error code means error from the client side.
As you request a static resource as an image or a html page, returning a 404 response makes sense as :
The HTTP 404 Not Found client error response code indicates that the
server can't find the requested resource. Links which lead to a 404
page are often called broken or dead links, and can be subject to link
rot.
As you provide to clients some REST methods, you rely on the HTTP methods but you should not consider REST services as simple resources.
For clients, an error response in the REST method is often handled close to errors of other processings.
For example, to catch errors during REST invocations or somewhere else, clients could use catchError() of RxJS.
We could write a code (in TypeScript/Angular 2 for the sample code) in this way to delegate the error processing to a function :
return this.http
.get<Foo>("/api/foos")
.pipe(
catchError(this.handleError)
)
.map(foo => {...})
The problem is that any HTTP error (5XX or 4XXX) will terminate in the catchError() callback.
It may really make the REST API responses misleading for clients.
If we do a parallel with programming language, we could consider 5XX/4XX as exception flow.
Generally, we don't throw an exception only because a data is not found, we throw it as a data is not found and that that data would have been found.
For the REST API, we should follow the same logic.
If the entity may not be found, returning OK in the two cases is perfectly fine :
#GET
#Path("/{fooId}")
#Produces(MediaType.APPLICATION_XML)
public Response getFoo(#PathParam("fooId") final String fooId)
throws IOException, ParseException {
final Foo foo = fooService.getFoo(fooId);
if (foo != null){
return Response.status(Response.Status.OK).entity(foo).build();
}
return Response.status(Response.Status.OK).build();
}
The client could so handle the result according to the result is present or missing.
I don't think that returning 204 brings any useful value.
The HTTP 204 documentation states that :
The client doesn't need to go away from its current page.
But requesting a REST resource and more particularly by a GET method doesn't mean that the client is about terminating a workflow (that makes more sense with POST/PUT methods).
The document adds also :
The common use case is to return 204 as a result of a PUT request,
updating a resource, without changing the current content of the page
displayed to the user.
We are really not in this case.
Some specific HTTP codes for classical browsing matche finely with return codes of REST API (201, 202, 401, and so for...) but this is not always the case.
So for these cases, rather than twisting original codes, I would favor to keep them simple by using more general codes : 200, 400.

Picking HTTP status codes for errors from REST-ful services

When a client invokes my REST-ful service, it needs to know if the response came back was 'from me' or rather a diagnosis from the containing web server that something awful happened.
One theory is that, if my code is called, it should always return an HTTP OK(=200), and any errors I've got to return should be just represented in the data I return. After all, it's my code that gets the response, not the naked browser.
Somewhat self-evidently, if I'm using REST to generate HTML read directly by a browser, I absolutely must return an error code if there's an error. In the case I care about, it's always Javascript or Java that is interpreting the entrails of the response.
Another possibility is that there is some family of HTTP status codes that I could return with a high confidence that it/they would never be generated by a problem in the surrounding container. Is this the case?

I use the following:
GET
200 OK
400 Bad Request (when input criteria not correct)
POST
202 Accepted (returned by authorization method)
401 Unauthorized (also returned by authorization)
201 Created (when creating a new resource; I also set the location header)
400 Bad Request (when data for creating new entity is invalid or transaction rollback)
PUT
Same as POST
201 Ok
400 Bad Request
DELETE
200 OK
404 Not Found (same as GET)

I would not know how to avoid that some container returns codes like 404.
4xx codes are meant to handle client errors along with possibly some entity that describes the problem in detail (and thus would mean a combination of both of your mentioned approaches). Since REST relies on HTTP and the according semantics of status as well as methods, always returning 200 in any possible case is a violation of this principle in my opinion.
If you for instance have a request such as http://foo.com/bar/123 which represents a bar ressource with id=123 and you return 200 with some content, the client has no chance to figure out if this was the intended response or some sort of error that occured. Therefore one should try to map error conditions to status codes as discussed in REST: Mapping application errors to HTTP Status codes for example.