google oauth callback appending parameters multiple times - callback

We have successfully using Google OAuth for years now, but it suddenly stopped working a few days ago. In looking into this, it appears that the after the user clicks "Allow" to grant access to the requested scope, Google is redirecting to our callback page (as it always has) but now the code and scope parameters are being appended to the URL multiple times (example below). Given querystring length limits on our web server, this is now throwing a 404.15 error.
Since we have made no recent code changes and have not made any updates in the Google API Console, I don't believe we have done anything to cause the parameters to be appended multiple times to the callback URL. Is this an issue with Google? Or am I missing something that may have caused this issue?
Example callback URL:
http://example.com/oauth/oauthcallback?code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://www.googleapis.com/auth/gmail.readonly

I have resolved this. In case this helps someone else, sometime between 9/12/2018 and 9/14/2018, Google started returning an additional parameter ("scope") in their OAuth callback (in addition to the only other parameter - "code" - that was previously being returned in the callback). The scope value included "https://www.googleapis.com" which was causing an issue with an existing URL rewrite rule on our end to strip "www" from our URL. The very generic syntax in our rewrite rule that simply looked for "www." was causing a redirect loop until a 404.15 was thrown. By making the rewrite rule specific to our URL, the scope parameter is ignored by the rewrite rule and the redirect loop is avoided.

Posting because this may help others. #fzebra's answer applied in my case but ALSO my auth library forwards all query parameters that the OAuth provider sends to my redirect_uri onto the requests it makes to retrieve the access_token. Because of this and because I think Google has a parsing bug, the new scope parameter blows up the request. Google responds with a 400 Bad Request and inspecting the JSON response, you get a redirect_uri_mismatch. My guess is they see their own scope URL parameter as the redirect URI and invalidate the request.
To solve this, I needed to chop the scope query parameter off the outgoing request to Google, so I did it via a URL rewrite rule.
<!-- See https://stackoverflow.com/questions/52372359/google-oauth-callback-appending-parameters-multiple-times -->
<rule name="Google Login - Remove scope parameter" stopProcessing="true">
<match url="google/redirect/url(.*)?$" />
<conditions trackAllCaptures="true">
<add input="{QUERY_STRING}" pattern="(.*)(&?scope=.+&?)(.*)" />
</conditions>
<action type="Rewrite" url="google/redirect/url?{C:1}{C:3}" appendQueryString="false" />
</rule>
This cuts the scope parameter and value out from the incoming query string and joins the two parts back together without it. Note the & is because this is XML, in plain regex the expression is just (.*)(&?scope=.+&?)(.*). It will leave a trailing & in some cases.
You should replace google/redirect/url with the path to your auth URL (that Google redirects to).
You could do this in application layer code but URL rewrite does not add an extra server request 👍
This fixed it finally. Jeez!

Related

Where do I place a 301 redirect when using ColdFusion?

I found this code for 301 redirects in ColdFusion:
<cfheader statuscode="301" statustext="Moved Permanently">
<cfheader name="Location" value="[the URL to be redirected to]">
<cfabort>
What file do I place this code in? Is it the "missing page" that is now supposed to be giving a 301 error when someone lands on it? Or is there a file that's similar to .htaccess that I should put it in?
First of all: 3xx status codes are not errors but redirects.
Your code snippet isn't wrong, but ColdFusion has a more comfy way to do these 3 lines with a single statement:
<cflocation url="[the URL to be redirected to]" statusCode="301">
You can put this tag anywhere in your .cfm template. ColdFusion executes everything up to this point and then stops execution, sets the response header accordingly, discards the output buffer (because 3xx are not supposed to contain a body) and transmits the response (header with location reference).
Note: Your code snippet would include content in the response body (e.g. everything you put in <cfoutput> tags), which is usually not desired. So I strongly recommend to use the cflocation tag for common redirects. It'll also protect you from forgetting to place <cfabort> after it.
For a common scenario like "redirect visitor from a no longer existing page to a new page", you can simply do this:
no_longer_existing_page.cfm
<cflocation url="the_new_page.cfm" statusCode="301" addToken="false">
the_new_page.cfm
<cfoutput>Hello World !!</cfoutput>
Requests to both pages will now point to the_new_page.cfm and return Hello World !!. (This is a redirect, not a rewrite, so the address in the browser will change to the_new_page.cfm in both cases.)

cflocation vs cfheader for 301 redirects

I am "renaming" an existing file for a project I am working on. To maintain backwards compatibility, I am leaving a cfm file in place to redirect the users to the new one.
buy.cfm: old
shop.cfm: new
In order to keep everything as clean as possible, I want to send the 301 statuscode response if a user tries to go to buy.cfm.
I know that I can use either cflocation with the statuscode attribute
<cflocation url="shop.cfm" statuscode="301" addtoken="false">
or I can use the cfheader tags.
<cfheader statuscode="301" statustext="Moved permanently">
<cfheader name="Location" value="http://www.mysite.com/shop.cfm">
Are there any reasons to use one method over the other?
I think they do the same thing, with <cflocation> being more readable
I tested this on ColdFusion 9.
There is one major difference, and it is that cflocation stops execution of the page and then redirects to the specified resource.
From the Adobe ColdFusion documentation:
Stops execution of the current page and opens a ColdFusion page or
HTML file.
So you would need to do this:
<cfheader statuscode="301" statustext="Moved permanently">
<cfheader name="Location" value="http://www.example.com/shop.cfm">
<cfabort>
to get the equivalent of this:
<cflocation url="shop.cfm" statuscode="301" addtoken="false">
Otherwise, you risk running into issues if other code runs after the cfheader tag. I came across this when fixing some code where redirects were inserted into an application.cfm file -- using cfheader -- without aborting the rest of the page processing.
I also noticed, in the response headers, that cflocation also sets the following headers accordingly:
Cache-Control: no-cache
Pragma: no-cache
One might want to add these headers in if using the cfheader tag with Location, if needed:
<cfheader name="Cache-Control" value="no-cache">
<cfheader name="Pragma" value="no-cache">
To elaborate on the Answer by Andy Tyrone, while they MAY do the same thing in certain circumstances, the CFHEADER method give you more control over the headers passed in the request. This becomes useful, for example, if you want to send cache control headers to a browser or content delivery network so that they do not keep hitting your server with the same old redirect request. There is no way (to my knowledge) to tell a CFLocation to cache the redirect.

Handling HTTP 302 error and redirecting in Backbone.JS "sync" method

I've got a secured Backbone.js app (that uses Spring security atm.), so a logged-in user must have a valid session-cookie (JSESSIONID). Now, if this session is invalidated (deleted, expired, whatever) and the user attempts to make a request, Spring security will return a 302 Error as an attempt to redirect the user to a login-form.
As is explained in this answer, this 302 response gets handled by the browser (it doesn't reach my app) so what is returned to my app is a 200 OK response with contenttype="text/html" (containing the login form).
Thats an issue, because when my Backbone model attempts to do a sync to a url, it expects JSON. If this sync happens without a valid session, the 200 "text/html" response is returned when "application/json" is expected, giving me a JSON parse error in jQuery.extend.parseJSON.
With great help from this question/answer, I've overridden the Backbone.sync method in order to use my own error handling. However, since the 302 never reaches my error handler I cannot override the redirect myself.
My situation is very similar to this question, however a final solution to the problem was never posted. Could someone please help me figure out the ideal way to ensure a redirect to the login page happens?
Instead of returning the login page with HTTP 200 OK, you should configure Spring Security to return HTTP 401 Unauthorized for unauthenticated AJAX requests. You can detect an AJAX request (as opposed to a normal page request) by checking for the X-Requested-With: XMLHttpRequest request header.
You can use the global $.ajaxError handler to check for 401 errors and redirect to the login page there.
This is how we've implemented it and it works nicely. I'm not a Spring guy, though, so I can't really help with the Spring Security configuration.
EDIT. Instead of custom coockie it will be better to use solution provided by #fencliff.
I think you can use some other field of XHR to detect this situation. A special coockie may do the trick.
You can define your own authentication failure handler from Spring Security side. At the moment when redirect to login page occurs you will be able to add some coockie to HttpServletResponse. Your custom Backbone.sync method will check this cookie. If it is present, it will launch your custom handler for this case (do not forget remove the coockie at the same time).
<sec:http ... >
<sec:form-login login-page='/login.html' authentication-failure-handler-ref="customAuthenticationFailureHandler" />
</sec:http>
<bean id="customAuthenticationFailureHandler" class="com.domain.CustomAuthenticationFailureHandler" />
CustomAuthenticationFailureHandler must implement org.springframework.security.web.authentication.AuthenticationFailureHandler interface. You can add your coockie and then call default SimpleUrlAuthenticationFailureHandler.onAuthenticationFailure(...) implementation.

URLRewriteFile and "#" char in URL string

I'm using the Google means of making my GWT app searchable (https://developers.google.com/webmasters/ajax-crawling/docs/getting-started), which works fine. Unfortunately, it seems Bing does not follow the same pattern/rule.
I thought I'd add a URL filter, based on user-agent to map all URL's of the form
http://www.example.com/#!blah=something
to
http://www.example.com/?_escaped_fragment_=blah=something
only for BingBot so that my CrawlerServet returned the same as the GoogleBot requests. I have a URLRewrite rule like:
<rule>
<condition name="user-agent">Firefox/8.0</condition>
<from use-query-string="true">^(.*)#!(.*)$</from>
<to type="redirect">?_escaped_fragment_=$2</to>
</rule>
(I'm using a user-agent of Firefox to test)
This never matches. If I change the rule to ^(.)!(.)$ and try and match on
http://www.example.com/!blah=something
it will work, but using the same rule
http://www.example.com/#!blah=something
will not work, because it seems the URL string the filter is using is truncated at the "#".
Can anyone tell me if it's possible to make this work.
The browser doesn't send the hash to the server, as you've discovered. Watching a given request, you'll see that it only sends along the url before the # symbol.
GET / HTTP/1.1
Host: example.com
...
From the link you mentioned:
Hash fragments are never (by specification) sent to the server as part of an HTTP request. In other words, the crawler needs some way to let your server know that it wants the content for the URL www.example.com/ajax.html#!key=value (as opposed to simply www.example.com/ajax.html).
From the descriptions in the text, it is the server's job to translate from the 'ugly' url to a pretty one (with a hash), and to send back a snapshot of what that page might look like if loaded with a hash on the client. That page may have other links using hashes to load other documents - the crawler will automatically translate those back to ugly urls, and request more data from the server.
So in short, this is not an change you should need to make, the GoogleBot will make it automatically, provided you have opted into using hash fragments. As for other bots, apparently Bing now supports this idea as well, but that appears to be outside the scope of your question.

How to disallow access to an url called without parameters with robots.txt

I would like to deny web robots to access a url like this:
http://www.example.com/export
allowing this kind of url instead:
http://www.example.com/export?foo=value1
A spider bot is calling /export without query string causing a lot of errors on my log.
Is there a way to manage this filter on robots.txt?
I am assuming you have problems with bots hitting the first URL in your example.
As said in the comment, this is probably not possible, because http://www.example.com/export is the resource's base URL. Even if it were possible as per the standard, I wouldn't trust bots to understand this properly.
I would also not send a 401 Access denied or similar header if the URL is called without a query string for the same reason: A bot could think that the resource is out of bounds entirely.
What I would do in your situation is, if somebody arrives at
http://www.example.com/export
send a 301 Moved permanently redirect to the same URL and a query string with some default values, like
http://www.example.com/export?foo=0
this should keep the search engine index clean. (It won't fix the logging problem you state in your comment, though.)