Need to preserve HTTP status code when using httpErrors custom 404 error using responseMode executeURL on Azure App Service IIS - perl

I want to have all missing content/"bad" URLs redirect to our custom 404.html error page.
This is important for accurately recording 404 errors in Google Analytics.
The issue is that when the responseMode=ExecuteURL flag is set, then the custom error does not preserve the 404 status code, but always shows a 200 code. I can change this to responseMode=Redirect, but this then shows a 302 status code, before redirecting to the custom 404.html page.
All of this DOES work with a "File" flag set on the httpError… just not with the "ExecuteURL" flag which is required for our server-side Perl includes used to present Header/Footer page elements.
Ideally, we should be able to use the Azure App Service IIS web.config to set a custom error to:
always preserve/show the requested (missing) URL request in address bar (and dev tools)
always preserve/show the "real" HTTP status code (404) in the dev tools
allow the use of server-side includes to update header/footer elements using our current Perl setup
The below code works to preserve the requested URL in address bar, correctly shows the custom 404.html page with server-side header/footer content, BUT loses the 404 status code in dev tools (and Google Analytics)...
<httpErrors>
<remove statusCode="404" subStatusCode="-1" />
<error statusCode="404"
responseMode="ExecuteURL"
path="/404.html" />
</httpErrors>
Changing to responseMode="Redirect" only changes the status code to 302 before redirecting to custom 404.html...
If I change to use responseMode="File" this all works fine, but I then lose the custom server-side header and footers which are handled with Perl server-side includes...
EDIT:
To be clear, the custom 404 page is all HTML and Javascript, but also leverages some very old Perl server-side includes to add custom header and footer elements to the page. We are not using any .NET framework or .NET core pages...
This arrangement should be possible, but perhaps only with a different web server, not IIS? nginx, perhaps?
FINAL UPDATE:
Not a full answer, but our near-term resolution was to use nginx proxy configuration (which was already present and could be altered in nginx.conf) to preserve 404 error codes and present the proper custom 404.html static file.
I was also able to do this with Docker and nginx, so I know it is possible for a web server to deal with this situation...
I've determined that AFAICT there is no way for IIS web.config to handle this without using server-side code as Jason Pan suggested. So while he may be correct, that answer was not helpful for our needs.

UPDATE
When you use httpErrors in web.config, it must have code in your project to handle 400 and 500 in server side.
Due to your project just static web app, and no sever side code. So I suggest you can use hard code in httpErrors.
Like,
<error statusCode="404" responseMode="ExecuteURL" path="/404.html?httpcode=404" />.
The tag of httpErrors is used in servers such as iis. The 404 and 500 errors you want can’t be displayed directly in the browser. Because the httpErrors tag is used, the server will process everything and return it to the browser, so the HttpStaus you get is always 200.
PRIVIOUS
I probably know your question. Your program is .net framework or .net core, it is not clear for the time being. But I see the configuration tags in the web.config file.
In principle, the wrong request arrives at IIS and other servers, and the code level has processed 404 and other errors, so the returned HttpStatus value must be 200. There is no way to change it, but when a 404 or 500 error occurs, we can Processing and recording in Application_Error can also achieve the purpose you want to analyze.
So I tested it and based on the .net framework, you can download my sample code on github and give the following suggestions.
1. Add the Application_Error method to the Global.asax.cs file.
2. According to the Application_Error method, add an Error Controller.
3. Perform the test and the result is shown in the screenshot.
After decode the message. The content is
The controller for path '/a/b' was not found or does not implement IController..
We can custom error msg in Application_Error function. We can append HttpCode and other info.

Hi i found this workaround(ASP NET WebForms)
Web Config
<httpErrors errorMode="Custom" existingResponse="Replace">
<remove statusCode="404" subStatusCode="-1" />
<error statusCode="404" path="/ErrorDefault.aspx?httpcode=404" responseMode="ExecuteURL" />
<remove statusCode="500" subStatusCode="-1" />
<error statusCode="500" path="/ErrorDefault.aspx?httpcode=500" responseMode="ExecuteURL" />
</httpErrors>
ErrorDefault.cs > Page Load
var httpCode = Request.QueryString["httpcode"];

Related

google oauth callback appending parameters multiple times

We have successfully using Google OAuth for years now, but it suddenly stopped working a few days ago. In looking into this, it appears that the after the user clicks "Allow" to grant access to the requested scope, Google is redirecting to our callback page (as it always has) but now the code and scope parameters are being appended to the URL multiple times (example below). Given querystring length limits on our web server, this is now throwing a 404.15 error.
Since we have made no recent code changes and have not made any updates in the Google API Console, I don't believe we have done anything to cause the parameters to be appended multiple times to the callback URL. Is this an issue with Google? Or am I missing something that may have caused this issue?
Example callback URL:
http://example.com/oauth/oauthcallback?code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://googleapis.com/auth/gmail.readonly&code=4/XADj4OhPIwWZRA5TsZMgOkMIfmuBVdQidarK_MhSmkpxWubmprbySMBnY4huJaYATwzf8B798OcHLfD-LdBBtfQ&scope=https://www.googleapis.com/auth/gmail.readonly
I have resolved this. In case this helps someone else, sometime between 9/12/2018 and 9/14/2018, Google started returning an additional parameter ("scope") in their OAuth callback (in addition to the only other parameter - "code" - that was previously being returned in the callback). The scope value included "https://www.googleapis.com" which was causing an issue with an existing URL rewrite rule on our end to strip "www" from our URL. The very generic syntax in our rewrite rule that simply looked for "www." was causing a redirect loop until a 404.15 was thrown. By making the rewrite rule specific to our URL, the scope parameter is ignored by the rewrite rule and the redirect loop is avoided.
Posting because this may help others. #fzebra's answer applied in my case but ALSO my auth library forwards all query parameters that the OAuth provider sends to my redirect_uri onto the requests it makes to retrieve the access_token. Because of this and because I think Google has a parsing bug, the new scope parameter blows up the request. Google responds with a 400 Bad Request and inspecting the JSON response, you get a redirect_uri_mismatch. My guess is they see their own scope URL parameter as the redirect URI and invalidate the request.
To solve this, I needed to chop the scope query parameter off the outgoing request to Google, so I did it via a URL rewrite rule.
<!-- See https://stackoverflow.com/questions/52372359/google-oauth-callback-appending-parameters-multiple-times -->
<rule name="Google Login - Remove scope parameter" stopProcessing="true">
<match url="google/redirect/url(.*)?$" />
<conditions trackAllCaptures="true">
<add input="{QUERY_STRING}" pattern="(.*)(&?scope=.+&?)(.*)" />
</conditions>
<action type="Rewrite" url="google/redirect/url?{C:1}{C:3}" appendQueryString="false" />
</rule>
This cuts the scope parameter and value out from the incoming query string and joins the two parts back together without it. Note the & is because this is XML, in plain regex the expression is just (.*)(&?scope=.+&?)(.*). It will leave a trailing & in some cases.
You should replace google/redirect/url with the path to your auth URL (that Google redirects to).
You could do this in application layer code but URL rewrite does not add an extra server request 👍
This fixed it finally. Jeez!

disable auto-generated error pages iisexpress

I'm creating a JSON web api. Certain parts of the api are limited to authenticated users only.
If you're not authenticated, the server is programmed to set the http status code to 403 and close the response.
http_response_code(403);
exit();
If that's the output my code produces, I don't expect iisexpress to jump in and serve an auto-generated html page. How can this behaviour be disabled?
I have tried adding following configuration to web.config
<httpErrors>
<remove statusCode="403" subStatusCode="-1" />
</httpErrors>
That doesn't work. Even if it did, only 4xx and 5xx status codes can be removed, and I need all auto-generated pages to be removed.
As suggested in a comment, I tried customerrors
<customErrors mode="On" defaultRedirect="index.php">
<error statusCode="403" redirect="index.php"/>
</customErrors>
I still get the default 403 page.
Also note that I'm not actually being redirected (no 'Location' headers are being set) to the default 403 page. It's just rendering a default html page in my response.

cflocation vs cfheader for 301 redirects

I am "renaming" an existing file for a project I am working on. To maintain backwards compatibility, I am leaving a cfm file in place to redirect the users to the new one.
buy.cfm: old
shop.cfm: new
In order to keep everything as clean as possible, I want to send the 301 statuscode response if a user tries to go to buy.cfm.
I know that I can use either cflocation with the statuscode attribute
<cflocation url="shop.cfm" statuscode="301" addtoken="false">
or I can use the cfheader tags.
<cfheader statuscode="301" statustext="Moved permanently">
<cfheader name="Location" value="http://www.mysite.com/shop.cfm">
Are there any reasons to use one method over the other?
I think they do the same thing, with <cflocation> being more readable
I tested this on ColdFusion 9.
There is one major difference, and it is that cflocation stops execution of the page and then redirects to the specified resource.
From the Adobe ColdFusion documentation:
Stops execution of the current page and opens a ColdFusion page or
HTML file.
So you would need to do this:
<cfheader statuscode="301" statustext="Moved permanently">
<cfheader name="Location" value="http://www.example.com/shop.cfm">
<cfabort>
to get the equivalent of this:
<cflocation url="shop.cfm" statuscode="301" addtoken="false">
Otherwise, you risk running into issues if other code runs after the cfheader tag. I came across this when fixing some code where redirects were inserted into an application.cfm file -- using cfheader -- without aborting the rest of the page processing.
I also noticed, in the response headers, that cflocation also sets the following headers accordingly:
Cache-Control: no-cache
Pragma: no-cache
One might want to add these headers in if using the cfheader tag with Location, if needed:
<cfheader name="Cache-Control" value="no-cache">
<cfheader name="Pragma" value="no-cache">
To elaborate on the Answer by Andy Tyrone, while they MAY do the same thing in certain circumstances, the CFHEADER method give you more control over the headers passed in the request. This becomes useful, for example, if you want to send cache control headers to a browser or content delivery network so that they do not keep hitting your server with the same old redirect request. There is no way (to my knowledge) to tell a CFLocation to cache the redirect.

URLRewriteFile and "#" char in URL string

I'm using the Google means of making my GWT app searchable (https://developers.google.com/webmasters/ajax-crawling/docs/getting-started), which works fine. Unfortunately, it seems Bing does not follow the same pattern/rule.
I thought I'd add a URL filter, based on user-agent to map all URL's of the form
http://www.example.com/#!blah=something
to
http://www.example.com/?_escaped_fragment_=blah=something
only for BingBot so that my CrawlerServet returned the same as the GoogleBot requests. I have a URLRewrite rule like:
<rule>
<condition name="user-agent">Firefox/8.0</condition>
<from use-query-string="true">^(.*)#!(.*)$</from>
<to type="redirect">?_escaped_fragment_=$2</to>
</rule>
(I'm using a user-agent of Firefox to test)
This never matches. If I change the rule to ^(.)!(.)$ and try and match on
http://www.example.com/!blah=something
it will work, but using the same rule
http://www.example.com/#!blah=something
will not work, because it seems the URL string the filter is using is truncated at the "#".
Can anyone tell me if it's possible to make this work.
The browser doesn't send the hash to the server, as you've discovered. Watching a given request, you'll see that it only sends along the url before the # symbol.
GET / HTTP/1.1
Host: example.com
...
From the link you mentioned:
Hash fragments are never (by specification) sent to the server as part of an HTTP request. In other words, the crawler needs some way to let your server know that it wants the content for the URL www.example.com/ajax.html#!key=value (as opposed to simply www.example.com/ajax.html).
From the descriptions in the text, it is the server's job to translate from the 'ugly' url to a pretty one (with a hash), and to send back a snapshot of what that page might look like if loaded with a hash on the client. That page may have other links using hashes to load other documents - the crawler will automatically translate those back to ugly urls, and request more data from the server.
So in short, this is not an change you should need to make, the GoogleBot will make it automatically, provided you have opted into using hash fragments. As for other bots, apparently Bing now supports this idea as well, but that appears to be outside the scope of your question.

Force the browser to send some HTTP request header

I need to include some secure (BASIC authentication) application.
when I open the application URL in the browser, the browser asks me to enter your credentials ...
what I know is that:
The browser ask the server to get
some URL -- the url of the app
The server checks the request header
for the Authentication header and
didn't find it
The server sends 401 to the
browser back
The browser interpret this response
code into a message dialog that
shows to me asking me to enter the
username/password to send back to
the server in the Authentication
request header
So far... so good, I can write some page (in JSP) that send this required http request header to the request that is calling this page..
So I'll call this application through my page..
The problem here is, this application (in fact a GWT application) contains a reference to some Javascript and CSS files that is coming from the server that hosts this application. the application page that I import looks like:
<html>
<link href="http://application_host/cssfile.css" />
<link href="http://application_host/javascriptfile.js" />
.....
</html>
So, again I found the application asks me for the authentication crenditals for the css and js files!
I am thinking of many solutions but don't know the applicability of each
One solution is to ask the browser
(via Javascript) to send the request
header (Authentication) when he
asks the server for the js and css
files
please give me your opinions about that... and any other suggestions will be very welcomed.
Thanks.
I think you're running into some weirdness with how your server is configured. Authentication happens in context of a authentication realm. Your assets should either be in the same authentication realm as your page, or (more likely) should not require authentication at all. The browser should be caching credentials for the given realm, and not prompt for them again.
See the protocol example on http://en.wikipedia.org/wiki/Basic_access_authentication
Judging from your story, something tells me your problem is with the authentication method itsef. Not how to implement it. Why do you want to bother with the request header so much?
As far as i know, you can configure your container (ie Tomcat) to force http authentication for certain urls. Your container will make sure that authentication has taken place. No need to set http headers yourself whatsoever.
Perhaps you can explain a bit better what you are trying to achieve, instead of telling implementation details?
Why css & js files are kept in protected area of server? You need to place files into public area of your server. If you don't have public area, so you nead to prpvide for it. how to do it depends from serverside software architecture & configuration.