How do you make owasp zap crawl subdomains? - owasp

In the spider window it says they are out of scope.

Include them In Scope by setting them as part of the Context that you'll specify as the starting point for the Spider or Ajax Spider.
Note: Contexts are defined with regex expressions so inclusion of sub-domains is easy.
Alternatively you could also proxy traffic, make manual requests, or spider those domains as starting points.

Related

AEM URL Rewriting

I can see broadly two approaches for URL rewriting in AEM:
Sling mapping (sling:Mapping) under /etc/map/http(s)
URL rewriting using link rewriter/TransformerFactory.
I want to know which one is better among two - in sense of ease of implementation, scalability, maintenance & automation.
Regards.
there are always multiple options to a problem in Sling. If you look at the topic "URL rewriting" it has two dimensions:
outbound - e.g. shorten links /content/path/en/about.html to /en/about/
inbound - e.g. map an inbound request from /en/about/ to a resoure request /content/path/en/about.html
Outbound:
URL rewriting is usually be done outbound by a LinkRewriter/TransformerFactory.
In theory, you could also change your component to render differently or change your content. But that's not recommended.
To apply a Transformer you can use
/etc/map mapping (recommended), referred to as Mapping Map Entries [1]
enhanced mapping allowing for complex rules, also for regex-based rules
allows for different mapping per domain or protocol
can ensure complete externalization of links
ResourceResolver Map Entries [1]
traditional mapping, very simple rules only
Does not take domain or protocol into account
requires resolver restart on change (can be expensive for large production environments)
Custom TransformerFactory
full power to change all links on the way out based on Sax+custom rules
Inbound:
Your inbound requests can be rewritten or mapped on Sling or at infrastructure levels before (Apache HTTPD mod_rewrite or CDN s.a. Akamai)
Apache HTTPD mod_rewrite (recommended for production) - modify the request before it gets forwarded to the Dispatcher module. Recommended as it allows for enhanced security as well as for proper and simple caching and de-caching rules
Sling - usually not for production, as caching might become difficult
/etc/map
ResourceResolver
RequestFilter [2]
NonExistingResource servlet
CDN: same as mod_rewrite. Inbound manipulation before the request reaches the Dispatcher
HTH
[1]
https://docs.adobe.com/docs/en/aem/6-2/deploy/configuring/resource-mapping.html
[2]
https://sling.apache.org/documentation/the-sling-engine/filters.html
[3]
https://sling.apache.org/documentation/the-sling-engine/mappings-for-resource-resolution.html
This depends on which rewriting are you referring to. Inbound or Outbound.
When it comes to Inbound rewriting I'd advise to use mod_rewrite and just properly rewrite your content there with a single rule - this is quite efficient.
When it comes to Outbound rewriting (handling links in your html) you should definitely go with Sling Mappings - as they are more efficient and clear - and they are designed just for this purpose.
Take a look at this blog which explains the whole rewriting journey: https://www.cognifide.com/our-blogs/cq/multidomain-cq-mappings-and-apache-configuration/

Does the OWASP ASVS standard forbid the use of non-standard HTTP methods?

In owasp 2014 (https://www.owasp.org/images/5/58/OWASP_ASVS_Version_2.pdf) we have:
V 11.2 (page 31): Verify that the application accepts only a defined
set of HTTP request methods, such as GET and POST and unused methods
are explicitly blocked.
Does it mean we cannot use non-standard HTTP methods? If yes, can we say that WebDAV doesn't conform to OWASP ASVS standard? If the answer is no, is there any formal document, blog post or a FAQ for this?
The way I read this is that as long as you define which request methods you accept and block everything else you can use any method you want.
only a defined set
is not the same as you cannot use none standard, it say that for instance if you are not using POST you should explicitly block POST
such as GET and POST
here GET and POST are examples of methods, not a complete list of available methods.
So use the methods that fits with your needs, but verify that the application do not accept any request not in the list of acceptable requests
The quick answer is NO! I asked Andrew van der Stock the Owasp ASVS project leader. This is my question:
Dear Owasp Asvs project leaders (Daniel & Vanderaj),
I want to know if OWASP ASVS 2014 Level 1 force us to use just
standardized Http Methods(GET,HEAD,POST,PUT,
DELETE,CONNECT,OPTIONS,TRACE) or we can use non-standardized Http
methods too? (by listing them in a document like what
WebDav(https://en.wikipedia. org/wiki/WebDAV) did)
With Respect
And he replied me:
I think the primary driver is not to worry about which methods are
available, but if they are necessary and safely configured.
Essentially, we are asking for: All methods are denied by default,
except for: A positive set of allowed methods, AND all these methods
are correctly and securely configured
For example, OPTIONS and HEAD are required by Chrome doing pre-flight
CORS checks on AngularJS and other apps, and many apps require PUT and
DELETE. Therefore these methods are necessary. If you use a new
method, such as "EXAMPLE", the idea is that you don't also accept any
other words, such as "RIDICULOUS", and "EXAMPLE" is correctly
configured to be safe.
So if WebDAV is also enabled for whatever reason, it's important to
make sure that it is properly secured. There may be a solid reason for
it to exist (e.g. SharePoint), but to allow anonymous users to
overwrite your site or change things is not okay.
thanks, Andrew

Scraping WebObjects website & REST

I need to programmatically interact with a WebObjects website and extract data from the responses. The particular WebObjects site I am scraping uses component actions and stores sessions in cookies (not urls). This means that all urls look something like this:
http://example.com/WOApp/WebObjects/WOApp.woa/wo/7.0.0.0.29.1.1.1
My first questions are:
Does urls like this not completely destroy local and shared caching opportunities (cachable constraint in REST)? I imaging the only effective caching with such urls is the WebObjects server itself.
Isn't addressability broken as well? Each resource does have a unique endpoint, but it changes constantly. Furthermore (I think) that WebObjects also makes too old URLs invalid since they "time-out" after a period of time. I'm not sure whether this applies only to urls with sessions though.
Regarding the scraping I am not sure whether it's possible to extract any meaningful endpoints from the website. For example, with a normal website I would look through the HTML and extract the POST urls, then use them in my scraper by posting directly to them instead of going through the normal request-response cycle.
In this case I obviously cannot use any URLs extracted from the HTML since they are dynamically generated on each request, but I read something about being able to access WebObjects components directly if the security settings have not been set to disallow this (see https://developer.apple.com/legacy/library/documentation/LegacyTechnologies/WebObjects/WebObjects_3.5/PDF/WebObjectsDevGuide.pdf, p. 53 "Limitations on Direct requests"). I don't understand exactly how to do this though or if it's even possible.
If it's not possible what would be a good approach then? The only options I can think of is:
Using a full-blown browser client to interact with the website (e.g. WatiR or Selenium) and extract & process the HTML from their responses
Manually extracting the dynamic end-points by first request the page where they are on and then find the place in the HTML where they're located. Then use them afterwards as if they were "static".
I am interested in opinions on how to approach this scenario since I don't believe any of the solutions above are particularly good.
You've asked a number of questions, and I'll see if I can cover each in turn.
Does urls like this not completely destroy local and shared caching
opportunities (cachable constraint in REST)? I imaging the only
effective caching with such urls is the WebObjects server itself.
There is, indeed, a page cache within the WebObjects application server, and you're right to observe that these component action URLs probably thwart any other kind of caching. Additionally, even though the session ID is not present in the URL, you'd need the session ID in the cookie to re-create the same page, so having just that URL would get you a session restoration error from the application server.
Isn't addressability broken as well? Each resource does have a unique
endpoint, but it changes constantly.
Well, yes, on the face of it this is true. You've given a component action URL as an example, and they're tied to the session.
Furthermore (I think) that
WebObjects also makes too old URLs invalid since they "time-out" after
a period of time. I'm not sure whether this applies only to urls with
sessions though.
Again, all true. Component action URLs generate sessions, and sessions time out.
At this point, let me take a quick diversion. I'm assuming you're not the owner of the WebObjects application—you're talking about having to scrape a WebObjects app, and you've identified some ways in which this particular app doesn't conform to REST principles. You're completely right—a fully component-action-based WebObjects application won't be RESTful. WebObjects pre-dates REST by a few years. Having said that, there are ways in which a WebObjects application can be completely RESTful:
Using session-less direct actions gives a degree of REST-like behaviour, and would certainly solve the problems you identify with caching, addressability and expiry.
Using the ERRest framework to create a 100% RESTful application.
Of course, none of this will help you if you're just trying to scrape a legacy application.
Regarding the scraping I am not sure whether it's possible to extract
any meaningful endpoints from the website. For example, with a normal
website I would look through the HTML and extract the POST urls, then
use them in my scraper by posting directly to them instead of going
through the normal request-response cycle.
Again, if it's a fully component action-based application, you're right—all those URLs will be dynamically generated and useless to you.
In this case I obviously cannot use any URLs extracted from the HTML
since they are dynamically generated on each request, but I read
something about being able to access WebObjects components directly if
the security settings have not been set to disallow this…
That's talking about getting a component to render directly from its template with some restrictions:
As you note, the application can easily prevent it from happening at all.
As mentioned on p.53, the user input and action-invocation phases of rendering the component are skipped, which probably means this approach would be limited to rendering a component that didn't have any dynamic content anyway. This might be of some very limited use to you, though you'd need to know the component names you were interested in, and they wouldn't normally be exposed anywhere.
I'm not sure you're going to find anything better than the types of high-level functional approaches you've already suggested above, such as automating at the browser level with Selenium. If what you need is REST-style direct addressability of resources within the application, you're not going to get that unless you can re-write the application to use direct actions or ERRest where you need them.
A little late, but could help.
I use the Apache's mod_ext_filter (little modified) to pre/post filter the requests/responses from our WebObjects application. The filter calls PHP scripts and can read the dynamical hyperrefs and other things from the HTML pages. The scripts can also modify the HTTP requests, so we can programatically add/remove parameters from the request to implement new workflows in front of the legacy app and cleanup the requests before they will reach WebObjects. It is also possible to handle an additional database within the scripts and store some things over multiple requests.
So you can get the dynamically created links (maybe a button's name or HTML form destination) and can recognize these names within the request.
It is also possible to "remote control" such applications with little scripts like "click on the third button on the page". The only thing you need is a DOM parser to get the structure of the HTML pages and then rebuild the actions which the browser would do (i.e. create the HTTP request manually and send it as POST to the extracted form destination href). The only problem is the Javascript code, which we analyze and reprogram within PHP (i.e. enable/disable input elements, so they will not be transmitted within the requests)
There were some problems within the WebObjects Adapter Module for Apache. It still uses Content-Length within the HTTP header, which you cannot change in mod_ext_filter. If you change the HTML or the parameters within the request, the length of the content will not longer match. But it is possible to change that.
Theoretically it could also be possible to control such an closed-source legacy application from a new UI on a tablet or smartphone, which delegates the user interaction to the backend WebObjects app.
The scripts depends on the page structure, so if your WebObjects app will be changed, you have to correct some things in the scripts (i.e. third button could be now the fourth button).
It should also be possible to add a Restful interface in front of the application and query the data from the legacy app by the filter scripts.

Can REST be used instead of URL Rewrite in CF10?

Can the REST support in CF10 be used to replace the use of URL Rewrite / ModRewrite for SEO-friendly URL? Write a thin layer that defines the GET and POST method, and <cfinclude> in the correct page?
Or would it tax the server too much and better leave it to the web server to deal with?
Once in CFML, it'd be much easier to be version controlled and maintained.
Thanks
If I understand what you are saying (and perhaps I do not) you would create a handler that would intercept a request, parse out the variables, then request the appropriate page via REST? If that's what you have in mind then I'm not sure I follow what you would gain by this. REST (in general) is more of a generic HTTP API for getting at methods - not so much a page / content paradigm (thought I suppose it could be).
If what you are looking for is to use CF as an rewrite SEO URL handler you can do this now. To use an IIS example, you can create a "custom 404" handler - a CFM page - that gets all the requests that are not tied to a specific document. The handler teases out the variables by parsing through the URL, then "includes" the correct cfm Code or page. That sounds a bit like what you want - but it's not really REST.
Perhaps you are thinking of doing some sort of CFHTTP call where you grab the content you need by constructing the query string from the URL. So if someone loads a url like:
blah.com/productid/550
You could write code like so -
<cfhttp
url="http://blah.com/index.cfm?#listfirst(cgi.script_name,'/')#=#listlast(cgi.script_name,'/')#"/>
<cfoutput>#cfhttp.filecontent#</cfoutput>
While this would do the trick you would be better off using cfinclude rather than this approach. An approach like the one above would actually generate an additional thread per request - one thread for the browser's request and another for the cfhttp request.
Finally I would suggest politely that URL Rewrite (in apache or IIS) is more efficient and more "conventional" and therefore probably a better choice in general.
#Henry
REST is not a replacement for the URL rewriting.
First of all the REST URLs have a format.
http://localhost:8500/rest/App_Name/Rest_Path
"rest" part is mandatory. If you want to change "rest" you can change it in the web.xml (Change the URL Mapping).
App_Name is not mandatory. A server can have a default rest application. For default applications you do not need to specify the AppName. For accessing other (non-default) rest applications, you should specify, the AppName. You can make an application default in the Rest Service registration page in the admin.
Rest_Path identifies the CFC and the function in the CFC that needs to be invoked on the HTTP call.
If these URL format is acceptable, then the URL of these formats can be mapped to a specific function in a CFC. When ever an HTTP call is made to the URL, the corresponding CFFunction will be invoked. By using REST, you are accessing a function in the CFC. It is not possible to access a CFC or a CFM directly in this way. But in the function you can implement whatever you want(Like invoking a CFC, Invoking another CFM etc.).
Does this reply answer your question?
Thanks,
Paul
Even if one could do this, I'd say it's co-opting the wrong tool to do the wrong job. URL rewriting is the web server's job, not the CF server's, and the web server will be a hell of a lot better at it than CF will be. CF's REST interface is for building APIs, not for doing URL rewriting.
If one was to want to handle URL rewriting with CF, then using the 404 handler or onMissingTemplate() handler would be a better fit here, would it not? At least you're using a tool intended for the job (if not the best one).
As for version control... an .htaccess file is just a text file, like a CFML file is. I've not looked too closely at IIS's rewrite module, but can it not use a text file to configure / maintain its rewrites? Obviously Apache can, and we use Helicon's ISAPI Rewrite module which uses an mod_rewrite-compatible .htaccess file.
It seems to me like you're trying to make the developer's job easier by using an approach that would penalise the production performance. "Making the developer's life easier" should never be grounds for compromising the production environment (IMO, obviously).

GWT: add-linker (cross-site) doens't work with Server code!

I am trying to do some cross-site in GWT.
According to GWT:Same Origin Policy I've added to the module xml file.
It is working okey as long as I am not calling any GWT remote service (using GWT-RPC), but when I try to call any remote service, I got no response!
Any one know how to fix cross-site issue in GWT with GWT remote services?
Thanks in advance!
Steve's answer is correct, however there is one other option you can consider which is the best approach if you want to require authentication for server interaction without using OAUTH. The main point is that the cross-site linker doesn't bypass the SOP, but allows you to host the index.html on a different site than the JS code so that you can have the JS code and servlets on one server and load them from another. To get around the SOP you can use a method called JSON with padding or JSONP. Essentially what it does is use a script tag to inject a foreign request into the environment by wrapping the requested data in a callback. To do this you can use one of many server-side implementations such as Jersey. GWT 2 includes a JsonpRequestBuilder object which does all the client-side work for you and can be used in the same way as RequestBuilder. See this article for a tutorial.
If you want to access some other server (example.com) from your GWT app, then you'll have to do an RPC to your server, and in your server-side code, you'll have to make another HTTP call to the example.com page you're looking for.
The page you linked to regarding cross-site linking outlines that adding <add-linker name="xs"/> to the module file allows you to split your hosting between 2 servers:
One server for static files (all GWT produced html and js files, and all images)
One server for dynamic calls (all your RPCs go here, and your index.html home page must be here)