Duplicate form detection and handling in Spring MVC - forms

I'm finding it hard to determine best practice for detecting duplicate form submissions. I'm using the latest SpringBoot, Thymeleaf and Spring-Security and the out-of the-box CSRF functionality all appears to be working.
The design of the application is such that submit buttons get disabled via JavaScript onclick, successful POSTs result in a redirect (POST->Redirect->Get pattern) and I had (seemingly wrongly) thought that the CSRF protection would provide the server-side protection for anything that slipped through the JavaScript.
For some reason my dodgy Logitech G500 mouse (which has started double-clicking everything) has managed to highlight a problem with the application. Somehow it has defeated the JavaScript and it has revealed that there is no protection on the server for duplicate form submissions - i.e. the form got processed twice. I'll have a look into the JavaScript later, but I don't want to rely upon this to protect the server so I want to be able to detect it at the server.
Given how much Spring does (including the CSRF protection) I was somewhat surprised and have done a lot of Googling. From what I can tell, there used to be something in the old Spring framework (references to AbstractFormController.handleInvalidSubmit) but that no longer exists now. I've also seen references to RequestMappingHandlerAdapter and settings such as synchronizeSession and sessionForm, but I don't really understand them yet. There are also a load of custom solutions that people have produced, including a HandlerInterceptorAdapter with associated tag library and a cache that performs some custom processing.
So my questions are:
Why doesn't the CSRF protection prevent this?
What sort of support is built in to detect and handle duplicate form submission?
If a custom solution is necessary, do you have any advice for best practice? In particular, the second click will get rejected
and if I display an error page the user might never see the handling
of the first click and thus not realise it was actually processed
directly.
I have read this: Duplicate form submission in Spring , including the Synchronizer piece from 2009 but of course it's quite old and some of those things are no longer valid.
Thanks
Marcus

Related

Typo3 Forms framework and frontend overlay

I was browsing the typo3 core Forms framework documentation but with no relevant answer to my requirements which are:
The form has to be displayed in a frontend overlay.
The filling process involves multiple steps where the user would be able to go back and forth.
The form fields must still be editable by a redactor.
I'm not sure about how the form framework behaves, so far I remember I think that multiple steps are configurable from the backend module but I don't know if it sends request to the controller after each step or if it sends everything only on submit.
I have an idea about how to implement it though, it's based on this question how to get a typo3 form framework html via ajax. Which would just let me provide the whole html content to the frontender and let him split the whole form into steps. The separation would be based on the addition of some special tags via the editor that would surround the fields you want in each step.
What do you think about that approach?
The form framework proceeds each form step seperately. So without developing your own form runtime, you have to keep proceeding every step seperate.
I see two possibilities:
1. Send each form step from frontend to the form controller and replace the response (html form) in the frontend.
That is the fast and easy way, as you use the existing form runtime.
Prepare a page which returns the rendered form as html
Fetch this page by JavaScript
Send the form data back to the given form action
The form controller proceeds the form with all its validators, rules and finishers and returns the next step, previous step, the current step with existing errors or the finishers response on success
Replace your form in the frontend with the already rendered html response of the form framework
The advantage of this way: Less effort and you can rely on the already existing validators, as you get an already validated response.
The disadvantage of this way --> it is more difficult to implement frontend validation, as you have a mix between frontend and server side validation.
2. Make the form framework kind of headless and work json based
In my opinion the better approach, but with a lot more effort to take.
You have to extend / overwrite the controller and the form runtime. This allows you more flexibility in handling the form by JavaScript and e.g. return the errors in a json object. It makes life easier when you want the form render and handle with a JS framework like react or vue.
To your question:
What do you think about that approach?
If I got it right, you want to keep ONE form step in the backend, but let the editor divide this form step into multiple steps by adding tags? You can try, but I don't see any real advantage in keeping the original form steps and proceed every step by sending the step to the controller and handle the response (like mentioned in 1.)
Summary:
In the past, I was thinking a lot about handling forms by JavaScript and came to the conclusion:
Keep the form framework's behaviour completely untouched with server side processing or make it frontend driven, with an own runtime. All mixtures between client and server side rendering will sooner or later run into bigger problems or at least a high effort. The form framework is pretty complex with a lot of possibilities, hook driven behaviour, etc. From my experience, you have to know it pretty good to develop without loosing control. In smaller projects with just one or two basic forms, I would try to avoid special cases with lots of JS. In bigger projects (with more budget), I would definitely go with my second mentioned approach (currently, I'm developing vue.js based rendering and handling of the form frontend). But these are just my five cents...

How to check if there is any script injected in the json request?

We have got issues in our AEM application for cross site scripting. We decided to check for any scripts before submitting a request. How do we check if there is any script available in the SOAP request at the server side(Java). Is this the correct solution for avoiding cross site scripting issue?
This is a pretty broad question, and we can't provide any implementation details since we don't know any of your architecture or implementation details. However, there are some general XSS things to keep in mind:
If you are "checking for scripts" only in the browser, using JS, before submitting a form that will not solve anything. People can easily bypass this by simply issuing the HTTP request that the form would have made from any other tool (e.g. curl, PostMan, etc.). You need to check for bad data on the server side while processing the request that the Form is submitting.
As far as how to do this sort of thing on the CQ server side: Adobe has some recommendation that you should read through:
AEM 6.1
AEM 5.6
The PDF "cheat sheet" link on those pages will probably be most helpful.
There are different ways to mitigate the XSS risk. White-listing the data to let only known good data through, black-listing the data to block out any known bad data, encoding the data to prevent scripts from being treated as HTML. For an excellent read on what to do pay attention to the OWASP recommendations
Check out XSSAPI , you can use methods in this api to prevent XSS security risks.
On the other hand, you could probably start using sightly which provides automatic contextual XSS protection.

Deal with huge forms in Spring

I hope you can help me. I've tried to look for a solution to this problem or for a similar question here in StackOverflow but couldn't find any, so here it is.
We must develop a feature in which we will have a multi-page form. After filling all the pages of the form, the user will submit it. The problem is that the final submit will send many parameters (around 500), and we're afraid we may encounter problems with request size in many cases.
An initial approach would be having an object in session, which would be partially filled when the user navigates through the pages. I.e. when the user fills the fields in page 1, the object in session is partially filled with that data, and so on. That way, we wouldn't have to pass all the request parameters in every step and the final submit wouldn't have to send so many data. But we don't want to use this approach because we don't want to use the session to store data that are specific to a single functionality or bunch of pages.
Another approach would be saving data to a database after the user fills each page of the form, and retrieving it after the final submit so we can deal with the whole thing. Maybe we could do this, but it would delay the development of the project since it's not a trivial task.
I wonder if there's a better approach to handle this. Maybe using #Cacheable in some intelligent way, maybe using Spring WebFlow (which I've never worked with), maybe other alternatives I can't think of. Is there any strategy or technology I could use for this? Currently we are working with Spring 3.2. We are using jQuery as well, just in case it's relevant.
Thank you.
Writing as answer as I would not fit into comment:
There is no limit to request body size for POST requests. Only GET requests are limited (i.e. when parameters are sent via query parameters). No need to worry here.
I don't understand why you don't want to use session (#SessionAttributes). Having multi-step forms is one of the use-case this was designed for I would say.
Storing incomplete model objects in database is also a good approach as it is very close to REST principles. We have used this multiple times in our company.
Spring WebFlow is also a good approach if you don't want to handle all the transitional logic yourself. However SWF is not that simple technology to learn and you should include that fact in your effort estimations.
There is another approach, which I would say is becoming more and more popular: doing all the logic dynamically on a single webpage (e.g. via AngularJS or some jQuery plugin) and submit the result as a JSON object.
There is no definitive answer to your question without being very specific about your use-case and your application. And even with exhaustive description it is question about personal preference.
The single dynamic page approach (e.g. AngularJS) would be good if your overall application architecture is going to be designed that way.
Spring WebFlow would be nice if you are familiar with that technology or if you are planning on having more multi-step forms throughout the application (i.e. I would not go for SWF if I need to solve just one use-case with it).
I would probably go for #SessionAttributes if I need to quickly solve a single multi-step form. There are some complexities connected to that (partial validation and partial binding namely)... so again this might not be the simplest approach in the end.
Spring Webflow would handle your use case nicely through its flowScope.
Anyway, I you don't want to go through the pain of integrating its infrastructure only for that, the session attribute you mentioned will work perfectly and it's a correct approach. Just make sure you remove it when it's not neccesary anymore to prevent memory leaks.

Scraping WebObjects website & REST

I need to programmatically interact with a WebObjects website and extract data from the responses. The particular WebObjects site I am scraping uses component actions and stores sessions in cookies (not urls). This means that all urls look something like this:
http://example.com/WOApp/WebObjects/WOApp.woa/wo/7.0.0.0.29.1.1.1
My first questions are:
Does urls like this not completely destroy local and shared caching opportunities (cachable constraint in REST)? I imaging the only effective caching with such urls is the WebObjects server itself.
Isn't addressability broken as well? Each resource does have a unique endpoint, but it changes constantly. Furthermore (I think) that WebObjects also makes too old URLs invalid since they "time-out" after a period of time. I'm not sure whether this applies only to urls with sessions though.
Regarding the scraping I am not sure whether it's possible to extract any meaningful endpoints from the website. For example, with a normal website I would look through the HTML and extract the POST urls, then use them in my scraper by posting directly to them instead of going through the normal request-response cycle.
In this case I obviously cannot use any URLs extracted from the HTML since they are dynamically generated on each request, but I read something about being able to access WebObjects components directly if the security settings have not been set to disallow this (see https://developer.apple.com/legacy/library/documentation/LegacyTechnologies/WebObjects/WebObjects_3.5/PDF/WebObjectsDevGuide.pdf, p. 53 "Limitations on Direct requests"). I don't understand exactly how to do this though or if it's even possible.
If it's not possible what would be a good approach then? The only options I can think of is:
Using a full-blown browser client to interact with the website (e.g. WatiR or Selenium) and extract & process the HTML from their responses
Manually extracting the dynamic end-points by first request the page where they are on and then find the place in the HTML where they're located. Then use them afterwards as if they were "static".
I am interested in opinions on how to approach this scenario since I don't believe any of the solutions above are particularly good.
You've asked a number of questions, and I'll see if I can cover each in turn.
Does urls like this not completely destroy local and shared caching
opportunities (cachable constraint in REST)? I imaging the only
effective caching with such urls is the WebObjects server itself.
There is, indeed, a page cache within the WebObjects application server, and you're right to observe that these component action URLs probably thwart any other kind of caching. Additionally, even though the session ID is not present in the URL, you'd need the session ID in the cookie to re-create the same page, so having just that URL would get you a session restoration error from the application server.
Isn't addressability broken as well? Each resource does have a unique
endpoint, but it changes constantly.
Well, yes, on the face of it this is true. You've given a component action URL as an example, and they're tied to the session.
Furthermore (I think) that
WebObjects also makes too old URLs invalid since they "time-out" after
a period of time. I'm not sure whether this applies only to urls with
sessions though.
Again, all true. Component action URLs generate sessions, and sessions time out.
At this point, let me take a quick diversion. I'm assuming you're not the owner of the WebObjects application—you're talking about having to scrape a WebObjects app, and you've identified some ways in which this particular app doesn't conform to REST principles. You're completely right—a fully component-action-based WebObjects application won't be RESTful. WebObjects pre-dates REST by a few years. Having said that, there are ways in which a WebObjects application can be completely RESTful:
Using session-less direct actions gives a degree of REST-like behaviour, and would certainly solve the problems you identify with caching, addressability and expiry.
Using the ERRest framework to create a 100% RESTful application.
Of course, none of this will help you if you're just trying to scrape a legacy application.
Regarding the scraping I am not sure whether it's possible to extract
any meaningful endpoints from the website. For example, with a normal
website I would look through the HTML and extract the POST urls, then
use them in my scraper by posting directly to them instead of going
through the normal request-response cycle.
Again, if it's a fully component action-based application, you're right—all those URLs will be dynamically generated and useless to you.
In this case I obviously cannot use any URLs extracted from the HTML
since they are dynamically generated on each request, but I read
something about being able to access WebObjects components directly if
the security settings have not been set to disallow this…
That's talking about getting a component to render directly from its template with some restrictions:
As you note, the application can easily prevent it from happening at all.
As mentioned on p.53, the user input and action-invocation phases of rendering the component are skipped, which probably means this approach would be limited to rendering a component that didn't have any dynamic content anyway. This might be of some very limited use to you, though you'd need to know the component names you were interested in, and they wouldn't normally be exposed anywhere.
I'm not sure you're going to find anything better than the types of high-level functional approaches you've already suggested above, such as automating at the browser level with Selenium. If what you need is REST-style direct addressability of resources within the application, you're not going to get that unless you can re-write the application to use direct actions or ERRest where you need them.
A little late, but could help.
I use the Apache's mod_ext_filter (little modified) to pre/post filter the requests/responses from our WebObjects application. The filter calls PHP scripts and can read the dynamical hyperrefs and other things from the HTML pages. The scripts can also modify the HTTP requests, so we can programatically add/remove parameters from the request to implement new workflows in front of the legacy app and cleanup the requests before they will reach WebObjects. It is also possible to handle an additional database within the scripts and store some things over multiple requests.
So you can get the dynamically created links (maybe a button's name or HTML form destination) and can recognize these names within the request.
It is also possible to "remote control" such applications with little scripts like "click on the third button on the page". The only thing you need is a DOM parser to get the structure of the HTML pages and then rebuild the actions which the browser would do (i.e. create the HTTP request manually and send it as POST to the extracted form destination href). The only problem is the Javascript code, which we analyze and reprogram within PHP (i.e. enable/disable input elements, so they will not be transmitted within the requests)
There were some problems within the WebObjects Adapter Module for Apache. It still uses Content-Length within the HTTP header, which you cannot change in mod_ext_filter. If you change the HTML or the parameters within the request, the length of the content will not longer match. But it is possible to change that.
Theoretically it could also be possible to control such an closed-source legacy application from a new UI on a tablet or smartphone, which delegates the user interaction to the backend WebObjects app.
The scripts depends on the page structure, so if your WebObjects app will be changed, you have to correct some things in the scripts (i.e. third button could be now the fourth button).
It should also be possible to add a Restful interface in front of the application and query the data from the legacy app by the filter scripts.

Graceful Degradation with REST in CakePHP

Alright, so a better title here may have been "Progressive Enhancement with REST in CakePHP", but at least now I'll know you didn't read the question if your answer just refers to the difference between the two ;)
I'm pretty familiar with REST and how to integrate it with CakePHP, but I'm not 100% on board with how to still maintain a conventionally functioning website. Using Router::mapResources sounds like a great idea, but this creates a problem with maintaining the "gracefully degradation" version of the site, because both POST requests to /resource/ AND GET requests for /resource/add will route to the same action (add). Clearly I'll want this action to return a JSON object if they're using the REST api, but if they're using the degraded version of the site (no JS perhaps), it should be a add form, right?
What's the best way to deal with this. Do you route your REST requests to other action names using Router::resourceMap()? Do you do that crazy hack I saw to have the /api/ prefix part of the resourceMap so you can use api_action functions? Do you have the actions handle both REST and conventional requests via checking isAjax()? If so, how do you ensure that you can rely on the browser to properly support the other two request types?
I've searched around quite a bit but haven't found anything about how to keep conventional requests available in Cake along side REST, so if anyone has any advice or experience, I'd love to hear it!
CakePHP uses extension routing as well, via Router::parseExtension() so;
/test/action will render views/test/action.ctp
/test/action.html also
/test/action.json will render views/test/json/action.ctp
/test/action.xml will render views/test/xml/action.ctp
If all views are designed to handle the same data as set by your controller, you'll be able to show a regular HTML form and handle the posted data the same way as you'd handle the AJAX request.
You'll probably might have to add checks if any data is posted/submitted inside the /add, /edit, /delete actions to prevent items being deleted without a form being posted (haven't tested that though, it might be that cake blocks these urls if mapresources is set for the controller)
REST in CakePHP:
http://book.cakephp.org/2.0/en/development/rest.html
(Extension) Routing
http://book.cakephp.org/2.0/en/development/routing.html#file-extensions