GWT RPC server side safe deserialization to check field size - gwt

Suppose I send objects of the following type from GWT client to server through RPC. The objects get stored to a database.
public class MyData2Server implements Serializable
{
private String myDataStr;
public String getMyDataStr() { return myDataStr; }
public void setMyDataStr(String newVal) { myDataStr = newVal; }
}
On the client side, I constrain the field myDataStr to be say 20 character max.
I have been reading on web-application security. If I learned something it is client data should not be trusted. Server should then check the data. So I feel like I ought to check on the server that my field is indeed not longer than 20 characters otherwise I would abort the request since I know it must be an attack attempt (assuming no bug on the client side of course).
So my questions are:
How important is it to actually check on the server side my field is not longer than 20 characters? I mean what are the chances/risks of an attack and how bad could the consequences be? From what I have read, it looks like it could go as far as bringing the server down through overflow and denial of service, but not being a security expert, I could be mis-interpreting.
Assuming I would not be wasting my time doing the field-size check on the server, how should one accomplish it? I seem to recall reading (sorry I no longer have the reference) that a naive check like
if (myData2ServerObject.getMyDataStr().length() > 20) throw new MyException();
is not the right way. Instead one would need to define (or override?) the method readObject(), something like in here. If so, again how should one do it within the context of an RPC call?
Thank you in advance.

How important is it to actually check on the server side my field is not longer than 20 characters?
It's 100% important, except maybe if you can trust the end-user 100% (e. g. some internal apps).
I mean what are the chances
Generally: Increasing. The exact proability can only be answered for your concrete scenario individually (i. e. no one here will be able to tell you, though I would also be interested in general statistics). What I can say is, that tampering is trivially easy. It can be done in the JavaScript code (e. g. using Chrome's built-in dev tools debugger) or by editing the clearly visible HTTP request data.
/risks of an attack and how bad could the consequences be?
The risks can vary. The most direct risk can be evaluated by thinking: "What could you store and do, if you can set any field of any GWT-serializable object to any value?" This is not only about exceeding the size, but maybe tampering with the user ID etc.
From what I have read, it looks like it could go as far as bringing the server down through overflow and denial of service, but not being a security expert, I could be mis-interpreting.
This is yet another level to deal with, and cannot be addressed with server side validation within the GWT RPC method implementation.
Instead one would need to define (or override?) the method readObject(), something like in here.
I don't think that's a good approach. It tries to accomplish two things, but can do neither of them very well. There are two kinds of checks on the server side that must be done:
On a low level, when the bytes come in (before they are converted by RemoteServiceServlet to a Java Object). This needs to be dealt with on every server, not only with GWT, and would need to be answered in a separate question (the answer could simply be a server setting for the maximum request size).
On a logical level, after you have the data in the Java Object. For this, I would recommend a validation/authorization layer. One of the awesome features of GWT is, that you can use JSR 303 validation both on the server and client side now. It doesn't cover every aspect (you would still have to test for user permissions), but it can cover your "#Size(max = 20)" use case.

Related

Making "parse" function RESTful

I have a RESTful service for getting let's say devices. It provides very usual functionality:
GET /devices
GET /devices/:id
POST /devices
PUT /devices/:id
DELETE /devices/:id
The device object might be defined as follows:
{
id: 123,
name: "Smoke detector",
firmware: "21.0.103",
battery: "ok",
last_maintenance: "2017-07-07",
last_alarm: "2014-02-01 12:11:10",
// ...
}
There is an application that might read device state via some device specific reader. The application itself has no idea how to interpret read data, but it might ask server to do it. In our case let's assume that the data contains the following: battery status, firmware version, last alarm.
If I were implementing regular RPC service, I would create function with "parse" meaning. It means it accept the raw data and returns an updated device object (or, alternatively, only the part of the device object containing the parsed state). But I doubt that I could find a good REST solution for such function. Now I am doing it via PATCH, but I personally do not like this solution, and therefore I will not provide it here. I believe there should be good solution for such class of problems.
So the question: how should I fit my "parse" logic in REST paradigm?
POST it to a /parsed-device-state URL, which will return a 201 Created, a Location header pointing to the place where you can get the parsed data from, and if you like, return the parsed data in the 201 as well (along with an additional Content-Location header with the same value as the Location header). Or if it takes a long time to parse, use 202 Accepted, and the same Location header. The caller can then poll that provided location until the results are ready.
So the question: how should I fit my "parse" logic in REST paradigm?
How would you fit your parse logic into a web site?
You'd probably start with a bookmark. GET $BOOKMARK would return a representation of a form. The form might include an input control like a text area element that would allow the consumer to input a representation, or it might include a input control that allows the consumer to link into a file. The consumer would submit the form, and the agent would create a request from the information in the form. That would probably be a POST (you aren't likely to include an arbitrary file's representation onto the query string) to whatever resource was specified as the action of the form. The server's response would provide a representation of the result.
If parsing were a particularly slow process, then the response instead might be a representation including links to resources that could be used to track the progress of the parsing. The whole protocol in this case looks a lot like putting work on a queue, and then polling for updates.
It's the right answer to a problem that is not a great fit for HTTP:
The REST interface is designed to be efficient for large-grain hypermedia data transfer, optimizing for the common case of the Web, but resulting in an interface that is not optimal for other forms of architectural interaction.
To some degree, what you are trying to do with your function is transfer compute, which may be why it feels like you are trimming corners off of the peg to fit it in the hole.
An alternative approach, which is a better fit for HTTP, is think about transferring a representation of the behavior. The API client gets a function that understands how to parse apples into oranges, and then runs that code on the information that it keeps locally. Think java script - we get a representation of the behavior from the server (which can embed into that representation information the server has that the client will need), and then execute the result locally. Metadata in the headers describes the lifetime of the representation, in a way that is understood by any standards compliant cache.

Proper way to communicate with socket

Is there any design pattern or something else for the network communication using Socket.
I mean what i always do is :
I receive a message from my client or my server
I extract the type of this message (f.e : LOGIN or LOGOUT or
CHECK_TICKET etc ...)
And i test this type in a switch case statement
Then execute the suitable method for this type
this way is a little bit borring when u have a lot of type of message.
Each time i have to add a type, i have to add it in the switch case.
Plus, it take more machine operations when you have hundred or thousands type of message in your protocol (due to the switch case).
Thanks.
You could use a loop over a set of handler classes (i.e. one for each type of message supported). This is essentially the composite pattern. The Component and each Composite then become independently testable. Once written Component need never change again and the support for a new message becomes isolated to a single new class (or perhaps lambda or function pointer depending on language). Also you can add/remove/reorder Composites at runtime to the Component, if that was something you wanted from your design (alternatively if you wanted to prevent this, depending on your language you could use variadic templates). Also you could look at Chain of Responsibility.
However, if you thought that adding a case to a switch is a bit laborious, I suspect that writing a new class would be too.
P.S. I don't see a good way of avoiding steps 1 and 2.

Should a RESTful API return data which is reshaped for the UI

Lets assume ONE team uses the API to return data to a SPA in the browser.
Is it RESTful to return data which is specially prepared for the UI of the SPA?
Instead the client could prepare the data specially with JS what many want to avoid.
WHAT means reshaped? =>
public async Task<IEnumerable<SchoolclassCodeDTO>> GetSchoolclassCodesAsync(int schoolyearId)
{
var schoolclassCodes = await schoolclassCodeRepository.GetSchoolclassCodesAsync(schoolyearId);
var allPupils = schoolclassCodes.SelectMany(s => s.Pupils).Distinct<Pupil>(new PupilDistinctComparer());
var allPupilsDTOs = allPupils.Select(p => p.ToPupilDTO());
var schoolclassCodeDTOs = schoolclassCodes.Select(s => s.ToSchoolclassDTO());
// Prepare data specially for UI DataGrid with checkboxes
foreach (var s in schoolclassCodeDTOs)
{
foreach (var p in allPupilsDTOs)
{
var targetPupil = s.Pupils.SingleOrDefault(pupil => pupil.Id == p.Id);
if(targetPupil == null)
{
p.IsSelected = false;
s.Pupils.Add(p);
}
else
{
targetPupil.IsSelected = true;
}
}
}
return schoolclassCodeDTOs;
}
This is a good question. Problem is, it's likely that only you have the answer.
tl;dr Good-old "it's application-specific".
You probably need to look at this as a continuum, not a binary decision. On one extreme you have the server generate HTML views and is thus responsible for a lot of UI concerns, on the other you have the server expose a data model with very generic CRUD functionality, and is thus not responsible for any UI concerns. Somewhere in the middle, you can also find a design where a server interface is specific to an application but does not necessarily expose HTML (although with the magic of conn-neg everything falls into place) for possibly very obvious reasons (when it comes to so-called SPAs).
So, what should you choose?
I expect some people to advise you to decouple client and server as much as possible, but I personally believe there is no such ultimate "good practice". In fact, it might be premature. Design your logical components first, and then decide on where and how they should run.
Remember:
A good architecture allows major decisions to be deferred and maximizes the number of decisions not made.
(Uncle Bob)
Why? Well, because these (domain logic and execution environment) are truly separate concerns: they might evolve independently. For example, you might decide to create a thinner client for mobile and a thicker client for desktop if the application is compute-intensive (e.g. to save battery power). Or, you might do the exact opposite if the application is network-intensive (e.g. to save roundtrips when connectivity is bad, also consider "offline-first"). Maybe you'll provide all variants and let the user choose, or maybe choose automatically based on available resources -- whatever the requirements warrant and however they might change.
I think these are more appropriate questions and architectural decisions to make (but again: only after you've already designed the boundaries of your logical components). These clearer requirements will help you decide which components of your application should run where. They will drive the way you represent your boundaries (whether they be internal or remote APIs, private or public) but not how you shape them (that's already done). Your RESTful API (if you decide you need one and that a REST-style architecture is appropriate) is just a representation for an arbitrary boundary.
And that's how you will eventually answer your own question in the context of your scenario -- which should hopefully have become very intuitive by then.
End note: While having the domain logic strictly shape your boundaries is nice and pure, it's inevitable that some concerns pertaining to the execution environment (like who controls certain network hosts, where the data should reside, etc) will feed back into the domain design. I don't see it as a contradiction; your application does influence whatever activity you're modelling, so its own concerns must be modelled too. Tools also influence the way you think, so if HTTP is a tool and you're very good at using it, you might start using it everywhere. This is not necessarily bad (e.g. the jury is still out on "micro-services"), though one should be aware that knowing too few tools often (not always) push developers to awkward corners. How could I not finish with: "use the right tool for th--" ah, it's getting old, isn't it ;).

Non-RESTful backend with backbone.js

I'm evaluating backbone.js as a potential javascript library for use in an application which will have a few different backends: WebSocket, REST, and 3rd party library producing JSON. I've read some opinions that backbone.js works beautifully with RESTful backends so long as the api is 'by the book' and follows the appropriate http verbage. Can someone elaborate on what this means?
Also, how much trouble is it to get backbone.js to connect to WebSockets? Lastly, are there any issues with integrating a backbone.js model with a function which returns JSON - in other words does the data model always need to be served via REST?
Backbone's power is that it has an incredibly flexible and modular structure. It means that any part of Backbone you can use, extend, take out, or modify. This includes the AJAX functionality.
Backbone doesn't "care" where do you get the data for your collections or models. It will help you out by providing an out of the box RESTful "ajax" solution, but it won't be mad if you want to use something else!
This allows you to find (or write) any plugin you want to handle the server interaction. Just look on backplug.io, Google, and Github.
Specifically for Sockets there is backbone.iobind.
Can't find a plugin, no worries. I can tell you exactly how to write one (it's 100x easier than it sounds).
The first thing that you need to understand is that overwriting behavior is SUPER easy. There are 2 main ways:
Globally:
Backbone.Collection.prototype.sync = function() {
//screw you Backbone!!! You're completely useless I am doing my own thing
}
Per instance
var MySpecialCollection = Backbone.Collection.extend({
sync: function() {
//I like what you're doing with the ajax thing... Clever clever ;)
// But for a few collections I wanna do it my way. That cool?
});
And the only other thing you need to know is what happens when you call "fetch" on a collection. This is the "by the book"/"out of the box behavior" behavior:
collection#fetch is triggered by user (YOU). fetch will delegate the ACTUAL fetching (ajax, sockets, local storage, or even a function that instantly returns json) to some other function (collection#sync). Whatever function is in collection.sync has to has to take 3 arguments:
action: create (for creating), action: read (for fetching), delete (for deleting), or update (for updating) = CRUD.
context (this variable) - if you don't know what this does it, don't worry about it, not important for now
options - where da magic is. We only care about 1 option though
success: a callback that gets called when the data is "ready". THIS is the callback that collection#fetch is interested in because that's when it takes over and does it's thing. The only requirements is that sync passes it the following 1st argument
response: the actual data it got back
Now
has to return a success callback in it's options that gets executed when it's done getting the data. That function what it's responsible for is
Whenever collection#sync is done doing it's thing, collection#fetch takes back over (with that callback in passed in to success) and does the following nifty steps:
Calls set or reset (for these purposes they're roughly the same).
When set finishes, it triggers a sync event on the collection broadcasting to the world "yo I'm ready!!"
So what happens in set. Well bunch of stuff (deduping, parsing, sorting, parsing, removing, creating models, propagating changesand general maintenance). Don't worry about it. It works ;) What you need to worry about is how you can hook in to different parts of this process. The only two you should worry about (if your wraps data in weird ways) are
collection#parse for parsing a collection. Should accept raw JSON (or whatever format) that comes from the server/ajax/websocket/function/worker/whoknowwhat and turn it into an ARRAY of objects. Takes in for 1st argument resp (the JSON) and should spit out a mutated response for return. Easy peasy.
model#parse. Same as collection but it takes in the raw objects (i.e. imagine you iterate over the output of collection#parse) and splits out an "unwrapped" object.
Get off your computer and go to the beach because you finished your work in 1/100th the time you thought it would take.
That's all you need to know in order to implement whatever server system you want in place of the vanilla "ajax requests".

Form-related problems

I am new to Lift and I am thinking whether I should investigate it more closely and start using it as my main platform for the web development. However I have few "fears" which I would be happy to be dispelled first.
Security
Assume that I have the following snippet that generates a form. There are several fields and the user is allowed to edit just some of them.
def form(in : NodeSeq): NodeSeq = {
val data = Data.get(...)
<lift:children>
Element 1: { textIf(data.el1, data.el1(_), isEditable("el1")) }<br />
Element 2: { textIf(data.el2, data.el2(_), isEditable("el2")) }<br />
Element 3: { textIf(data.el3, data.el3(_), isEditable("el3")) }<br />
{ button("Save", () => data.save) }
</lift:children>
}
def textIf(label: String, handler: String => Any, editable: Boolean): NodeSeq =
if (editable) text(label, handler) else Text(label)
Am I right that there is no vulnerability that would allow a user to change a value of some field even though the isEditable method assigned to that field evaluates to false?
Performance
What is the best approach to form processing in Lift? I really like the way of defining anonymous functions as handlers for every field - however how does it scale? I guess that for every handler a function is added to the session with its closure and it stays there until the form is posted back. Doesn't it introduce some potential performance issue when it comes to a service under high loads (let's say 200 requests per second)? And when do these handlers get freed (if the form isn't resubmitted and the user either closes the browser or navigate to another page)?
Thank you!
With regards to security, you are correct. When an input is created, a handler function is generated and stored server-side using a GUID identifier. The function is session specific, and closed over by your code - so it is not accessible by other users and would be hard to replay. In the case of your example, since no input is ever displayed - no function is ever generated, and therefore it would not be possible to change the value if isEditable is false.
As for performance, on a single machine, Lift performs incredibly well. It does however require session-aware load balancing to scale horizontally, since the handler functions do not easily serialize across machines. One thing to remember is that Lift is incredibly flexible, and you can also create stateless form processing if you need to (albeit, it will not be as secure). I have never seen too much of a memory hit with the applications we have created and deployed. I don't have too many hard stats available, but in this thread, David Pollak mentioned that demo.liftweb.net at the time had 214 open sessions consuming about 100MB of ram (500K/session).
Also, here is a link to the Lift book's chapter on Scalability, which also has some more info on security.
The closure and all the stuff is surely cleaned at sessionShutdown. Earlier -- I don't know. Anyway, it's not really a theoretical question -- it highly depends on how users use web forms in practice. So, for a broader answer, I'd ask the question on the main channel of liftweb -- https://groups.google.com/forum/#!forum/liftweb
Also, you can use a "statical" form if you want to. But AFAIK there are no problems with memory and everybody is using the main approach to forms.
If you don't create the handler xml/html -- the user won't be able to change the data, that's for sure. In your code, if I understood it correctly (I'm not sure), you don't create "text(label,handler)" when it's not needed, so everything's secure.