REST: how to tell server to do some background process - rest

I am building a client-side product with REST. All user interaction will be done with a browser (the config stuff will be on a server running on localhost). I want everything to be REST compliant, even though the application will be running on a client's machine on localhost and will never be accessible from the outside.
The commands are pretty simple:
update
restart
sync
Here's what I've come up with:
POST to / with 'action' parameter (JSON) detailing specifics
PUT a new resource
subsequent GET requests will return the status
when the command is complete, the resource is deleted
What would be the most RESTful way to implement this?
Note:
I'm not asking for scrutinization of my software architecture. I have reasons for choosing a REST interface instead of a unix domain socket, CLI interface, or even a regular GUI interface. The justification would overcomplicate the question and make it too localized.
I have had the same need on a couple of different projects (both client only and server) and I am looking for community input on best practices.

I would POST to a /process resource with the appropriate parameters necessary to start the process, then I would have it return a Location header to that resource that actually represents the process status (/process/123). You can then use GET on that process to get the latest information about it.
I would not automatically delete the process, because if you do that, the client will not know if the process finished properly or not, just simply that it finished (well, stopped running).
Noting that, the client can certainly DELETE the resource when it is done, or you can clean it up later after some reasonable time where whoever was interested in it is likely not to be any more.

Related

Invoking a script as part of a web api method : how bad an idea is it?

I have a powershell script (but I think these considerations could be extended to any script that requires a runtime to interpret and execute it) that does what I also need to expose to a web application front end as a REST API to be called and I've been asked to call directly the script itself from the web method but although technically feasible, having a web api method that starts a shell/process to execute the script and redirecting stdin/stdout/stderr looks like a very bad practice to me. Is there any specific security risk in doing something like this?
Reading this question brings to mind how many of the OWASP Top Ten Security Vulnerabilities it would expose your site to.
Injection Flaws - This is definitely a high risk. There are ways to remediate it, of course. Parameterizing all input with strongly-typed dates and numbers instead of strings is one method that can be used, but it may not fit with your business case. You should never allow user-provided code to be executed, but if you are accepting strings as input and running a script against that input, it becomes very difficult to prevent arbitrary code execution.
Broken Authentication - possibly vulnerable. If you force a user to authenticate before reaching your script (you probably should), there is a chance that the user reuses their credentials elsewhere and exposes those credentials to a brute force attack. Do you lock out accounts after too many tries? Do you have two-factor authentication? Do you allow weak passwords? These are all considerations when you introduce a new authentication mechanism.
Sensitive data exposure - likely vulnerable, depending on your script. Does the script allow reading files and returning their contents? If not now, will it do so in the future? Even if it's never designed to do so, combined with other exploits the script might be able to read a file from a path that's outside the web directory. It's very difficult to prevent directory traversal exploits that would allow a malicious user access to your server, or even the entire network. Compiled code and the web server prevent this in many cases.
XML External Entities - possibly vulnerable, depending on your requirements. If you allow user-provided XML, the bad guy can inject other files and create havoc. This is easier to trap when you're using standard web tools.
Broken Access Control - definitely vulnerable. A Web API application can enforce user controls, and set permission levels in a C# controller. Exceptions are handled with HTTP status codes that indicate the request was not allowed. In contrast, Powershell executes within the security context of the logged in user, and allows system-level changes even if not running escalated. If an injection flaw is exploited, the code would be executed in the web server's security context, not the user's. You may be surprised how much the IIS_USER (or other Application Pool service account) can do. For one, if the bad guy is executing in the context of a service account, they might be able to bring down your whole site with a single request by locking out that account or changing the password - a task that's much easier with a Powershell script than with compiled C# code.
Security Misconfiguration - likely vulnerable. A running script would require it's own security configuration outside whatever framework you are using for the Web API. Are you ready to re-implement something like OAuth Claims or ACLs?
Cross-Site Scripting - likely vulnerable. Are you echoing the script output? If you're not sanitizing input and output, the script could echo some Javascript that sends a user's cookie content to a malicious server, giving them access to all the user's resources. Cross site request forgery is also a risk if input is not validated.
Insecure Deserialization - Probably not vulnerable.
Using Components with Known Vulnerabilities - greatly increased vulnerability, compared to compiled. Powershell grants access to a whole set of libraries that would otherwise need explicit references in a compiled application.
Insufficient Logging & Monitoring - likely vulnerable. IIS logs requests by default, but Powershell doesn't log anything unless you explicitly write to a file or start a transcript. Neither method is designed for concurrency and may introduce performance or functional problems for shared files.
In short, 9 out of the top 10 Vulnerabilities may affect this implementation. I would hope that would be enough to prevent you making your script public, at the very least. Basically the problem is that you're using the tool (Powershell) for a purpose it wasn't intended to fulfill.

How to manage HATEOAS links when the server is the client?

I'm learning about HATEOAS. The backend server I'm working on will use a third party REST API that uses HATEOAS. That API has an end point to return the url for each resource and also returns the related resource links with regular requests.
But I'm wondering what's a good way to manage these links on the server to avoid hardcoding them. For example if the third party changes the url of the resource, how will the server detect that change? Are there any standard practices for managing HATEOAS resource links?
Possible ways I can think of
When the server starts, get all the resources urls and cache them. Whenever the third party API needs to be called, reuse these cached urls. Whenever there is a 404 or related error, update the resource url. Or update the url periodically in intervals.
Get the resource url each time before calling the end point. Simplest but essentially doubles the number of requests.
But neither sound like robust ways.
While discovery is generally a good thing and should allow a HATEOAS system to introduce changes in ways that 'hardcoded urls' don't, if urls start breaking arbitrarily I would still consider this a major issue.
You should be able to store urls / links on your side and have some expectation that those keep working.
There are some mechanisms that deal with changes though:
The server should return 301 / 308 redirects if a resource moved. If this were the case, you should also update your references.
The server can emit Sunset or Deprecated headers. See: https://www.rfc-editor.org/rfc/rfc8594
Those are more general answers, but ultimately the existence of best practices does not mean that vendors will abide by them. With that in mind I think your best bet is to try and find out what the deprecation policy is of your vendor and see what they recommend.
Use a cached resource if it is valid, request a refresh when you don't have a local valid copy.
RFC 7234 defines the caching semantics of HTTP.
Ideally, you don't implement the caching rules yourself, but instead you use a general purpose cache.
In its ideal form, your bespoke implementation is talking to a headless browser, and the headless browser worries about the caching rules for you.
In theory, you need the initial URL to start the process, and everything else comes from that.
Each resource you get from the server should include links to other edges on the graph of service for that resource.
So, once you get the initial resource, all of the rest come automatically.
That said, it's not untoward to have "well known" entry points that are, ideally, unchanging URLs. But in the end, those are just "bookmarks", and not necessarily guaranteed end points.
Consider a shopping site such as Amazon. Outside of amazon.com, you don't know any of their URLs. They're all provided on the various forms and pages, and the human simply navigates the site. Those URLs can be changing all the time, and no one would know. With HATEOAS, it's up to the machine to follow the links, rather than a human. But the process of navigation is the same.
As others have mentioned, idea of caching a root resource has merit. Then you rely on the caching headers to direct you to how often you have to refresh the links.
But that said, operationally, there's no difference between following a normal link, and following a cached link. Underneath, the cached resource loads faster, but you still need to "follow the link". Because that's where the caching behavior kicks in. This is different from assuming the link is good, assuming you know the result of a resource lookup. Your application follows the link. Always. The underlying infrastructure is responsible for making it efficient.
So, your code should not, say, load up a root resource, and then stuff a map filled with links, and then assume they're good. Rather, the code should request the root resource, perhaps as a Map of links (datatypes for the win), and let the next layer handle the details. Because it all depends on the type of caching involved. Some have coded durations where no followup is necessary. Others, you make the request anyway, and the server tier responds back "nothing changed", so you can use your local copy, but you're still require to ask in the first place.
Those are implementation details that the SERVER mandates (not the client). It's a server contract. If they want you pinging them each and every time, so be it. That's the contract they're presenting to you and if you want to be a Good Citizen, then you should honor that contact.
Ideally, the server makes good decisions on these kinds of issues for the sake of efficiency, but in the end it's really up to them.
The client has to go along. The client in a HATEOAS system cedes a lot to the server. They're simply not decisions for the client to make.

What are the limitations of the flask built-in web server

I'm a newbie in web server administration. I've read multiple times that flask built-in web server is not designed for "production", and must be used only for tests and debug...
But what if my app touchs only a thousand users who occasionnaly send data to the server ?
If it works, when will I have to bother with the configuration of a more sophisticated web server ? (I am looking for approximative metrics).
In a nutshell, I would love to find what the builtin web server can do (with approx thresholds) and what it cannot.
Thanks a lot !
There isn't one right answer to this question, but here are some things to keep in mind:
With the right amount of horizontal scaling, it is quite possible you could keep scaling out use of the debug server forever. When exactly you would need to start scaling (or switch to using a "real" web server) would also depend on the environment you are hosting in, the expectations of the users, etc.
The main issue you would probably run into is that the server is single-threaded. This means that it will handle each request one at a time, serially. This means that if you are trying to serve more than one request (including favicons, static items like images, CSS and Javascript files, etc.) the requests will take longer. If any given requests happens to take a long time (say, 20 seconds) then your entire application is unresponsive for that time (20 seconds). This is only the default, of course: you could bump the thread counts (or have requests be handled in other processes), which might alleviate some issues. But once again, it can still be slow under a "high" load. What is considered a "high" load will be dependent on your application and the expectations of a maximum acceptable response time.
Another issue is security: if you are concerned at ALL about security (and not just the security of the data in the application itself, but the security of the box that will be running it as well) then you should not use the development server. It is not ready to withstand any sort of attack.
Finally, the development server could just fail outright. It is not designed to be used as a long-running process (days, weeks, months), and so it has not been well tested to work in this capacity.
So, yes, it has limitations. Yes, you could still conceivably use it in production. And yes, I would still recommend using a "real" web server. If you don't like the idea of needing to install something like Apache or Nginx, you can still go with a solution that is still as easy as "run a python script" by using some of the WSGI Standalone servers, which can run a server that is designed to be in production with something just as simple as running python run_app.py in the command line. You typically just need to create a 4-5 line python script to import and create the server object, point it to your Flask app, and run it.
gunicorn could be run with only the following on the command line, no extra script needed:
gunicorn myproject:app
...where "myproject" is the Python package that contains the app Flask object. Keep in mind that one of developers of gunicorn would probably recommend against this approach. See https://serverfault.com/questions/331256/why-do-i-need-nginx-and-something-like-gunicorn.
The OP has long-since moved on, but for those who encounter this question in the future I would just add that setting up an Apache server, even on a laptop, is free and pretty easy. It can be readily configured for as few or as many features as you want just by uncomment in or commenting out lines in the config file. There might be an even easier GUI method for doing that nowdays, but just editing the configs is simple.

Client Server Applications for Iphone

I have a question regarding this topic.Like for Client Server Applications
1) is it necessary to load database directly into the Application.
Suppose if I have a DB in the back end and My application has to connect to that DB and display the results on the View for this do I need to Add DB into the Application directly.
2) can we access any DB or a File on the Remote server and show the required results.( with out adding that particular DB or A File into the application directly). How can we do this.
I saw a similar question in stackoverflow one answer was to use a PList, I am new to this.I am browsing the net but not able to get clear results. I lost many of my interviews because of this question.
Thanks,
1) is it necessary to load database
directly into the Application.
Suppose if I have a DB in the back end
and My application has to connect to
that DB and display the results on the
View for this do I need to Add DB into
the Application directly.
I'm not sure I understand this question. No, you don't need to load a database directly into a client in a client-server architecture. Normally, when I think of a design where a server has a database, I imagine there's some kind of way for the client to query the server for information. Perhaps it's making HTTP requests, which the server parses into a query, runs the query, and then returns the results (perhaps in XML form?).
2) can we access any DB or a File on
the Remote server and show the
required results.( with out adding
that particular DB or A File into the
application directly). How can we do
this.
Are you asking if it's possible, in general, to access a server database from a client? Yes, of course. (See above, re: HTTP Requests).
Any arbitrary file? That depends on how the server is set up. Again, HTTP is one protocol works that way; if you send an HTTP query like "GET someimage.png HTTP/1.0", the server could just be grabbing the whole file someimage.png and sending it back in the response. (Technically, it's not necessarily snarfing a whole file -- it could be creating that PNG dynamically since there's nothing in the HTTP protocol that says it must be sending an existing file -- but that's outside the scope of your question.)
I lost many of my interviews because
of this question.
Not to sound too snarky, but interviews are often won and lost not because you don't know the answer, but when you can't communicate effectively. You haven't phrased your question(s) here particularly well.

Connectedness & HATEOAS

It is said that in a well defined RESTful system, the clients only need to know the root URI or few well known URIs and the client shall discover all other links through these initial URIs. I do understand the benefits (decoupled clients) from this approach but the downside for me is that the client needs to discover the links each time it tries access something i.e given the following hierarchy of resources:
/collection1
collection1
|-sub1
|-sub1sub1
|-sub1sub1sub1
|-sub1sub1sub1sub1
|-sub1sub2
|-sub2
|-sub2sub1
|-sub2sub2
|-sub3
|-sub3sub1
|-sub3sub2
If we follow the "Client only need to know the root URI" approach, then a client shall only be aware of the root URI i.e. /collection1 above and the rest of URIs should be discovered by the clients through hypermedia links. I find this cumbersome because each time a client needs to do a GET, say on sub1sub1sub1sub1, should the client first do a GET on /collection1 and the follow link defined in the returned representation and then do several more GETs on sub resources to reach the desired resource? or is my understanding about connectedness completely wrong?
Best regards,
Suresh
You will run into this mismatch when you try and build a REST api that does not match the flow of the user agent that is consuming the API.
Consider when you run a client application, the user is always presented with some initial screen. If you match the content and options on this screen with the root representation then the available links and desired transitions will match nicely. As the user selects options on the screen, you can transition to other representations and the client UI should be updated to reflect the new representation.
If you try and model your REST API as some kind of linked data repository and your client UI as an independent set of transitions then you will find HATEOAS quite painful.
Yes, it's right that the client application should traverse the links, but once it's discovered a resource, there's nothing wrong with keeping a reference to that resource and using it for a longer time than one request. If your client has the possibility of remembering things permanently, it can do so.
consider how a web browser keeps its bookmarks. You probably have maybe ten or a hundred bookmarks in the browser, and you probably found some of these deep in a hierarchy of pages, but the browser dutifully remembers them without requiring remembering the path it took to find them.
A more rich client application could remember the URI of sub1sub1sub1sub1 and reuse it if it still works. It's likely that it still represents the same thing (it ought to). If it no longer exists or fails for any other client reason (4xx) you could retrace your steps to see if you can find a suitable replacement.
And of course what Darrel Miller said :-)
I don't think that that's the strict requirement. From how I understand it, it is legal for a client to access resources directly and start from there. The important thing is that you do not do this for state transitions, i.e. do not automatically proceed with /foo2 after operating on /foo1 and so forth. Retrieving /products/1234 initially to edit it seems perfectly fine. The server could always return, say, a redirect to /shop/products/1234 to remain backwards compatible (which is desirable for search engines, bookmarks and external links as well).