It's possible that a backend system is under heavy load,accumulates a backlog in terms of unprocessed data. This causes an outdated data in a database.
What is the recommended practice to indicate it in the API returning this outdated data?
Thank you,
Vlad
There is no way for the backend to know which updates are pending and inform the client about it.
If there is a backlog of requests being processed is because the backend can't serve them. So the backend is unaware of which request are pending.
Another supposition is that the backend processes update requests but postpones service of data reads which are delegated to a cache. Then the cache could inform the client that the data it's reading MAY be outdated, but can't give precise information without consulting the backend which seems what you can't do or don't want to do.
Related
Is it proper programming practice/ software design to have a REST API call another REST API? If not what would be the recommended way of handling this scenario?
If I understand your question correctly, then YES, it is extremely common.
You are describing the following, I presume:
Client makes API call to Server-1, which in the process of servicing
this request, makes another request to API Server-2, takes the
response from Server-2, does some reformatting or data extraction, and
packages that up to respond back the the Client?
This sort of thing happens all the time. The downside to it, is that unless the connection between Server-1 and Server-2 is very low latency (e.g. they are on the same network), and the bandwidth used is small, then the Client will have to wait quite a while for the response. Obviously there can be caching between the two back-end servers to help mitigate this.
It is pretty much the same as Server-1 making a SQL query to a database in order to answer the request.
An alternative interpretation of your question might be that the Client is asking Server-1 to queue an operation that Server-2 would pick up and execute asynchronously. This also is very common (it's how Google crawls your website, for instance). This scenario would have Server-1 respond to Client immediately without needing to wait for the results of the operation undertaken by Server-2. A message queue or database table is usually used as an intermediary between servers in this case.
Another approach to that is make your REST API(1) store the request details to a queue table. Make a backend that will check that queue table every let's say 100milliseconds. That backend will be the one who will call the other REST API(2).
In your REST API(1) just create a loop that will check if the transaction on queue has been processed. If yes, get the process details and return it to client, if no, just keep on looping until process is done
So far all the guides I've found for creating rest API's are for displaying stuff from your own site, but can you display stuff from another site?
Typically you'd do this by:
Proxying calls: When a request comes into your server, make a request to the remote server and pass it back to the user. You'll want to make sure you can make the requests quickly and cache results aggressively. You'll probably want to use a short timeout for the remote call and rate-limit API requests so your server can't be blocked making all these remote calls.
Pre-fetching: Downloading with a data dump periodically or pre-fetching the data you need so you can store it locally.
Keep in mind:
Are you allowed to use the API this way, according to its terms of use? If it's a website you're scraping, it may be okay for small hobby use, but not for a large commercial operation.
The remote source probably has its own rate limits in place. Can you realistically provide your service under those limits?
As mentioned, cache aggressively to avoid re-requesting the same data. Get to know HTTP caching standards (cache-control, etag, etc headers) to minimise network activity.
If you are proxying, consider choosing a data center near the API's data center to reduce latency.
In a distributed systems environment, we have a RESTful service that needs to provide high read throughput at low-latency. Due to limitations in the database technology and given its a read-heavy system, we decided to use MemCached. Now, in a SOA, there are atleast 2 choices for the location of the cache, basically client looks up in Cache before calling server vs client always calls server which looks up in cache. In both cases, caching itself is done in a distributed MemCached server.
Option 1: Client -> RESTful Service -> MemCached -> Database
OR
Option 2: Client -> MemCached -> RESTful Service -> Database
I have an opinion but i'd love to hear arguments for and against either option from SOA experts in the community. Please assume either option is feasible, its a architecture question. Appreciate sharing your experience.
I have seen the
Option 1: Client -> RESTful Service -> Cache Server -> Database
working very well. Pros IMHO are that you are able to operate wtih and use this layer in a way allowing you to "free" part of the load on the DB. Assuming that your end-users can have a lot of similar requests and after all the Client can decide what storage to spare for caching. Also how often to clear it.
I prefer Option 1 and I am currently using it. In this way it is easier to control the load on the DB (just as #ekostatinov mentioned). I have lots of data that are required for every user in the system, but the data is never changed (such as some system rules, types of items, etc). It really reduces the DB load. In this way you can also control the behavior of the cache (such as when to clear the items).
Option 1 is the prefered option as it makes memcache an implementation detail of the service. the other option means that if the business changes and things can't be kept in the cache (or other can etc.) the clients would have to change. Option 1 hides all that behind the service interface.
Additionally option 1 lets you evolve the service as you wish. e.g. maybe later you think you need a new technology, maybe you'd solve the performance problem with the DB etc. Again, option 1 lets you make all these changes without dragging the clients into the mess
Is the REST ful API exposed to external consumers. In that case it is up to the consumer to decide if they want to use a cache and how much stale data can they use.
As for as the REST ful service goes, the service is the container of business logic and it is the authority of data, so it decides how much to cache, cache expiry, when to flush etc. A client consuming the REST service always assumes that the service is providing it with the latest data. And hence option 1 is preferred.
Who is the client in this case?
Is it a wrapper for your REST API. Are you providing both client and the service.
I can share my experience with Enduro/X middleware implementation. For local XATMI service calls any client process connects to shared memory (LMDB) and checks the result there. If there is response saved it returns data directly from shm. If data is not there, client process goes the longest path and performs the IPC. In case of REST access, network clients still performs the HTTP invocation, but HTTP server as XATMI client returns the data from shared mem. From real life, this technique was greatly boosting the web frontend web application which used middleware via REST calls.
Assume the following scenario A web application serves up resources through a RESTful API. A number of clients consume this API. The goal is to keep the data on the clients synchronized with the web application (in both directions).
The easiest way to do this is to ask the API if any of the resources have changed since the client last synchronized with the API. This means that the client needs to ask the API for the appropriate resources accompanied by timestamp (to see if the data needs to be updated). This seems to me like the approach with the least overhead in terms of needless consumption of bandwidth.
However, I have the feeling that this approach has a few downsides in terms of design and responsibilities. For example, the API shouldn't have to deal with checking whether the resources are out of date. It seems that the only responsibility of the API should be to serve up the resources when asked without having to deal with the updating aspect. By following this second approach, the client would ask for a lot of data every time it wants to update its data to keep it synchronized with the web application. In other words, the client would check whether the data it got back is newer than the locally stored data. If this process takes place every few minutes, this might become a significant burden for the system.
Am I seeing this correctly or is there a middle road that I am overlooking?
This is a pretty common problem, and a RESTful approach can help you solve it. HTTP (the application protocol typically used to build RESTful services) supports a variety of techniques that can be used to keep API clients in sync with the data on the server side.
If the client receives a Last-Modified or E-Tag header in a HTTP response, it may use that information to make conditional GET calls in the future. This allows the server to quickly indicate with a 304 – Not Modified response that the client’s previously stored representation of the resource is still valid and accurate. This will allow the server (or even better, an intermediate proxy or cache server) to be as efficient as possible in how it responds to the client’s requests, potentially reducing costly round-trips to a back-end data store.
If a response contains a Last-Modified header and the client wishes to take advantage of the performance optimization available with it, they must include an If-Modified-Since directive in a subsequent GET call to the same URI, passing in the same timestamp value it received. This instructs the server to only GET the information from the authoritative back-end source if it knows it has changed since that time. Your server will have to be built to support this technique, of course.
A similar principle applies to E-Tag headers. An E-Tag is a simple hash code representing a specific state of the resource at a particular point in time. If the resource changes in any way, so does its E-Tag value. If the client sees an E-Tag in a response it should pass it in subsequent GET requests to the same URI, thereby allowing the server to quickly determine if the client has an up-to-date representation of the resource.
Finally, you should probably look at the long polling technique to reduce the number of repeated GET requests issued by your clients to the server. In essence, the trick is to issue very long GET requests to the server to watch for server data changes. The GET will not return a response until either the data has changed or the very long timeout fires. If the latter, the client just re-issues the same long-lived request to watch for changes again. See also topics like Comet and Web Sockets which are similar in approach.
I have a RESTful web-service application that I developed using the Netbeans IDE. The application uses MySQL server as its back end server. What I am wondering now is how often a client application that uses my RESTful application would refresh to reflect the data change in the server.
Are there any default pull intervals that clients get from the RESTful application? Does the framework(JAX-RS) do something about it Or is that my business to take care of.
Thanks in advance
#Abraham
There are no such rules. Only thing you can use for properly implementing this is HTTP's caching capabilities. Service must include control information how long representation of a particular resource can be cached, revalidated, never cached etc...
On client application side of things each client may decide it's own path how it will keep itself in sync with a service. It can be done by locally storing data and serve end user from local cache etc... Service can not(and shouldn't know) how clients are implemented, only thing service can do is to include caching information in response messages as i already mentioned above.
It is your responsibility to schedule the service to execute again and again. We can set time out interval but there is no pull interval.