when I write data in Initiator,why there exists read requests? - iscsi

Initiator using open-iscsi in linux.
when I launch a write request, the target gets a series of read requests and then a write request. why these requests exists?
I found these requests seem to be consistent with the request when the Initiator logged into the target.

Related

Progress callback for multi-part form POST request

We have a legacy application that uses embedded Jetty and provides functionality through clients making HTTP calls. Most of the information/parameters needed by the server is sent by the client through HTTP headers.
We are using multi-part form POST using curl_mime, curl_mimepart and CURLOPT_MIMEPOST, and send transfer multiple files request to server, server accept the request and it transfer files from volume1 to volume2.
For this transfer operation we have multiple files having size in GBs and it is taking long time to complete. Currently client do a multi-part form POST request and wait for result which takes long time.
We want some mechanism where client will get some progress status of this transfer operation.
I went through curl and jetty documentation and seems progress callback(CURLOPT_PROGRESSFUNCTION) is not supported when we use multi-part form POST.
One solution I found that will fulfil my requirement is use of 102-processing,
102 Processing
But as per document it mentioned "later update of RFC 2518, RFC 4918, removed the 102 Processing status code for lack of implementation."
if 102-processing is removed, then is there any alternate given that will tell transfer progress to client?
What is the best way to tell this transfer progress to client?

Why are REST APIs considered stateless if PUT commands can update?

I am a bit confused by the terminology of REST APIs being stateless. For example, if we had a To-Do list API, and one of the endpoints was used to update or delete entries, then each request does not happen in isolation.
If I create an entry before someone else queries the total entries, then their response will depend on my response.
But, PUT is seen as a verb of REST APIs. Can someone help me clear my confusion?
Stateless means that you store the client state on the client and send it with each request instead of storing it on the server. The latter is the classical server side sessions, where you have a session cookie with the session id and the server stores the session data in the database or file system. This does not scale well for Facebook size applications, that's why they rather send the session data with each request. You can ensure that the session data is not modified by the client if you sign it with a private key stored on the server. So there is signature verification by each request, but still it is less expensive than maintaining session data for more than 1M users in a database and syncing it around the globe with multiple servers to solve the single point of failure problem too. They rather send the session data with each request and if it passes the verification, then the request is handled by any node chosen by the load balancer without touching the database to get session data.
As of the part of the question related to concurrent calls, it can be solved with resource versioning. You can send the actual ETag of the resource and use the if-match header with your PUT request so the server will be able to figure out which version you request is based on. If there is a newer version, then the ETag won't match and the server will reject the request. There can be other ways to solve concurrency, it always depends on your application how you handle it.

Handle REST API timeout in time consuming operations

How is possible to handle timeouts in time consuming operations in a REST API. Let's say we have the following scenario as example:
A client service sends a request to insert a resource through a REST API.
Timeout elapses. The client thinks the insertion failed.
REST API keep working and finishes the insertion.
Client do not notify the resource insertion and it status is "Failed".
I can think I a solution with a message broker to send orders to a queue and wait until they are solved.
Any other workaround?
EDIT 1:
POST-PUT Pattern as has been suggested in this thread.
A Message Broker (add more complexity to the system)
Callback or webhook. Pass in the request a return url that the server API can call to let the client know that the work is completed.
HTTP offers a set of properties for invoking certain methods. These are primarily safetiness, idempotency and cacheability. While the first one guarantees a client that no data is modified, the 2nd one gives a promise whether a request can be reissued in regards to connection issues and the client not knowing whether the initial request succeeded or not and only the response got lost mid way. PUT i.e. does provide such a property, i.e.
A simple POST request to "insert" some data does not have any of these properties. A server receiving a POST request furthermore processes the payload according to its own semantics. The client does not know beforehand whether a resource will be created or if the server just ignores the request. In case the server created a resource the server will inform the client via the Location HTTP response header pointing to the actual location the client can retrieve information from.
PUT is usually used only to "update" a resource, though according to the spec it can also be used in order to create a new resource if it does not yet exist. As with POST on a successful resource creation the PUT response should include such a Location HTTP response header to inform the client that a resource was created.
The POST-PUT-Creation pattern separates the creation of the URI from the actual persistence of the representation by first firing off POST requests to the server until a response is received containing a Location HTTP response header. This header is used in a PUT request to actually send the payload to the server. As PUT is idempotent the server simply can reissue the request until it receives a valid response from the server.
On sending the initial POST request to the server, a client can't be sure whether the request reached the server and only the response got lost, or the initial request didn't make it to the server. As the request is only used to create a new URI (without any content yet) the client may simply reissue the request and in worst case just create a new URI that points to nothing. The server may have a cleanup routine that frees unused URIs after a certain amount of time.
Once the client receives the URI, it simply can use PUT to reliably send data to the server. As long as the client didn't receive a valid response, it can just reissue the request over and over until it receives a response.
I therefore do not see the need to use a message-oriented middleware (MOM) using brokers and queues in order to guarantee reliable messaging.
You could also cache the data after a successful insertion with a previously exchanged request_id or something of that sort. But I believe message broker with some asynchronous task runner is a much better way to deal with the problem especially if your request thread is a scarce resource. What I mean by that is. If you are receiving a good amount of requests all the time. Then it is a good idea to keep your responses as quickly as possible so the workers will be available for any requests to come.

Long running REST API with queues

We are implementing a REST API, which will kick off multiple long running backend tasks. I have been reading the RESTful Web Services Cookbook and the recommendation is to return HTTP 202 / Accepted with a Content-Location header pointing to the task being processed. (e.g. http://www.example.org/orders/tasks/1234), and have the client poll this URI for an update on the long running task.
The idea is to have the REST API immediately post a message to a queue, with a background worker role picking up the message from the queue and spinning up multiple backend tasks, also using queues. The problem I see with this approach is how to assign a unique ID to the task and subsequently let the client request a status of the task by issuing a GET to the Content-Location URI.
If the REST API immediately posts to a queue, then it could generate a GUID and attach that as an attribute on the message being added to the queue, but fetching the status of the request becomes awkward.
Another option would be to have the REST API immediately add an entry to the database (let's say an order, with a new order id), with an initial status and then subsequently put a message on the queue to kick off the back ground tasks, which would then subsequently update that database record. The API would return this new order ID in the URI of the Content-Location header, for the client to use when checking the status of the task.
Somehow adding the database entry first, then adding the message to the queue seems backwards, but only adding the request to the queue makes it hard to track progress.
What would be the recommended approach?
Thanks a lot for your insights.
I assume your system looks like the following. You have a REST service, which receives requests from the client. It converts the requests into commands which the business logic can understand. You put these commands into a queue. You have a single or multiple workers which can process and remove these commands from the queue and send the results to the REST service, which can respond to the client.
Your problem that by your long running tasks the client connection timeouts, so you cannot send a response. So what you can do is sending a 202 accepted after you put the commands into the queue and add a polling link, so the client will be able to poll for the changes. Your tasks have multiple subtasks so there is progress, not just pending and complete status changes.
If you want to stick with polling, you should create a new REST resource, which contains the actual state and the progress of the long running task. This means that you have to store this info in a database, so the REST service will be able to respond to requests like GET /tasks/23461/status. This means that your worker has to update the database when it is completed a subtask or the whole task.
If your REST service is running as a daemon, then you can notify it by progress, so storing the task status in the database won't be the responsibility of the worker. This kind of REST service can store the info in the memory as well.
If you decide to use websockets to notify the client, then you can create a notification service. By REST you have to respond with a task id. After that you send back this task id on the websocket connection, so the notification service will know which websocket connection subscribed to the events of a certain task. After that you won't need the REST service, you can send the progress through the websocket connection as long as the client does not close the connection.
You can combine these solutions the following way. You let your REST service to create a task resource, so you'll be able to access the progress by using a polling link. After that you send back an identifier with 202 which you send back through the websockets connection. So you can use a notification service to notify the client. By progress your worker will notify the REST service, which will create a link like GET /tasks/23461/status and send that link to the client through the notification service. After that the client can use the link to update its status.
I think the last one is the best solution if your REST service runs as a daemon. It is because you can move the notification responsibility to a dedicated notification service, which can use websockets, polling, SSE, whatever you want. It can collapse without killing the REST service, so the REST service will stay stable and fast. If you send back a manual update link too with the 202, then the client can do manual update (assuming a human controlled client), so you will have something like graceful degradation if the notification service is not available. You don't have to maintain the notification service because it won't know anything about the tasks, it will just send data to the clients. Your worker won't have to know anything about how to send notifications and how to create hyperlinks. It will be easier to maintain the client code too, since it will be almost a pure REST client. The only extra feature will be the subscription for the notification links, which does not change frequently.

What to do if network fails before POST response can be read?

When accessing a REST service from a client that has an unreliable network connection (e.g., some crappy cell network), what are some best practices for handling an error where the network connection drops before the response to a POST can be read. Since POSTs are not idempotent, it's unsafe to naively retry. Are there best practices for this? Assume I'm also designing the service end of this, so there are no constraints on that end of the wire either.
Write a protocol which does not allow to create a second resource when the client did not consume the first one. For example, after GETting the resource, the client should POST back that it consumed it, so the service can safely create another one when the next GET arrives. If no verification POST arrives, the server should respond every subsequent GETs by sending the same resource which was created for the first GET (this may be client-specific). -- This way you can safely repeat the GET after the predefined timeout interval elapses. (If the number of repeats exceeds a given value, it means that you have a permanent network or service error, about which you will have to notify the user.)