Can I get only first entry of atom feed? - feed

This atom feed returns 500 entry for each page.
But I just need a first entry to minimize data transfer.
Is there a way to get a first entry?
Or is it impossible unless server side engineer implements the argument for it?

Related

Keep all fields option creating duplicate records in http client origin in StreamSets

I have an http client origin which gives a json response. Pipeline uses pagination (by page number). When I am enabling ‘Keep all fields’ option in http client it is creates the duplicates of first record in every page. Say I have 10 records in the file, it writes first record 10 times in the same, 1st record in the second page 10 times and so on. basically it repeats first record for the entire page. Any way to fix this issue? We need to ‘Keep all fields’ enabled to get page properties for processing within the job.

SharePoint REST API returns incomplete content of file during downloading

I work on application for fetching and downloading SharePoint data. For every folder in SharePoint I can get the list of all files inside given folder by using next SharePoint REST API endpoint:
/_api/web/GetFolderById('<folder_guid>')/Files
The expected size and guid is provided for every file so I can use them when I want to download the file. Then I use the next endpoint from SharePoint REST API in order to actually get file content:
/_api/web/GetFileById('<file_guid>')/$value
From time to time when I download the file I get less data than expected: size of downloaded data is just different from the value I obtain while getting the properties list of files. However when I try to get its content again it can be successfully downloaded (size of downloaded data is equal the expected value) or I can get another incomplete data.
I verified that the first endpoint (one used to get properties of all files in the folder) returns the correct file size. The problem is in the call of the second one.
I see that there is "transfer-encoding" header with "chunked" value in response. So when my http client performs chunked data download and if zero chunk is received at some point then we reached the end of the body by definition. So it looks like in some cases SharePoint either returns the incomplete data or zero chunks when they should not be sent.
What can be the reason of such strange behavior? Is it a know issue?
We actually also see this, strange behaviour, many files are just small aspx files, about 3-4kb and they are constantly smaller by 15% and more than appears in file propertis. We're also using REST API and this is really frustrating. All those strange bugs in Sharepoint Online are very annoying.
this is an interesting topic... are those files large? like over 1GB? It would seem that chunk file download is not supported way in SP Online. Better option is to user RPC. Please see this links for examples:
https://sharepoint.stackexchange.com/questions/184789/download-large-files-from-sharepoint-online
https://social.msdn.microsoft.com/Forums/office/en-US/03e55d41-1daf-46a5-b61d-2d80139123f4/download-large-files-using-rest?forum=sharepointdevelopment
https://piyushksingh.com/2016/08/15/download-large-files-from-sharepoint-online/
You could also check the MS Graph API if maybe will work better for this case
https://learn.microsoft.com/en-us/graph/api/driveitem-get-content?view=graph-rest-1.0&tabs=http
... I hope this will be of any help

What is the REST way to update a record without primary key?

I've created a REST API. According to my design, we have to store user's blood sugar level per daily basis.
The problem is:
I want to use single endpoint for the insert and the update operations
I don't want to use primary key of the blood-sugar resource in the URI because i want to store only the last value for a single day.
For example if I make this call
POST https://{host}/users/1/blood-sugar/
{
"measureDate": "2019-05-04",
"bloodSugarLevel": 86
}
It will create a blood-sugar resource and the database will assign and ID (let's say ID=333)
It's OK until here.
Then, I want to be able to make a second request with same date but different blood sugar level. As a result, i want to the backend should find the previous blood-sugar resource (with ID=333) and update the bloodSugarLevel field, because we already have a record for this day (2019-05-04). I don't want to send ID=333 in the request body or URI.
POST https://{host}/users/1/blood-sugar/
{
"measureDate": "2019-05-04",
"bloodSugarLevel": 105 # only this value is different
}
My question is:
Is there any way to achieve this (or similar) result with REST? You can offer me to change the VERB or the URI or the request body.
Note:
If I was doing this with WCF or similar thing, only single method would satisfy the all my requirements. For example: CreateOrUpdateBloodSugarLevel(int userId, DateTime measureDate, int bloodSugarLevel)
Thanks.
Is there any way to achieve this (or similar) result with REST?
Just POSTing the updated value to the same endpoint is fine.
Think about how you would do this on the world wide web. You would visit a website, and would load some form, containing a text field for date, a text field for bloodSugarLevel, and a submit button. That would POST the message to the web server, and your browser would get back some response.
Note that, as a client, we really don't care whether the server appends the new message into a list, or upserts the message into a map, or does some clever thing with an RDBMS or a graph database. Those are implementation details; part of the point of having a uniform interface is that the interface means that the clients (and generic components) don't really need to know what is happening.
Another application protocol that could work would be to treat bloodSugarLevel as a document that users can edit locally. That way, a client could just use any HTTP aware editor to do the right thing.
GET /users/1/blood-sugar/
200 OK
{
"measureDate": "2019-05-03",
"bloodSugarLevel": 90
}
PUT /users/1/blood-sugar/
{
"measureDate": "2019-05-04",
"bloodSugarLevel": 86
}
204 No Content
PUT /users/1/blood-sugar/
{
"measureDate": "2019-05-04",
"bloodSugarLevel": 105
}
There are some semantic advantages to using PUT when the network is unreliable; because the server agrees that the message handing will be done idempotently, clients can respond to a timeout waiting for an acknowledgment by repeating the send.
Semantically, PUT means "upsert", but the underlying implementation doesn't have to be an upsert. We're only making promises about the semantics that the client can expect.

How to design a REST API to fetch a large (ephemeral) data stream?

Imagine a request that starts a long running process whose output is a large set of records.
We could start the process with a POST request:
POST /api/v1/long-computation
The output consists of a large sequence of numbered records, that must be sent to the client. Since the output is large, the server does not store everything, and so maintains a window of records with a upper limit on the size of the window. Let's say that it stores upto 1000 records (and pauses computation whenever this many records are available). When the client fetches records, the server may subsequently delete those records and so continue with generating more records (as more slots in the 1000-length window are free).
Let's say we fetch records with:
GET /api/v1/long-computation?ack=213
We can take this to mean that the server should return records starting from index 214. When the server receives this request, it can assume that the (well-behaved) client is acknowledging that records up to number 213 are received by the client and so it deletes them, and then returns records starting from number 214 to whatever is available at that time.
Next if the client requests:
GET /api/v1/long-computation?ack=214
the server would delete record 214 and return records starting from 215.
This seems like a reasonable design until it is noticed that GET requests need to be safe and idempotent (see section 9.1 in the HTTP RFC).
Questions:
Is there a better way to design this API?
Is it OK to keep it as GET even though it appears to violate the standard?
Would it be reasonable to make it a POST request such as:
POST /api/v1/long-computation/truncate-and-fetch?ack=213
One question I always feel like that needs to be asked is, are you sure that REST is the right approach for this problem? I'm a big fan and proponent REST, but try to only apply to to situations where it's applicable.
That being said, I don't think there's anything necessarily wrong with expiring resources after they have been used, but I think it's bad design to re-use the same url over and over again.
Instead, when I call the first set of results (maybe with):
GET /api/v1/long-computation
I'd expect that resource to give me a next link with the next set of results.
Although that particular url design does sort of tell me there's only 1 long-computation on the entire system going on at the same time. If this is not the case, I would also expect a bit more uniqueness in the url design.
The best solution here is to buy a bigger hard drive. I'm assuming you've pushed back and that's not in the cards.
I would consider your operation to be "unsafe" as defined by RFC 7231, so I would suggest not using GET. I would also strongly advise you to not delete records from the server without the client explicitly requesting it. One of the principles REST is built around is that the web is unreliable. Under your design, what happens if a response doesn't make it to the client for whatever reason? If they make another request, any records from the lost response will be destroyed.
I'm going to second #Evert's suggestion that you absolutely must keep this design, you instead pick a technology that's build around reliable delivery of information, such as a messaging queue. If you're going to stick with REST, you need to allow clients to tell you when it's safe to delete records.
For instance, is it possible to page records? You could do something like:
POST /long-running-operations?recordsPerPage=10
202 Accepted
Location: "/long-running-operations/12"
{
"status": "building next page",
"retry-after-seconds": 120
}
GET /long-running-operations/12
200 OK
{
"status": "next page available",
"current-page": "/pages/123"
}
-- or --
GET /long-running-operations/12
200 OK
{
"status": "building next page",
"retry-after-seconds": 120
}
-- or --
GET /long-running-operations/12
200 OK
{
"status": "complete"
}
GET /pages/123
{
// a page of records
}
DELETE /pages/123
// remove this page so new records can be made
You'll need to cap out page size at the number of records you support. If the client request is smaller than that limit, you can background more records while they process the first page.
That's just spitballing, but maybe you can start there. No promises on quality - this is totally off the top of my head. This approach is a little chatty, but it saves you from returning a 404 if the new page isn't ready yet.

How to implement robust pagination with a RESTful API when the resultset can change?

I'm implementing a RESTful API which exposes Orders as a resource and supports pagination through the resultset:
GET /orders?start=1&end=30
where the orders to paginate are sorted by ordered_at timestamp, descending. This is basically approach #1 from the SO question Pagination in a REST web application.
If the user requests the second page of orders (GET /orders?start=31&end=60), the server simply re-queries the orders table, sorts by ordered_at DESC again and returns the records in positions 31 to 60.
The problem I have is: what happens if the resultset changes (e.g. a new order is added) while the user is viewing the records? In the case of a new order being added, the user would see the old order #30 in first position on the second page of results (because the same order is now #31). Worse, in the case of a deletion, the user sees the old order #32 in first position on the second page (#31) and wouldn't see the old order #31 (now #30) at all.
I can't see a solution to this without somehow making the RESTful server stateful (urg) or building some pagination intelligence into each client... What are some established techniques for dealing with this?
For completeness: my back-end is implemented in Scala/Spray/Squeryl/Postgres; I'm building two front-end clients, one in backbone.js and the other in Python Django.
The way I'd do it, is to make the indices from old to new. So they never change. And then when querying without any start parameter, return the newest page. Also the response should contain an index indicating what elements are contained, so you can calculate the indices you need to request for the next older page. While this is not exactly what you want, it seems like the easiest and cleanest solution to me.
Initial request: GET /orders?count=30 returns:
{
"start"=1039;
"count"=30;
...//data
}
From this the consumer calculates that he wants to request:
Next requests: GET /orders?start=1009&count=30 which then returns:
{
"start"=1009;
"count"=30;
...//data
}
Instead of raw indices you could also return a link to the next page:
{
"next"="/orders?start=1009&count=30";
}
This approach breaks if items get inserted or deleted in the middle. In that case you should use some auto incrementing persistent value instead of an index.
The sad truth is that all the sites I see have pagination "broken" in that sense, so there must not be an easy way to achieve that.
A quick workaround could be reversing the ordering, so the position of the items is absolute and unchanging with new additions. From your front page you can give the latest indices to ensure consistent navigation from up there.
Pros: same url gives the same results
Cons: there's no evident way to get the latest elements... Maybe you could use negative indices and redirect the result page to the absolute indices.
With a RESTFUL API, Application state should be in the client. Here the application state should some sort of time stamp or version number telling when you started looking at the data. On the server side, you will need some form of audit trail, which is properly server data, as it does not depend on whether there have been clients and what they have done. At the very least, it should know when the data last changed. No contradiction with REST here.
You could add a version parameter to your get. When the client first requires a page, it normally does not send a version. The server replies contains one. For instance, if there are links in the reply to next/other pages, those links contains &version=... The client should send the version when requiring another page.
When the server recieves some request with a version, it should at least know whether the data have changed since the client started looking and, dependending of what sort of audit trail you have, how they have changed. If they have not, it answer normally, transmitting the same version number. If they have, it may at least tell the client. And depending how much it knows on how the data have changed, it may taylor the reply accordingly.
Just as an example, suppose you get a request with start, end, version, and that you know that since version was up to date, 3 rows coming before start have been deleted. You might send a redirect with start-3, end-3, new version.
WebSockets can do this. You can use something like pusher.com to catch realtime changes to your database and pass the changes to the client. You can then bind different pusher events to work with models and collections.
Just Going to throw it out there. Please feel free to tell me if it's completely wrong and why so.
This approach is trying to use a left_off variable to sort through without using offsets.
Consider you need to make your result Ordered by timestamp order_at DESC.
So when I ask for first result set
it's
SELECT * FROM Orders ORDER BY order_at DESC LIMIT 25;
right?
This is the case when you ask for the first page (in terms of URL probably the request that doesn't have any
yoursomething.com/orders?limit=25&left_off=$timestamp
Then When receiving your data set. just grab the timestamp of last viewed item. 2015-12-21 13:00:49
Now to Request next 25 items go to: yoursomething.com/orders?limit=25&left_off=2015-12-21 13:00:49 (to lastly viewed timestamp)
In Sql you would just make the same query and say where timestamp is equal or less than $left_off
SELECT * FROM (SELECT * FROM Orders ORDER BY order_at DESC) as a
WHERE a.order_at < '2015-12-21 13:00:49' LIMIT 25;
You should get a next 25 items from the last seen item.
For those who sees this answer. Please comment if this approach is relevant or even possible in the first place. Thank you.