How to get more than 100 query results with Azure DocumentDB REST API - rest

I am following a sample for Azure DocumentDB below. In the sample, C# code queries for documents in the DocumentDB.
https://github.com/Azure/azure-documentdb-dotnet/blob/master/samples/rest-from-.net/Program.cs
Line 182:
var qry = new SqlQuerySpec { query = "SELECT * FROM root" };
var r = client.PostWithNoCharSetAsync(new Uri(baseUri, resourceLink), qry).Result;
The problem is the result 'r' only contains the first 100 documents. If I use the Client SDK, I can get more than 100. I tried using stream, but had no luck so far. Any help would be appreciated!

For a SQL query the results are returned in segments if the result set is too large. The results are returned in chunks of 100 items or 1 MB (whichever limit is hit first) by default.
You can either use continuation tokens to get each segment after another. Or you set the x-ms-max-item-count custom header in a request to increase the limit to an appropriate value.
You can have a look the the REST API for further details.
For the sample program you have to add the line
client.DefaultRequestHeaders.Add("x-ms-max-item-count", "1000");
in order to get 1000 documents instead of 100.

I'm just guessing here, but it might be worth a shot. Here's the documentation from MSDN that describes the List action:
https://learn.microsoft.com/en-us/rest/api/documentdb/list-documents
In the "Headers" section under "Response" it is mentioned that you might get an optional token in the header "x-ms-continuation". Based on the description you have to issue another GET request with this token specified to get the other elements of the result set.
Can you check whether you get a header like this in the response? If so, you can issue another get request with this token specified (see the same documentation page under "Request").

Related

API Request Assistance

I'm new to playing around with calling third party REST API's.
I have an API which requires an ID (/sites/{id}/. As I don't know the ID off the top of my head and would like to query multiple ID's, is there anyway to wildcard this ID for it to run through and check for instance ID's 1 through to 10? Or is this more of a python integration?
As mentioned above, if an API happens to have a parameter "id", whether or not you can scan for all available IDs (or any ID between 1 and 10) depends entirely on the API.
In your case, the API (for help.rapid7.com) is well documented. It appears to have an endpoint to "list sites", which should give you what you're looking for:
https://help.rapid7.com/insightvm/en-us/api/index.html#operation/getSites
Sites
GET /api/3/sites
Server URL
https://help.rapid7.com/api/3/sites
Retrieves a paged resource of accessible sites.
PARAMETERS
Query Parameters
* page integer <int32>
Default: 0
The index of the page (zero-based) to retrieve.
* size integer <int32>
Default: 10
The number of records per page to retrieve.
* sort
Multiple query params of string
The criteria to sort the records by, in the format:
property[,ASC|DESC].
The default sort order is ascending.
Multiple sort criteria can be specified using multiple sort query parameters.
You would probably want to do the following:
Call /api/3/sites (with a filter) to get a list of sites you're interested in, then
Make successive calls to /sites/{id}/ for each site in the list you want detailed information about.

How to filter YouTube Analytics API request for embedded video stats only

I'm trying to get data from the YouTube Analytics API for embedded videos only.
When I use the "insightPlaybackLocationType==EMBEDDED" filter, I get a response that the query is not supported. Without this filter, the query returns a response without any errors.
response = self.executeAPIRequest(
yt_instance.reports().query,
ids="channel==" + c_id,
startDate=startdate,
endDate=enddate,
metrics="views,likes,dislikes,comments,shares,estimatedMinutesWatched,averageViewDuration,averageViewPercentage",
sort='-views',
filters="video==VIDEO_ID_HERE;insightPlaybackLocationType==EMBEDDED",
maxResults=200,
)
Here's the error I get:
googleapiclient.errors.HttpError: https://youtubeanalytics.googleapis.com/v2/reports?ids=channel%3D%CHANNEL_ID_HERE&startDate=2017-02-28&endDate=2019-08-11&metrics=views%2Clikes%2Cdislikes%2Ccomments%2Cshares%2CestimatedMinutesWatched%2CaverageViewDuration%2CaverageViewPercentage&sort=-views&filters=video%3D%VIDEO_ID_HERE%3BinsightPlaybackLocationType%3D%3DEMBEDDED&maxResults=200&alt=json returned "The query is not supported. Check the documentation at https://developers.google.com/youtube/analytics/v2/available_reports for a list of supported queries.">
That filter can only be used with the insightPlaybackLocationDetail dimension.
Bear in mind that the dimension only supports views and estimatedMinutesWatched metrics.
Documentation (Playback location detail):
https://developers.google.com/youtube/analytics/channel_reports#playback-location-reports
Be sure to set the sort and maxResults parameters correctly for this dimension.

Emberjs not returning all records

I am using the SANE stack, which consists of Sails and Emberjs. I am also using MongoDB as a datastore.
When I do something like the following on the Sailsjs side of things;
Parent.find(req.query.id).populate('children').exec(function(err, parent){
console.log('children.length = ' + parent[0].children.length);
});
I get 212
But when I do something like the following on the Emberjs side of things;
parent.get('children').then(function(children){
console.log('children.length = ' + children.length);
});
I get 30.
As a matter of fact, once the number of records goes over 30, it does not matter ember will only return 30 records.
Is there some way to get the rest of the records? I actually need to records so I can sort and calculate some things. I am not just displaying them.
Any help would be greatly appreciated.
Explanation
That's because the default response limit in Sails is 30 records. It's there obviously so that if you have 10k or a million rows you don't make some normal request and accidentally dump out your entire database to the client and crash everything.
This is documented here:
http://sailsjs.org/documentation/reference/configuration/sails-config-blueprints
defaultLimit (default: 30)
The default number of records to show in the response
from a "find" action. Doubles as the default size of populated arrays
if populate is true.
It's also documented in the config/blueprints.js file in your sails app:
https://github.com/balderdashy/sails-generate-backend/blob/master/templates/config/blueprints.js#L152-L160.
Here is the relevant code: https://github.com/balderdashy/sails/blob/master/lib/hooks/blueprints/actionUtil.js#L291
Solution
Edit the defaultLimit in your config/blueprints.js file
OR
Add &limit=<number> to your http request URL

How to implement cursors for pagination in an api

This is similar to to this question which doesn't have any answers. I've read all about how to use cursors with the twitter, facebook, and disqus api's and also this article about how disqus generally built their cursors, but I still cannot seem to grok the concept of how they work and how to implement a similar solution in my own projects. Can someone explain specifically the different techniques and concepts behind them?
Lets first understand why offset pagination fails for large data sets with an example.
Clients provide two parameters limit for number of results and offset and for page offset.
For example, with offset = 40, limit = 20, we can tell the database to return the next 20 items, skipping the first 40.
Drawbacks:
Using LIMIT OFFSET doesn’t scale well for large
datasets. As the offset increases the farther you go within the
dataset, the database still has to read up to offset + count rows
from disk, before discarding the offset and only returning count
rows.
If items are being written to the dataset at a high frequency, the
page window becomes unreliable, potentially skipping or returning
duplicate results.
How Cursors solve this ?
Cursor-based pagination works by returning a pointer to a specific item in the dataset. On subsequent requests, the server returns results after the given pointer.
We will use parameters next_cursor along with limit as the parameters provided by client in this case.
Let’s assume we want to paginate from the most recent user to the oldest user.When client request for the first time , suppose we select the first page through query:
SELECT * FROM users
WHERE team_id = %team_id
ORDER BY id DESC
LIMIT %limit
Where limit is equal to limit plus one, to fetch one more result than the count specified by the client. The extra result isn’t returned in the result set, but we use the ID of the value as the next_cursor.
The response from the server would be:
{
"users": [...],
"next_cursor": "1234", # the user id of the extra result
}
The client would then provide next_cursor as cursor in the second request.
SELECT * FROM users
WHERE team_id = %team_id
AND id <= %cursor
ORDER BY id DESC
LIMIT %limit
With this, we’ve addressed the drawbacks of offset based pagination:
Instead of the window being calculated from scratch on each request based on the total number of items, we’re always fetching the next count rows after a specific reference point. If items are being written to the dataset at a high frequency, the overall position of the cursor in the set might change, but the pagination window adjusts accordingly.
This will scale well for large datasets. We’re using a WHERE clause to fetch rows with id values less than the last id from the previous page. This lets us leverage the index on the column and the database doesn’t have to read any rows that we’ve already seen.
For detailed explanation you can visit this wonderful engineering article from slack!
Here is an article about pagination: paginating-real-time-data-cursor-based-pagination
Cursors – we need to have at least one column with unique sequential values to implement cursor based pagination. This can be similar to Twitter’s max_id parameter or Facebook’s after parameter.
In general you should pass the current item or page number in the request as a param. Other usual param is the batch size of the page. Then on the server side backend you select and return the proper dataset, with an SQL query for example.
enter image description hereHere's what I am Done with. The cursor is working as a pointer and it points to that index. and limit will pick that many rows from that pointer. Let's say we have given id 10 and limit 5 then it will go to id 10 and pick 5 elements from there.
Some Graph API connections uses cursors by default. You can use 'limit' and 'before'/'after' parameters in your call. If you are still not clear, you can post your code here and I can explain with it.

facebook graph api comment list sort , like 'orderby=desc'?

I use graph api to get the picture's comments, but I want to first sort the results by creating time and then return to the latest data. Similar to the sql statement 'order by create_time desc', I do not know if have such a parameter.
Currently used to offset and limit access to the latest data, but also know the total number of comments,
pagesize = 25;
offset = comments.count - pagesize;
limit = 25;
url = "https://graph.facebook.com/" + object_id + "/comments?access_token=" + access_token + "&limit=" + limit + "&offset=" + limit;
next page:
offset -= 25
but comments.ount of numerical sometimes is not accurate
and the result of the request URL to return to sometimes don't match
Whether to have very good solution
Or I used the wrong way (‘limit’ and ‘offset’ Parameter)!!!
Thank you for your answer.
"Graphics API" the existence of the cache?
i post a message and 46 comments.requests url, set the parameters:
offset=0&limit=1
Then it should return to the last comment (latest one), the actual return to the middle of a comment, and I tested a few times, set the
offset and limit. According to the returned results, the middle one is
the latest comment
If I set the limit value is greater than the 'comment.count', the returned data is all, the official website and facebook consistent
Because the cache reason?
Thanks again~
#dbau - You are still better off using FQL. In my experience, unless you are making a very simple call, you have very little control over what you get via a Graph API call.
Why don't you want to use FQL? FQL is an endpoint of the Graph API. There is still some data that can only be returned via FQL.
This will get you the result you're looking for. The query needs to be URL encoded. I left it in plain text for clarity.
https://graph.facebook.com/fql?access_token=[TOKEN]&q=
SELECT id, fromid, text, time, likes, user_likes FROM comment
WHERE object_id = [OBJECT_ID] ORDER BY time DESC LIMIT 0,[N]
You may find you don't get [N] comments returned each time, because Facebook filters out items that are not visible to the access_token owner after the query is run. You could either up the LIMIT and filter out any excess results returned or if you are using a user access_token, you could add AND can_like = TRUE to the WHERE clause to be guaranteed that, if they exist, [N] posts visible to the current user are returned.
Graph API returns latest objects first.
Facebook provides 2 keywords to filter the fetched data.
Limit : Returns "limit" number of latest records
Offset : Returns "limit" number of records from the offset position
So to retrieve latest "x" comments posted for an object
https://graph.facebook.com/[OBJECTID]?limit=[X]&offset=0
To retrieve next "X" comments (page wise)
https://graph.facebook.com/[OBJECTID]?limit=[X]&offset=[X*PAGENo]
Hope the answer is clear enough for you.