List all users via Github Graphql v4 API - github

Trying to find documentation or figure out how to replicate GitHub's v3 API Get all users endpoint via there v4 graphql API.
It's easy enough to query just about anything for a specific user but how can I retrieve a payload listing all users similarly to the v3 API payload?
Can someone point me to the correct documentation or even better provide an example that returns a list of users?

As far as I can tell there is no equivalent to "Get all users" in the v4 api, however, there is a hacky way that you can get close to it using a nodes query.
First you need to be able to generate node ID's to iterate over. Looking at mojombo (the first user by database ID) you can see how node ID's are derived.
$ curl https://api.github.com/users/mojombo
{
"login": "mojombo",
"id": 1,
"node_id": "MDQ6VXNlcjE=",
...
}
$ echo -n "MDQ6VXNlcjE=" | base64 -d
04:User1
It is the string 04:User followed by the users id (database ID).
Knowing this we can now generate a nodes query for 100 users at a time. NOTE: Not all database ID's are users, some are organisations or deleted users, and as such you will get a lot of NOT_FOUND errors.
query($ids:[ID!]!) {
nodes(ids:$ids) {
... on User {
login
}
}
}
variables {
"ids": ["MDQ6VXNlcjE=", "MDQ6VXNlcjI=", ...]
}
If you aren't limited to only using the v4 api but still want to take advantage of improved performance of the v4 api then there is a slightly less hacky alternative, which is to use the v3 api to get users 100 at a time then use the returned node_id field to perform a bulk v4 query across all 100 users using the above technique. Using this method gives far fewer NOT_FOUND errors and hence better utilises your rate limit.
To give an idea of the performance improvement you can get using the v4 api and this technique, the task I was performing went from taking an estimated ~474 days using the v3 api to less than 5 days to using this method.

The search query works well for this. You can query based on specific fields, and it will return a list of users. This is usually good if you are looking for one specific user. If you are looking for a large list of users, the search api isn't what you want, since it is optimized for finding one value based on inputs. See an example below:
{
search(query: "location:toronto language:Go", type: USER, first: 10) {
userCount
edges {
node {
... on User {
login
name
location
email
company
}
}
}
}
}

Related

How to make REST calls to Tableau

I need to make REST requests to Tableau to upload and download data sources and other requests.
In the documentation mentioned here, it says that to make a REST request you need.
Server Name
SiteID
Workspace/Group ID
Where can I get these 3 things? I am new thus not familiar with the tableau platform.
Below is my Tableau Dashboard:
I see you've figured this out based on some of your other questions but here is the answer for anyone else searching.
Server name = your server's ip address or if using Tableau Online, the first portion of your url when you login.
10ay.online.tableau.com for the GET call of
https://10ay.online.tableau.com/api/3.12/sites/site-id/projects/project-id
Site ID can be returned using a POST in your API authentication call. Using the server name above the POST call would look like this https://10ay.online.tableau.com/api/3.4/auth/signin You will need to add some info to the POST body that will look like this.
{
"credentials": {
"personalAccessTokenName": "YOURTOKENNAME",
"personalAccessTokenSecret": "YOURTOKENSECRET",
"site": {
"contentUrl": "YOURSITE"
}
}
}
You don't necessarily need the group-id unless you are returning group specific info like user/group relationships. Use this in a GET call to return your group IDs by name. https://10ay.online.tableau.com/api/3.12/sites/site-id/groups

Is there a way to GET all items in a global secondary index with REST API using aws api-gateway? I can only GET some

I created a REST api using aws api-gateway and dynamodb without using aws-lambda (I wrote mapping templates for both the integration request and integration response instead of lambda) on a GET API method, POST http method and Scan action setting. I'm fetching from a global secondary index in dynamodb to make my scan smaller than the original table.
It's working well except I am only able to scan roughly 1,000 of my 7,500 items that I need to scan. I checked out paginating the json in an s3 bucket, but I really want to keep it simple with just the aws api-gateway and the dynamodb, if possible.
Is there a way to get all 7,500 of the items in my payload with some modification to my integration request and/or response mappings? If not, what do you suggest?
Below is the mapping code I'm using that works for a 1000 item json payload instead of the 7,500 that I would like to have:
Integration Request:
{
"TableName": "TrailData",
"IndexName": "trail-index"
}
Integration Response:
#set($inputRoot = $input.path('$'))
[
#foreach($elem in $inputRoot.Items)
{
"id":$elem.id.N,
"trail_name":"$elem.trail_name.S",
"challenge_rank":$elem.challenge_rank.N,
"challenge_description":"$elem.challenge_description.S",
"reliability_description":"$elem.reliability_description.S"
}
#if($foreach.hasNext),#end
#end
]
Here is a screenshot of the GET method settings for my API:
API Screenshot
I have already checked out this: stackoverflow question related topic, but I can't figure out how to apply it to my situation. I have put a lot of time into this.
I am aware of the 1MB query limit for dynamodb, but the limited data I am returning is only 142KB.
I appreciate any help or suggestions. I am new to this. Thank you!
This limitation is not related to Dynamo Scan but VTL within Response Template #foreach is restricted to 1000 iterations Here is the issue.
We can also confirm this, by simply removing the #foreach(or entire response template), we should see all(1MB) the records back (but not well formatted).
Easiest solution is pass the request parameters to restrict only necessary attributes from Dynamo table
{
"TableName":"ana-qa-linkshare",
"Limit":2000,
"ProjectionExpression":"challenge_rank,reliability_description,trail_name"
}
However, we can avoid doing a single loop that goes over 1000 with multiple foreach loops, but going to get little complex with in template, instead, we could use lambda. But here is how it might look like.
#set($inputRoot = $input.path('$'))
#set($maxRec = 500)
#set($totalLoops = $inputRoot.Count / $maxRec )
#set($outerArray = [0..$totalLoops])
#set($innerArray = [0..$maxRec])
{
[
#foreach($outer in $outerArray)
#foreach($inner in $innerArray)
{
grab the element with $inputRoot.Items.get(..index)
and Build JSON here.
}
#end
#end
]
}

Github API pagination limit [duplicate]

This question already has answers here:
github search limit results
(2 answers)
Closed 5 years ago.
So, I have created a query to retrieve some data from Github using its GraphQL API (tested with REST API; same problem). The query is partially shown below (showing just what matters):
query listRepos($queryString: String!, $numberOfRepos: Int!, $afterCursor: String!) {
rateLimit {
cost
remaining
resetAt
}
search(query: $queryString, type: REPOSITORY, first: $numberOfRepos, after: $afterCursor) {
repositoryCount
pageInfo {
endCursor
startCursor
}
<data related to the repository here>
}
}
And I did a small piece of code which is running on a browser to retrieve the data. The code works without problems up to the moment where it is doing its 50th request. At this point, the response comes without data related to pagination; that is, without any cursors related to the pageInfo object (pagination).
Note, at this point I am still far far away from getting even close to the limits of requests I can do. In fact, I have 5000 points at the beginning and each of these calls have a cost of 1. So, at the moment I stop receiving pagination cursors, I still have 4950 points left.
Also, at this point I have collected data of almost 1000 repositories. The repositoryCount shows more than 450,000 results for my search query. Hence, this is not a result of reaching the end of the pages/list/data.
So, at this point I can't move further and retrieve the rest of the data. I even tried using Insomnia and Postman to retrieve the data using the last valid cursor I have, but I only get a response without any cursor.
Why is that happening?
UPDATE
The same thing happens while trying to use the REST API. Using the query API and analyzing the response headers, I see that I am given only 34 pages of results (around 1000 repositories) although the search count returns more than 400,000 results.
Both the REST API and GraphQL API cap the number of items returned from a search query at 1,000 results: https://developer.github.com/v3/search/#about-the-search-api.
Just like searching on Google, you sometimes want to see a few pages of search results so that you can find the item that best meets your needs. To satisfy that need, the GitHub Search API provides up to 1,000 results for each search.

Is that a good practice to provide a aggregation API in a Rest or a normal WebAPI Design?

I have provide such REST API for getting base user info and user's space info.
/v1/users/{user_id}/profile, which will return such a JSON:
```json
{
"id":123,
"name":"foo",
"sex":0,
"email":"foo#gmail.com",
}
```
/v1/users/{user_id}/space , which also return a JSON:
```json
{
"sum_space":100,
"used_space":20,
}
Now if a client (e.g. web page, a 3rd application) have a view which need to show part of user info(e.g "name", "sex") and part of user space info (e.g "sum_space") in a same time, Do I need to provide a new aggregation API such like /v1/users/{user_id}?
And if I provide such a aggregation API, should it return all attributes of User and Space? if I did so, the return value will include some unused values, which will increase the bandwidth of network. But if this API just return what client need, what should I do if a new client requirement come(e.g just get a user's name and user's used_space)?
If I don't provide any aggregation API, all the client must call N times for getting N kinds of resource. if there are a requirement for filter search(e.g getting user list which sum space > 100), the client could only do this serially.
I am very confuse about those, is that any guideline to follow?
Provide base data at /users/{id}. Allow clients to include the optional query parameter ?expand=profile, space. If they pick both, they get a detailed response with data from all three endpoints. If they only pick, say, profile, then they get the base data and the profile data.
If you need them to be able to restrict to exactly the properties they need, you can also support an ?include query parameter, that might look like GET /users/{id}?include=id,lastModifiedDate,profile.name&expand=profile and might get back something like
{
"id": 25,
"lastModifiedDate": 0,
"profile": {
"name": "Bob Smith"
}
}
You can mess around with the URI structure above as you like. Maybe you prefer GET /users/{id}?include=id,lastModifiedDate&expand=profile[name]. The point is that you can use query parameters to define the type of information you get back.
Another choice, if you're using vendor-specific MIME types, would be to use a different mime type for different shapes of response. One type might return the profile and space, while another would return just one or the other, when a request is made to GET /users/{id}. That's a particularly blunt instrument though, and it doesn't sound appropriate for your needs.

Way to specify resource's fields list in RESTful API request

I have a RESTful API within a web-service with resources such as users, posts and so on. When I make a request for a list of posts (GET /posts), I want to retrieve an array of posts only with limited data for each post (i.e. subject, author name). When I make a request for a concrete post (GET /posts/42) I want to retrieve the full list of post object fields, including big post body, additional info about likes count, comments count.
I suppose there exist many ways to solve this problem.
In my mind, the three most obvious are:
Explicitly specify a fields list on every request
(/posts?fields=subject,author_name and
/posts/42?fields=subject,body,createAt,author_name,comments_count,likes_count, etc...).
Explicitly specify a fields list only if it differs from the default fields list.
Specify a fields list that should be excluded (or included) from (to)
the default fields set if the desired fields set differs from the default.
I want to build a clear and useful API for my customers. Which way should I choose?
I'd go for option 2 IMHO.
So if the consumer just requests the resource url (/posts/42) they receive the default fields.
Then consumers can alter the default response by defining values in the query string like:
/posts/42/fields?subject,author_name
This has worked well for me in the past and is how some other well know APIs work, e.g. Facebook
Edit: Looking back at this I’d change the request to be:
/posts/42?fields=subject,author_name
/post/42 is the resource, not fields.
Have been doing research into this as well and was pointed towards Facebook's GraphQL as an alternative to requesting a restful api with the fields wanted. It is still in the very early stages but seems very promising.
https://facebook.github.io/react/blog/2015/05/01/graphql-introduction.html
EDIT:
Reproduced from the URL:
A GraphQL query is a string interpreted by a server that returns data in a specified format. Here is an example query:
{
user(id: 3500401) {
id,
name,
isViewerFriend,
profilePicture(size: 50) {
uri,
width,
height
}
}
}
(Note: this syntax is slightly different from previous GraphQL examples. We've recently been making improvements to the language.)
And here is the response to that query.
{
"user" : {
"id": 3500401,
"name": "Jing Chen",
"isViewerFriend": true,
"profilePicture": {
"uri": "http://someurl.cdn/pic.jpg",
"width": 50,
"height": 50
}
}
}