Is it possible to get all the file names from repository using the GitHub API?
I'm currently trying to tinker this using PyGithub, but I'm totally ok with manually doing the request as long as it works.
My algorithm so far is:
Get the user repo names
Get the user repo that matches a certain description
??? get repo file names?
This will have to be relative to a particular commit, as some files may be present in some commits and absent in others, so before you can look at files you'll need to use something like List commits on a repository:
GET /repos/:owner/:repo/commits
If you're just interested in the latest commit on a branch you can set the sha parameter to the branch name:
sha string SHA or branch to start listing commits from.
Once you have a commit hash, you can inspect that commit
GET /repos/:owner/:repo/git/commits/:sha
which should return something like this (truncated from GitHub's documentation):
{
"sha": "...",
"...",
"tree": {
"url": "https://api.github.com/repos/octocat/Hello-World/git/trees/691272480426f78a0138979dd3ce63b77f706feb",
"sha": "691272480426f78a0138979dd3ce63b77f706feb"
},
"...": "..."
}
Look at the hash of its tree, which is essentially its directory contents. In this case, 691272480426f78a0138979dd3ce63b77f706feb. Now we can finally request the contents of that tree:
GET /repos/:owner/:repo/git/trees/:sha
The output from GitHub's example is
{
"sha": "9fb037999f264ba9a7fc6274d15fa3ae2ab98312",
"url": "https://api.github.com/repos/octocat/Hello-World/trees/9fb037999f264ba9a7fc6274d15fa3ae2ab98312",
"tree": [
{
"path": "file.rb",
"mode": "100644",
"type": "blob",
"size": 30,
"sha": "44b4fc6d56897b048c772eb4087f854f46256132",
"url": "https://api.github.com/repos/octocat/Hello-World/git/blobs/44b4fc6d56897b048c772eb4087f854f46256132"
},
{
"path": "subdir",
"mode": "040000",
"type": "tree",
"sha": "f484d249c660418515fb01c2b9662073663c242e",
"url": "https://api.github.com/repos/octocat/Hello-World/git/blobs/f484d249c660418515fb01c2b9662073663c242e"
},
{
"path": "exec_file",
"mode": "100755",
"type": "blob",
"size": 75,
"sha": "45b983be36b73c0788dc9cbcb76cbb80fc7bb057",
"url": "https://api.github.com/repos/octocat/Hello-World/git/blobs/45b983be36b73c0788dc9cbcb76cbb80fc7bb057"
}
]
}
As you can see, we have some blobs, which correspond to files, and some additional trees, which correspond to subdirectories. You may want to do this recursively.
You can use Github git trees
https://api.github.com/repos/[USER]/[REPO]/git/trees/[BRANCH]?recursive=1
Repo
https://github.com/deeja/bing-maps-loader
Api Call
https://api.github.com/repos/deeja/bing-maps-loader/git/trees/master?recursive=1
which returns
{
sha: "55382e87889ccb4c173bc99a42cc738358fc253a",
url: "https://api.github.com/repos/deeja/bing-maps-loader/git/trees/55382e87889ccb4c173bc99a42cc738358fc253a",
tree: [
{
path: "README.md",
mode: "100644",
type: "blob",
sha: "41ceefc1262bb80a25529342ee3ec2ec7add7063",
size: 3196,
url: "https://api.github.com/repos/deeja/bing-maps-loader/git/blobs/41ceefc1262bb80a25529342ee3ec2ec7add7063"
},
{
path: "index.js",
mode: "100644",
type: "blob",
sha: "a81c94f70d1ca2a0df02bae36eb2aa920c7fb20e",
size: 1581,
url: "https://api.github.com/repos/deeja/bing-maps-loader/git/blobs/a81c94f70d1ca2a0df02bae36eb2aa920c7fb20e"
},
{
path: "package.json",
mode: "100644",
type: "blob",
sha: "45f24dcb7a457b14fede4cb907e957600882b340",
size: 595,
url: "https://api.github.com/repos/deeja/bing-maps-loader/git/blobs/45f24dcb7a457b14fede4cb907e957600882b340"
}
],
truncated: false
}
Much eaiser now with the graphql api, you can get it all in a single query
first you get your repo:
query {
repository(name: "MyRepo" owner: "mylogin"){
}
}
then you get its defaultBranchRef to make life easy
defaultBranchRef{
}
Now all a branch ref really is, is just a pointer to a commit, and since graphql is strongly typed (and refs can be different things) we need to let it know it is a commit,
target{
...on Commit {
}
}
so target is what our ref is pointing to, and we say "if its a commit, do this"
and what should it do?
it should get the most recent commit (since that will have the latest files in the repo)
so to do that we query history
history(first: 1 until: "2019-10-08T00:00:00"){
nodes{
}
}
now inside of nodes we are inside of our commit and now we can see the files,
the files in a commits pointer are really just a pointer to a tree, and a tree just has entries, which can be objects of either type Tree, or type blob
entries that represent files are known as blobs, but since we dont do anything with them but list their names, you dont even need to know that
but its important to know that trees are also entries, so if you find a tree you need to dig in deeper, but you can only go a pre defined amount of levels deep.
tree{
entries {
name
object {
...on Tree{
entries{
name
object {
...on Tree{
entries{
name
}
}
}
}
}
}
}
}
now to put it all together:
query{
repository(owner: "MyLogin", name: "MyRepo") {
defaultBranchRef {
target {
... on Commit {
history(first: 1 until: "2019-10-08T00:00:00") {
nodes {
tree {
entries {
name
object {
... on Tree {
entries {
name
object{
...on Tree{
entries{
name
object{
...on Tree{
entries{
name
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
As Dan mentioned: github trees
See working example below
import requests
user = "grumbach"
repo = "ft_ping"
url = "https://api.github.com/repos/{}/{}/git/trees/master?recursive=1".format(user, repo)
r = requests.get(url)
res = r.json()
for file in res["tree"]:
print(file["path"])
For the sake of simplicity I omitted error management, velociraptors are extinct anyway…
Use gh api for authenticated HTTP request to the GitHub API
in one line
gh api -X GET /repos/octocat/Hello-World/commits | grep -E -o ".{0,0}\[{\"sha\":\".{0,40}" | sed 's/\[{\"sha\":\"//' | xargs -I {} gh api -X GET /repos/octocat/Hello-World/commits/{} | grep -E -o "\"filename\":\".*?\""
Or in two steps
Get commits sha
gh api -X GET /repos/octocat/Hello-World/commits | grep -E -o ".{0,0}\[{\"sha\":\".{0,40}" | sed 's/\[{\"sha\":\"//' >> ~/commits
List file names
xargs < ~/commits -I {} gh api -X GET /repos/octocat/Hello-World/commits/{} | grep -E -o "\"filename\":\".*?\""
Related
There are two releases that can be obtained from a GitHub repo (Binary Releases and Package Releases) as shown below:
I want to use Ansible to retrieve Package Releases from my GitHub Repo
I did some searching on Ansible docs and found a collection community.general.github_release but this gives the latest Release binaries of the repo and not Package Releases.
Can anyone help if they know a collection that can fetch Package Releases from GitHub ?
Appreciate any help. Thanks
You can use Github GraphQL API (as shown in this question and this one) such as:
#
# Tasks that may be included in an Ansible playbook or role depending on your needs
#
# Some variables to define to identify your repository
# They may be set as playbook or role variables as well
# You'll need a Bearer token (see https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token)
- set_fact:
bearer_token: YOUR_BEARER_TOKEN
repository_name: repository-name
repository_owner: repository-owner
- name: Retrieve packages for repository
uri:
url: https://api.github.com/graphql
method: POST
body: '{"query":
"query { repository(name: \"{{ repository_name }}\", owner: \"{{ repository_owner }}\") {
packages(first:10) { nodes { name, packageType, latestVersion {
version, files(first:100) { nodes { url } }
} } }
}
}"'
headers:
Content-Type: application/json
Accept: "application/vnd.github.packages-preview+json"
Authorization: "bearer {{ bearer_token }}"
register: github_packages_json
This will provide an output like:
{
"json": {
"data": {
"repository": {
"packages": {
"nodes": [
{
"latestVersion": {
"files": {
"nodes": [
{
"url": "https://pkg.githubusercontent.com/xxx/some-url"
},
{
"url": "https://pkg.githubusercontent.com/xxx/another-url"
}
]
},
"version": "my-package-1.2.3"
},
"name": "my-package",
"packageType": "DOCKER"
}
]
}
}
}
},
}
Depending on packageType you may need to perform different action. For example, a DOCKER packageType would require you to pull the image such as:
- name: pull docker
shell: docker pull docker.pkg.github.com/{{ repository_owner | lower }}/{{ repository_name }}/{{ docker_image_name }}:{{ docker_image_version }}
vars:
docker_image_name: "{{ github_packages_json.json.data.repository.packages.nodes[0].name }}"
docker_image_version: "{{ github_packages_json.json.data.repository.packages.nodes[0].latestVersion.version }}"
The community.general.github_release Ansible role is part of the ansible-collections/community.general code.
Its source code source_control/github/github_release.py shows that is is using github3.py, the library for using GitHub's REST API.
Specifically, the latest_release endpoint (code here) uses the GET /repos/{owner}/{repo}/releases/latest REST API.
However, a "Package Release" is, for github3.py (used by the Ansible role), an asset, with an ID you can find in a github3.repos.release.Release: the original_assets will give you all the assets id.
You would therefore need to write a role similar to latest_release, using the version returned by latest_release in order to call github3.repos.release.Release, get the assets ID and download the one you need using asset(asset_id)
Let's say that we have the Github package registry repository https://maven.pkg.github.com/someOrganization . How can I cat the list of all packages in this repo into a txt file ?
This can be done using Preview API for GitHub Packages. You can query it in GraphQL using:
query($login: String!) {
organization(login:$login) {
registryPackages(first:10, packageType:MAVEN) {
nodes {
name
}
}
}
}
This will output something like:
{
"data": {
"organization": {
"registryPackages": {
"nodes": [
{
"name": "package1"
},
{
"name": "package2"
}
]
}
}
}
}
At the time of writing this requires both:
Valid Token with org:read and packages:read
Accept header for preview API: application/vnd.github.packages-preview+json
Now because you want to do this over the command line, you could curling it. There's already a good answer on how to use curl to access GitHub's GraphQL API: https://stackoverflow.com/a/42021388/1174076
Hope this helps.
Can anyone help me with an example for std.lines(arr) function of Jsonnet?
I am trying to create a bash script to clone multiple git repositories using values from an array. My array structure is given below.
gitRepo : [
{
github_repo: "github.com/abcd.git",
github_id: "tom",
github_access_token: "1aae0a6dc19aef327565"
},
{
github_repo: "github.com/qwerty.git",
github_id: "alice",
github_access_token: "2e2eef327565"
},
],
}
Thanks in advance...
Found a solution for this from jsonnet google groups.
local config = [
{
github_repo: 'github.com/abcd.git',
github_id: 'tom',
github_access_token: '1aae0a6dc19aef327565',
},
{
github_repo: 'github.com/qwerty.git',
github_id: 'alice',
github_access_token: '2e2eef327565',
},
];
std.lines([
'git clone %(github_repo)s --user=%(github_id)s --token=%(github_access_token)s' % item
for item in config
])
test it with jsonnet -S test.jsonnet. (Note the Capital -S flag)
https://groups.google.com/forum/#!searchin/jsonnet/array%7Csort:date/jsonnet/SGADdQQ-vBs/Tig8DnsRBQAJ
I'm able to successfully create an annotated tag (git tag) with the request below, but I'm not able to programmatically delete it.
POST https://dev.azure.com/{organization}/{project}/_apis/git/repositories/{repositoryId}/annotatedtags?api-version=4.1-preview.1
Request:
{
"name": "wagner-test-3",
"message": "wagner-test-3",
"taggedObject": {
"objectId": "aaaaab6cad84a07b7bd65cf3519142a12f856baa"
}
}
According to the documentation there is no delete endpoint, so I've tried the delete ref endpoint but no luck so far. It only returns 400 (Invalid Request).
DELETE https://dev.azure.com/{organization}/{project}/_apis/git/favorites/refs/{favoriteId}?api-version=4.1-preview.1
Response:
{
"count": 1,
"value": {
"Message": "The request is invalid."
}
}
Thanks.
I was able to figure out my own question. The way to delete an annotated tag is to update it with the Refs API. This is not well documented though.
POST https://dev.azure.com/{organization}/{project}/_apis/git/repositories/{repositoryId}/refs?api-version=4.1
Request:
[
{
"name": "refs/tags/wagner-test-3",
"newObjectId": "0000000000000000000000000000000000000000",
"oldObjectId": "aaaaab6cad84a07b7bd65cf3519142a12f856baa"
}
]
Azure DevOps documentation:
Refs - Update Refs
Creating, updating, or deleting refs(branches).
The firebase doc sys this is how it is supposed to be done:
curl -X PATCH -d '{"last":"Jones"}' \
'https://[PROJECT_ID].firebaseio.com/users/jack/name/.json'
But I dont know how to convert this to a rest based request.
TO be clear I need to send a web request from javascript/java, hence I want to know what should be the body , and header and operation type for this request.
Can someone please help?
If you use the documentation for curl, you can figure out what that command line you showed is trying to tell you.
The HTTP method is: PATCH
The request body is: {"last":"Jones"}
The url is: https://[PROJECT_ID].firebaseio.com/users/jack/name/.json
Where PROJECT_ID is the name of your project. That's all there is to it.
You need teh following structure:
HTTP Request:
https://firestore.googleapis.com/v1/projects/*YOUPROJECT_ID*/databases/(default)/documents/users_admin/*DOCUMENT_ID*?**updateMask.fieldPaths=user_name&updateMask.fieldPaths=permisos.Administrador&updateMask.fieldPaths=user_email**
JSON Body (must be exactly the same structure and type as your database):
{
"fields": {
"user_name": { "stringValue": "Test Actualización 2" },
"permisos": {
"mapValue": {
"fields": {
"Administrador": {
"booleanValue": true
}
}
}
},
"user_email": { "stringValue": "veviboj548#eyeremind.com" }
}
}