How do I batch archive my repositories? I'd preferably want to be able to sort through them and figure out a way to not archive my active repositories.
I have hundreds of old GitHub repositories in my account since before the GitHub notifications feature, and now I get vulnerability notifications for all of them. Here's what my notifications look, for projects that were last used maybe 6 years ago:
You can use the GitHub API along with two tools to achieve this. I'll be using:
Hub, but you can make direct API calls
jq, but you can use any JSON parser
Here's how:
Fetch a list of all the GitHub repositories in our account, and saving them in a file:
hub api --paginate users/amingilani/repos | jq -r '.[]."full_name"' > repos_names.txt
Go through that file manually, remove any repositories you don't want to archive
Archive all the repositories in the file:
cat repos_names.txt | xargs -I {} -n 1 hub api -X PATCH -F archived=true /repos/{}
Note: since 2020:
gh repo list has been released (with gh 1.7.0 in commit 00cb921, Q1 2021): it does take pagination in account, as it is similar to an alias like:
set -e
repos() {
local owner="${1?}"
shift 1
gh api graphql --paginate -f owner="$owner" "$#" -f query='
query($owner: String!, $per_page: Int = 100, $endCursor: String) {
repositoryOwner(login: $owner) {
repositories(first: $per_page, after: $endCursor, ownerAffiliations: OWNER) {
nodes {
nameWithOwner
description
primaryLanguage { name }
isFork
pushedAt
}
pageInfo {
hasNextPage
endCursor
}
}
}
}
' | jq -r '.data.repositoryOwner.repositories.nodes[] | [.nameWithOwner,.pushedAt,.description,.primaryLanguage.name,.isFork] | #tsv' | sort
}
repos "$#"
gh repo list --no-archived can limit the list to your not-yet-archived repositories
gh repo archive can then, for each element of that list, archive the GitHub repository.
wolfram77 also proposes in the comments:
gh repo list <org> | awk '{NF=1}1' | \
while read in; do gh repo archive -y "$in"; done
Using only gh.
gh repo list --no-archived --limit 144 --visibility public --source --json nameWithOwner --jq ".[].nameWithOwner" > repos_names.txt
Set --limit to the number of repositories you have.
Use vim to delete line which you don't want to archive by pressing dd on the line:
vim repos_names.txt
Run the the following command to arrive them
cat repos_names.txt | while read in; do gh repo archive -y "$in"; done
Clear after:
rm repos_names.txt
Related
How to use github cli to auto pull all newly created or updated repos to local pc?
I think I need to listen for new repo creation/updation and pull the repos. How to do it with cli?
If can't listen, I need to pull latest 100 repos to local machiene. How to do it?
I tried https://api.github.com/users/xxxx/repos?per_page=100. It gives in alphabetical order.
I use following code
#!/bin/sh
cat repolist.txt | while read line
do
REPOSRC=$line
LOCALREPO=$line
# We do it this way so that we can abstract if from just git later on
LOCALREPO_VC_DIR=$LOCALREPO/.git
if [ ! -d $LOCALREPO_VC_DIR ]
then
cd ~/xxxx
gh repo clone $REPOSRC
cd ~
else
cd ~
gh repo sync $REPOSRC -s $REPOSRC
fi
done
# End
The sort key
you're looking for seems to be sort=pushed.
Try curl -s 'https://api.github.com/users/xxxx/repos?sort=pushed&per_page=100' | jq '.[].name' to verify.
From the answer to a related question I know it's possible to batch clone repositories based on a GitHub search result:
# cheating knowing we currently have 9 pages
for i in {1..9}
do
curl "https://api.github.com/search/repositories?q=blazor+language:C%23&per_page=100&page=$i" \
| jq -r '.items[].ssh_url' >> urls.txt
done
cat urls.txt | xargs -P8 -L1 git clone
I also know that the Hub client allows me to make API calls.
hub api [-it] [-X METHOD] [-H HEADER] [--cache TTL] ENDPOINT [-F FIELD|--input FILE]
I guess the last step is, how do I archive a repository with Hub?
You can update a repository using the Update a Repository API call.
I put all my repositories in a TMP variable in the following way, and ran the following:
echo $TMP | xargs -P8 -L1 hub api -X PATCH -F archived=true
Here is a sample of what the $TMP variable looked like:
echo $TMP
/repos/amingilani/9bot
/repos/amingilani/advent-of-code-2019
/repos/amingilani/alan
/repos/amingilani/annotate_models
I'm able to get the file contents (and if it's a folder, I'm able to get the list of files) by using the GitHub v3 API.
Example:
https://api.github.com/repos/[Owner]/[Repository]/contents/[Folder]
But how can I know when the file was last updated? Is there an API for that?
If you know the exact file path, you can use list commits on repository API specifying a path which only includes commits with this specific file path and then extract the most recent commit (the most recent is the first one) :
Using Rest API v3
https://api.github.com/repos/bertrandmartel/speed-test-lib/commits?path=jspeedtest%2Fbuild.gradle&page=1&per_page=1
Using curl & jq :
curl -s "https://api.github.com/repos/bertrandmartel/speed-test-lib/commits?path=jspeedtest%2Fbuild.gradle&page=1&per_page=1" | \
jq -r '.[0].commit.committer.date'
Using GraphqQL API v4
{
repository(owner: "bertrandmartel", name: "speed-test-lib") {
ref(qualifiedName: "refs/heads/master") {
target {
... on Commit {
history(first: 1, path: "jspeedtest/build.gradle") {
edges {
node {
committedDate
}
}
}
}
}
}
}
}
Try it in the explorer
Using curl & jq :
curl -s -H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type:application/json" \
-d '{
"query": "{ repository(owner: \"bertrandmartel\", name: \"speed-test-lib\") { ref(qualifiedName: \"refs/heads/master\") { target { ... on Commit { history(first: 1, path: \"jspeedtest/build.gradle\") { edges { node { committedDate } } } } } } } }"
}' https://api.github.com/graphql | \
jq -r '.data.repository.ref.target.history.edges[0].node.committedDate'
Using Python
pip install PyGithub
from github import Github
g = Github()
repo = g.get_repo("datasets/population")
print(repo.name)
commits = repo.get_commits(path='data/population.csv')
print(commits.totalCount)
if commits.totalCount:
print(commits[0].commit.committer.date)
Output:
population
5
2020-04-14 15:09:26
https://github.com/PyGithub/PyGithub
That would be surprising, considering git does not store file timestamps (and other metadata like permissions and ownership), for reasons I detailed here.
So that information is not present on the remote repository side (here GitHub) either.
You can actually determine what you want using the request you specifically mentioned.
Note that all dates/times below are under GMT (of course).
Copy & paste the following command to find the last modified date and time of Folder/File in repository ForStackExchange by user YenForYang:
\curl -sIA. --ignore-content-length \
-H"If-Modified-Since: Sun May 01 00:00:00 9999" \
"https://api.github.com/repos/YenForYang/ForStackExchange/contents/Folder/File?ref=branch" \
| \grep -m1 -oP "(?<=Last-Modified: )[ADFJMNOSTWa-eghilnoprtuvy0-9:, ]{25}" \
(If Perl regex isn't available, you can ... | grep -F -m1 "Last-Modified:")
The above command should return (GMT): Thu, 27 Dec 2018 11:01:26
(or later, if I update the file for some reason)
Note that if the ref parameter is unspecified, ref=master.
And if you can't copy and paste, and don't care about the API rate limits, you might opt for the shorter:
\curl -sIL "api.github.com/repos/yenforyang/forstackexchange/contents/Folder/File?ref=branch" | \grep "^Las"
And if ya don't have grep on Windows just use find "Last-Modified: " instead (doubles quotes are necessary).
And if you don't have curl on Windows (download it...or) use Powershell
(iwr -me HEAD -usebasic "https://api.github.com/repos/yenforyang/forstackexchange/contents/Folder/File?ref=branch").Headers."Last-Modified"
When you run a public repository with GitHub Pages and want to let your visitors know the history of a page, you can simply put the hyperlink to the https://github.com/<user>/<user>.github.io/commits/main/<path-to-file> for that page on that page.
Looking for particular command or python script to download all repositories or sub branches of the particular organization from Github at once
This gist (or this one) allows to list and clone all repos from an organization
curl -s https://api.github.com/orgs/twitter/repos?per_page=200 | ruby -rubygems -e 'require "json"; JSON.load(STDIN.read).each { |repo| %x[git clone #{repo["ssh_url"]} ]}'
You have the same in python with the project muhasturk/gitim
It isn't hard to curl the zip archive of a repo instead (instead of cloning the repo):
curl -u '<git username>' -L -o master.zip https://github.com/<organization>/<reponame>/zipball/master
Here's my query to the GitHub API
curl -i -u {user} https://api.github.com/orgs/{org}/repos?type=all
But this does not list all repos for this organization that I have access to. Specifically, it does not list repos in the organization that are part of a team that I am a member of.
If I were to query
curl -i -u {user} https://api.github.com/teams/{teamid}/repos
I would see the missing repos. If I were to navigate to github.com, I would see both private organization repos and my team repos next to each other on the same page. Is there a way to get all of these repos in the same API query?
You can use the below command
gh repo list {organization-name}
before this login with below command
gh auth login
github.com/cli/cli
I apologize. It was listing all my repos...on subsequent pages. Learning about "page=" and "per_page=" was all I needed, and now I can see all of the repos I need.
To add to original answer, below command can be used if there are many repositories and you wanted to fetch required number of repos. Without -L flag it returns 30 items by default.
gh repo list <org> -L <#>
If you have gh, the github cli (https://cli.github.com/) and jq (https://stedolan.github.io/jq/) you can do this:
gh repo list $ORG -L $COUNT --json name | jq '.[].name' | tr -d '"'
where $ORG is your organization name and $COUNT is the max number of repos returned
Download the official gh cli, the github cli (https://cli.github.com/)
gh repo list $ORG -L $COUNT --json name --jq '.[].name'
Set $ORG equal to your organization name, and $COUNT to be the amount of Repos you want to list. (Set $COUNT equal to the amount of repos in the organization if you want to list them all)
curl -i -u "username":"password" https://your_git_url.com/organization_name | grep "codeRepository" | awk -F '"' '{print $6}'
Have you tried setting the "per_page"-attribute to "0"? I have seen some APIs using a default value of for example 20, but if you activately set it to 0, like ?per_page=0 you get all pages.