Is there a way to bulk/batch download all repos from Github based on a search result? - github

I run this search on Guthub and I get 881 repos. Blazor & C# repos.
https://github.com/search?l=C%23&q=blazor&type=Repositories
Is there a way to download all these repos easily instead of one by one?

Yes, your query can be run via the github search api:
https://api.github.com/search/repositories?q=blazor+language:C%23&per_page=100&page=1
That gives you one page of 100 repositories. You can loop over all pages, extract the ssh_url (or http if you prefer), and write the result to a file:
# cheating knowing we currently have 9 pages
for i in {1..9}
do
curl "https://api.github.com/search/repositories?q=blazor+language:C%23&per_page=100&page=$i" \
| jq -r '.items[].ssh_url' >> urls.txt
done
cat urls.txt | xargs -P8 -L1 git clone
You can optimize to extract the number of pages from the response headers.
References:
https://developer.github.com/v3/search/
Parsing JSON with Unix tools
How to apply shell command to each line of a command output?
Running programs in parallel using xargs
Similar question:
GitHub API - Different number of results for jq filtered response

Related

Delete a workflow from GitHub Actions

I create a couple workflows in the .github/workflows folder of my repository to experiment with GitHub Actions. I have since learned quite a bit and deleted said "experimental" workflows from my repo. After deleting the "experimental" workflow yaml files and committing the deletions, when I go to the Actions tab of my repository I STILL see the workflows that I have since deleted.
I see no option to delete and start from scratch?! Is this not possible? Is it maybe possible through GitHub API? Hmm.
As of July 7, 2020, you can now delete the results of individual workflow runs. To do this, navigate to your workflow, find the workflow run that you want to delete, and select the "..." menu. In this menu, select "Delete workflow run".
The workflow run and its logs will be removed.
Currently, you must do this for each workflow run individually.
edit: As of 2021 Feb it seems that after all workflow runs are deleted
the workflow it self disappears. One comment below also seems to confirm this.
It doesn't seem that there is currently a way to delete those workflows - this makes no sense - but it appears that once one makes the mistake of creating one they are stuck with it forever. The only solution so far I found is to disable these workflows.
So if I go to the Actions tab (edit the url to match your repo), I can then click on a workflow and disable it via [...] in the right top corner of that tab as in the snapshot below:
To delete all workflow results at once
To delete the records here is the solution I found here with slight modifications from the original:
user=GH_USERNAME repo=REPO_NAME; gh api repos/$user/$repo/actions/runs \
--paginate -q '.workflow_runs[] | select(.head_branch != "master") | "\(.id)"' | \
xargs -n1 -I % gh api repos/$user/$repo/actions/runs/% -X DELETE
Replace GH_USERNAME and REPO_NAME with the desired github username and repo name correspondingly.
This will delete all the old workflows that aren't on the master branch. You can further tweak this to do what you need.
Prerequisites:
You will find the latest gh version here.
Notes:
You may have to gh auth login if this is your first time using it
You may further change the command to gh api --silent if you prefer not to see the verbose output.
For the final xargs part of the command chain - the original used
-J instead of -I, which is not supported by GNU xargs. -J
results in a single command, and -I will execute the command for
each records, so it's a bit slower.
Thank you to the OP on the community forum for sharing this in first place.
Here's a few commands to quickly clean up your workflows.
You'll need the xargs, gh and jq CLI tools.
Depending on how many runs you have you'll have to execute the delete step multiple times because the GH API endpoints are paginated.
OWNER=<your user/org name>
REPO=<repo name>
# list workflows
gh api -X GET /repos/$OWNER/$REPO/actions/workflows | jq '.workflows[] | .name,.id'
# copy the ID of the workflow you want to clear and set it
WORKFLOW_ID=<workflow id>
# list runs
gh api -X GET /repos/$OWNER/$REPO/actions/workflows/$WORKFLOW_ID/runs | jq '.workflow_runs[] | .id'
# delete runs (you'll have to run this multiple times if there's many because of pagination)
gh api -X GET /repos/$OWNER/$REPO/actions/workflows/$WORKFLOW_ID/runs | jq '.workflow_runs[] | .id' | xargs -I{} gh api -X DELETE /repos/$OWNER/$REPO/actions/runs/{}
Based on the #Giampaolo Rodolà answer (which worked for me), I created this simple shell script that does the job.
Disable the workflow you want to delete (via Github console) before executing this script.
org=<your org>
repo=<your repo>
# Get workflow IDs with status "disabled_manually"
workflow_ids=($(gh api repos/$org/$repo/actions/workflows | jq '.workflows[] | select(.["state"] | contains("disabled_manually")) | .id'))
for workflow_id in "${workflow_ids[#]}"
do
echo "Listing runs for the workflow ID $workflow_id"
run_ids=( $(gh api repos/$org/$repo/actions/workflows/$workflow_id/runs --paginate | jq '.workflow_runs[].id') )
for run_id in "${run_ids[#]}"
do
echo "Deleting Run ID $run_id"
gh api repos/$org/$repo/actions/runs/$run_id -X DELETE >/dev/null
done
done
Outcome:
Listing runs for the workflow ID 5261185
Deleting Run ID 507553125
Deleting Run ID 507548002
Listing runs for the workflow ID 5261568
Deleting Run ID 525274011
Deleting Run ID 525264327
Deleting Run ID 525247443
Ensure to have Github client installed and required token permissions in Github.
I managed to fix this (currently not possible via UI) by using "gh" CLI tool and reading REST API docs.
First, get all your workflows (these are the ones shown in the web UI -> Actions -> left column):
$ gh api repos/$YOUR_USER/$YOUR_REPO/actions/workflows
{
"total_count": 2,
"workflows": [
{
"id": 3611607,
"name": "FreeBSD",
...
},
{
"id": 4336318,
"name": "MacOS",
...
}
]
}
Use the ID of the workflow you want to delete (say 3611607) to get all of its individual runs:
$ gh api repos/$YOUR_USER/$YOUR_REPO/actions/workflows/3611607/runs
{
"total_count": 17,
"workflow_runs": [
{
"id": 363876785,
"name": "FreeBSD",
...
},
{
"id": 363876786,
"name": "FreeBSD",
...
},
{
"id": 363876787,
"name": "FreeBSD",
...
},
}
For each run id (let's say 363876785), delete it with:
$ gh api repos/$YOUR_USER/$YOUR_REPO/actions/runs/363876785 -X DELETE
After this, the undeletable Action in the left column of the web UI should disappear.
Delete all runs from a certain workflow
An improved version of #Sheece Gardazi's answer that supports selecting a certain workflow:
export OWNER="my-user"
export REPOSITORY="my-repo"
export WORKFLOW="My Workflow"
gh api -X GET /repos/$OWNER/$REPOSITORY/actions/runs --paginate \
| jq '.workflow_runs[] | select(.name == '\"$WORKFLOW\"') | .id' \
| xargs -t -I{} gh api -X DELETE /repos/$OWNER/$REPOSITORY/actions/runs/{}
(The -t option to xargs prints each workflow run deletion request to the terminal.)
It requires GitHub CLI:
brew install gh
gh auth login
and jq:
brew install jq
Delete all jobs belonged to your workflow and your workflow will be gone
P/s: in the case you have thousand of jobs to delete, then using API is a good way to go with: https://docs.github.com/en/rest/reference/actions#workflow-runs
Until GitHub implements a "Delete all workflow runs", you have to rely on the API. With the CLI tools gh and jq installed on your workstation, you can run the following commands to delete all runs of that workflow. Once all runs are removed, it won't show up anymore in the UI.
cd /path/to/your/repo
gh workflow list # Pick-up the workflow ID for which you want to delete all runs
WORKFLOW_ID=<the workflow id> # Change this line!
# List last 10 runs of the workflow you picked to double check the id
gh run list -L 10 -w $WORKFLOW_ID
# Some set up
REPO_INFO=$(gh repo view --json name,owner)
REPO_FULL_NAME="$(echo $REPO_INFO | jq '.owner.login' -r)/$(echo $REPO_INFO | jq '.name' -r)"
# Ready? Let's delete some runs!
gh api -X GET "/repos/$REPO_FULL_NAME/actions/workflows/$WORKFLOW_ID/runs?per_page=100" | jq '.workflow_runs[] | .id' -r | xargs -t -I{} gh api --silent -X DELETE /repos/$REPO_FULL_NAME/actions/runs/{}
The last command will delete the last 100 runs (limit from GitHub API). If you have more, run it multiple times to delete all.
Following to the Github Actions document: https://docs.github.com/en/actions/managing-workflow-runs/deleting-a-workflow-run
It should be easy to delete a workflow which you don't need any more, like showing in this image
If you don't see that delete option but the disable workflow instead, then it's because that workflow still have some workflow runs.
You need to delete those workflow runs and then the delete option will appears :)
And a PowerShell implementation (thanks to the other respondents),
which also requires the gh cli.
$user = "your user"
$repo = "repo"
(gh api repos/$user/$repo/actions/runs | ConvertFrom-Json).workflow_runs |
%{ $_.id } |
%{ gh api repos/$user/$repo/actions/runs/$_ -X DELETE }
Re-run the "one-liner" until you have no more; it currently pages to 30 results.
I had 600+ actions that I wanted deleted so there were multiple pages. I had to run the command in for loop:
# install following packages
sudo snap install jq gh
# To authenticate github cli
gh auth login
# for reference path to your code repository: https://github.com/$OWNER/$REPOSITORY
export OWNER=<OWNER_NAME/ORGANIZATIONS_NAME>
export REPOSITORY=<REPOSITORY_NAME>
# repeat command 30 times, if there are 30 pages of workflow history
for i in {1..30}; do gh api -X GET /repos/$OWNER/$REPOSITORY/actions/runs | jq '.workflow_runs[] | .id' | xargs -I{} gh api -X DELETE /repos/$OWNER/$REPOSITORY/actions/runs/{}; done
I wasn't able to delete the workflow inspite of all the answers in this post.. What worked for me was that I first authenticated myself using "gh auth login" and the used the below command to get the details of the workflow that you want delete. If you do not know the workflow id, then just run "gh api repos/$org/$repo/actions/workflows/" to see all the workflows. Once you run this, you will know the branch where you need to delete the workflow from. In our case, the work flow existed in the "develop" branch. You can see this in the "html_url". Once I deleted the workflow file from the develop branch, the workflow vanished from everywhere. Also, when you run the "gh api repos/$org/$repo/actions/workflows/$workflow_id", you will notice that the state will be changed to "deleted".
$> gh api repos/$org/$repo/actions/workflows/$workflow_id
{
"id": 6,
"node_id": "blah",
"name": "Run Unit Tests",
"path": ".github/workflows/unittests.yml",
"state": "active",
"created_at": "2021-05-15T00:25:19.000-07:00",
"updated_at": "2022-03-10T13:02:43.000-08:00",
"url": "blah",
"html_url": "https://company-name/workspace/project/blob/develop/.github/workflows/unittests.yml",
"badge_url": "blah"
}
For anyone wondering, deleting the workflow.yml files within .github/workflows works BUT you need to make sure it is deleted in all branches. If master/main still has the workflow files then GitHub will hold onto them.
It should be automatically be removed once you remove all related workflow runs.
Deleting the workflows runs via the CLI was only part of the solution in my case.
GitHub still refused to show any workflows I tried to add again afterwards.
I solved it by using the "New workflow" button in GH and to create a workflow from template.
I pasted the content of my original YML file and renamed the file so that everything looked like before.
Lastly, I committed via web - and GitHub showed again my workflow.
Here is another option to delete all logs from a Github actions workflow automatically, using Ritchie CLI.
All you need to do is:
run rit github delete workflow-logs
inform your github username and token
inform the repo owner and name
select the repo workflow to clean
An example can be seen here. Note that you need to import this repository using the Ritchie CLI tool for the command to work.
Install the Ritchie CLI: https://docs.ritchiecli.io/
Run rit add repo
? Select your provider: Github
? Repository name: formulas-github
? Repository URL: https://github.com/GuillaumeFalourd/formulas-github
? Is a private repository? no
? Select a tag version: {latest}
? Set the priority: 1
Run rit set formula-runner
? Select a default formula run type: docker
The default formula runner has been successfully configured!
Start Docker Desktop
Go to https://github.com/settings/tokens, add generate a GitHub Personal Access Token (use the "workflow" scope)
Run rit github delete workflow-logs
Enter your GitHub username, GitHub token, GitHub owner, GitHub repository name and select the workflow for which you want the runs to be deleted 🙌
Revoke your PAT
I created this command line tool to conveniently select multiple entries and delete them together from a navigable list:
https://github.com/jv-k/delete-workflow-runs
I made some minor adjustments to #david-miguel's answer which offloads the filtering to github's api instead of doing it locally.
#!/bin/sh
OWNER="user-name|org-name"
REPOSITORY="repository-name"
WORKFLOW="workflow-human-friendly-name"
gh run list --repo "$OWNER/$REPOSITORY" -w "$WORKFLOW" --json databaseId \
| jq '.[].databaseId' \
| xargs -I{} gh api -X DELETE /repos/$OWNER/$REPOSITORY/actions/runs/{}
I did find a way of doing this. You can go to .github/workflows or wherever your workflow is set and then commit deleting of the file(workflow file) which will finally delete it.
If you want to delete multiple workflow runs, you should use the GitHub Action API to get the run ids you want to delete, then send DELETE request with a header containing personal access token to delete the workflow run.
Try this python script to delete all workflow runs.
https://github.com/wh201906/garage/blob/master/Python/DeleteGithubAction/delete.py
It uses grequests to make multi requests at once, and this script doesn't require gh and jq.
For the people who deleted the workflow and can't delete the workflow run (404) ?
After finding this thread, i still did spent quite some time to make my workflows run fade away...
The answer provided by #ribeiro is correct, however for people that have DELETED their workflows, they won't be able to delete the runs, it will resolve into a 404!
How to delete workflows run that doesn't have workflows anymore
I've tried many things, in the end, only this worked for me:
For each workflows you deleted, create the yml file with the same name that the previous one(filename.yml and the name property in the workflow must be the same as the previous one). This way github will see them as 'existing'.
Create a .sh file and input the following (i will use the answer provided by Ribeiro, however, i will remove the state filter because i wish to delete them all)
OWNER="aOwner"
REPO="aRepo"
workflow_ids=($(gh api repos/$OWNER/$REPO/actions/workflows | jq '.workflows[] | .id'))
for workflow_id in "${workflow_ids[#]}"
do
echo "Listing runs for the workflow ID $workflow_id"
run_ids=( $(gh api repos/$OWNER/$REPO/actions/workflows/$workflow_id/runs --paginate | jq '.workflow_runs[].id') )
for run_id in "${run_ids[#]}"
do
echo "Deleting Run ID $run_id"
gh api repos/$OWNER/$REPO/actions/runs/$run_id -X DELETE >/dev/null
done
done
Run the sh file. You should see workflows run deleted (it does take some times tho)
Update 1/3/2023:
I found something new, if I add test.yml back, the history will be back.
To "truly" delete the workflow history, please refer to other answers.
As of Dec 31, 2022, assume you have a workflow .github/workflows/test.yml.
Once you delete this workflow file and merge into main branch, the workflow on the side menu plus all workflow run history will be gone.
You can verify by manually go to the old workflow link https://github.com/[username]/[repo]/actions/workflows/test.yml
This is the current result.
Here is an answer that just uses the GitHub CLI (you need to install and authenticate re: https://github.com/cli/cli#installation) and awk:
OWNER="repoOwner"
REPO="repoName"
gh run list --repo <[HOST/]OWNER/REPO> -workflow <string> | \
awk -F '\t' '{ if ($2 == "failure") { print $7 } }' | \
while read id; do gh api repos/$OWNER/$REPO/runs/$id -X DELETE; done
Remove the workflow file from the default branch.
Delete the workflow runs:
owner=hello-seer
repo=stylelab
workflow=lint.yml
gh api repos/"$owner"/"$repo"/actions/workflows/"$workflow"/runs --paginate -q '.workflow_runs[] | .id' | while read -r id; do
echo "$id"
gh api -X DELETE repos/"$owner"/"$repo"/actions/runs/"$id"
done
Update your local branch from to sync with master , then delete the github/workflows.
Commit and push your changes .
Wokflow should be deleted in master
Your workflows are *.yml files saved in your repo in the folder /.github/workflows/
Just delete them!
I tried deleting yml file from this location .github/workflows/ and it worked like a charm.
you can delete the file from the Code tab in github as it was added as a normal commit
click on the file and then click delete icon:

Using the Github API is it possible to determine if a branch is ahead of the default branch?

Using the Github API (no local git commands), is it possible to compare a branch to see if it has any changes ahead of the default branch?
I'm building an auditing tool, and would like to identify branches that are candidates to be closed, because all of their changes exist in the default branch.
I want the same information that drives the charts on the branches page:
(See https://github.com/octokit/octokit.rb/branches)
Is it possible to do get this information purely with the Github API?
You can:
get the default branch using https://api.github.com/repos/octokit/octokit.rb
compare the specified branch to the default branch using compare two commits API and extract ahead_by & behind_by fields.
In that case it would be :
https://api.github.com/repos/octokit/octokit.rb/compare/kytrinyx/generator/spike...master
Example using bash, curl & jq :
branch=kytrinyx/generator/spike
default_branch=$(curl -s "https://api.github.com/repos/octokit/octokit.rb" | jq -r '.default_branch')
curl -s "https://api.github.com/repos/octokit/octokit.rb/compare/$branch...$default_branch" | \
jq -r '.ahead_by, .behind_by'

GitHub URL for latest release of the _download file_?

Although this question is similar to GitHub latest release, it's actually different -- it's about a link that means "the latest version of the download file itself".
GitHub provides a "Latest" URL that redirects to the information page for the latest release. For example: https://github.com/reactiveui/ReactiveUI/releases/latest will redirect to https://github.com/reactiveui/ReactiveUI/releases/tag/5.99.6 (as I type this; or to the page for a newer version, someday).
That's great but I need a URL to the download file itself. In this example, the .zip file associated with the green download button, https://github.com/reactiveui/ReactiveUI/releases/download/5.99.6/ReactiveUI-5.99.6.zip (as I type this; or to a newer zip file, someday).
Why? I want to give the URL to curl, as part of a Travis CI script, to download the latest version.
I guessed at a few URLs like /releases/download/latest/file.zip (substituting "latest" for the version part) and /releases/download/file.zip but those 404.
Is there any way to do this -- in the context of a shell script and curl (note: not in a browser page with JS)?
For releases that do not contain the version number or other variable content in their assets' names, you can use a URL of the format:
https://github.com/owner/repository/releases/latest/download/ASSET.ext
As per the docs:
If you'd like to link directly to a download of your latest release asset you can link to /owner/name/releases/latest/download/asset-name.zip.
Here is a way to do it w/o Github if you have a single download in the release:
wget $(curl -s https://api.github.com/repos/USERNAME/REPONAME/releases/latest | grep 'browser_' | cut -d\" -f4)
It is pretty easy (though not pretty), and of course you can swap out wget for another curl call if you want to pipe it to something.
Basically, the curl call nets you a JSON structure, and I'm just using basic shell utilities to extract the URL to the download.
Very interesting, I haven't noticed a "latest" tag in GitHub-releases yet. As i now figured out, they're given away if you're using the "pre-release"-capabilities of GitHubs release-system. But i don't know any way to access binaries via a latest-path.
I would like to suggest you using git (which is available in your travis-vm) to download the latest tag.
Like Julien Renault describes in his blog post, you will be able to checkout the latest tag in the repository like this:
# this step should be optional
git fetch --tags
latestTag=$(git describe --tags `git rev-list --tags --max-count=1`)
git checkout $latestTag
This solution is based on the assumption that the latest tag is also the latest version.
I use this to get the download URLs in PowerShell 5+ (replace ACCOUNT & REPO)
Invoke-RestMethod -uri https://api.github.com/repos/ACCOUNT/REPO/releases/latest | select -ExpandProperty assets | select -expand browser_download_url
Note if they have more than one package this will be a list. If you want to pick a certain one find a unique part of the name i.e. win for Windows and use:
(replace ACCOUNT, REPO & SELECTOR)
Invoke-RestMethod -uri https://api.github.com/repos/ACCOUNT/REPO/releases/latest | select -ExpandProperty assets | ? { $_.name.Contains("SELECTOR")} | select -expand browser_download_url
As a bonus if you assign the above to a variable you can then grab the file and extract it with the following (assuming you assign to $uri):
Invoke-WebRequest $uri -OutFile "release.zip"
Expand-Archive .\release.zip
In PowerShell 6+ this should work on other platforms than Windows.
On windows, only using powershell, this works for me. It can probably be written a lot shorter.
#Downloads latest paket.bootstrapper.exe from github
$urlbase = "https://github.com"
$latestPage="$urlbase/fsprojects/Paket/releases/latest"
Write-Host "Parsing latest release page: $latestPage"
$page=Invoke-Webrequest -uri $latestPage
$latestBootStrapper=($page.Links | Where-Object { $_.href -match "bootstrapper" }).href
$dlurl="$urlbase$latestBootStrapper"
Write-Host "Downloading paket.bootstrapper.exe from $dlurl"
$wc=new-object net.webclient
$wc.UseDefaultCredentials=$true
$wc.Proxy.Credentials=$wc.Credentials
$wc.DownloadFile($dlurl, (join-path (resolve-path ".\") "paket.bootstrapper.exe"))
$repoName = "PowerShell/PowerShell"
$assetPattern = "*-win-x64.msi"
$extractDirectory = "C:\Users\Public\Downloads"
$releasesUri = "https://api.github.com/repos/$repoName/releases/latest"
$asset = (Invoke-WebRequest $releasesUri | ConvertFrom-Json).assets | Where-Object name -like $assetPattern
$downloadUri = $asset.browser_download_url
$extractPath = [System.IO.Path]::Combine($extractDirectory, $asset.name)
Invoke-WebRequest -Uri $downloadUri -Out $extractPath
You can use curl with https://api.github.com. It gives JSON output from which you can easily extract what you need with jq or your favorite json tool.
For example, using the repository in the question:
gituser=reactiveui; repo=ReactiveUI
tag_name=$(curl -sL https://api.github.com/repos/$gituser/$repo/releases/latest | jq -r '.tag_name'); echo $tag_name
# output: "16.3.10"
tarurl=$(curl -sL https://api.github.com/repos/$gituser/$repo/releases/latest | jq -r '.tarball_url'); echo $tarurl
# output: https://api.github.com/repos/reactiveui/ReactiveUI/tarball/16.3.10
zipurl=$(curl -sL https://api.github.com/repos/$gituser/$repo/releases/latest | jq -r '.zipball_url'); echo $zipurl
# output: https://api.github.com/repos/reactiveui/ReactiveUI/zipball/16.3.10
So you could get the download with a nested curl in a one-liner:
curl -OL $(curl -sL https://api.github.com/repos/filesender/filesender/releases/latest | jq -r '.tarball_url')
This will download the file, and save it with the name of its tag_name, but without extension. So you may want to rename it by appending ".tgz" or ".zip", depending on which you downloaded.
Note for Windows users: curl is now installed by default on Windows too, but beware that it must be called as curl.exe. That's because Powershell has an alias stupidly called "curl" which is not the same!
Centos/RHEL
There are 2 options to download using the URL directly.
Via Github API (using CURL and jq package)
Via Github direct (using CURL and sed)
I am listing a demonstration script for each option.
Option 1
#!/bin/bash
# author: fullarray
# Contribution shared on: stackoverflow
# Contribution shared on: github
# date: 06112022
compose_version=$(curl https://api.github.com/repos/docker/compose/releases/latest | jq .name -r)
get_local_os_build=$(uname -s)-$(uname -m)
curl -L https://github.com/docker/compose/releases/download/$compose_version/docker-compose-$get_local_os_build
Option 2
#!/bin/bash
# author: fullarray
# Contribution shared on: stackoverflow
# Contribution shared on: github
# date: 06112022
get_local_os_build=$(uname -s)-$(uname -m)
compose_latest_version=$(curl -L "https://github.com/docker/compose/releases/download/`curl -fsSLI -o /dev/null -w %{url_effective} https://github.com/docker/compose/releases/latest | sed 's#.*tag/##g' && echo`/docker-compose-$get_local_os_build")
If you are fine with cloning the repository first, you can use git tag, which also allows you to sort the tags by version in various ways.
git clone https://github.com/reactiveui/ReactiveUI.git .
LATEST="$(git tag --sort=v:refname | tail -n1)"
git checkout "$LATEST"
This allows for more flexibility, as you can filter the tags you're not interested in with grep, e.g.:
git tag --sort=v:refname | grep -vE '-RC[0-9]+$' | tail -n1
Here's an excerpt from the documentation on git-tag:
Sort based on the key given. Prefix - to sort in descending order of the value. You may use the --sort=<key> option multiple times, in which case the last key becomes the primary key. Also supports version:refname or v:refname (tag names are treated as versions). The version:refname sort order can also be affected by the versionsort.suffix configuration variable. The keys supported are the same as those in git for-each-ref. Sort order defaults to the value configured for the tag.sort variable if it exists, or lexicographic order otherwise. See git-config(1).
If you really don't want to clone the repository, the --sort option also works with git ls-remote. It'll just take a bit more work to get the part you're interested in:
git ls-remote --tags --sort=v:refname https://github.com/reactiveui/ReactiveUI.git | awk -F'/' '{ print $NF }'
This approach doesn't seem to work all too well for the ReactiveUI repository in particular, because their tags are a bit messy, but it's an option.
Please note, that the sorting isn't quite the same as with semantic versioning, but git does allow you to work around most of these cases. As an example mqtt2prometheus has release candidates using the suffix RC1, RC2 etc., but git sorts 0.1.6-RC1 as being newer than 0.1.6. You can tell git that "RC" is a pre-release suffix to make it sort them correctly.
git tag -c 'versionsort.suffix=-RC' --sort=v:refname | tail -n1
Here's an excerpt from the documentation on git-config:
By specifying a single suffix in this variable, any tagname containing that suffix will appear before the corresponding main release. E.g. if the variable is set to "-rc", then all "1.0-rcX" tags will appear before "1.0". If specified multiple times, once per suffix, then the order of suffixes in the configuration will determine the sorting order of tagnames with those suffixes. E.g. if "-pre" appears before "-rc" in the configuration, then all "1.0-preX" tags will be listed before any "1.0-rcX" tags.
You can also sort by the date of the tag using --sort=taggerdate, that might work better in some situations.
As #florianb pointed out, I should use git.
Originally my .travis.yml was something like:
before_install:
- curl -L https://raw.githubusercontent.com/greghendershott/travis-racket/master/install-racket.sh | bash
This would automatically get whatever the latest version is, from the repo.
But someone pointed out to me that GitHub doesn't want people to use raw.github.com for downloads. Instead people should use "releases". So I was a good doob and manually made a release each time. Then my .travis.yml was something like:
before_install:
- curl -L https://github.com/greghendershott/travis-racket/releases/download/v0.6/install-racket.sh | bash
But it's a PITA to make a release each time. Worse, all .travis.yml files need to be updated to point to the newer version of the file.
Instead -- just use git to clone the repo, and use the file within it:
before_install:
- git clone https://github.com/greghendershott/travis-racket.git
- cat travis-racket/install-racket.sh | bash # pipe to bash not sh!

wget appends query string to resulting file

I'm trying to retrieve working webpages with wget and this goes well for most sites with the following command:
wget -p -k http://www.example.com
In these cases I will end up with index.html and the needed CSS/JS etc.
HOWEVER, in certain situations the url will have a query string and in those cases I get an index.html with the query string appended.
Example
www.onlinetechvision.com/?p=566
Combined with the above wget command will result in:
index.html?page=566
I have tried using the --restrict-file-names=windows option, but that only gets me to
index.html#page=566
Can anyone explain why this is needed and how I can end up with a regular index.html file?
UPDATE: I'm sort of on the fence on taking a different approach. I found out I can take the first filename that wget saves by parsing the output. So the name that appears after Saving to: is the one I need.
However, this is wrapped by this strange character â - rather than just removing that hardcoded - where does this come from?
If you try with parameter "--adjust-extension"
wget -p -k --adjust-extension www.onlinetechvision.com/?p=566
you come closer. In www.onlinetechvision.com folder there will be file with corrected extension: index.html#p=566.html or index.html?p=566.html on *NiX systems. It is simple now to change that file to index.html even with script.
If you are on Microsoft OS make sure you have latter version of wget - it is also available here: https://eternallybored.org/misc/wget/
To answer your question about why this is needed, remember that the web server is likely to return different results based on the parameters in the query string. If a query for index.html?page=52 returns different results from index.html?page=53, you probably wouldn't want both pages to be saved in the same file.
Each HTTP request that uses a different set of query parameters is quite literally a request for a distinct resource. wget can't predict which of these changes is and isn't going to be significant, so it's doing the conservative thing and preserving the query parameter URLs in the filename of the local document.
My solution is to do recursive crawling outside wget:
get directory structure with wget (no file)
loop to get main entry file (index.html) from each dir
This works well with wordpress sites. Could miss some pages tho.
#!/bin/bash
#
# get directory structure
#
wget --spider -r --no-parent http://<site>/
#
# loop through each dir
#
find . -mindepth 1 -maxdepth 10 -type d | cut -c 3- > ./dir_list.txt
while read line;do
wget --wait=5 --tries=20 --page-requisites --html-extension --convert-links --execute=robots=off --domain=<domain> --strict-comments http://${line}/
done < ./dir_list.txt
The query string is required because of the website design what the site is doing is using the same standard index.html for all content and then using the querystring to pull in the content from another page like with script on the server side. (it may be client side if you look in the JavaScript).
Have you tried using --no-cookies it could be storing this information via cookie and pulling it when you hit the page. also this could be caused by URL rewrite logic which you will have little control over from the client side.
use -O or --output-document options. see http://www.electrictoolbox.com/wget-save-different-filename/

How do I get the raw version of a gist from github?

I need to load a shell script from a raw gist but I can't find a way to get raw URL.
curl -L address-to-raw-gist.sh | bash
And yet there is, look for the raw button (on the top-right of the source code).
The raw URL should look like this:
https://gist.githubusercontent.com/{user}/{gist_hash}/raw/{commit_hash}/{file}
Note: it is possible to get the latest version by omitting the {commit_hash} part, as shown below:
https://gist.githubusercontent.com/{user}/{gist_hash}/raw/{file}
February 2014: the raw url just changed.
See "Gist raw file URI change":
The raw host for all Gist files is changing immediately.
This change was made to further isolate user content from trusted GitHub applications.
The new host is
https://gist.githubusercontent.com.
Existing URIs will redirect to the new host.
Before it was https://gist.github.com/<username>/<gist-id>/raw/...
Now it is https://gist.githubusercontent.com/<username>/<gist-id>/raw/...
For instance:
https://gist.githubusercontent.com/VonC/9184693/raw/30d74d258442c7c65512eafab474568dd706c430/testNewGist
KrisWebDev adds in the comments:
If you want the last version of a Gist document, just remove the <commit>/ from URL
https://gist.githubusercontent.com/VonC/9184693/raw/testNewGist
One can simply use the github api.
https://api.github.com/gists/$GIST_ID
Reference: https://miguelpiedrafita.com/github-gists
Gitlab snippets provide short concise urls, are easy to create and goes well with the command line.
Sample example: Enable bash completion by patching /etc/bash.bashrc
sudo su -
(curl -s https://gitlab.com/snippets/21846/raw && echo) | patch -s /etc/bash.bashrc