Set Google Storage Bucket's default cache control - google-cloud-storage

Is there any way to set Bucket's default cache control (trying to override the public, max-age=3600 in bucket level every time creating a new object)
Similar to defacl but set the cache control

If someone is still looking for an answer, one needs to set the metadata while adding the blob.
For those who want to update the metadata for all existing objects in the bucket, you can use setmeta from gsutil - https://cloud.google.com/storage/docs/gsutil/commands/setmeta
You just need to do the following :
gsutil setmeta -r -h "Cache-control:public, max-age=12345" gs://bucket_name

Using gsutil
-h: Allows you to specify certain HTTP headers
-r: Recursive
-m: To performing a sequence of gsutil operations that may run significantly faster.
gsutil -m setmeta -r -h "Cache-control:public, max-age=259200" gs://bucket-name

It is possible to write a Google Cloud Storage Trigger.
This function sets the Cache-Control metadata field for every new object in a bucket:
from google.cloud import storage
CACHE_CONTROL = "private"
def set_cache_control_private(data, context):
"""Background Cloud Function to be triggered by Cloud Storage.
This function changes Cache-Control meta data.
Args:
data (dict): The Cloud Functions event payload.
context (google.cloud.functions.Context): Metadata of triggering event.
Returns:
None; the output is written to Stackdriver Logging
"""
print('Setting Cache-Control to {} for: gs://{}/{}'.format(
CACHE_CONTROL, data['bucket'], data['name']))
storage_client = storage.Client()
bucket = storage_client.get_bucket(data['bucket'])
blob = bucket.get_blob(data['name'])
blob.cache_control = CACHE_CONTROL
blob.patch()
You also need a requirements.txt file for the storage import in the same directory. Inside the requirements there is the google-cloud-storage package:
google-cloud-storage==1.10.0
You have to deploy the function to a specific bucket:
gcloud beta functions deploy set_cache_control_private \
--runtime python37 \
--trigger-resource gs://<your_bucket_name> \
--trigger-event google.storage.object.finalize
For debugging purpose you can retrieve logs with gcloud command as well:
gcloud functions logs read --limit 50

I know that this is quite an old question and you're after a default action (which I'm not sure exists), but the below worked for me on a recent PHP project after much frustration:
$object = $bucket->upload($tempFile, [
'predefinedAcl' => "PUBLICREAD",
'name' => $destination,
'metadata' => [
'cacheControl' => 'Cache-Control: private, max-age=0, no-transform',
]
]);
Same can be applied in node:
const storage = new Storage();
const bucket = storage.bucket(BUCKET_NAME);
const blob = bucket.file(FILE_NAME);
const uploadProgress = new Promise((resolve, reject) => {
const blobStream = blob.createWriteStream();
blobStream.on('error', err => {
reject(err);
throw new Error(err);
});
blobStream.on('finish', () => {
resolve();
});
blobStream.end(file.buffer);
});
await uploadProgress;
if (isPublic) {
await blob.makePublic();
}
blob.setMetadata({ cacheControl: 'public, max-age=31536000' });

There is no way to specify a default cache control. It must be set when creating the object.

If you're using a python app, you can use the option "default_expiration" in your app.yaml to set a global default value for the Cache-Control header: https://cloud.google.com/appengine/docs/standard/python/config/appref
For example:
runtime: python27
api_version: 1
threadsafe: yes
default_expiration: "30s"

Related

How to set charset to UTF8 for text files with gsutil when uploading to Google Cloud Storage bucket?

We have a (public) Google Cloud Storage bucket that hosts a simple website, meaning both HTML and images.
Our build process uses Google Cloud Build, however the question is not tied to using Cloud Build, but specifically regarding on how to use gsutil properly.
This is our current gsutil task:
# Upload it to the bucket
- name: gcr.io/cloud-builders/gsutil
dir: "public/"
args: [
"-m", # run the rsync command in parallel
"-h", "Cache-Control: public, max-age=0", # Custom cache control header
"cp", # copy command
"-r", # recursively
".", # source folder
"gs://mybucket/" # the target bucket and folder
]
As you can see, this copies everything in the local public/ folder to the bucket and applies the Cache-Control header on all objects.
According to this:
https://cloud.google.com/storage/docs/gsutil/addlhelp/WorkingWithObjectMetadata
You can specify the content type with
-h "Content-Type:text/html; charset=utf-8"
However, this makes all objects (not only .html files, but also images, etc) to get the content type text/html; charset=utf-8.
(I have even tried -h "Content-Type:; charset=utf-8" but then gsutil fails saying its an invalid content type value).
Is there a way to tell gsutil to apply charset=utf-8 on all objects, without actually overwriting the main content type?

How to change the metadata of all specific file of exist objects in Google Cloud Storage?

I have uploaded thousands of files to google storage, and i found out all the files miss content-type,so that my website cannot get it right.
i wonder if i can set some kind of policy like changing all the files content-type at the same time, for example, i have bunch of .html files inside the bucket
a/b/index.html
a/c/a.html
a/c/a/b.html
a/a.html
.
.
.
is that possible to set the content-type of all the .html files with one command in the different place?
You could do:
gsutil -m setmeta -h Content-Type:text/html gs://your-bucket/**.html
There's no a unique command to achieve the behavior you are looking for (one command to edit all the object's metadata) however, there's a command from gcloud to edit the metadata which you could use on a bash script to make a loop through all the objects inside the bucket.
1.- Option (1) is to use a the gcloud command "setmeta" on a bash script:
# kinda pseudo code here.
# get the list with all your object's names and iterate over the metadata edition command.
for OUTPUT in $(get_list_of_objects_names)
do
gsutil setmeta -h "[METADATA_KEY]:[METADATA_VALUE]" gs://[BUCKET_NAME]/[OBJECT_NAME]
# the "gs://[BUCKET_NAME]/[OBJECT_NAME]" would be your object name.
done
2.- You could also create a C++ script to achieve the same thing:
namespace gcs = google::cloud::storage;
using ::google::cloud::StatusOr;
[](gcs::Client client, std::string bucket_name, std::string object_name,
std::string key, std::string value) {
# you would need to find list all the objects, while on the loop, you can edit the metadata of the object.
for (auto&& object_metadata : client.ListObjects(bucket_name)) {
string bucket_name=object_metadata->bucket(), object_name=object_metadata->name();
StatusOr<gcs::ObjectMetadata> object_metadata =
client.GetObjectMetadata(bucket_name, object_name);
gcs::ObjectMetadata desired = *object_metadata;
desired.mutable_metadata().emplace(key, value);
StatusOr<gcs::ObjectMetadata> updated =
client.UpdateObject(bucket_name, object_name, desired,
gcs::Generation(object_metadata->generation()))
}
}

Deleting all blobs inside a path prefix using google cloud storage API

I am using google cloud storage python API. I came across a situation where I need to delete a folder that might have hundred of files using API. Is there an efficient way to do it without making recursive and multiple delete call?
One solution that I have is to list all blob objects in the bucket with given path prefix and delete them one by one.
The other solution is to use gsutil:
$ gsutil rm -R gs://bucket/path
Try something like this:
bucket = storage.Client().bucket(bucket_name)
blobs = bucket.list_blobs()
while True:
blob = blobs.next()
if not blob: break
if blob.name.startswith('/path'): blob.delete()
And if you want to delete the contents of a bucket instead of a folder within a bucket you can do it in a single method call as such:
bucket = storage.Client().bucket(bucket_name)
bucket.delete_blobs(bucket.list_blobs())
from google.cloud import storage
def deleteStorageFolder(bucketName, folder):
"""
This function deletes from GCP Storage
:param bucketName: The bucket name in which the file is to be placed
:param folder: Folder name to be deleted
:return: returns nothing
"""
cloudStorageClient = storage.Client()
bucket = cloudStorageClient.bucket(bucketName)
try:
bucket.delete_blobs(blobs=bucket.list_blobs(prefix=folder))
except Exception as e:
print str(e.message)
In this case folder = "path"

How to extract the list of all repositories in Stash or Bitbucket?

I need to extract the list of all repos under all projects in Bitbucket. Is there a REST API for the same? I couldn't find one.
I have both on-premise and cloud Bitbucket.
Clone ALL Projects & Repositories for a given stash url
#!/usr/bin/python
#
# #author Jason LeMonier
#
# Clone ALL Projects & Repositories for a given stash url
#
# Loop through all projects: [P1, P2, ...]
# P1 > for each project make a directory with the key "P1"
# Then clone every repository inside of directory P1
# Backup a directory, create P2, ...
#
# Added ACTION_FLAG bit so the same logic can run fetch --all on every repository and/or clone.
import sys
import os
import stashy
ACTION_FLAG = 1 # Bit: +1=Clone, +2=fetch --all
url = os.environ["STASH_URL"] # "https://mystash.com/stash"
user = os.environ["STASH_USER"] # joedoe"
pwd = os.environ["STASH_PWD"] # Yay123
stash = stashy.connect(url, user, pwd)
def mkdir(xdir):
if not os.path.exists(xdir):
os.makedirs(xdir)
def run_cmd(cmd):
print ("Directory cwd: %s "%(os.getcwd() ))
print ("Running Command: \n %s " %(cmd))
os.system(cmd)
start_dir = os.getcwd()
for project in stash.projects:
pk = project_key = project["key"]
mkdir(pk)
os.chdir(pk)
for repo in stash.projects[project_key].repos.list():
for url in repo["links"]["clone"]:
href = url["href"]
repo_dir = href.split("/")[-1].split(".")[0]
if (url["name"] == "http"):
print (" url.href: %s"% href) # https://joedoe#mystash.com/stash/scm/app/ae.git
print ("Directory cwd: %s Project: %s"%(os.getcwd(), pk))
if ACTION_FLAG & 1 > 0:
if not os.path.exists(repo_dir):
run_cmd("git clone %s" % url["href"])
else:
print ("Directory: %s/%s exists already. Skipping clone. "%(os.getcwd(), repo_dir))
if ACTION_FLAG & 2 > 0:
# chdir into directory "ae" based on url of this repo, fetch, chdir back
cur_dir = os.getcwd()
os.chdir(repo_dir)
run_cmd("git fetch --all ")
os.chdir(cur_dir)
break
os.chdir(start_dir) # avoiding ".." in case of incorrect git directories
Once logged in: on the top right, click on your profile pic and then 'View profile'
Take note of your user (in the example below 'YourEmail#domain.com', but keep in mind it's case sensitive)
Click on profile pic > Manage account > Personal access token > Create a token (choosing 'Read' access type is enough for this functionality)
For all repos in all projects:
Open a CLI and use the command below (remember to fill in your server domain!):
curl -u "YourEmail#domain.com" -X GET https://<my_server_domain>/rest/api/1.0/projects/?limit=1000
It will ask you for your personal access token, you comply and you get a JSON file with all repos requested
For all repos in a given project:
Pick the project you want to get repos from. In my case, the project URL is: <your_server_domain>/projects/TECH/ and therefore my {projectKey} is 'TECH', which you'll need for the command below.
Open a CLI and use this command (remember to fill in your server domain and projectKey!):
curl -u "YourEmail#domain.com" -X GET https://<my_server_domain>/rest/api/1.0/projects/{projectKey}/repos?limit=50
Final touches
(optional) If you want just the titles of the repos requested and you have jq installed (for Windows, downloading the exe and adding it to PATH should be enough, but you need to restart your CLI for that new addition to be detected), you can use the command below:
curl -u $BBUSER -X GET <my_server_domain>/rest/api/1.0/projects/TECH/repos?limit=50 | jq '.values|.[]|.name'
(tested with Data Center/Atlassian Bitbucket v7.9.0 and powershell CLI)
For Bitbucket Cloud
You can use their REST API to access and perform queries on your server.
Specifically, you can use this documentation page, provided by Atlassian, to learn how to list you're repositories.
For Bitbucket Server
Edit: As of receiving this tweet from Dan Bennett, I've learnt there is an API/plugin system for Bitbucket Server that could possibly cater for your needs. For docs: See here.
Edit2: Found this reference to listing personal repositories that may serve as a solution.
AFAIK there isn't a solution for you unless you built a little API for yourself that interacted with your Bitbucket Server instance.
Atlassian Documentation does indicate that to list all currently configured repositories you can do git remote -v. However I'm dubious of this as this isn't normally how git remote -v is used; I think it's more likely that Atlassian's documentation is being unclear rather than Atlassian building in this functionality to Bitbucket Server.
I ended up having to do this myself with an on-prem install of Bitbucket which didn't seem to have the REST APIs discussed above accessible, so I came up with a short script to scrape it out of the web page. This workaround has the advantage that there's nothing you need to install, and you don't need to worry about dependencies, certs or logins other than just logging into your Bitbucket server. You can also set this up as a bookmark if you urlencode the script and prefix it with javascript:.
To use this:
Open your bitbucket server project page, where you should see a list of repos.
Open your browser's devtools console. This is usually F12 or ctrl-shift-i.
Paste the following into the command prompt there.
JSON.stringify(Array.from(document.querySelectorAll('[data-repository-id]')).map(aTag => {
const href = aTag.getAttribute('href');
let projName = href.match(/\/projects\/(.+)\/repos/)[1].toLowerCase();
let repoName = href.match(/\/repos\/(.+)\/browse/)[1];
repoName = repoName.replace(' ', '-');
const templ = `https://${location.host}/scm/${projName}/${repoName}.git`;
return {
href,
name: aTag.innerText,
clone: templ
}
}));
The result is a JSON string containing an array with the repo's URL, name, and clone URL.
[{
"href": "/projects/FOO/repos/some-repo-here/browse",
"name": "some-repo-here",
"clone": "https://mybitbucket.company.com/scm/foo/some-repo-here.git"
}]
This ruby script isn't the greatest code, which makes sense, because I'm not the greatest coder. But it is clear, tested, and it works.
The script filters the output of a Bitbucket API call to create a complete report of all repos on a Bitbucket server. Report is arranged by project, and includes totals and subtotals, a link to each repo, and whether the repos are public or personal. I could have simplified it for general use, but it's pretty useful as it is.
There are no command line arguments. Just run it.
#!/usr/bin/ruby
#
# #author Bill Cernansky
#
# List and count all repos on a Bitbucket server, arranged by project, to STDOUT.
#
require 'json'
bbserver = 'http(s)://server.domain.com'
bbuser = 'username'
bbpassword = 'password'
bbmaxrepos = 2000 # Increase if you have more than 2000 repos
reposRaw = JSON.parse(`curl -s -u '#{bbuser}':'#{bbpassword}' -X GET #{bbserver}/rest/api/1.0/repos?limit=#{bbmaxrepos}`)
projects = {}
repoCount = reposRaw['values'].count
reposRaw['values'].each do |r|
projID = r['project']['key']
if projects[projID].nil?
projects[projID] = {}
projects[projID]['name'] = r['project']['name']
projects[projID]['repos'] = {}
end
repoName = r['name']
projects[projID]['repos'][repoName] = r['links']['clone'][0]['href']
end
privateProjCount = projects.keys.grep(/^\~/).count
publicProjCount = projects.keys.count - privateProjCount
reportText = ''
privateRepoCount = 0
projects.keys.sort.each do |p|
# Personal project slugs always start with tilde
isPrivate = p[0] == '~'
projRepoCount = projects[p]['repos'].keys.count
privateRepoCount += projRepoCount if isPrivate
reportText += "\nProject: #{p} : #{projects[p]['name']}\n #{projRepoCount} #{isPrivate ? 'PERSONAL' : 'Public'} repositories\n"
projects[p]['repos'].keys.each do |r|
reportText += sprintf(" %-30s : %s\n", r, projects[p]['repos'][r])
end
end
puts "BITBUCKET REPO REPORT\n\n"
puts sprintf(" Total Projects: %5d Public: %5d Personal: %5d", projects.keys.count, publicProjCount, privateProjCount)
puts sprintf(" Total Repos: %5d Public: %5d Personal: %5d", repoCount, repoCount - privateRepoCount, privateRepoCount)
puts reportText
The way I solved this issue, was get the html page and give it a ridiculous limit like this. thats in python :
cmd = "curl -s -k --user " + username + " https://URL/projects/<KEY_PROJECT_NAME>/?limit\=10000"
then I parsed it with BeautifulSoup
make_list = str((subprocess.check_output(cmd, shell=True)).rstrip().decode("utf-8"))
html = make_list
parsed_html = BeautifulSoup(html,'html.parser')
list1 = []
for a in parsed_html.find_all("a", href=re.compile("/<projects>/<KEY_PROJECT_NAME>/repos/")):
list1.append(a.string)
print(list1)
to use this make sure you change and , this should be the bitbucket project you are targeting. All , I am doing is parsing an html file.
Here's how I pulled the list of repos from Bitbucket Cloud.
Setup OAauth Consumer
Go to your workspace settings and setup an OAuth consumer, you should be able to go here directly using this link: https://bitbucket.org/{your_workspace}/workspace/settings/api
The only setting that matters is the callback URL which can be anything but I chose http://localhost
Once setup, this will display a key and secret pair for your OAuth consumer, I will refer to these as {oauth_key} and {oauth_secret} below
Authenticate with the API
Go to https://bitbucket.org/site/oauth2/authorize?client_id={oauth_key}&response_type=code ensuring you replace {oauth_key}
This will redirect you to something like http://localhost/?code=xxxxxxxxxxxxxxxxxx, make a note of that code, I'll refer to that as {oauth_code} below
In your terminal go to curl -X POST -u "{oauth_key}:{oauth_secret}" https://bitbucket.org/site/oauth2/access_token -d grant_type=authorization_code -d code={oauth_code} replacing the placeholders.
This should return json including the access_token, I’ll refer to that access token as {oauth_token}
Get the list of repos
You can now run the following to get the list of repos. Bear in mind that your {oauth_token} lasts 2hrs by default.
curl --request GET \
--url 'https://api.bitbucket.org/2.0/repositories/pageant?page=1' \
--header 'Authorization: Bearer {oauth_token}' \
--header 'Accept: application/json'
This response is paginated so you'll need to page through the responses, 10 repositories at a time.

How to release a build artifact asset on GitHub with a script?

I am trying to figure out a one-command process for generating a build on GitHub.
What I anticipate doing is running some sort of command- make release, say, and the make release script builds up the release artifact and then uploads it to GitHub in some fashion.
However, I'm fairly confused about how to actually get a release artifact on GitHub. Source code is awesome, but not everyone wants to do their own builds. :-)
Update 2022: The official GitHub CLI comes with gh release upload
Upload asset files to a GitHub Release.
You can create the release first with gh release create
Upload all tarballs in a directory as release assets
$ gh release create v1.2.3 ./dist/*.tgz
Upload a release asset with a display label
$ gh release create v1.2.3 '/path/to/asset.zip#My display label'
Update September 2013, you can automate a release (API in preview mode)
Update January 2014, there's an unofficial command-line app, called github-release by Nicolas Hillegeer (aktau), for creating releases and uploading (binary) artifacts.
It uses the new github releases API mentioned above. Look at the Makefile of the project to see how to automate it more still.
Example:
# create a formal release
$ github-release release \
--user aktau \
--repo gofinance \
--tag v0.1.0 \
--name "the wolf of source street" \
--description "Not a movie, contrary to popular opinion. Still, my first release!" \
--pre-release
This API is a little different due to the binary assets. We use the Accept header for content negotation when requesting a release asset.
Pass a standard API media type to get the API representation:
$ curl -i -H "Authorization: token TOKEN" \
-H "Accept: application/vnd.github.manifold-preview" \
"https://uploads.github.com/repos/hubot/singularity/releases/assets/123"
HTTP/1.1 200 OK
{
"id": 123,
...
}
Pass “application/octet-stream” to download the binary content.
$ curl -i -H "Authorization: token TOKEN" \
-H "Accept: application/octet-stream" \
"https://uploads.github.com/repos/hubot/singularity/releases/assets/123"
HTTP/1.1 302 Found
Uploads are handled by a single request to a companion “uploads.github.com” service.
$ curl -H "Authorization: token TOKEN" \
-H "Accept: application/vnd.github.manifold-preview" \
-H "Content-Type: application/zip" \
--data-binary #build/mac/package.zip \
"https://uploads.github.com/repos/hubot/singularity/releases/123/assets?name=1.0.0-mac.zip"
Update 2d July 2013, you now can define a release.
Releases are accompanied by release notes and links to download the software or source code.
Following the conventions of many Git projects, releases are tied to Git tags. You can use an existing tag, or let releases create the tag when it's published.
You can also attach binary assets (such as compiled executables, minified scripts, documentation) to a release. Once published, the release details and assets are available to anyone that can view the repository.
This is what replaces the old binary upload service, which was removed in December 2012!
the make release script builds up the release artifact and then uploads it to github in some fashion.
That would mean adding it ("it" being the delivery made of one or several files, generally including binaries) to a regular local repo, and then pushing that repo to its matching GitHub repo.
That being said, the reason GitHub isn't mention in any "release" task is because Git is a source control management system, and is ill-suited for binaries.
It can have those files (binaries) of course, but isn't made to have them regularly, because of the bloated size of the repo after a while: each cloning would take longer and longer.
See What are the Git limits, and also "git - should source files and repository be on the same machine ?".
Preparation:
1) Download github-releases and put its executable in your PATH.
2) Create a token at https://github.com/settings/applications#personal-access-tokens let's say abc123
Uploading an artifact:
1) Let's say you have just compiled what you decide to call version 3.1, and want to upload it.
2) Make sure you have committed everything.
3) Run these five commands:
git tag v3.1
git push
git push --tags
github-release release --security-token abc123 --user <you> --repo <yourrepo> \
--tag v3.1
github-release upload --security-token abc123 --user <you> --repo <yourrepo> \
--tag v3.1 --name <thefile> --file <thefile>
You can upload several files, for instance for different operating systems.
(Based on VonC's answer, which unfortunately does not detail how to upload an artifact)
hub official Go-based GitHub CLI tool
https://github.com/github/hub
First install Go. On Ubuntu: https://askubuntu.com/questions/959932/installation-instructions-for-golang-1-9-into-ubuntu-16-04/1075726#1075726
Then install hub:
go get github.com/github/hub
There is no Ubuntu package: https://github.com/github/hub/issues/718
Then from inside your repo:
hub release create -a prebuilt.zip -m 'release title' tag-name
This:
prompts for your password the first time, and then automatically creates and stores an API token locally
creates a non annotated tag on the remote called tag-name
creates a release associated to that tag
uploads prebuilt.zip as an attachment
You can also provide your existing API token with the GITHUB_TOKEN environment variable.
For other release operations, see:
hub release --help
Tested on hub de684cb613c47572cc9ec90d4fd73eef80aef09c.
Python APIv3 upload example without any external dependencies
Usage:
GITHUB_TOKEN=<token> ./create-release username/reponame <tag-name> <path-to-upload>
Script:
#!/usr/bin/env python3
import json
import os
import sys
from urllib.parse import urlencode
from urllib.request import Request, urlopen
repo = sys.argv[1]
tag = sys.argv[2]
upload_file = sys.argv[3]
token = os.environ['GITHUB_TOKEN']
url_template = 'https://{}.github.com/repos/' + repo + '/releases'
# Create.
_json = json.loads(urlopen(Request(
url_template.format('api'),
json.dumps({
'tag_name': tag,
'name': tag,
'prerelease': True,
}).encode(),
headers={
'Accept': 'application/vnd.github.v3+json',
'Authorization': 'token ' + token,
},
)).read().decode())
# This is not the tag, but rather some database integer identifier.
release_id = _json['id']
# Upload.
with open(upload_file, 'br') as myfile:
content = myfile.read()
_json = json.loads(urlopen(Request(
url_template.format('uploads') + '/' + str(release_id) + '/assets?' \
+ urlencode({'name': os.path.split(upload_file)[1]}),
content,
headers={
'Accept': 'application/vnd.github.v3+json',
'Authorization': 'token ' + token,
'Content-Type': 'application/zip',
},
)).read().decode())
Both release and asset creation will fail with 422 if they already exist. Work around that by first deleting the release or asset. Here is an example.
If you use Maven, you can add GitHub's Downloads Maven Plugin ( https://github.com/github/maven-plugins/#downloads-plugin ) and simply do:
$ mvn clean install ghDownloads:upload
Github has an API to access their own file download system.
Repo downloads allow you to provide binaries for users - although there may be a limit to the size and number. The API allows access from automated agents.
Take a look at:
http://developer.github.com/v3/repos/downloads/ for usage info.
The feature isn't in use much, but definitely works. You can go to any github repo, click the "Downloads" tab to see them.
For an example of downloadable files:
http://github.com/dannystaple/emacs_cheat_sheets/downloads - the HTML file offered there is actually a built artefact, and not in the source. I am trying to rustle up a better (binary) example - but there is no reason that executables, zips/tarballs and other filetypes couldn't be offered.
These downloads are NOT the same as source tarballs for a repo or its tags. Any arbitrary file can be uploaded this way.
I had the same problem, hacked up a little python to do it for me. I must say it was a pain, s3 is a total freakshow.
https://raw.github.com/reklis/utilityscripts/master/github-upload
#!/opt/local/bin/python2.7
import json
import requests
import sys
import argparse
import os
import mimetypes
import pycurl
import cStringIO
from xml.dom import minidom
github_api_root = "https://api.github.com/"
def parse_args():
parser = argparse.ArgumentParser(description='post a file to github as a download')
parser.add_argument('--user', dest='user', help='github username', required=True)
parser.add_argument('--pass', dest='password', help='github password', required=True)
parser.add_argument('--repo', dest='repo', help='the name of the github repo', required=True)
parser.add_argument('--file', dest='filepath', help='path of the local file to upload', required=True)
parser.add_argument('--desc', dest='description', help='descriptive text about this file', required=True)
parser.add_argument('--owner', dest='owner', help='owner of the github repository', required=True)
args = parser.parse_args()
# print args
return args
def make_dl_post_url(owner, repo):
url = "%srepos/%s/%s/downloads" % (str(github_api_root), str(owner), str(repo))
# print url
return url
def make_dl_delete_url(owner, repo, dlid):
url = "%srepos/%s/%s/downloads/%s" % (str(github_api_root), str(owner), str(repo), str(dlid))
# print url
return url
def add_github_reference(args):
dl_post_url = make_dl_post_url(args.owner, args.repo)
fp = args.filepath
filename = os.path.basename(fp)
filesize = os.path.getsize(fp)
mtype, mdetails = mimetypes.guess_type(fp)
file_description = {
'name': filename,
'size': filesize,
'description': args.description,
'content_type': mtype
}
# print json.dumps(file_description, indent=2)
github = requests.post(dl_post_url, auth=(args.user, args.password), data=json.dumps(file_description))
resp = github.json
# print json.dumps(resp, indent=2)
return resp
def remove_github_reference(args, dlid):
dl_delete_url = make_dl_delete_url(args.owner, args.repo, dlid)
github = requests.delete(dl_delete_url, auth=(args.user, args.password))
delete_ok = (204 == github.status_code)
return delete_ok
def post_file_to_s3(file_path, gh):
# s3 is very particular with field ordering
# curl \
# -F "key=downloads/octocat/Hello-World/new_file.jpg" \
# -F "acl=public-read" \
# -F "success_action_status=201" \
# -F "Filename=new_file.jpg" \
# -F "AWSAccessKeyId=1ABCDEF..." \
# -F "Policy=ewogIC..." \
# -F "Signature=mwnF..." \
# -F "Content-Type=image/jpeg" \
# -F "file=#new_file.jpg" \
# https://github.s3.amazonaws.com/
s3_ok = 201
xml_buffer = cStringIO.StringIO()
try:
post_fields = [
('key', str(gh['path'])),
('acl', str(gh['acl'])),
('success_action_status', str(s3_ok)),
('Filename', str(gh['name'])),
('AWSAccessKeyId', str(gh['accesskeyid'])),
('Policy', str(gh['policy'])),
('Signature', str(gh['signature'])),
('Content-Type', str(gh['mime_type'])),
('file', (pycurl.FORM_FILE, file_path))
]
# print post_fields
s3 = pycurl.Curl()
s3.setopt(pycurl.SSL_VERIFYPEER, 0)
s3.setopt(pycurl.SSL_VERIFYHOST, 0)
s3.setopt(pycurl.POST, 1)
s3.setopt(pycurl.URL, str(gh['s3_url']))
s3.setopt(pycurl.HTTPPOST, post_fields)
# s3.setopt(pycurl.VERBOSE, 1)
# accumulate string response
s3.setopt(pycurl.WRITEFUNCTION, xml_buffer.write)
s3.perform()
file_upload_success = (s3_ok == s3.getinfo(pycurl.HTTP_CODE))
xml_payload = minidom.parseString(xml_buffer.getvalue())
if (file_upload_success):
location_element = xml_payload.getElementsByTagName('Location')
print location_element[0].firstChild.nodeValue
else:
print xml_payload.toprettyxml()
except Exception, e:
print e
file_upload_success = False
finally:
s3.close()
return file_upload_success
def main():
mimetypes.init()
args = parse_args()
# step 1: tell github about the file
gh = add_github_reference(args)
# step 2: upload file to s3
if ('errors' in gh):
print json.dumps(gh, indent=2)
else:
file_upload_success = post_file_to_s3(args.filepath, gh)
# cleanup if upload failed
if (False == file_upload_success):
removed_ok = remove_github_reference(args, gh['id'])
if (removed_ok):
print "removed github reference"
else:
print "failed to remove github reference"
if __name__ == '__main__':
main()
Update 2021: You can create a GitHub Actions automation to create a release from a tag, then use the runners to create release assets and upload them to the release. See here for an example.
For those using gradle, the plugin gradle-github-plugin also allows to create releases and attach files to them.
Add the plugin to the gradle.build:
plugins {
id "co.riiid.gradle" version "X.Y.Z"
}
Configure the upload. Example:
github {
owner = 'riiid'
repo = 'gradle-github-plugin'
token = 'XXXXXXXXXXXXXXXXXXXXX'
tagName = '0.1.0'
targetCommitish = 'master'
name = 'v0.1.0'
body = """# Project Name
Write `release note` here.
"""
assets = [
'app/build/outputs/apk/app-release.apk',
'app/build/outputs/mapping/release/mapping.txt',
'app/build/outputs',
...
]
}