Pub/Sub notification for GCS - event filter doesn't work - google-cloud-storage

What I wanted is whenever (and only when) a file is created on GCS, my pub/sub subscription can receive a notification.
So I did this:
gsutil notification create -t projects/[my-project-id]/topics/new-raw-file -f none -m eventType:OBJECT_FINALIZE gs://[the-target-bucket]
I think this config is set up successfully, because when I ran gsutil notification list , it shows:
projects/_/buckets/[the-target-bucket]/notificationConfigs/7
Cloud Pub/Sub topic: projects/[my-project-id]/topics/new-raw-file
Custom attributes:
eventType: OBJECT_FINALIZE
This is the only config.
However, other than file creation, I also receive file deletion notification:
Received 1 messages.
* 118758642722910: message - , attributes - {u'resource': u'projects/_/buckets/[the-target-bucket]/objects/2466870.3.txt#1493038968423735', u'objectId': u'2466870.3.txt', u'bucketId': u'[the-target-bucket]', u'notificationConfig': u'projects/_/buckets/[the-target-bucket]/notificationConfigs/7', u'payloadFormat': u'NONE', u'eventType': u'OBJECT_DELETE', u'objectGeneration': u'1493038968423735'}
didn't get what was going wrong.

Turns out I misused one command option. According to this page, "-m" actually just append a key:value attribute to the notification, it has nothing to do with the event filter that I wanted. The right option to use should be "-e" . So, the following config command actually works properly:enter code here
gsutil notification create -t [TOPIC_NAME] -f json -e OBJECT_FINALIZE gs://[BUCKET_NAME]

Related

How to export IBM Watson conversation history?

Before running the code, install ibm-watson &
ibm-cloud-sdk-core package and also pip instll PyJWT==1.7.1.
I found in IBM document that "For a Python script you can run to export logs and convert them to CSV format, download the export_logs_py.py file from the Watson Assistant GitHub) repository."
But I don't really know where & how should I modify in order to connect my ibm skill.
There is no demo or instruction about where I can find those argument.
I only find these information in skill api details but it seems it needs more.
Do anyone have an example version about how to use the .py they provided?
(I'm a coding beginner, not really understand every lines in the .py)
The .py shows an error after I run the file without modification:
runfile('C:/export_logs.py', wdir='C:/Users/admin/Downloads')
usage: export_logs.py [-h] [--logtype {ASSISTANT,WORKSPACE,DEPLOYMENT}]
[--language LANGUAGE] [--filetype {CSV,TSV,XLSX,JSON}]
[--url URL] [--version VERSION]
[--totalpages TOTALPAGES] [--pagelimit PAGELIMIT]
[--filter FILTER] [--strip STRIP]
apikey id filename
export_logs.py: error: the following arguments are required: apikey, id, filename
An exception has occurred, use %tb to see the full traceback.
SystemExit: 2
The conversation I want to download:
First of all, Workspaces in IBM Watson Assistant are now called Skills.
To understand what arguments(positional and optional) you need to pass to the Python script, run the below command
python export_logs_py.py -h
Wherever you see workspace, you can replace it with skill.
To export the logs in the .csv file format, run the below command
python export_logs_py.py --filetype CSV --url <URL> <API_KEY> <SKILL_ID> output.csv
Replace placeholders <URL, <API_KEY> and <SKILL_ID> with appropriate values mentioned below.
<URL> & <API_KEY> - You can find them under the Manage page of your Watson Assistant service page
<SKILL_ID> - The same as the one in the image you uploaded. Check this StackOverflow answer for more info.
For Assistant logs, add --logtype ASSISTANT. The default is WORKSPACE.
You can also find the logs in the UI under Analytics section of your Skill
As you can see, the script reported an error and said that you have to provide the apikey, the id and the (presumably output) filename as parameters. It also showed that additional parameters can be specified.
usage: export_logs.py [-h] [--logtype {ASSISTANT,WORKSPACE,DEPLOYMENT}]
[--language LANGUAGE] [--filetype {CSV,TSV,XLSX,JSON}]
[--url URL] [--version VERSION]
[--totalpages TOTALPAGES] [--pagelimit PAGELIMIT]
[--filter FILTER] [--strip STRIP]
apikey id filename
Your next step could be now to invoke the script again, but provide an API key for Watson Assistant, the skill ID and a filename as additional paramaters. Next, I would try to something like, e.g., trying out to specify the output type:
export_logs.py --filetype CSV myapikey skillID output.csv
I am not the author of that script, but that is how I would approach it if I wanted to use it

How to change the metadata of all specific file of exist objects in Google Cloud Storage?

I have uploaded thousands of files to google storage, and i found out all the files miss content-type,so that my website cannot get it right.
i wonder if i can set some kind of policy like changing all the files content-type at the same time, for example, i have bunch of .html files inside the bucket
a/b/index.html
a/c/a.html
a/c/a/b.html
a/a.html
.
.
.
is that possible to set the content-type of all the .html files with one command in the different place?
You could do:
gsutil -m setmeta -h Content-Type:text/html gs://your-bucket/**.html
There's no a unique command to achieve the behavior you are looking for (one command to edit all the object's metadata) however, there's a command from gcloud to edit the metadata which you could use on a bash script to make a loop through all the objects inside the bucket.
1.- Option (1) is to use a the gcloud command "setmeta" on a bash script:
# kinda pseudo code here.
# get the list with all your object's names and iterate over the metadata edition command.
for OUTPUT in $(get_list_of_objects_names)
do
gsutil setmeta -h "[METADATA_KEY]:[METADATA_VALUE]" gs://[BUCKET_NAME]/[OBJECT_NAME]
# the "gs://[BUCKET_NAME]/[OBJECT_NAME]" would be your object name.
done
2.- You could also create a C++ script to achieve the same thing:
namespace gcs = google::cloud::storage;
using ::google::cloud::StatusOr;
[](gcs::Client client, std::string bucket_name, std::string object_name,
std::string key, std::string value) {
# you would need to find list all the objects, while on the loop, you can edit the metadata of the object.
for (auto&& object_metadata : client.ListObjects(bucket_name)) {
string bucket_name=object_metadata->bucket(), object_name=object_metadata->name();
StatusOr<gcs::ObjectMetadata> object_metadata =
client.GetObjectMetadata(bucket_name, object_name);
gcs::ObjectMetadata desired = *object_metadata;
desired.mutable_metadata().emplace(key, value);
StatusOr<gcs::ObjectMetadata> updated =
client.UpdateObject(bucket_name, object_name, desired,
gcs::Generation(object_metadata->generation()))
}
}

unable to trigger job in concourse

I was new to concourse, and set up the environment in my centos7.6 like below.
$ wget https://concourse-ci.org/docker-compose.yml
$ docker-compose up -d
Then login by `fly --target example login --team-name main --concourse-url http://192.168.77.140:8080/ -u test -p test`
I can see below.
[root#centostest ~]# fly targets
name url team expiry
example http://192.168.77.140:8080 main Sun, 16 Jun 2019 02:23:48 UTC
I used below yaml.xml named with 2.yaml
---
resources:
- name: my-git-repo
type: git
source:
uri: https://github.com/ruanbekker/concourse-test
branch: basic-helloworld
jobs:
- name: hello-world-job
public: true
plan:
- get: my-git-repo
- task: task_print-hello-world
file: my-git-repo/ci/task-hello-world.yml
Then I run below commands step by step.
fly -t example sp -c 2.yaml -p pipeline-01
fly -t example up -p pipeline-01
fly -t example tj -j pipeline-01/hello-world-job --watch
But i just hang on there , no useful response like below.
[root#centostest ~]# fly -t example tj -j pipeline-01/hello-world-job --watch
started pipeline-01/hello-world-job #3
Theoretically, it should print something like below.
Cloning into '/tmp/build/get'...
Fetching HEAD
292c84b change task name
initializing
running echo hello world
hello world
succeeded
Where I did wrong? thanks.
welcome to Concourse!
One thing that can be confusing when starting with Concourse is understanding when Concourse detects that the pipeline has changed and what happens if the pipeline is one file or multiple files.
Your pipeline (as the majority of real-world pipelines) is "nested": main pipeline file 2.yaml refers to a task file named my-git-repo/ci/task-hello-world.yml
What sets Concourse apart from other CI systems is that:
the main pipeline file (2.yaml) can reside everywhere, also in a different repository.
Due to 1, Concourse is unable to detect a change to the main pipeline file, you have to tell Concourse that the file has changed, either with fly set-pipeline or with automatic means such as the concourse-pipeline-resource.
So the following errors happen often:
Changing the main pipeline file, committing and pushing, and expecting Concourse to pick up the change. Missing: you have to do fly set-pipeline
Once doing fly set-pipeline becomes second nature, you can stumble upon the opposite error: Change both the main pipeline file and the nested task file, not pushing, doing set-pipeline. In this case, the only changes picked up by Concourse will be the ones to the main pipeline file, not to the task file. Missing: commit and push.
From the description of your problem, I have the feeling that it is a mixture of the gotchas I mentioned.

How to pass variables to Buildbot?

I'm using Buildbot V.0.9.0rc3
My Buildbot triggers when I send a change via command line or if I receieve an http Post request to the correct address.
Currently I'm sending changes to Buildbot in two different ways:
$ buildbot sendchange -m localhost:9999 -a example-user:pass -W me -C default
or
curl -X POST -d author=aalvz -d comments=mycomment -d project=my_project -d category=default -d repository=some http://192.168.33.20:8020/change_hook/base
My schedulers are defined like this:
c['schedulers'].append(schedulers.SingleBranchScheduler(
name="waiter",
builderNames=["runtests"],
change_filter=util.ChangeFilter(category='default')))
c['www'] = dict(port=8020,
plugins=dict(waterfall_view={}, console_view={}),
change_hook_dialects={
'base': True,
'somehook': {'option1':True,
'option2':False}})
And my Step in factory cloning a repo looks like this:
factory.addStep(steps.Git(repourl='git#github.com:AAlvz/my_repo.git', mode='full', workdir='newFolder', branch='my_branch', submodules=True, clobberOnFailure=True))
I would like to receive a POST with some data and use that data to trigger different commands. Something like: (using $ to make the variables noticeable)
factory.addStep(steps.Git(repourl=$myjson.name, mode='full', workdir=$myjson.path, branch=$myjson.branch, submodules=True, clobberOnFailure=True))
That way I could send a JSON like:
{myjson: {name: github/myrepo.git, path: /tmp/my/path, branch: my_branch}}
and be able to clone the repository provided by the JSON.
Thanks in advance! I hope the question is clear enough. I can provide with logs or any needed configuration. Thanks!
This is solved Using Buildbot Properties.
You cand send them via command line (with PBChangeSource) using the flag
buildbot sendchange ... --properties=my_property:myvalue
The flag can be used multiple timpes if multiple properties are needed.

Download latest GitHub release

I'd like to have "Download Latest Version" button on my website which would represent the link to the latest release (stored at GitHub Releases). I tried to create release tag named "latest", but it became complicated when I tried to load new release (confusion with tag creation date, tag interchanging, etc.). Updating download links on my website manually is also a time-consuming and scrupulous task. I see the only way - redirect all download buttons to some html, which in turn will redirect to the actual latest release.
Note that my website is hosted at GitHub Pages (static hosting), so I simply can't use server-side scripting to generate links. Any ideas?
You don't need any scripting to generate a download link for the latest release. Simply use this format:
https://github.com/:owner/:repo/zipball/:branch
Examples:
https://github.com/webix-hub/tracker/zipball/master
https://github.com/iDoRecall/selection-menu/zipball/gh-pages
If for some reason you want to obtain a link to the latest release download, including its version number, you can obtain that from the get latest release API:
GET /repos/:owner/:repo/releases/latest
Example:
$.get('https://api.github.com/repos/idorecall/selection-menu/releases/latest', function (data) {
$('#result').attr('href', data.zipball_url);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<a id="result">Download latest release (.ZIP)</a>
Github now provides a "Latest release" button on the release page of a project, after you have created your first release.
In the example you gave, this button links to https://github.com/reactiveui/ReactiveUI/releases/latest
You can use the following where:
${Organization} as the GitHub user or organization
${Repository} is the repository name
curl -L https://api.github.com/repos/${Organization}/${Repository}/tarball > ${Repository}.tar.gz
The top level directory in the .tar.gz file has the sha hash of the commit in the directory name which can be a problem if you need an automated way to change into the resulting directory and do something.
The method below will strip this out, and leave the files in a folder with a predictable name.
mkdir ${Repository}
curl -L https://api.github.com/repos/${Organization}/${Repository}/tarball | tar -zxv -C ${Repository} --strip-components=1
Since February 18th, 2015, the GitHUb V3 release API has a get latest release API.
GET /repos/:owner/:repo/releases/latest
See also "Linking to releases".
Still, the name of the asset can be tricky.
Git-for-Windows, for instance, requires a command like:
curl -IkLs -o NUL -w %{url_effective} \
https://github.com/git-for-windows/git/releases/latest|\
grep -o "[^/]*$"| sed "s/v//g"|\
xargs -I T echo \
https://github.com/git-for-windows/git/releases/download/vT/PortableGit-T-64-bit.7z.exe \
-o PortableGit-T-64-bit.7z.exe| \
sed "s/.windows.1-64/-64/g"|sed "s/.windows.\(.\)-64/.\1-64/g"|\
xargs curl -kL
The first 3 lines extract the latest version 2.35.1.windows.2
The rest will build the right URL
https://github.com/git-for-windows/git/releases/download/
v2.35.1.windows.2/PortableGit-2.35.1.2-64-bit.7z.exe
^^^^^^^^^^^^^^^^^ ^^^^^^^^^
Maybe could you use some client-side scripting and dynamically generate the target of the link by invoking the GitHub api, through some JQuery magic?
The Releases API exposes a way to retrieve the list of all the releases from a repository. For instance, this link return a Json formatted list of all the releases of the ReactiveUI project.
Extracting the first one would return the latest release.
Within this payload:
The html_url attribute will hold the first part of the url to build (ie. https://github.com/{owner}/{repository}/releases/{version}).
The assets array will list of the downloadable archives. Each asset will bear a name attribute
Building the target download url is only a few string operations away.
Insert the download/ keyword between the releases/ segment from the html_url and the version number
Append the name of the asset to download
Resulting url will be of the following format: https://github.com/{owner}/{repository}/releases/download/{version}/name_of_asset
For instance, regarding the Json payload from the link ReactiveUI link above, we've got html_url: "https://github.com/reactiveui/ReactiveUI/releases/5.99.0" and one asset with name: "ReactiveUI.6.0.Preview.1.zip".
As such, the download url is https://github.com/reactiveui/ReactiveUI/releases/download/5.99.0/ReactiveUI.6.0.Preview.1.zip
If you using PHP try follow code:
function getLatestTagUrl($repository, $default = 'master') {
$file = #json_decode(#file_get_contents("https://api.github.com/repos/$repository/tags", false,
stream_context_create(['http' => ['header' => "User-Agent: Vestibulum\r\n"]])
));
return sprintf("https://github.com/$repository/archive/%s.zip", $file ? reset($file)->name : $default);
}
Function usage example
echo 'Download';
As I didn't see the answer here, but it was quite helpful for me while running continuous integration tests, this one-liner that only requires you to have curl will allow to search the Github repo's releases to download the latest version
https://gist.github.com/steinwaywhw/a4cd19cda655b8249d908261a62687f8
I use it to run PHPSTan on our repository using the following script
https://gist.github.com/rvanlaak/7491f2c4f0c456a93f90e31774300b62
If you are trying to download form any linux — even old or tiny versions — or are trying to download from a bash script then the failproof way is using this command:
wget https://api.github.com/repos/$OWNER/$REPO/releases/latest -O - | awk -F \" -v RS="," '/browser_download_url/ {print $(NF-1)}' | xargs wget
do not forget to replace $OWNER and $REPO with the right owner and repository names. The command downloads a json page with the data of the latest release. then awk gets the value from the browser_download_url key.
If you are in a really old linux or a tiny embedded system with a small wget, the download name can be a problem. In such case you can always use the ultra-reliable:
URL=$(wget https://api.github.com/repos/$OWNER/$REPO/releases/latest -O - | awk -F \" -v RS="," '/browser_download_url/ {print $(NF-1)}'); wget $URL -O $(basename "$URL")
As noted by #Dan Dascalescu in a comment to accepted answer, there are some projects (roughly 30%) which do not bother to file formal releases, so neither "Latest release" button nor /releases/latest API call would return useful data.
To reliably fetch the latest release for a GitHub project, you can use lastversion.