Run a MapReduce job via rest api - rest

I use hadoop2.7.1's rest apis to run a mapreduce job outside the cluster. This example "http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api" really helped me. But when I submit a post response, some strange things happen:
I look at "http://master:8088/cluster/apps" and a post response produce two jobs as following picture:
strange things: a response produces two jobs
After wait a long time, the job which I defined in the http response body fail because of FileAlreadyExistsException. The reason is another job creates the output directory, so Output directory hdfs://master:9000/output/output16 already exists.
This is my response body:
{
"application-id": "application_1445825741228_0011",
"application-name": "wordcount-demo",
"am-container-spec": {
"commands": {
"command": "{{HADOOP_HOME}}/bin/hadoop jar /home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /data/ /output/output16"
},
"environment": {
"entry": [{
"key": "CLASSPATH",
"value": "{{CLASSPATH}}<CPS>./*<CPS>{{HADOOP_CONF_DIR}}<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/*<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/lib/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/lib/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/lib/*<CPS>./log4j.properties"
}]
}
},
"unmanaged-AM": false,
"max-app-attempts": 2,
"resource": {
"memory": 1024,
"vCores": 1
},
"application-type": "MAPREDUCE",
"keep-containers-across-application-attempts": false
}
and this is my command:
curl -i -X POST -H 'Accept: application/json' -H 'Content-Type: application/json' http://master:8088/ws/v1/cluster/apps?user.name=hadoop -d #post-json.txt
Can anybody help me? Thanks a lot.

When you run the map reduce, see that you do not have output folder as the job will not run if it is present. You can write program so that you can delete the folder is it exists, or manually delete it before calling the rest api. This is just to prevent the data loss and avoid overwriting the output of other job.

Related

GitHub Actions: How to access to the log of current build via Terminal

I'm trying to get familiar with Github Actions. I have configured my workflow in a way, that every time I push my code to GitHub, the code will automatically be built and pushed to heroku.
How can I access the build log information in terminal without going to github.com?
With the latest cli/cli tool named gh (1.9.0+), you can simply do
(from your terminal, without going to github.com):
gh run view <jobId> --log
# or
gh run view <jobId> --log-failed
See "Work with GitHub Actions in your terminal with GitHub CLI"
With the new gh run list, you receive an overview of all types of workflow runs whether they were triggered via a push, pull request, webhook, or manual event.
To drill down into the details of a single run, you can use gh run view, optionally going into as much detail as the individual steps of a job.
For more mysterious failures, you can combine a tool like grep with gh run view --log to search across a run’s entire log output.
If --log is too much information, gh run --log-failed will output only the log lines for individual steps that failed.
This is great for getting right to the logs for a failed step instead of having to run grep yourself.
And with GitHub CLI 2.4.0 (Dec. 2021), gh run list comes with a --json flag for JSON export.
Use
curl \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/repos/<github-user>/<repository>/actions/workflows/<workflow.yaml>/runs
https://docs.github.com/en/free-pro-team#latest/rest/reference/actions#list-workflow-runs
This will return a JSON with the following structure:
{
"total_count": 1,
"workflow_runs": [
{
"id": 30433642,
"node_id": "MDEyOldvcmtmbG93IFJ1bjI2OTI4OQ==",
"head_branch": "master",
"head_sha": "acb5820ced9479c074f688cc328bf03f341a511d",
"run_number": 562,
"event": "push",
"status": "queued",
"conclusion": null,
"workflow_id": 159038,
"url": "https://api.github.com/repos/octo-org/octo-repo/actions/runs/30433642",
"html_url": "https://github.com/octo-org/octo-repo/actions/runs/30433642",
"pull_requests": [],
"created_at": "2020-01-22T19:33:08Z",
"updated_at": "2020-01-22T19:33:08Z",
"jobs_url": "https://api.github.com/repos/octo-org/octo-repo/actions/runs/30433642/jobs",
"logs_url": "https://api.github.com/repos/octo-org/octo-repo/actions/runs/30433642/logs",
"check_suite_url": "https://api.github.com/repos/octo-org/octo-repo/check-suites/414944374",
"artifacts_url": "https://api.github.com/repos/octo-org/octo-repo/actions/runs/30433642/artifacts",
"cancel_url": "https://api.github.com/repos/octo-org/octo-repo/actions/runs/30433642/cancel",
"rerun_url": "https://api.github.com/repos/octo-org/octo-repo/actions/runs/30433642/rerun",
"workflow_url": "https://api.github.com/repos/octo-org/octo-repo/actions/workflows/159038",
"head_commit": {...},
"repository": {...},
"head_repository": {...}
]
}
Access the jobs_url with a PAT that has repository admin rights.

Cloud Functions REST API: Creating a new action from a zip file

I'm trying to create a nodejs function from a zip file and through the REST API using the following curl:
curl --request PUT --url 'https://my:credentials#openwhisk.eu-gb.bluemix.net/api/v1/namespaces/mynamespace/actions/my_action_name?overwrite=true' --header 'accept: application/json' --header 'content-type: application/json' --data '{"annotations":[{"key":"exec","value":"nodejs:10"},{"key":"web-export","value":true}],"exec":{"kind":"nodejs:10","init":"./action.zip"},"parameters":[{"key":"message","value":"Hello World"}]}'
As a result I get an error:
"error":"The request content was malformed:\n'code' must be a string or attachment object defined in 'exec' for 'nodejs:10' actions"
Is it possible to get an example of how to create a new action from a zip file and through the REST API? Thank you.
You have to base64 encode your .zip file and then pass it as a code parameter. I have written a shell script(bash) to encode and also create an action called 'action'. Save the script as create.sh and execute the script ./create.sh
#!/bin/sh
ACTION=action
ZIP=$ACTION.zip
base64 $ZIP | echo "\"$(cat)\"" | jq "{namespace:\"_\", name:\"$ACTION\", exec:{kind:\"nodejs:10\", code:., binary:true, main:\"main\"}}" | curl -X PUT -H "Content-Type:application/json" -d #- https://USERNAME:PASSWORD#openwhisk.ng.bluemix.net/api/v1/namespaces/_/actions/$ACTION?overwrite=true
Complete code
app.js or index.js code
function myAction(args) {
const leftPad = require("left-pad")
const lines = args.lines || [];
return { padded: lines.map(l => leftPad(l, 30, ".")) }
}
exports.main = myAction;
package.json
{
"name": "node",
"version": "1.0.0",
"description": "",
"main": "app.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "",
"license": "ISC",
"dependencies": {
"left-pad" : "1.1.3"
}
}
Run npm install and zip the file zip -r action.zip *.
To test the action
ibmcloud fn action invoke --result action --param lines "[\"and now\", \"for something completely\", \"different\" ]"
The REST API for creating or updating a Cloud Functions actions is documented in the IBM Cloud Functions API docs. A good way to find out the exact curl / request syntax is to use the IBM Cloud Functions CLI in verbose mode (-v). The CLI is just a wrapper around the REST API and in verbose mode all the REST details are printed.
Here is the relevant part for what could be printed:
Req Body
Body exceeds 1000 bytes and will be truncated
{"namespace":"_","name":"mytest/myaction","exec":{"kind":"nodejs:8","code":"UEsDBBQAAAAIAHJPhEzjlkxc8wYAAH8VAAALABwAX19tYWluX18ucHlVVAkAA+iFxFrohcRadXgLAAEE9AEAAAT0AQAAxVhtb9s2EP6uX8HRCCLBipb002DA69YkbYo17dZ0GwbDMGSKlrXJokfSToNh/313R+rNL2labJiK1iJ578/x7tTBgJ7A/QzYq8IuN3NmdbpYFIIZm9rC2EKYmiIYsB+1ynW6Ykqz1y9u2WWpNhl7uamELVTFrGJClaUUtha2LeQ9S6uMiVJVspYNgnDPWKVhb5lalqU2ZUXFUqZlmaKwtKTNeWpkzKp0JcsHdj
You would need to set the binary field to true and include the zip content as code. The curl docs suggest to use #filename to reference your zip file:
If you want the contents to be read from a file, use <#filename> as
contents.

Why using Google Cloud Drive Rest API file.list can not get all the files?

I am using the following CURL command to retrieve all my google drive files, however, it only list a very limited part of the whole bunch of files. Why?
curl -H "Authorization: Bearer ya29.hereshouldbethemaskedaccesstokenvalue" https://www.googleapis.com/drive/v3/files
result
{
"kind": "drive#fileList",
"incompleteSearch": false,
"files": [
{
"kind": "drive#file",
id": "2fileidxxxxxxxx",
"name": "testnum",
"mimeType": "application/vnd.google-apps.folder"
},
{
"kind": "drive#file",
"id": "1fileidxxxxxxx",
"name": "test2.txt",
...
}
token scope includes
https://www.googleapis.com/auth/drive.file
https://www.googleapis.com/auth/drive.appdata
Using the Android SDK also facing the same issue.
Any help would be appreciated.
Results from files.list are paginated -- your response should include a "nextPageToken" field, and you'll have to make another call for the next page of results. See documentation here about the files list call. You may want to use one of the client libraries to make this call (see the examples at the bottom of the page)
I have the same problem when try to get list of files in Google Drive folder. This folder has more than 5000 files, but API return only two of them. The problem is -- when files in folder shared with anyone with a link, in fact it isn't shared with you until you open it. Owner of this folder must specify you as viewer.

Rest API Testing from commandline

I am preparing a SDK, and SDK as of now, does not have CI system separately.
I want to test some REST endpoints which should be available when the user uses SDK to create the software and try to run with our framework.
I have written all the manual steps in shell script and planning to put the script as crontab to run it every few hours.
Now, for rest end point testing, I was thinking of just using curl and checking if we getting data back. but this can turn into a lot of work,as we expand the functionality. I looked into frisby framework which kind of suits my needs.
Is there any recommendation for allowing me to test rest services when the framework software is started.
Probably swat is exactly what you need. Reasons :
This is DSL for web, rest services test automation
it uses curl command line API to create http requests
it is both DSL and command line tool to run test scenarios written on DSL
it is configurable both from bash style scripts and general configs
it is very easy to start with
probably in your case curl based test cases could be easily converted into swat DSL format
(*) disclosure - I am the author of swat.
I have created a very small bash script to test JSON APIs which might be useful. It uses jq and curl as dependencies. curl for making request and jq for JSON processing.It is only designed to test JSON APIs.
Link: api-test
Every API call you want to run is stored in a JSON file with format below:
{
"name": "My API test",
"testCases": {
"test_case_1": {
"path": "/path_1",
"method": "POST",
"description": "Best POST api",
"body": {
"value": 1
},
"header": {
"X-per": "1"
}
},
}
"url": "http://myapi.com"
}
To run a test case:
api-test -f test.json run test_case_1
api-test -f test.json run all # run all API call at once.
It will produce output in an organized way
Running Case: test_case_1
Response:
200 OK
{
"name": "Ram",
"full_name": "Ram Shah"
}
META:
{
"ResponseTime": "0.078919s",
"Size": "235 Bytes"
}
It also supports automated testing of API with jq JSON comparison and normal equality/subset comparisons.

Track number of download of a release (binaries) on Github

So now you can manage and publish your binaries directly on Github, the feature is back from early this month (source).
I've been looking around Github interface and I haven't seen a download tracker. This is a feature Google Code offer and I was wondering if Github has the same.
Please note, I am not interested to know the number of download of a repo, this is a different topic.
Based on Petros answer, I used the two following curl command:
To get the list of all releases including their id and number of
download:
curl -i https://api.github.com/repos/:owner/:repo/releases -H "Accept: application/vnd.github.manifold-preview+json"
For example to list all the release for the OpenRefine project:
curl -i https://api.github.com/repos/openrefine/openrefine/releases -H "Accept: application/vnd.github.manifold-preview+json"
Then to get details on each release (you will need to run the first query to get the release id)
curl -i https://api.github.com/repos/:owner/:repo/releases/assets/:release_id -H "Accept: application/vnd.github.manifold-preview+json"
With the same example to list the details including download number for google-refine-2.5-r2407.zip
curl -i https://api.github.com/repos/openrefine/openrefine/releases/assets/6513 -H "Accept: application/vnd.github.manifold-preview+json"
You can use the GitHub API to get the download_count among other things for a single release asset:
http://developer.github.com/v3/repos/releases/#get-a-single-release-asset
This is how it looks currently, but please check the link above just in case anything changed since this answer was written.
GET /repos/:owner/:repo/releases/assets/:id
{
"url": "https://api.github.com/repos/octocat/Hello-World/releases/assets/1",
"id": 1,
"name": "example.zip",
"label": "short description",
"state": "uploaded",
"content_type": "application/zip",
"size": 1024,
"download_count": 42,
"created_at": "2013-02-27T19:35:32Z",
"updated_at": "2013-02-27T19:35:32Z"
}
You can add a badge to your github repo. See this answer for more details.
Also, there is a nifty project that shows all of this data in a nice website which is over here: https://www.somsubhra.com/github-release-stats/