Spark History UI is available but tracking UI errors for in-progress applications - knox-gateway

I've inherited a cluster that uses knox and am trying to figure out why the Spark history server is available for completed Spark jobs but the Spark UI is not available for in-progress Spark applications.
In this yarn UI (which is exposed via Knox) there are 5 completed yarn applications and 1 in-progress yarn application. All are spark applications:
In the Tracking UI columns the available links are:
https://my-knox-endpoint/gateway/my-cluster/yarn/proxy/application_1580137635209_0001
https://my-knox-endpoint/gateway/my-cluster/yarn/proxy/application_1580137635209_0002
https://my-knox-endpoint/gateway/my-cluster/yarn/proxy/application_1580137635209_0003
https://my-knox-endpoint/gateway/my-cluster/yarn/proxy/application_1580137635209_0004
https://my-knox-endpoint/gateway/my-cluster/yarn/proxy/application_1580137635209_0005
https://my-knox-endpoint/gateway/my-cluster/yarn/proxy/application_1580137635209_0006
The five links pertaining to the completed jobs all successfully bring up the Spark History server UI for those jobs. If I issue cat ${GATEWAY_HOME}/logs/gateway-audit.log I can see the following appear when I hit any of those five links:
20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0001|unavailable|Request method: GET
20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0001|unavailable|Request method: GET
20/01/27 15:50:55 ||55bef3f3-a52f-4790-97d0-bd6e5076a293|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0001|success|Response status: 302
20/01/27 15:50:55 |||audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.229|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0001|success|Response status: 302
20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||access|uri|/gateway/my-cluster-name/sparkhistory/history/application_1580137635209_0001/1|unavailable|Request method: GET
20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||dispatch|uri|http://my-cluster-name-m:18080/history/application_1580137635209_0001/1/|unavailable|Request method: GET
20/01/27 15:50:55 ||f7617e15-3bf4-4a8c-9701-9785894d7884|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.0.234|SPARKHISTORYUI||||dispatch|uri|http://my-cluster-name-m:18080/history/application_1580137635209_0001/1/|success|Response status: 30
and lots and lots of other log records for Spark History UI resources. All good. Notice the 302 record (redirect)
However, if I hit the link for the in-progress application I get sent to http://my-cluster-name-m:18080/history/application_1580137635209_0006/1 which is the cluster master node, and the following displayed:
In the logs I see:
20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0006|unavailable|Request method: GET
20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0006|unavailable|Request method: GET
20/01/27 15:58:38 ||aec261d3-7ecc-43a7-8815-d7185ee13833|audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||dispatch|uri|http://my-cluster-name-m:8088/proxy/application_1580137635209_0006|success|Response status: 200
20/01/27 15:58:38 |||audit|109.231.200.210, 165.225.80.109, 34.102.220.138, 130.211.1.130|YARNUI||||access|uri|/gateway/my-cluster-name/yarn/proxy/application_1580137635209_0006|success|Response status: 200
Notice there are no 302 records there.
Edit: Since originally posting this I have noticed that if i click on the Tracking UI link immediately after the application starts then I am taken to the details of the yarn application:
A few seconds later clicking on the same link will take me to the error as shown above.
I'm a bit lost at this point. Can anyone help explain why I can't view the Spark UI for in-progress applications? Any pointers as to how I can diagnose would be welcomed.

OK, the answer is rather embarrassing. The cause was simply that the spark UI was not enabled. Changing setting spark.ui.enabled to true solved this particular problem.

Related

Significance of the generic pull_request event and other more specific pull_request events like pull_request.opened

I am developing a GitHub App using nodejs and probot framework. I can see the Application class (https://probot.github.io/api/latest/classes/application.html) of the probot framework contains events like :
> event: "pull_request" | "pull_request.assigned" |
> "pull_request.closed" | "pull_request.edited" | "pull_request.labeled"
> | "pull_request.opened" | "pull_request.reopened" |
> "pull_request.review_request_removed" |
> "pull_request.review_requested" | "pull_request.unassigned" |
> "pull_request.unlabeled" | "pull_request.synchronize
I have noticed that when the "Create pull request" button is clicked, then the pull_request as well as pull_request.opened events are fired.
In order to understand this behavior of firing multiple seemingly similar events upon the click of the same button, I tried to reopen a closed request and print out the Context object for both pull_request event as well as pull_request.reopened event.
I did a diff comparison of both the contexts and found that the contexts returned by both the events are
identical except that the context of pull_request event contained below additional properties :
merged: false,
mergeable: null,
rebaseable: null,
mergeable_state: 'unknown',
merged_by: null,
comments: 6,
review_comments: 0,
maintainer_can_modify: false,
commits: 1,
additions: 1,
deletions: 0,
changed_files: 1 },
repository:
{ id: 123456789,
node_id: '',
name: '',
full_name: '',
private: true,
owner: [Object],
html_url: 'some-url-here'
.
.
///////////////////--------many more urls-------//////////////////////
created_at: '2020-04-0',
updated_at: '2020-04-0',
We know that the general format of the context object returned is as follows :
Context {
name: 'pull_request',
id: '187128937812-8219-89891892133-16752-1234576765545',
payload:
{ action: 'reopened',
number: 1,
pull_request:
{ url:
.
.
.and so on.......
This above information is present in both the contexts. We can see that this also tells us about the specific action that was performed and this is denoted by the context.payload.action. So, if someone's requirement is to get hold of the pull_request.opened he/she could do it by just by using pull_request event as follows :
app.on('pull_request', async context => {
console.log('---------------------- on pull_request event')
console.log('Context returned :----', context)
})
And doesn't need to care about the other more specific events(here pull_request.opened) i.e. apart from what is achieved from the above code, below code would provide no real additional help :
app.on('pull_request.opened', async context => {
console.log('---------------------- on pull_request.opened')
console.log('Context returned :----', context)
})
So here is the question that's troubling me :
What is the purpose of the pull_request event , if its other specific forms (like pull_request.reopened) carry no different information(more precisely, if their contexts contain no different infos) ?
I am quite sure that there does lie some wisdom behind it. I'm not able to find anything on the internet, nothing in the docs that could explain this.
Please help me understand the hidden wisdom.
EDIT 1 : START
Forgot to mention one observation and that is : Reopening the pull request also triggers issue_comment.created event. So, So three events are triggered by one action(clicking Create Pull Request).
EDIT 2 : START
What is the purpose of the pull_request event , if its other specific forms (like pull_request.reopened) carry no different information(more precisely, if their contexts contain no different infos) ?
This is just a feature of Probot to simplify processing webhook events from GitHub. I'll try and explain why it's helpful.
If you were to consume webhook events without Probot, you would have to parse every pull_request event, check the action field for a case, and decide whether to handle it.
There are several events that have a top-level action field in the payload, incuding:
check_run
issue
project
pull_request
and many more in the docs...
Rather than make application developers perform this parsing and inspection of the JSON themselves they decided to simplify the callbacks so you can subscribe to webhooks using the specific [event].[action] pattern, and the framework takes care of invoking your callback when the matching event and action is received.
So you have two options for handling pull_request events:
if you don't know which events you need, or need to dynamically process events, subscribing to pull_request is how you would receive all pull request events
if you know which events you should handle, and can ignore the rest, subscribe to explicit pull_request.[event] should simplify your application code
You could also subscribe to *, which represents all events the probot app receives, rather than explicitly listing all supported events in your app.

Smartsheet API response codes with python SDK?

I am using the Smartsheet python SDK to do bulk operations (deletes, updates, etc.) on a Smartsheet. As my process becomes more complex, I've realized that I need to include some internal checks to make sure I am not encountering errors when sending multiple calls per minute as Smartsheet suggests in their API Best Practices.
My question is this: How do I access and parse the API responses while using SDK functions such as Sheets.delete_rows()? For instance, some of my requests using this function can trigger status: 500 Internal Server Error which mean the request was properly formatted but the operation failed on the Smartsheet end.
I can view these responses in my log file (or in terminal while running interactively) but how do I access them from within my script so I can, for example, sleep() my process for xx seconds if encountering such a response?
If you are looking to know the results of a request you can store the response in a variable and inspect that for determining the next steps your process should take. In the case of the DELETE Rows request a Result object is returned.
deleteRows = smartsheet_client.Sheets.delete_rows({{sheetId}}, [ {{rowId1}}, {{rowId2}}, {{rowId3}}])
print(deleteRows)
If the request was successful the response would look like this:
{"data": [7411278123689860], "message": "SUCCESS", "result": [7411278123689860], "resultCode": 0}
If there was an issue, rows weren't found for example, the response would look like this:
{"result": {"code": 1006, "errorCode": 1006, "message": "Not Found", "name": "ApiError", "recommendation": "Do not retry without fixing the problem. ", "refId": "jv6o8uyrya2s", "shouldRetry": false, "statusCode": 404}}
All of the Smartsheet SDKs will backoff and retry by default. Other errors will be thrown as exceptions.
There is a way to increase the default timeout (to allow more retries) when creating the client. However, the Python-specific way to do it doesn't seem to be documented yet. I'll add it to the queue. In the meantime, I think the Ruby example below will be the closest to how Python probably does it, but you might want to read through the various ways to do this.
C#: https://github.com/smartsheet-platform/smartsheet-csharp-sdk/blob/master/ADVANCED.md#sample-retryhttpclient
Java: https://github.com/smartsheet-platform/smartsheet-java-sdk/blob/master/ADVANCED.md#sample-retryhttpclient
Node.js: https://github.com/smartsheet-platform/smartsheet-javascript-sdk#retry-configuration
Ruby: https://github.com/smartsheet-platform/smartsheet-ruby-sdk#retry-configuration

Using sails.io pubsub works with shortcuts, but not with RESTful record creation

I'm having an issue where I'm getting events from the sails resourceful pubsub when I create a record with the shortcut routes, but not with the RESTful routes.
In my client code, I have a sails socket that I listen to for the model:
io.socket.on('users', function(event){console.log(event);})
If I use a shortcut route to create a record (http://localhost:1337/users/create?name=test), I get the callback in the console as expected:
>{verb: "created", data: {…}, id: 44}
However, if I use the socket to post from the client, the callback never fires.
io.socket.post('/users', { name: 'test' });
The record is created in the DB, and even more confusing is that the Sails server log says its publishing the message:
silly: Published message to sails_model_create_users : { verb: 'created',
data:
{ name: 'test',
createdAt: '2017-10-09T02:58:18.218Z',
updatedAt: '2017-10-09T02:58:18.218Z',
id: 44 },
id: 44 }
I'm the using generic blueprints, sails is v0.12.14. Any ideas what I'm doing wrong?
I figured out why I wasn't getting the events back on the pubsub. Sails excludes the socket that sent the req from the event. You can get the result from the post callback, but this breaks my nice dataflow. So I ended up using io.sails.connect() to create 2 sockets, one for the crud calls and one for the pubsib.

RESTful API and real life example

We have a web application (AngularJS and Web API) which has quite a simple functionality - displays a list of jobs and allows users to select and cancel selected jobs.
We are trying to follow RESTful approach with our API, but that's where it gets confusing.
Getting jobs is easy - simple GET: /jobs
How shall we cancel the selected jobs? Bearing in mind that this is the only operation on jobs we need to implement. The easiest and most logical approach (to me) is to send the list of selected jobs IDs to the API (server) and do necessary procedures. But that's not RESTful way.
If we are to do it following RESTful approach it seams that we need to send PATCH request to jobs, with json similar to this:
PATCH: /jobs
[
{
"op": "replace",
"path": "/jobs/123",
"status": "cancelled"
},
{
"op": "replace",
"path": "/jobs/321",
"status": "cancelled"
},
]
That will require generating this json on client, then mapping it to some the model on server, parsing "path" property to get the job ID and then do actual cancellation. This seems very convoluted and artificial to me.
What is the general advice on this kind of operation? I'm curious what people do in real life when a lot of operations can't be simply mapped to RESTful resource paradigm.
Thanks!
If by cancelling a job you mean deleting it then you could use the DELETE verb:
DELETE /jobs?ids=123,321,...
If by cancelling a job you mean setting some status field to cancelled then you could use the PATCH verb:
PATCH /jobs
Content-Type: application/json
[ { "id": 123, "status": "cancelled" }, { "id": 321, "status": "cancelled" } ]
POST for Business Process
POST is often an overlooked solution in this situation. Treating resources as nouns is a useful and common practice in REST, and as such, POST is often mapped to the "CREATE" operation from CRUD semantics - however the HTTP Spec for POST mandates no such thing:
The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics. For example, POST is used for the following functions (among others):
Providing a block of data, such as the fields entered into an HTML form, to a data-handling process;
Posting a message to a bulletin board, newsgroup, mailing list, blog, or similar group of articles;
Creating a new resource that has yet to be identified by the origin server; and
Appending data to a resource's existing representation(s).
In your case, you could use:
POST /jobs/123/cancel
and consider it an example of the first option - providing a block of data to a data handling process - and is analogous to html forms using POST to submit the form.
With this technique, you could return the job representation in the body and/or return a 303 See Other status code with the Location set to /jobs/123
Some people complain that this looks 'too RPC' - but there is nothing that is not RESTful about it if you read the spec - and personally I find it much clearer than trying to find an arbitrary mapping from CRUD operations to real business processes.
Ideally, if you are concerned with following the REST spec, the URI for the cancel operation should be provided to the client via a hypermedia link in your job representation. e.g. if you were using HAL, you'd have:
GET /jobs/123
{
"id": 123,
"name": "some job name",
"_links" : {
"cancel" : {
"href" : "/jobs/123/cancel"
},
"self" : {
"href" : "/jobs/123"
}
}
}
The client could then obtain the href of the "cancel" rel link, and POST to it to effect the cancellation.
Treat Processes as Resources
Another option is, depending on if it makes sense in your domain, to make a 'cancellation' a noun and associate data with it, such as who cancelled it, when it was cancelled etc. - this is especially useful if a job may be cancelled, reopened and cancelled again, as the history of changes could be useful business data, or if the act of cancelling is an asynchronous process that requires tracking the state of the cancellation request over time. With this approach, you could use:
POST /jobs/123/cancellations
which would "create" a job cancellation - you could then have operations like:
GET /jobs/123/cancellations/1
to return the data associated with the cancellation, e.g.
{
"cancelledBy": "Joe Smith",
"requestedAt": "2016-09-01T12:43:22Z",
"status": "in process"
"completedAt": null
}
and:
GET /jobs/123/cancellations
to return a collection of cancellations that have been applied to the job and their current status.
Example 1: Let’s compare it with a real-world example: You go to a restaurant you sit at your table and you choose that you need ABC. You will have your waiter coming up and taking a note of what you want. You tell him that you want ABC. So, you are requesting ABC, the waiter responds back with ABC he gets in the kitchen and serves you the food. In this case, who is your interface in between you and the kitchen is your waiter. It’s his responsibility to carry the request from you to the kitchen, make sure it’s getting done, and you know once it is ready he gets back to you as a response.
Example 2: Another important example that we can relate is travel booking systems. For instance, take Kayak the biggest online site for booking tickets. You enter your destination, once you select dates and click on search, what you get back are the results from different airlines. How is Kayak communicating with all these airlines? There must be some ways that these airlines are actually exposing some level of information to Kayak. That’s all the talking, it’s through API’s
Example 3: Now open UBER and see. Once the site is loaded, it gives you an ability to log in or continue with Facebook and Google. In this case, Google and Facebook are also exposing some level of users’ information. There is an agreement between UBER and Google/Facebook that has already happened. That’s the reason it is letting you sign up with Google/ Facebook.
PUT /jobs{/ids}/status "cancelled"
so for example
PUT /jobs/123,321/status "cancelled"
if you want to cancel multiple jobs. Be aware, that the job id must not contain the comma character.
https://www.rfc-editor.org/rfc/rfc6570#page-25

How to call DELETE request method in Yesod?

In the book on routing ( http://www.yesodweb.com/book/routing-and-handlers ) there's a paragraph:
A separate handler for each request method will be the same, plus a
list of request methods. The request methods must be ALL CAPITAL
LETTERS. For example, /person/#String PersonR GET POST DELETE. In this
case, you would need to define the three handler functions getPersonR,
postPersonR and deletePersonR.
Performing something like
curl -X DELETE localhost:3000/person/1
works, so the server is capable of handling these requests.
Several examples (like https://github.com/snoyberg/haskellers/blob/master/routes and http://pbrisbin.com/posts/posts_database ) use GET or POST requests (in stead of DELETE) to handle this.
Is there a straight-forward way to call the DELETE request from Yesod-code? So that the route handler deletePersonR gets called?
Unlike the GET and POST methods, which can be accessed using plain links or forms on a page, DELETE methods require using JavaScript, and are not supported by all browsers. This is why POST is often used instead. To invoke a DELETE method from JavaScript, the easiest way is to use a JavaScript framework such as jQuery:
$.ajax({
url: "/person/1",
type: "DELETE",
success: function(html){
alert("Ok, deleted");
}
});