Spark REST API difficulties in understanding, goal sending RESTful messages from webpage - rest

For a project I would like to run Spark via a webpage. Here the goal is to submit dynamically submission requests and status updates. As inspiration I used the following weblink: When asking for
I am sending a REST request for checking spark submission after submitting the below Spark request: http://arturmkrtchyan.com/apache-spark-hidden-rest-api
The Request code for a Spark job submission is the following:
curl -X POST http://sparkmasterIP:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"action" : "CreateSubmissionRequest",
"appArgs" : [ "/home/opc/TestApp.jar"],
"appResource" : "file:/home/opc/TestApp.jar",
"clientSparkVersion" : "1.6.0",
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass" : "com.Test",
"sparkProperties" : {
"spark.driver.supervise" : "false",
"spark.app.name" : "TestJob",
"spark.eventLog.enabled": "true",
"spark.submit.deployMode" : "cluster",
"spark.master" : "spark://sparkmasterIP:6066"
}
}'
Response:
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20170302152313-0044",
"serverSparkVersion" : "1.6.0",
"submissionId" : "driver-20170302152313-0044",
"success" : true
}
When asking for the submission status there were some difficulties. To request the submission status I used the submissionId displayed in the response code above. So the following command was used:
curl http://masterIP:6066/v1/submissions/status/driver-20170302152313-0044
The Response for Submission Status contained the following error:
"message" : "Exception from the cluster:\njava.io.FileNotFoundException: /home/opc/TestApp.jar denied)\n\tjava.io.FileInputStream.open0(Native Method)\n\tjava.io.FileInputStream.open(FileInputStream.java:195)\n\tjava.io.FileInputStream.<init>(FileInputStream.java:138)\n\torg.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:124)\n\torg.spark-project.guava.io.Files$FileByteSource.openStream(Files.java:114)\n\torg.spark-project.guava.io.ByteSource.copyTo(ByteSource.java:202)\n\torg.spark-project.guava.io.Files.copy(Files.java:436)\n\torg.apache.spark.util.Utils$.org$apache$spark$util$Utils$$copyRecursive(Utils.scala:540)\n\torg.apache.spark.util.Utils$.copyFile(Utils.scala:511)\n\torg.apache.spark.util.Utils$.doFetchFile(Utils.scala:596)\n\torg.apache.spark.util.Utils$.fetchFile(Utils.scala:395)\n\torg.apache.spark.deploy.worker.DriverRunner.org$apache$spark$deploy$worker$DriverRunner$$downloadUserJar(DriverRunner.scala:150)\n\torg.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:79)",
My question is how to use such an API, in such a way that the submission status can be obtained. If there is another API where the correct status can be obtained, then I would like a short description of how this API works in a RESTful way.
Thanks

As noted down in the comments of the blog http://arturmkrtchyan.com/apache-spark-hidden-rest-api , some more commenter's are experiencing this problem as well. Here below I will try to explain some of the possible reasons.
It looks like your file:/home/opc/TestApp.jar is not found/denied. This might be because of the directory cannot be found (access denied,cannot find). This is likely because it is not there on all nodes and the Spark submit is in cluster mode.
As noted in the Spark documentation for application jar Spark documentation. Application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.
To solve this one of the recommendations I can do is to execute the command using spark-submit. More information about spark-submit can be found at Spark submit and a book by Jacek Laskowski
spark-submit --status [submission ID] --master [spark://...]

Related

Print editions using metaboss on Solana

I'm trying to create prints from a master edition (aka original edition) using from the console. The number of prints should be limited to a fixed number.
I followed this procedure :
Upload the image to Arweave : arloader upload image.jpg --with-sol --sol-keypair-path ~/.config/solana/id.json --ar-default-keypair --no-bundle.
Create the json file with NFT metadata :
{
"name": "name_of__the_collection",
"symbol": "token_of_the_collection",
"uri": "https://arweave.net/[arweave_img_tx_id]",
"seller_fee_basis_points": 0,
"creators": [
{
"address": "address_of_the_creator_of_the_collection",
"verified": false,
"share": 100
}
]
}
Mint the NFT :
metaboss mint one --keypair ~/.config/solana/id.json --nft-data-file ./metadata.json --max-editions='10'
Create the all the prints :
metaboss mint missing-editions --account address_of_the_creator_of_the_collection
I have two issues :
On solana explorer, I have an error : error loading image
The 4. command returns an error : Error: failed to get account data
What's wrong ?
[edit] Error 1 : I used uri key instead of the image in the metadata. That's why solana explorer couldn't find the image.
Generally the process is good. There are some details that have to be aligned though:
Regarding the missing image:
You have to upload the metadata JSON file, too. This is what you reference in the mint command.
Your metadata is not 100% valid. E.g. you are missing the properties field. Have a look into the Token Metadata docs for more details.
Regarding metaboss mint missing-editions:
The Account you specify with --account should not be the address of the creator of the collection but instead the Master Edition Address. (Master Edition is the NFT you minted in step 3)
Since the command runs a GPA call you should add --timeout 120 and use not use the default RPC. Otherwise you will not get results.
If it still does not work you can also run
metaboss mint editions --next-editions 9
Please let me know in case of any uncertainties.

Setup Apereo Cas Management integrated with CAS server

I want to install Apero Cas Management (verison 6.0) and integrate it with Cas Server (version 6.0).
I have installed following these step:
Step 1: I installed Cas Server
I checked it with REST API. It worked.
My server stays at http://203.162.141.7:8080
And this is configuration of my Cas server. I put this config at /etc/cas/config. Here is my file cas.properties file
cas.server.name=http://203.162.141.7:8080
cas.server.prefix=${cas.server.name}/cas
logging.config: file:/etc/cas/config/log4j2.xml
server.port=8080
server.ssl.enabled=false
cas.serviceRegistry.initFromJson=false
cas.serviceRegistry.json.location=file:/etc/cas/services-repo
cas.authn.oauth.grants.resourceOwner.requireServiceHeader=true
cas.authn.oauth.userProfileViewType=NESTED
cas.authn.policy.requiredHandlerAuthenticationPolicyEnabled=false
cas.authn.attributeRepository.stub.attributes.email=casuser#example.org
#REST API JSON
cas.rest.attributeName=email
cas.rest.attributeValue=.+example.*
Step 2: I installed Cas-management-overlay
I put my cas-management-overlay's config file a /etc/cas/config too. Here is my management.properties file
cas.server.name=http://203.162.141.7:8080
cas.server.prefix=${cas.server.name}/cas
mgmt.serverName=http://203.162.141.7:8088
mgmt.adminRoles[0]=ROLE_ADMIN
mgmt.userPropertiesFile=file:/etc/cas/config/users.json
server.port=8088
server.ssl.enabled=false
logging.config=file:/etc/cas/config/log4j2-management.xml
And my here is users.json file
{
"casuser" : {
"#class" : "org.apereo.cas.mgmt.authz.json.UserAuthorizationDefinition",
"roles" : [ "ROLE_ADMIN" ]
}
}
Then I run ./build.sh, and it shows me that
Finally, I access this link to open cas-management http://203.162.141.7:8088/cas-management, but the it redirects to this url http://203.162.141.7:8080/cas/login?service=http%3A%2F%2F203.162.141.7%3A8088%2Fcas-management%2F and shows this error below
I don't know where I have gone wrong.
I think since you haven't told the management webapp about the location of the service registry, it can't add itself as a registered service.
Manually add a registered service for http://203.162.141.7:8088/cas-management and you should be able to log in to the management app at that point.
Here is my answer for cas-management register file name /etc/cas/services-repo/casManagement-1.json
{
"#class" : "org.apereo.cas.services.RegexRegisteredService",
"serviceId":"^https://domain:8088/cas-management.+",
"name" : "casManagement",
"id" : 1,
"evaluationOrder" : 1,
"allowedAttributes":["cn","mail"]
}

Can anyone help me with this error code in Data Fusion

I'm having a go at creating my first data fusion pipeline.
The data is going from Google Cloud Storage csv file to Big Query.
I have created the pipeline and carried out a preview run which was successful but after deployment trying to run resulted in error.
I pretty much accepted all the default settings apart from obviously configuring my source and destination.
Error from Log ...
com.google.api.client.googleapis.json.GoogleJsonResponseException: 403
Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "Required 'compute.firewalls.list' permission for
'projects/xxxxxxxxxxx'",
"reason" : "forbidden"
} ],
"message" : "Required 'compute.firewalls.list' permission for
'projects/xxxxxxxxxx'"
}
After deployment run fails
Do note that as a part of creating an instance, you must set up permissions [0]. The role "Cloud Data Fusion API Service Agent" must be granted to the exact service account, as specified in that document, which has an email address that begins with "cloud-datafusion-management-sa#...".
Doing so should resolve your issue.
[0] : https://cloud.google.com/data-fusion/docs/how-to/create-instance#setting_up_permissions

XS Project Share SAP HANA cannot see in browser

I have project with XS project, I already shared to HANA packages but failed when show to browser, the error show:
404 - Not found
We could not find the resource you're trying to access.
It might be misspelled or currently unavailable.
My .xsaccess:
{
"exposed" : true,
"authentication" : [{"method":"Basic"}],
"cache_control" : "no-cache, no-store",
"cors" : {
"enabled" : false
}
}
.xsapp:
{}
xsprivileges:
{
"privileges" : [
{ "name" : "ProfileOwner", "description" : "Profile Ownership" }
]
}
and one question, is it possible the problem because the role user or privileges user, about authorization? How to fix this issue? thanks
The .xsapp should be a empty file with no content in it. The exposed parameter in the .xsaccess should be enough to expose your project. Make sure that all files are activated in the HANA repository.
If the error was authorization specific you would get a 503 error. If the 404 error is a XSEngine page, either your code isn't activated or the package path is incorrect.

How to get all jobs status through spark REST API?

I am using spark 1.5.1 and I'd like to retrieve all jobs status through REST API.
I am getting correct result using /api/v1/applications/{appId}. But while accessing jobs /api/v1/applications/{appId}/jobs getting "no such app:{appID}" response.
How should I pass app ID here to retrieve jobs status of application using spark REST API?
Spark provides 4 hidden RESTFUL API
1) Submit the job - curl -X POST http://SPARK_MASTER_IP:6066/v1/submissions/create
2) To kill the job - curl -X POST http://SPARK_MASTER_IP:6066/v1/submissions/kill/driver-id
3) To check status if the job - curl http://SPARK_MASTER_IP:6066/v1/submissions/status/driver-id
4) Status of the Spark Cluster - http://SPARK_MASTER_IP:8080/json/
If you want to use another APIs you can try Livy , lucidworks
url - https://doc.lucidworks.com/fusion/3.0/Spark_ML/Spark-Getting-Started.html
This is supposed to work when accessing a live driver's API endpoints, but since you're using Spark 1.5.x I think you're running into SPARK-10531, a bug where the Spark Driver UI incorrectly mixes up application names and application ids. As a result, you have to use the application name in the REST API url, e.g.
http://localhost:4040/api/v1/applications/Spark%20shell/jobs
According to the JIRA ticket, this only affects the Spark Driver UI; application IDs should work as expected with the Spark History Server's API endpoints.
This is fixed in Spark 1.6.0, which should be released soon. If you want a workaround which should work on all Spark versions, though, then the following approach should work:
The api/v1/applications endpoint misreports job names as job ids, so you should be able to hit that endpoint, extract the id field (which is actually an application name), then use that to construct the URL for the current application's job list (note that the /applications endpoint will only ever return a single job in the Spark Driver UI, which is why this approach should be safe; due to this property, we don't have to worry about the non-uniqueness of application names). For example, in Spark 1.5.2 the /applications endpoint can return a response which contains a record like
{
id: "Spark shell",
name: "Spark shell",
attempts: [
{
startTime: "2015-09-10T06:38:21.528GMT",
endTime: "1969-12-31T23:59:59.999GMT",
sparkUser: "",
completed: false
}]
}
If you use the contents of this id field to construct the applications/<id>/jobs URL then your code should be future-proofed against upgrades to Spark 1.6.0, since the id field will begin reporting the proper IDs in Spark 1.6.0+.
For those who have this problem and are running on YARN:
According to the docs,
when running in YARN cluster mode, [app-id] will actually be [base-app-id]/[attempt-id], where [base-app-id] is the YARN application ID
So if your call to https://HOST:PORT/api/v1/applications/application_12345678_0123 returns something like
{
"id" : "application_12345678_0123",
"name" : "some_name",
"attempts" : [ {
"attemptId" : "1",
<...snip...>
} ]
}
you can get eg. jobs by calling
https://HOST:PORT/api/v1/applications/application_12345678_0123/1/jobs
(note the "1" before "/jobs").
If you want to use the REST API to control Spark, you're probably best adding the Spark Jobserver to your installation which then gives you a much more comprehensive REST API than the private REST APIs you're currently querying.
Poking around, I've managed to get the job status for a single application by running
curl http://127.0.0.1:4040/api/v1/applications/Spark%20shell/jobs/
which returned
[ {
"jobId" : 0,
"name" : "parquet at <console>:19",
"submissionTime" : "2015-12-21T10:46:02.682GMT",
"stageIds" : [ 0 ],
"status" : "RUNNING",
"numTasks" : 2,
"numActiveTasks" : 2,
"numCompletedTasks" : 0,
"numSkippedTasks" : 0,
"numFailedTasks" : 0,
"numActiveStages" : 1,
"numCompletedStages" : 0,
"numSkippedStages" : 0,
"numFailedStages" : 0 }]
Spark has some hidden RESTFUL API that you can try.
Note that i have not tried yet, but i will.
For example: to get status of submit application you can do:
curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20151008145126-0000
Note: "driver-20151008145126-0000" is submitsionId.
You can take a deep look in this link with this post from arturmkrtchyan on GitHub