How to convert Livy curl call to Livy Rest API call - scala

I am getting started with Livy, in my setup Livy server is running on Unix machine and I am able to do curl to it and execute the job. I have created a fat jar and uploaded it on hdfs and I am simply calling its main method from Livy. My Json payload for Livy looks like below:
{
"file" : "hdfs:///user/data/restcheck/spark_job_2.11-3.0.0-RC1-
SNAPSHOT.jar",
"proxyUser" : "test_user",
"className" : "com.local.test.spark.pipeline.path.LivyTest",
"files" : ["hdfs:///user/data/restcheck/hivesite.xml","hdfs:///user/data/restcheck/log4j.properties"],
"driverMemory" : "5G",
"executorMemory" : "10G",
"executorCores" : 5,
"numExecutors" : 10,
"queue" : "user.queue",
"name" : "LivySampleTest2",
"conf" : {"spark.master" : "yarn","spark.executor.extraClassPath" :
"/etc/hbase/conf/","spark.executor.extraJavaOptions" : "-Dlog4j.configuration=file:log4j.properties","spark.driver.extraJavaOptions" : "-Dlog4j.configuration=file:log4j.properties","spark.ui.port" : 4100,"spark.port.maxRetries" : 100,"JAVA_HOME" : "/usr/java/jdk1.8.0_60","HADOOP_CONF_DIR" :
"/etc/hadoop/conf:/etc/hive/conf:/etc/hbase/conf","HIVE_CONF_DIR" :
"/etc/hive/conf"}
}
and below is my curl call to it:
curl -X POST --negotiate -u:"test_user" --data #/user/data/Livy/SampleFile.json  -H "Content-Type: application/json" https://livyhost:8998/batches
I am trying to convert this a REST API call and following the WordCount example provided by Cloudera but not able to covert my curl call to the REST API. I have all the jars already added in HDFS so I dont think I need to do the upload jar call.

It should work with curl also
Please try the below JSON.
curl -H "Content-Type: application/json" https://livyhost:8998/batches
-X POST --data '{
"name" : "LivyREST",
"className" : "com.local.test.spark.pipeline.path.LivyTest",
"file" : "/user/data/restcheck/spark_job_2.11-3.0.0-RC1-
SNAPSHOT.jar"
}'
Also, I am adding some more references
http://gethue.com/how-to-use-the-livy-spark-rest-job-server-api-for-submitting-batch-jar-python-and-streaming-spark-jobs/

Related

Spark Job SUBMITTED but not RUNNING after submit via REST API

Following the instructions in this website, I'm trying to submit a job to Spark via REST API /v1/submissions.
I tried to submit SparkPi in the example:
$ ./create.sh
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20211212044718-0003",
"serverSparkVersion" : "3.1.2",
"submissionId" : "driver-20211212044718-0003",
"success" : true
}
$ ./status.sh driver-20211212044718-0003
{
"action" : "SubmissionStatusResponse",
"driverState" : "SUBMITTED",
"serverSparkVersion" : "3.1.2",
"submissionId" : "driver-20211212044718-0003",
"success" : true
}
create.sh:
curl -X POST http://172.17.197.143:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"appResource": "/home/ruc/spark-3.1.2/examples/jars/spark-examples_2.12-3.1.2.jar",
"sparkProperties": {
"spark.master": "spark://172.17.197.143:7077",
"spark.driver.memory": "1g",
"spark.driver.cores": "1",
"spark.app.name": "REST API - PI",
"spark.jars": "/home/ruc/spark-3.1.2/examples/jars/spark-examples_2.12-3.1.2.jar",
"spark.driver.supervise": "true"
},
"clientSparkVersion": "3.1.2",
"mainClass": "org.apache.spark.examples.SparkPi",
"action": "CreateSubmissionRequest",
"environmentVariables": {
"SPARK_ENV_LOADED": "1"
},
"appArgs": [
"400"
]
}'
status.sh:
export DRIVER_ID=$1
curl http://172.17.197.143:6066/v1/submissions/status/$DRIVER_ID
But when I try to get the status of the job (even after a few minutes), I got a "SUBMITTED" rather than "RUNNING" or "FINISHED".
Then I looked up the log and found that
21/12/12 04:47:18 INFO master.Master: Driver submitted org.apache.spark.deploy.worker.DriverWrapper
21/12/12 04:47:18 WARN master.Master: Driver driver-20211212044718-0003 requires more resource than any of Workers could have.
# ...
21/12/12 04:49:02 WARN master.Master: Driver driver-20211212044718-0003 requires more resource than any of Workers could have.
However, in my spark-env.sh, I have
export SPARK_WORKER_MEMORY=10g
export SPARK_WORKER_CORES=2
I have no idea what happened. How can I make it run normally?
Since you've checked resources and You have enough. It might be network issue. executor maybe cannot connect back to driver program. Allow traffic on both master and workers.

Kafka Connect Transformations - RegexRouter replacement topic names in lowercase

We are trying to setup a connector (Debezium) in Kafka Connect and transform all the topic names generated by this connector via regular expressions. The regex below is working and detects the patterns we want, but we also need to create all the topic names in lowercase.
We have tried to put this in the replacement expression as \L$1 but it is just printing and L in front of our topic names, for example LOutbound.Policy instead of outbound.policy
Does anybody know how to do this? Thanks in advance!
This is the connector curl command
curl -i -X PUT http://kafka-alpha-cp-kafka-connect:8083/connectors/kafka-bi-datacontract/config -H "Content-Type: application/json" -d '{
"name": "kafka-bi-datacontract",
"connector.class" : "io.debezium.connector.sqlserver.SqlServerConnector",
"database.hostname" : "ukdb3232123",
"database.server.name" : "ukdb3232123\\perf",
"database.port" : "12442",
"database.user" : "KafkaConnect-BI",
"database.password" : "*******",
"database.dbname" : "BeazleyIntelligenceDataContract",
"snapshot.lock.timeout.ms" : "10000000",
"table.whitelist" : "Outbound.Policy,Outbound.Section",
"database.history.kafka.bootstrap.servers" : "kafka-alpha-cp-kafka-headless:9092",
"database.history.kafka.topic": "schema-changes.bidatacontract",
"transforms": "dropTopicPrefix",
"transforms.dropTopicPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropTopicPrefix.regex":"^[^.]+.(.*)",
"transforms.dropTopicPrefix.replacement":"\\L$1"
}'
\L$1 or \\L$1 would be the same as L$1.
You would need to create/find your own transform for lowercasing.
Once you do, you can do something like this
"transforms": "dropTopicPrefix,lowertopic",
"transforms.dropTopicPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",
"transforms.dropTopicPrefix.regex":"^[^.]+.(.*)",
"transforms.dropTopicPrefix.replacement":"$1",
"transforms.lowerTopic.type":"com.example.foo.LowerCase$Topic",

Jfrog REST API Jenkins Groovy status code: 404, reason phrase: Not Found

I have a Groovy script which is run in the Jenkins script console. The script uses the JFrog Rest API to run some queries. One of which returns: status code: 404, reason phrase: Not Found
CURL:
$ curl -X GET -H "X-JFrog-Art-Api:APIKey" https://OU.jfrog.io/OU/api/storage/test-repository/docker-log-gen/1.12/manifest.json?properties
{
"properties" : { ... },
"uri" : "https://OU.jfrog.io/artifactory/api/storage/test-repository/docker-log-gen/1.12/manifest.json"
}
WGET
$ wget --header="X-JFrog-Art-Api:APIKey" https://OU.jfrog.io/OU/api/storage/test-repository/docker-log-gen/1.12/manifest.json?properties
--2020-01-14 13:12:16-- https://OU.jfrog.io/OU/api/storage/test-repository/docker-log-gen/1.12/manifest.json?properties
HTTP request sent, awaiting response... 200 OK
Jenkins Groovy
def restClient = new RESTClient('https://OU.jfrog.io')
restClient.headers['X-JFrog-Art-Api'] = 'APIKey'
println(restClient.get(path: '/OU/api/storage/test-repository/docker-log-gen/1.12/manifest.json?properties', requestContentType: 'text/plain') )
groovyx.net.http.HttpResponseException: status code: 404, reason phrase: Not Found
Other rest calls (api/docker) are made prior to this one in the script and return successfully. I am unable to identify a cause for this response, as shown the command-line calls return the expected JSON.
Please help.
The part after the first question mark is not the URI path component.
println(restClient.get(path: '/OU/api/storage/test-repository/docker-log-gen/1.12/manifest.json', query: ['properties': ''] , requestContentType: 'text/plain').data.text )
{
"properties" : { ... },
"uri" : "https://OU.jfrog.io/artifactory/api/storage/test-repository/docker-log-gen/1.12/manifest.json"
}

How to enable Javascript in Druid

I have been using Druid for the past week and wanted to enable javascript for some postAggregations.
I think I followed the outlined steps and updated the common.runtime.properties file in ../con f/druid/_common/ to include druid.javascript.enabled=true. I then stopped the current processes and re-ran the Quickstart procedures, but it still says that JavaScript is disabled:
{
"error" : "Unknown exception",
"errorMessage" : "Instantiation of [simple type, class io.druid.query.aggregation.post.JavaScriptPostAggregator] value failed: JavaScript is disabled. (through reference chain: java.util.ArrayList[0])",
"errorClass" : "com.fasterxml.jackson.databind.JsonMappingException",
"host" : null
}
I am currently running it in the 'Quickstart' configuration - single local machine. Any pointers? Thanks!
JavaScript Query For druid Aggregation. Save the file as .body and hit the curl request.
This is a sample query for Average value.
curl -X POST "http://localhost:8082/druid/v2/?pretty" \ -H
'content-type: application/json' -d #query.body
{
"queryType":"groupBy",
"dataSource":"whirldata",
"granularity":"all",
"dimensions":[],
"aggregations":[{"name":"rows","type":"count","fieldName":"rows"},
{"name":"TargetDOS","type":"doubleSum","fieldName":"Target DOS"}],"postAggregations":[
{
"type": "javascript",
"name": "Target DOS Average",
"fieldNames": ["TargetDOS", "rows"],
"function": "function(TargetDOS, rows) { return Math.abs(TargetDOS) / rows; }"
}], "intervals":[ "2006-01-01T00:00:00.000Z/2020-01-01T00:00:00.000Z" ]}
The part you are missing is likely that the quickstart reads configs from conf-quickstart rather than conf. So try editing conf-quickstart/druid/_common/common.runtime.properties.

Spark REST API: Failed to find data source: com.databricks.spark.csv

I have a pyspark file stored on s3. I am trying to run it using spark REST API.
I am running the following command:
curl -X POST http://<ip-address>:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"action" : "CreateSubmissionRequest",
"appArgs" : [ "testing.py"],
"appResource" : "s3n://accessKey:secretKey/<bucket-name>/testing.py",
"clientSparkVersion" : "1.6.1",
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass" : "org.apache.spark.deploy.SparkSubmit",
"sparkProperties" : {
"spark.driver.supervise" : "false",
"spark.app.name" : "Simple App",
"spark.eventLog.enabled": "true",
"spark.submit.deployMode" : "cluster",
"spark.master" : "spark://<ip-address>:6066",
"spark.jars" : "spark-csv_2.10-1.4.0.jar",
"spark.jars.packages" : "com.databricks:spark-csv_2.10:1.4.0"
}
}'
and the testing.py file has a code snippet:
myContext = SQLContext(sc)
format = "com.databricks.spark.csv"
dataFrame1 = myContext.read.format(format).option("header", "true").option("inferSchema", "true").option("delimiter",",").load(location1).repartition(1)
dataFrame2 = myContext.read.format(format).option("header", "true").option("inferSchema", "true").option("delimiter",",").load(location2).repartition(1)
outDataFrame = dataFrame1.join(dataFrame2, dataFrame1.values == dataFrame2.valuesId)
outDataFrame.write.format(format).option("header", "true").option("nullValue","").save(outLocation)
But on this line:
dataFrame1 = myContext.read.format(format).option("header", "true").option("inferSchema", "true").option("delimiter",",").load(location1).repartition(1)
I get exception:
java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at http://spark-packages.org
Caused by: java.lang.ClassNotFoundException: com.databricks.spark.csv.DefaultSource
I was trying different things out and one of those things was that I logged into the ip-address machine and ran this command:
./bin/spark-shell --packages com.databricks:spark-csv_2.10:1.4.0
so that It would download the spark-csv in .ivy2/cache folder. But that didn't solve the problem. What am I doing wrong?
(Posted on behalf of the OP).
I first added spark-csv_2.10-1.4.0.jar on driver and worker machines. and added
"spark.driver.extraClassPath" : "absolute/path/to/spark-csv_2.10-1.4.0.jar",
"spark.executor.extraClassPath" : "absolute/path/to/spark-csv_2.10-1.4.0.jar",
Then I got following error:
java.lang.NoClassDefFoundError: org/apache/commons/csv/CSVFormat
Caused by: java.lang.ClassNotFoundException: org.apache.commons.csv.CSVFormat
And then I added commons-csv-1.4.jar on both machines and added:
"spark.driver.extraClassPath" : "/absolute/path/to/spark-csv_2.10-1.4.0.jar:/absolute/path/to/commons-csv-1.4.jar",
"spark.executor.extraClassPath" : "/absolute/path/to/spark-csv_2.10-1.4.0.jar:/absolute/path/to/commons-csv-1.4.jar",
And that solved my problem.