zeppelin: Error while running any operation on resulting dataframe - scala

I am using Zeppelin notebook with Spark1.3.1 and Hadoop version 2.6. I am able to run entire tutorial shipped with it without any issues.
Then I created new notebook to run simple code that fetches data from parquet file stored in HDFS on local machine. Following is code:
val param_alarms = sqlc.parquetFile("hdfs://localhost/ge_alarm/alarm_param")
param_alarms.registerTempTable("Alarm")
param_alarms.count()
This fails with following error message:
param_alarms: org.apache.spark.sql.DataFrame = [PARAMETER_ID: int, PARAMETER_NAME: string, OSM_NAME: string, DESCRIPTION: string, TIMESTAMP: bigint, Value: int]
java.lang.NoSuchMethodError: org.json4s.JsonDSL$.string2jvalue(Ljava/lang/String;)Lorg/json4s/JsonAST$JValue;
at org.apache.spark.sql.types.StructType$$anonfun$jsonValue$5.apply(dataTypes.scala:1065)
at org.apache.spark.sql.types.StructType$$anonfun$jsonValue$5.apply(dataTypes.scala:1065)
at org.json4s.JsonDSL$JsonAssoc.$tilde(JsonDSL.scala:86)
at org.apache.spark.sql.types.StructType.jsonValue(dataTypes.scala:1065)
at org.apache.spark.sql.types.StructType.jsonValue(dataTypes.scala:1015)
at org.apache.spark.sql.types.DataType.json(dataTypes.scala:265)
at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToString(ParquetTypes.scala:404)
at org.apache.spark.sql.parquet.ParquetRelation2.buildScan(newParquet.scala:437)
at org.apache.spark.sql.sources.DataSourceStrategy$$anonfun$1.apply(DataSourceStrategy.scala:38)
at org.apache.spark.sql.sources.DataSourceStrategy$$anonfun$1.apply(DataSourceStrategy.scala:38)
at org.apache.spark.sql.sources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:107)
at org.apache.spark.sql.sources.DataSourceStrategy$.apply(DataSourceStrategy.scala:34)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
at org.apache.spark.sql.execution.SparkStrategies$HashAggregation$.apply(SparkStrategies.scala:152)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:1081)
at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:1079)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:1085)
at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:1085)
at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:815)
at org.apache.spark.sql.DataFrame.count(DataFrame.scala:827)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:50)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:52)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:54)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:56)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:58)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:60)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:62)
at $iwC$$iwC$$iwC.<init>(<console>:64)
at $iwC$$iwC.<init>(<console>:66)
at $iwC.<init>(<console>:68)
at <init>(<console>:70)
at .<init>(<console>:74)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:582)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:558)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:551)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:277)
at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I also tried with adding following dependency:
%dep
z.reset()
z.load("org.json4s:json4s-native_2.11:3.2.10")
By the way when I try running %sql select * from Alarm i get following error:
java.lang.reflect.InvocationTargetException
No luck with this either. Can someone help?
One on colleagues finally found solution, root cause it incompatible json4s. In case others are facing same problem. It was caused with incompatible json4s.
To fix this issue I needed to update following:
Modify zeppelin-server/pom.xml to use swagger-jersey-jaxrs_2.10 version 1.3.11
Modify zeppelin-engine/pom.xml to use org.reflections version 0.9.9-RC1
export MAVEN_OPTS='-Xmx2048m -XX:MaxPermSize=2048m'
Compile for hadoop2.6 and Spark1.3
mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests

One on colleagues finally found solution, root cause it incompatible json4s. In case others are facing same problem. It was caused with incompatible json4s. To fix this issue I needed to update following:
Modify zeppelin-server/pom.xml to use swagger-jersey-jaxrs_2.10 version 1.3.11
Modify zeppelin-engine/pom.xml to use org.reflections version 0.9.9-RC1
export MAVEN_OPTS='-Xmx2048m -XX:MaxPermSize=2048m'
Compile for hadoop2.6 and Spark1.3 mvn clean package -Pspark-1.3 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests

Related

Spark fails to read from Elasticsearch/Opensearch. Invalid map received dynamic_date_formats

Hi I'm trying using scala 2.11.12, spark 2.3.0 and elasticsearch-spark-20 7.7.0 to read from an OpenSearch 1.3.4 Index with the following code:
spark.read.format("org.elasticsearch.spark.sql")
.load("myIndex")
.filter('Timestamp === lit(dateToRead))
But I get this error
22/08/17 15:30:42 ERROR EventManager$: Unexpected error retrieving offsets. Bailing out...
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: invalid map received dynamic_date_formats=[yyyy-MM-dd HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS||yyyy-MM-dd||yyyy-MM-dd'T'HH||yyyy-MM-dd'T'HH:mm]
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseField(FieldParser.java:146)
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseMapping(FieldParser.java:88)
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseIndexMappings(FieldParser.java:69)
at org.elasticsearch.hadoop.serialization.dto.mapping.FieldParser.parseMappings(FieldParser.java:40)
at org.elasticsearch.hadoop.rest.RestClient.getMappings(RestClient.java:321)
at org.elasticsearch.hadoop.rest.RestClient.getMappings(RestClient.java:307)
at org.elasticsearch.hadoop.rest.RestRepository.getMappings(RestRepository.java:293)
at org.elasticsearch.spark.sql.SchemaUtils$.discoverMappingAndGeoFields(SchemaUtils.scala:103)
at org.elasticsearch.spark.sql.SchemaUtils$.discoverMapping(SchemaUtils.scala:91)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema$lzycompute(DefaultSource.scala:229)
at org.elasticsearch.spark.sql.ElasticsearchRelation.lazySchema(DefaultSource.scala:229)
at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:233)
at org.elasticsearch.spark.sql.ElasticsearchRelation$$anonfun$schema$1.apply(DefaultSource.scala:233)
at scala.Option.getOrElse(Option.scala:121)
at org.elasticsearch.spark.sql.ElasticsearchRelation.schema(DefaultSource.scala:233)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:431)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at com.MyCalass$$anonfun$myMethod2$1.apply(MyCalass.scala:130)
at com.MyCalass$$anonfun$myMethod2$1.apply(MyCalass.scala:126)
at scala.util.Try$.apply(Try.scala:192)
at com.MyCalass$.myMethod2(MyCalass.scala:126)
at com.MyCalass$.myMethod(MyCalass.scala:55)
at com.MyApp$.MyApp$$myMethod(MyApp.scala:107)
at com.MyApp$$anonfun$main$2.apply(MyApp.scala:86)
at com.MyApp$$anonfun$main$2.apply(MyApp.scala:76)
at scala.Option.fold(Option.scala:158)
at com.MyApp$.main(MyApp.scala:76)
at com.MyApp.main(MyApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Command exiting with ret '1'
I've set the dynamic date mapping in opensearch. And also I am able to write to the index with the correct mapping, but when I try to read, it fails.
I found out the problem, basically The elasticsearch connector is not working properly and it tries to use ES 1.3.4 instead of Opensearch 1.3.4, to solve this problem add compatibility.override_main_response_version : true to your opensearch.yml file.

Problems with Kafka Source initialization in Siddhi

Can't create stream from Kafka topic using Siddhi. Even if I create string with Design View.
I copied all required jars to lib and bundle folders. Even started Kafka with Zookeeper locally (dunno why I need it locally but nwm).
On tooling.sh start I have following error:
[2020-02-26 22:15:43,041] WARNING {org.wso2.carbon.launcher.extensions.OSGiLibBundleDeployerUtils lambda$getBundlesInfo$1} - Error when loading the OSGi bundle information from /home/Hed/StreamProcessor/siddhi-tooling-5.1.2/lib/kafka-clients-2.3.0.jar
java.io.IOException: Required bundle manifest headers do not exist
at org.wso2.carbon.launcher.extensions.OSGiLibBundleDeployerUtils.getBundleInfo(OSGiLibBundleDeployerUtils.java:183)
at org.wso2.carbon.launcher.extensions.OSGiLibBundleDeployerUtils.lambda$getBundlesInfo$1(OSGiLibBundleDeployerUtils.java:135)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:313)
at java.util.stream.StreamSpliterators$DistinctSpliterator.forEachRemaining(StreamSpliterators.java:1291)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747)
at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721)
at java.util.stream.AbstractTask.compute(AbstractTask.java:327)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
For this script:
#App:name("HelloKafka")
#App:description('Consume events from a Kafka Topic and publish to a different Kafka Topic')
#source(type='kafka',
topic.list='kafka_topic',
partition.no.list='0',
threading.option='single.thread',
group.id="group",
bootstrap.servers='localhost:9092',
#map(type='json'))
define stream SweetProductionStream (name string, amount double);
I have see error on Run command:
io.siddhi.core.exception.SiddhiAppCreationException: Error on 'HelloKafka' # Line: 10. Position: 26, near '#source(type='kafka',
topic.list='kafka_topic',
partition.no.list='0',
threading.option='single.thread',
group.id="group",
bootstrap.servers='localhost:9092',
#map(type='json'))'. org/apache/kafka/clients/producer/Producer
at io.siddhi.core.util.ExceptionUtil.populateQueryContext(ExceptionUtil.java:43)
at io.siddhi.core.util.parser.helper.DefinitionParserHelper.addEventSource(DefinitionParserHelper.java:388)
at io.siddhi.core.util.SiddhiAppRuntimeBuilder.defineStream(SiddhiAppRuntimeBuilder.java:117)
at io.siddhi.core.util.parser.SiddhiAppParser.defineStreamDefinitions(SiddhiAppParser.java:374)
at io.siddhi.core.util.parser.SiddhiAppParser.parse(SiddhiAppParser.java:230)
at io.siddhi.core.SiddhiManager.createSiddhiAppRuntime(SiddhiManager.java:85)
at io.siddhi.core.SiddhiManager.createSiddhiAppRuntime(SiddhiManager.java:95)
at io.siddhi.distribution.editor.core.internal.DebugRuntime.createRuntime(DebugRuntime.java:201)
at io.siddhi.distribution.editor.core.internal.DebugRuntime.(DebugRuntime.java:56)
at io.siddhi.distribution.editor.core.internal.DebugProcessorService.start(DebugProcessorService.java:38)
at io.siddhi.distribution.editor.core.internal.EditorMicroservice.start(EditorMicroservice.java:761)
at io.siddhi.distribution.editor.core.internal.EditorMicroservice.startWithVariables(EditorMicroservice.java:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.wso2.msf4j.internal.router.HttpMethodInfo.invokeResource(HttpMethodInfo.java:187)
at org.wso2.msf4j.internal.router.HttpMethodInfo.invoke(HttpMethodInfo.java:143)
at org.wso2.msf4j.internal.MSF4JHttpConnectorListener.dispatchMethod(MSF4JHttpConnectorListener.java:218)
at org.wso2.msf4j.internal.MSF4JHttpConnectorListener.lambda$onMessage$58(MSF4JHttpConnectorListener.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/kafka/clients/producer/Producer
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.newInstance(Class.java:412)
at io.siddhi.core.util.SiddhiClassLoader.loadClass(SiddhiClassLoader.java:32)
at io.siddhi.core.util.SiddhiClassLoader.loadExtensionImplementation(SiddhiClassLoader.java:48)
at io.siddhi.core.util.parser.helper.DefinitionParserHelper.addEventSource(DefinitionParserHelper.java:346)
... 21 more
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.clients.producer.Producer cannot be found by siddhi-io-kafka_5.0.7
at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:448)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:361)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:353)
at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:161)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
... 28 more
Can somebody tell me what am I doing wrong? :(
Please make sure you had the OSGi-converted jars to the "C:\Program Files\WSO2\Enterprise Integrator\7.0.2\streaming-integrator\lib".
The OSGi-converted jar list:
kafka_2.12_2.3.0_1.0.0
kafka_clients_2.3.0_1.0.0
metrics_core_2.2.0_1.0.0
scala_library_2.12.8_1.0.0
zkclient_0.11_1.0.0
zookeeper_3.4.14_1.0.0
The, copy the original jars to to the "C:\Program Files\WSO2\Enterprise Integrator\7.0.2\streaming-integrator\samples\sample-clients\lib"
The list of original jars:
kafka_2.12-2.3.0
kafka-clients-2.3.0
metrics-core-2.2.0
scala-library-2.12.8
zkclient-0.11
zookeeper-3.4.14
In order to generate the OSGi-converted jars, copy all original jars to a folder called "source" and create an empty folder called "destination". Then run the following command in the terminal:
MINGW32 /c/Program Files/WSO2/Enterprise Integrator/7.0.2/streaming-integrator/bin
$ ./jartobundle.sh C:/DevTools/source C:/DevTools/destination
Finally, distribute the OSGis and original in accordance with the directories above.
PS1: in my case i am using kafka_2.12-2.4.1, but the basename of the jars does not change.
PS2: adapt the directories to your installation path
For more details check WSO2 documentation: Kafka transport

Getting error while running scala test runner

Any idea on why am i getting this error while running a scala test,
java.lang.IllegalArgumentException: ERROR: -r has been deprecated for a very long time and is no longer supported, to prepare for reusing it for a different purpose in the near future. Please change all uses of -r to -C.
at org.scalatest.tools.ArgsParser$.checkArgsForValidity(ArgsParser.scala:41)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:857)
at org.scalatest.tools.Runner$.run(Runner.scala:850)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:141)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:32)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Is this an issue with my dependancies got updated itself? Because this test was ran earlier without any issues.

Why does %AddJar command in Bluemix give an error?

Sorry if this is a silly question. I have some simple code in a Scala Notebook in a Bluemix Spark instance. I try to add a jar from a github repository in the manner indicated in the tutorials (https://console.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index-gentopic2.html#developing_with_notebooks)
import scala.collection.breakOut
%AddJar https://github.com/IBM-Bluemix/cf-deployment-tracker-client-java/blob/master/dep-jar/com.ibm.json4j_1.0.9.jar
The Notebook output informs me that the download is finished but an exception is then thrown:
Starting download from https://github.com/IBM-Bluemix/cf-deployment-tracker-client-java/blob/master/dep-jar/com.ibm.json4j_1.0.9.jar
Finished download of com.ibm.json4j_1.0.9.jar
Out[30]:
Name: java.lang.NullPointerException
Message: null
StackTrace: scala.reflect.io.ZipArchive$.fromFile(ZipArchive.scala:36)
scala.reflect.io.ZipArchive$.fromFile(ZipArchive.scala:34)
scala.reflect.io.AbstractFile$.getDirectory(AbstractFile.scala:48)
scala.reflect.io.AbstractFile$.getDirectory(AbstractFile.scala:36)
scala.tools.nsc.Global.scala$tools$nsc$Global$$matchesCanonical$1(Global.scala:920)
scala.tools.nsc.Global$$anonfun$16.apply(Global.scala:924)
scala.tools.nsc.Global$$anonfun$16.apply(Global.scala:924)
scala.collection.Iterator$class.find(Iterator.scala:780)
scala.collection.AbstractIterator.find(Iterator.scala:1157)
scala.collection.IterableLike$class.find(IterableLike.scala:79)
scala.collection.AbstractIterable.find(Iterable.scala:54)
scala.tools.nsc.Global.scala$tools$nsc$Global$$assoc$1(Global.scala:924)
scala.tools.nsc.Global$$anonfun$17.apply(Global.scala:933)
scala.tools.nsc.Global$$anonfun$17.apply(Global.scala:933)
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
scala.tools.nsc.Global.invalidateClassPathEntries(Global.scala:933)
com.ibm.spark.interpreter.ScalaInterpreter.updateCompilerClassPath(ScalaInterpreter.scala:167)
com.ibm.spark.interpreter.ScalaInterpreter.addJars(ScalaInterpreter.scala:90)
com.ibm.spark.magic.builtin.AddJar.execute(AddJar.scala:121)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
java.lang.reflect.Method.invoke(Method.java:507)
com.ibm.spark.utils.DynamicReflectionSupport.invokeMethod(DynamicReflectionSupport.scala:106)
com.ibm.spark.utils.DynamicReflectionSupport.applyDynamic(DynamicReflectionSupport.scala:78)
com.ibm.spark.magic.MagicExecutor.executeMagic(MagicExecutor.scala:32)
com.ibm.spark.magic.MagicExecutor.applyDynamic(MagicExecutor.scala:21)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:52)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:57)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:59)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:61)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:63)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:65)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:67)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:69)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:71)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:73)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:75)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:77)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:79)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:81)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:83)
$line144.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:85)
$line144.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:87)
$line144.$read$$iwC$$iwC$$iwC.<init>(<console>:89)
$line144.$read$$iwC$$iwC.<init>(<console>:91)
$line144.$read$$iwC.<init>(<console>:93)
$line144.$read.<init>(<console>:95)
$line144.$read$.<init>(<console>:99)
$line144.$read$.<clinit>(<console>)
$line144.$eval$.<init>(<console>:7)
$line144.$eval$.<clinit>(<console>)
$line144.$eval.$print(<console>)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
java.lang.reflect.Method.invoke(Method.java:507)
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
com.ibm.spark.interpreter.ScalaInterpreter$$anonfun$interpretAddTask$1$$anonfun$apply$3.apply(ScalaInterpreter.scala:296)
com.ibm.spark.interpreter.ScalaInterpreter$$anonfun$interpretAddTask$1$$anonfun$apply$3.apply(ScalaInterpreter.scala:291)
com.ibm.spark.global.StreamState$.withStreams(StreamState.scala:80)
com.ibm.spark.interpreter.ScalaInterpreter$$anonfun$interpretAddTask$1.apply(ScalaInterpreter.scala:290)
com.ibm.spark.interpreter.ScalaInterpreter$$anonfun$interpretAddTask$1.apply(ScalaInterpreter.scala:290)
com.ibm.spark.utils.TaskManager$$anonfun$add$2$$anon$1.run(TaskManager.scala:123)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
java.lang.Thread.run(Thread.java:785)
I followed the tutorial pretty closely but it seems I have done something wrong. Any pointers would be much appreciated.
Thanks
You used the wrong URL to the jar, do not use blob.
Try the raw path instead:
%AddJar https://github.com/IBM-Bluemix/cf-deployment-tracker-client-java/raw/master/dep-jar/com.ibm.json4j_1.0.9.jar
I have tried it, you can add the jar, but you may face an issue with an assertion.
In case it does not work (previously used wrong URL), restart your kernel or use -f flag:
%AddJar https://github.com/IBM-Bluemix/cf-deployment-tracker-client-java/raw/master/dep-jar/com.ibm.json4j_1.0.9.jar -f

Why spark-shell fails with NullPointerException?

I try to execute spark-shell on Windows 10, but I keep getting this error every time I run it.
I used both latest and spark-1.5.0-bin-hadoop2.4 versions.
15/09/22 18:46:24 WARN Connection: BoneCP specified but not present in
CLASSPATH (or one of dependencies)
15/09/22 18:46:24 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/09/22 18:46:27 WARN ObjectStore: Version information not found in
metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
15/09/22 18:46:27 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
15/09/22 18:46:27 WARN : Your hostname, DESKTOP-8JS2RD5 resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:c0a8:103%net1, but we couldn't find any external IP address!
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init> (ClientWrapper.scala:171)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala :163)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:161)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:168)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
at $iwC$$iwC.<init>(<console>:9)
at $iwC.<init>(<console>:18)
at <init>(<console>:20)
at .<init>(<console>:24)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:132)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:159)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.sca la:108)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$proc ess$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$proc ess$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$proc ess$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scal a:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:559)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:534)
org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 56 more
<console>:10: error: not found: value sqlContext
import sqlContext.implicits._
^
<console>:10: error: not found: value sqlContext
import sqlContext.sql
^
I used Spark 1.5.2 with Hadoop 2.6 and had similar problems. Solved by doing the following steps:
Download winutils.exe from the repository to some local folder, e.g. C:\hadoop\bin.
Set HADOOP_HOME to C:\hadoop.
Create c:\tmp\hive directory (using Windows Explorer or any other tool).
Open command prompt with admin rights.
Run C:\hadoop\bin\winutils.exe chmod 777 /tmp/hive
With that, I am still getting some warnings, but no ERRORs and can run Spark applications just fine.
I was facing a similar issue, got it resolved by putting the winutil inside bin folder. The Hadoop_home should be set as C:\Winutils and winutil to be placed in C:\Winutils\bin.
Windows 10 64 bit Winutils are available in https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.0/bin
Also ensure that command line has administrative access.
Refer https://wiki.apache.org/hadoop/WindowsProblems
My guess is that you're running into https://issues.apache.org/jira/browse/SPARK-10528. I was seeing the same issue running on Windows 7. Initially I was getting the NullPointerException as you did. When I put winutils into the bin directory and set HADOOP_HOME to point to the Spark directory, I got the error described in the JIRA issue.
Or perhaps this link here below be easier to follow,
https://wiki.apache.org/hadoop/WindowsProblems
Basically download and copy winutils.exe to your spark\bin folder. Re-run spark-shell
If you have not set your /tmp/hive to a writable state, please do so.
You need to give permission to /tmp/hive directory to resolve this exception.
Hope you already have winutils.exe and set HADOOP_HOME environment variable. Then open the command prompt and run following command as administrator:
If winutils.exe is present in D:\winutils\bin location and \tmp\hive is also in D drive:
D:\winutils\bin\winutils.exe chmod 777 D:\tmp\hive
For more details,you can refer the following links :
Frequent Issues occurred during Spark Development
How to run Apache Spark on Windows7 in standalone mode
You can resolve this issue by placing mysqlconnector jar in spark-1.6.0/libs folder and restart it again.It works.
The important thing is here instead of running spark-shell you should do
spark-shell --driver-class-path /home/username/spark-1.6.0-libs-mysqlconnector.jar
Hope it should work.
For Python - Create a SparkSession in your python (This config section is only for Windows)
spark = SparkSession.builder.config("spark.sql.warehouse.dir", "C:/temp").appName("SparkSQL").getOrCreate()
Copy winutils.exe and keep in C:\winutils\bin and execute the bellow commands
C:\Windows\system32>C:\winutils\bin\winutils.exe chmod 777 C:/temp
Run command prompt in ADMIN mode ( Run as Administrator)
My issue was having other .exe's/Jars inside the winutils/bin folder. So I cleared all the others and was left with winutils.exe alone. Was using spark 2.1.1
Issue was resolved after installing correct Java version in my case its java 8 and setting the environmental variables. Make sure you run the winutils.exe to create a temporary directory as below.
c:\winutils\bin\winutils.exe chmod 777 \tmp\hive
Above should not return any error. Use java -version to verify the version of java you are using before invoking spark-shell.
In Windows, you need to clone "winutils"
git clone https://github.com/steveloughran/winutils.git
And
set var HADOOP_HOME to DIR_CLONED\hadoop-{version}
Remember to choose the version of your hadoop.
Setting SPARK_LOCAL_HOSTNAME as localhost (on Windows 10) resolved the problem for me