Im trying to use dbutils in scala spark. Im submitting this job on databricks using spark submit. But, Im getting null pointer exception.
import com.databricks.dbutils_v1.DBUtilsHolder.dbutils
try{
val s3_ls = dbutils.fs.ls(targetS3Dir)
}
catch{
case e: Exception =>
logger.error(e)
}
I have added the following dependancy in build.sbt,
"com.databricks" %% "dbutils-api" % "0.0.4"
Im even adding com.databricks:dbutils-api:0.0.4 in --packages in spark-submit.
Im building a jar and passing it in the spark submit command. Im still getting the null pointer exception error.
Is there anything Im missing here?
This library is just a placeholder so you can compile your code locally (and you need to mark this dependency as provided), but it doesn't contain actual code. You don't need to include it when submitting a job, because this jar is a part of the Databricks Runtime.
Related
I am using spark 3.0.1 in my kotlin project. Compilation fails with the following error:
e: org.jetbrains.kotlin.util.KotlinFrontEndException: Exception while analyzing expression at (51,45) in /home/user/project/src/main/kotlin/ModelBuilder.kt
...
Caused by: java.lang.IllegalStateException: No parameter with index 0-0 (name=reverser$module$1 access=16) in method scala.collection.TraversableOnce.reverser$2
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.AnnotationsAndParameterCollectorMethodVisitor.visitParameter(Annotations.kt:48)
at org.jetbrains.org.objectweb.asm.ClassReader.readMethod(ClassReader.java:1149)
at org.jetbrains.org.objectweb.asm.ClassReader.accept(ClassReader.java:680)
at org.jetbrains.org.objectweb.asm.ClassReader.accept(ClassReader.java:392)
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.BinaryJavaClass.<init>(BinaryJavaClass.kt:77)
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.BinaryJavaClass.<init>(BinaryJavaClass.kt:40)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl.findClass(KotlinCliJavaFileManagerImpl.kt:115)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl.findClass(KotlinCliJavaFileManagerImpl.kt:85)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl$findClass$$inlined$getOrPut$lambda$1.invoke(KotlinCliJavaFileManagerImpl.kt:113)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl$findClass$$inlined$getOrPut$lambda$1.invoke(KotlinCliJavaFileManagerImpl.kt:48)
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.ClassifierResolutionContext.resolveClass(ClassifierResolutionContext.kt:60)
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.ClassifierResolutionContext.resolveByInternalName$frontend_java(ClassifierResolutionContext.kt:101)
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.BinaryClassSignatureParser$parseParameterizedClassRefSignature$1.invoke(BinaryClassSignatureParser.kt:141)
I've cleaned/rebuilt the project several times, removed the build directory and tried building from the command line with gradle.
The code where this happens:
val data = listOf(...)
val schema = StructType(arrayOf(
StructField("label", DataTypes.DoubleType, false, Metadata.empty()),
StructField("sentence", DataTypes.StringType, false, Metadata.empty())
))
val dataframe = spark.createDataFrame(data, schema) // <- offending line.
Was using kotlin version 1.4.0, upgraded to 1.4.10 without any change, still same error.
Looks like this bug (and this) already reported to JetBrains, but is it really not possible to use spark 3 (local mode) in kotlin 1.4?
I managed to get it working with Spring Boot (2.3.5) by adding the following to the dependencyManagement:
dependencies {
dependencySet("org.scala-lang:2.12.10") {
entry("scala-library")
}
}
This will downgrade the scala-library jar from 2.12.12 to 2.12.10 version, which is the same version of the scala-reflect jar in my project. I'm also using Kotlin 1.4.10
Are you trying to use this API?
https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/SparkSession.html#createDataFrame-java.util.List-java.lang.Class-
There is no method which takes java.util.List as well as Schema object AFAIK..
I am trying to use an external jar in my code. It is working fine in my local eclipse set-up as I have added the jar as Referenced Libraries, however when I try to create a Jar of my code and deploy it in Azure Data Factory - Spark Activity, it is not able to get the external jar in cluster(driver-node or executor-node) I am not sure.
I have created the spark session as below :-
val spark = SparkSession.builder()
.appName("test_cluster")
.master("yarn")
.getOrCreate()
And I have added the external Jar later in code as below :-
spark.sparkContext.addJar(jarPath)
spark.conf.set("spark.driver.extraClassPath", jarPath)
spark.conf.set("spark.executor.extraClassPath", jarPath)
Please let me know where I am going wrong as I am getting the below error message :-
java.lang.ClassNotFoundException: Failed to find data source
I am trying to create a Spark Session in Unit Test case using the below code
val spark = SparkSession.builder.appName("local").master("local").getOrCreate()
but while running the tests, I am getting the below error:
java.lang.ClassNotFoundException: org.apache.hadoop.fs.GlobalStorageStatistics$StorageStatisticsProvider
I have tried to add the dependency but to no avail. Can someone point out the cause and the solution to this issue?
It can be because of two reasons.
1. You may have incompatible versions of spark and Hadoop stacks. For example, HBase 0.9 is incompatible with spark 2.0. It will result in the class/method not found exceptions.
2. You may have multiple version of the same library because of dependency hell. You may need to run the dependency tree to make sure this is not the case.
I am receiving error while building my spark application (scala) in IntelliJ IDE.
It is a simple application with uses Kafka Stream for further processing. I have added all the jars and the IDE does not show any unresolved import or code statements.
However, when I try to build the artifact, I get two errors stating that
Error:(13, 35) object kafka is not a member of package
org.apache.spark.streaming
import org.apache.spark.streaming.kafka.KafkaUtils
Error:(35, 60) not found: value KafkaUtils
val messages: ReceiverInputDStream[(String, String)] = KafkaUtils.createStream(streamingContext,zkQuorum,"myGroup",topics)
I have seen similar questions but most of the ppl complain about this issue while submitting to spark. However, I one step behind that and merely building the jar file which would be submitted ultimately to spark. On top I am using IntelliJ IDE and a bit new to spark and scala; lost here.
Below is the snapshot of the IntelliJ Error
IntelliJ Error
Thanks
Omer
The reason is that you need to add spark-streaming-kafka-K.version-Sc.version.jar to your pom.xml and as well as your spark lib directory.
I am new to Scala and am trying to code read a file using the following code
scala> val textFile = sc.textFile("README.md")
scala> textFile.count()
But I keep getting the following error
error: not found: value sc
I have tried everything, but nothing seems to work. I am using Scala version 2.10.4 and Spark 1.1.0 (I have even tried Spark 1.2.0 but it doesn't work either). I have sbt installed and compiled yet not able to run sbt/sbt assembly. Is the error because of this?
You should run this code using ./spark-shell. It's scala repl with provided sparkContext. You can find it in your apache spark distribution in folder spark-1.4.1/bin.