Unable to use Apache Commons CLI Option.builder() in Scala - scala

In a spark shell or application (written in Scala/maven build), I am unable to use the static builder method from the Apache Commons CLI package. I have confirmed that I am including the jar in the class path and have access to the Option class along with other classes in the package like Options, DefaultParser, etc. Why can I not use this public static method in Scala?
import org.apache.commons.cli.Option
val opt = Option.builder("foo").build()
error: value builder is not a member of object org.apache.commons.cli.Option
I can however see the static fields Option.UNINITIALIZED and Option.UNLIMITED_VALUES
using commons-cli 1.3.1
Scala version: 2.11.8
Spark version: 2.2.0
command to start the shell: spark-shell --jars .m2/repository/commons-cli/commons-cli/1.3.1/commons-cli-1.3.1.jar

Let me help you clarify your problem scenario.
You can open your .idea folder, find that it have some internal jar dependencies already, and of the list commons_cli exists, but 1.2 version.
This would lead to class collision.
The solution is straightforward, refer the doc, use the compatible constructor method.

Related

Why when Maven Build Works good but adding Spark Jar as external Jars gives a compile error “object Apache is not a member of package org”

On Eclipse, while setting up spark , even after adding external jars to build path to spark-2.4.3-bin-hadoop2.7/jars/<_all.jar>,
Complier complains about '“object apache is not a member of package org''
Yes, Building dependencies via Maven or SBT would fix it. A question is asked
scalac compile yields "object apache is not a member of package org"
But Question over here is , WHY the traditional way is failing like this ?
If we reffer here , Scala/Spark version compatibility We could see a similar issue. The problem is Scala is NOT backward compatible. Hence each Spark module is complied against specific Scala library. But when we run from eclipse, the eclipse Scala environment may not be compatible that particular scala version of which we have the Spark libraries set up.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD

Please note that I am better dataminer than programmer.
I am trying to run examples from book "Advanced analytics with Spark" from author Sandy Ryza (these code examples can be downloaded from "https://github.com/sryza/aas"),
and I run into following problem.
When I open this project in Intelij Idea and try to run it, I get error "Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/rdd/RDD"
Does anyone know how to solve this issue ?
Does this mean i am using wrong version of spark ?
First when I tried to run this code, I got error "Exception in thread "main" java.lang.NoClassDefFoundError: scala/product", but I solved it by setting scala-lib to compile in maven.
I use Maven 3.3.9, Java 1.7.0_79 and scala 2.11.7 , spark 1.6.1. I tried both Intelij Idea 14 and 15 different versions of java (1.7), scala (2.10) and spark, but to no success.
I am also using windows 7.
My SPARK_HOME and Path variables are set, and i can execute spark-shell from command line.
The examples in this book will show a --master argument to sparkshell, but you will need to specify arguments as appropriate for your environment. If you don’t have Hadoop installed you need to start the spark-shell locally. To execute the sample you can simply pass paths to local file reference (file:///), rather than a HDFS reference (hdfs://)
The author suggest an hybrid development approach:
Keep the frontier of development in the REPL, and, as pieces of code
harden, move them over into a compiled library.
Hence the samples code are considered as compiled libraries rather than standalone application. You can make the compiled JAR available to spark-shell by passing it to the --jars property, while maven is used for compiling and managing dependencies.
In the book the author describes how the simplesparkproject can be executed:
use maven to compile and package the project
cd simplesparkproject/
mvn package
start the spark-shell with the jar dependencies
spark-shell --master local[2] --driver-memory 2g --jars ../simplesparkproject-0.0.1.jar ../README.md
Then you can access you object within the spark-shell as follows:
val myApp = com.cloudera.datascience.MyApp
However if you want to execute the sample code as Standalone application and execute it within idea you need to modify the pom.xml.
Some of dependencies are required for compilation, but are available in an spark runtime environment. Therefore these dependencies are marked with scope provided in the pom.xml.
<!--<scope>provided</scope>-->
you can remake the provided scope, than you will be able to run the samples within idea. But you can not provide this jar as dependency for the spark shell anymore.
Note: using maven 3.0.5 and Java 7+. I had problems with maven 3.3.X version with the plugin versions.

eclipse(set with scala envirnment) : object apache is not a member of package org

As shown in image, its giving error when i am importing the Spark packages. Please help. When i hover there, it shows "object apache is not a member of package org".
I searched on this error, it shows spark jars has not been imported. So, i imported "spark-assembly-1.4.1-hadoop2.2.0.jar" too. But still same error.Below is what i actually want to run:
import org.apache.spark.{SparkConf, SparkContext}
object ABC {
def main(args: Array[String]){
//Scala Main Method
println("Spark Configuration")
val conf = new SparkConf()
conf.setAppName("My First Spark Scala Application")
conf.setMaster("spark://ip-10-237-224-94:7077")
println("Creating Spark Context")
}
}
Adding spark-core jar in your classpath should resolve your issue. Also if you are using some build tools like Maven or Gradle (if not then you should because spark-core has lot many dependencies and you would keep getting such problem for different jars), try to use Eclipse task provided by these tools to properly set classpath in your project.
I was also receiving the same error, in my case it was compatibility issue. As Spark 2.2.1 is not compatible with Scala 2.12(it is compatible with 2.11.8) and my IDE was supporting Scala 2.12.3.
I resolved my error by
1) Importing the jar files from the basic folder of Spark. During the installation of Spark in our C drive we have a folder named Spark which contains Jars folder in it. In this folder one can get all the basic jar files.
Goto to Eclipse right click on the project -> properties-> Java Build Path. Under 'library' category we will get an option of ADD EXTERNAL JARs.. Select this option and import all the jar files of 'jars folder'. click on Apply.
2) Again goto properties -> Scala Compiler ->Scala Installation -> Latest 2.11 bundle (dynamic)*
*before selecting this option one should check the compatibility of SPARK and SCALA.
The problem is Scala is NOT backward compatible. Hence each Spark module is complied against specific Scala library. But when we run from eclipse, we have one SCALA VERSION which was used to compile and create the spark Dependency Jar which we add to the build path, and SECOND SCALA VERSION is there as the eclipse run time environment. Both may conflict.
This is a hard reality, although, we wish Scala to be ,backward compatible. Or at least a complied jar file created could be backward compatible.
Hence, the recommendation is , use Maven or similar where dependency version can be managed.
If you are doing this in the context of Scala within a Jupyter Notebook, you'll get this error. You have to install the Apache Toree kernel:
https://github.com/apache/incubator-toree
and create your notebooks with that kernel.
You also have to start the Jupyter Notebook with:
pyspark

How to use Phantom in Scala IDE

I want to use phantom with my scala IDE.So for this i clone the git hub repository and created a .jar file of phantom using sbt -> compile -> package.I add this .jar file to build path in my Scala IDE but still while importing
import com.websudos.phantom.connectors._
is throwing error that
object connector is not a member of com.websudos.phantom.
While using auto complete function of scala ide it is showing only the import for
import com.websudos.phantom.example
.I don't know if the jar files got created for example then why it is not created for other.
I search in internet but all other option are given as to add dependency in sbt build path but i dont want to use it.
Use sbt-assebly instead to create a fat jar.
https://github.com/sbt/sbt-assembly

class java.lang.RuntimeException/Scala class file does not contain Scala annotation

I'm trying to execute a scala code using scala 2.10.2, the code uses some jar libraries compiled with sbt.
I get the following error:
scala: error while loading Order, class file '..\prestashop-scala-client-0.2.4\target\prestasac-0.2.4.jar(co/orderly/prestasac/representations/Order.class)' is broken
(class java.lang.RuntimeException/Scala class file does not contain Scala annotation)
Sources of the prestasac-0.2.4.jar are on github: Order.class
Is there something to do to fix this issue ?
Thank you
Looks like the library is configured to compile against Scala 2.9.1. Major versions of Scala are not binary compatible.
I put the necessary SBT changes here: https://github.com/mpartel/prestashop-scala-client/commit/e9a1df40bfe35518aaebac899e438b9b6fa6d728