Spark Coalesce error - no such method - scala

i got this error. I'm not sure why this is the case because there is a coalesce method in org.apache.spark.rdd.RDD.
Any ideas?
Am I running a incompatible version of Spark and org.apache.spark.rdd.RDD?
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.rdd.RDD.coalesce$default$3(IZ)Lscala/math/Ordering;

It was because some part of your code or project dependencies called old version(spark version before 2.0.0) spark API 'coalesce' while in new version spark this API has been removed and replaced by 'repartition'.
To fix this problem, you could either downgrade your spark run environment to below version 2.0.0, or you can upgrade your SDK spark version to above 2.0.0 and upgrade project dependencies version to be compatible with spark 2.0.0 or above.
For more details please see this thread:
https://github.com/twitter/algebird/issues/549
https://github.com/EugenCepoi/algebird/commit/0dc7d314cba3be588897915c8dcfb14964933c31

As I suspected, this is a library compatibility issue. Everything works (no code change) after downgrading Spark alone.
Before:
scala 2.11.8
spark 2.0.1
Java 1.8.0_92
After
scala 2.11.8
spark 1.6.2
Java 1.8.0_92
OS: OSX 10.11.6

Related

java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V - Flink on EMR

I am trying to run a Flink (v 1.13.1) application on EMR ( v
5.34.0).
My Flink application uses Scallop(v 4.1.0) to parse the arguments passed.
Scala version used for Flink application is
2.12.7.
I keep getting below error when I submit the flink application to the cluster. Any clue or help is highly appreciated.
java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
at org.rogach.scallop.Scallop.<init>(Scallop.scala:63)
at org.rogach.scallop.Scallop$.apply(Scallop.scala:13)
Resolved the issue by downgrading Scala to 2.11. Flink 1.13.1 Scala shell REPL on EMR mentioned scala version 2.11.12 hence downgraded to that version of Scala and this problem has disappeared.

Not a version: 9 exception with Scala 2.11.12

The Scala application with Scala 2.11.12 is throwing following error while executing certain set of code
The environment configurations are as follow:
Scala IDE with Eclipse: version 4.7
Eclipse Version: 2019-06 (4.12.0)
Spark Version: 2.4.4
Java Version: "1.8.0_221"
However the same set of configuration is working fine in eclipse IDE with Scala version 2.11.11
Exception in thread "main" java.lang.NumberFormatException: Not a version: 9
at scala.util.PropertiesTrait$class.parts$1(Properties.scala:184)
at scala.util.PropertiesTrait$class.isJavaAtLeast(Properties.scala:187)
at scala.util.Properties$.isJavaAtLeast(Properties.scala:17)
at scala.tools.util.PathResolverBase$Calculated$.javaBootClasspath(PathResolver.scala:276)
at scala.tools.util.PathResolverBase$Calculated$.basis(PathResolver.scala:283)
at scala.tools.util.PathResolverBase$Calculated$.containers$lzycompute(PathResolver.scala:293)
at scala.tools.util.PathResolverBase$Calculated$.containers(PathResolver.scala:293)
at scala.tools.util.PathResolverBase.containers(PathResolver.scala:309)
at scala.tools.util.PathResolver.computeResult(PathResolver.scala:341)
at scala.tools.util.PathResolver.computeResult(PathResolver.scala:332)
at scala.tools.util.PathResolverBase.result(PathResolver.scala:314)
at scala.tools.nsc.backend.JavaPlatform$class.classPath(JavaPlatform.scala:28)
at scala.tools.nsc.Global$GlobalPlatform.classPath(Global.scala:115)
at scala.tools.nsc.Global.scala$tools$nsc$Global$$recursiveClassPath(Global.scala:131)
at scala.tools.nsc.Global.classPath(Global.scala:128)
at scala.tools.nsc.backend.jvm.BTypesFromSymbols.<init>(BTypesFromSymbols.scala:39)
at scala.tools.nsc.backend.jvm.BCodeIdiomatic.<init>(BCodeIdiomatic.scala:24)
at scala.tools.nsc.backend.jvm.BCodeHelpers.<init>(BCodeHelpers.scala:23)
at scala.tools.nsc.backend.jvm.BCodeSkelBuilder.<init>(BCodeSkelBuilder.scala:25)
at scala.tools.nsc.backend.jvm.BCodeBodyBuilder.<init>(BCodeBodyBuilder.scala:25)
at scala.tools.nsc.backend.jvm.BCodeSyncAndTry.<init>(BCodeSyncAndTry.scala:21)
at scala.tools.nsc.backend.jvm.GenBCode.<init>(GenBCode.scala:47)
at scala.tools.nsc.Global$genBCode$.<init>(Global.scala:675)
at scala.tools.nsc.Global.genBCode$lzycompute(Global.scala:671)
at scala.tools.nsc.Global.genBCode(Global.scala:671)
at scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.serialVUID(GenASM.scala:1240)
at scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genClass(GenASM.scala:1329)
at scala.tools.nsc.backend.jvm.GenASM$AsmPhase.emitFor$1(GenASM.scala:198)
at scala.tools.nsc.backend.jvm.GenASM$AsmPhase.run(GenASM.scala:204)
at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1528)
at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1513)
at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.wrapInPackageAndCompile(ToolBoxFactory.scala:197)
at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$ToolBoxGlobal.compile(ToolBoxFactory.scala:252)
at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:429)
at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$$anonfun$compile$2.apply(ToolBoxFactory.scala:422)
at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.liftedTree2$1(ToolBoxFactory.scala:355)
at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl$withCompilerApi$.apply(ToolBoxFactory.scala:355)
at scala.tools.reflect.ToolBoxFactory$ToolBoxImpl.compile(ToolBoxFactory.scala:422)
at com.slb.itdataplatform.dq.DataQualityValidation$$anonfun$compile$1.apply(DataQualityValidation.scala:112)
at scala.util.Try$.apply(Try.scala:192)
at com.slb.itdataplatform.dq.DataQualityValidation$.compile(DataQualityValidation.scala:109)
at com.slb.itdataplatform.dq.DataQualityValidation$.generateVerifier(DataQualityValidation.scala:104)
at com.slb.itdataplatform.dq.DataQualityValidation$.main(DataQualityValidation.scala:49)
at com.slb.itdataplatform.dq.DataQualityValidation.main(DataQualityValidation.scala)
I can work on the same set of environment configuration but Spark 2.4.4 underlined Scala version is 2.11.12 hence I want to use the same in my application to avoid any conflicts.(As my spark apps are not initializing Unable to initialize Spark job)
What could be the possible root cause for this error and how it can be resolved?

Should I use Scala 2.11.0 or 2.11.8 with Spark 2.3.0?

I'm trying to upgrade the Spark version from 1.6.2 to 2.3.0. I'm currently using Scala version 2.10, but I need to upgrade to Scala version 2.11.x since Scala 2.10 is no longer supported. My question is which sub-version should I upgrade to? I'm struggling to determine which sub-version to use.
I can't find anything that compares the different sub versions of Scala, but I have encountered some entries that recommend using Scala version 2.11.8 and that there was a bug with Scala 2.11.0 with using Spark (not sure how true that is).
What are your experiences and which sub versions do you recommend? Thanks!
As the documentation says, you can use a compatible version (2.11.x).
So just use the latest version (2.11.12 as of today).

Upgrading from Spark 1.6 to 2.1 - Incompatible Class Change Error

I'm upgrading from Spark 1.6 to 2.1 version (HortonWorks Distribution). Below explaining Stage 1 and Stage 2 scenario, Stage 1 with successful execution and Stage 2 failing.
Stage 1
POM xml dependencies for Spark 1.6.3 (which is working absolutely fine) are,
scala tools version 2.10
scala version 2.10.5
scala compiler version 2.10.6
spark-core_2.10
spark-sql_2.10
spark version 1.6.3
There is common set of libraries used which are -
commons-csv-1.4.jar
commons-configuration2-2.1.1.jar
commons-beanutils-1.9.2.jar
commons-email-1.4.jar
javax.mail-1.5.2.jar
sqoop-1.4.6.2.3.0.12-7.jar
avro-mapred-1.8.2.jar
avro-1.8.2.jar
guava-14.0.jar
commons-logging-1.1.3.jar
jackson-module-scala_2.10-2.4.4.jar
jackson-databind-2.4.4.jar
jackson-core-2.4.4.jar
xdb6.jar
jackson-mapper-asl-1.9.13.jar
ojdbc7-12.1.0.2.jar
Stage 2
As I change the dependencies and spark version in POM.xml to -
scala tools version 2.11
scala version 2.11.8
scala compiler version 2.11.8
spark-core_2.11
spark-sql_2.11
spark version 2.1.0
Also, the from the common set of libraries, I'm just changing -
jackson-module-scala_2.11-2.6.5.jar
jackson-core-2.6.5.jar
jackson-databind-2.6.5.jar
While, I take a build and try to run it on the cluster with Spark configuration as 2.1 and scala as 2.11.8, it is failing with below error.
INFO: Exception in thread "main" java.lang.IncompatibleClassChangeError:
class com.xxx.xxx.xxx.DQListener has interface
org.apache.spark.scheduler.SparkListener as super class
I'm sure that the issue is with importing correct jars but not able to find out which one. I would appreciate if anybody could help to resolve this please.

Spark Kafka - Issue while running from Eclipse IDE

I am experimenting with Spark Kafka integration. And I want to test the code from my eclipse IDE. However, I got below error:
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at kafka.utils.Pool.<init>(Pool.scala:28)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.<init>(FetchRequestAndResponseStats.scala:60)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.<clinit>(FetchRequestAndResponseStats.scala)
at kafka.consumer.SimpleConsumer.<init>(SimpleConsumer.scala:39)
at org.apache.spark.streaming.kafka.KafkaCluster.connect(KafkaCluster.scala:52)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:345)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:342)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at org.apache.spark.streaming.kafka.KafkaCluster.org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:342)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:125)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:403)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:532)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at com.capiot.platform.spark.SparkTelemetryReceiverFromKafkaStream.executeStreamingCalculations(SparkTelemetryReceiverFromKafkaStream.java:248)
at com.capiot.platform.spark.SparkTelemetryReceiverFromKafkaStream.main(SparkTelemetryReceiverFromKafkaStream.java:84)
UPDATE:
The versions that I am using are:
scala - 2.11
spark-streaming-kafka- 1.4.1
spark - 1.4.1
Can any one resolve the issue? Thanks in advance.
You have the wrong version of Scala. You need 2.10.x per
https://spark.apache.org/docs/1.4.1/
"For the Scala API, Spark 1.4.1 uses Scala 2.10."
Might be late to help OP, but when using kafka streaming with spark, you need to make sure that you use the right jar file.
For example, in my case, I have scala 2.11 (the minimum required for spark 2.0 which im using), and given that kafka spark requires the version 2.0.0 I have to use the artifact spark-streaming-kafka-0-8-assembly_2.11-2.0.0-preview.jar
Notice my scala version and the artifact version can be seen at 2.11-2.0.0
Hope this helps (someone)
Hope that helps.