saveToCassandra with spark-cassandra connector throws java.lang.ClassCastException

saveToCassandra with spark-cassandra connector throws java.lang.ClassCastException - scala

When trying to save data to Cassandra(in Scala), I get the following exception:
java.lang.ClassCastException:
com.datastax.driver.core.DefaultResultSetFuture cannot be cast to
com.google.common.util.concurrent.ListenableFuture
Please note that I do not get this error every time, but it comes up randomly once in a while which makes it more dangerous in production.
I am using YARN and I have shaded com.google.** to avoid the Guava symbol clash.
Here's the code snippet:
rdd.saveToCassandra(keyspace,"movie_attributes", SomeColumns("movie_id","movie_title","genre"))
Any help would be much appreciated.
UPDATE
Adding details from the pom file as requested:
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.10</artifactId>
<version>1.5.0</version>
</dependency>
**Shading guava**
<relocation> <!-- Conflicts between Cassandra Java driver and YARN -->
<pattern>com.google</pattern>
<shadedPattern>oryx.com.google</shadedPattern>
<includes>
<include>com.google.common.**</include>
</includes>
</relocation>
Spark version: 1.5.2
Cassandra version: 2.2.3

Almost everyone who works on C* and Spark has seen these type of errors. The root cause is explained here.
C* driver depends on a relatively new version of guava while Spark depends on an older guava. To solve this before connector 1.6.2, you need to explicitly embed C* driver and guava with your application.
Since 1.6.2 and 2.0.0-M3, by default connector ships with the correct C* driver and guava shaded. So you should be OK with just connector artifact included in your project.
Things get tricky if your Spark application uses other libraries that depend on C* driver. Then you will have to manually include un-shaded version of connector, correct C* driver and shaded guava and deploy a fat jar. You essentially make your own connector package. In this case, you can't use --package to launch Spark cluster anymore.
tl;dr
use connector 1.6.2/2.0.0-M3 or above. 99% you should be OK.

Related

AWS Glue - AWSGlueETL dependency not resolved

I am trying to run Glue in my local using scala, so I added the below dependency as per the AWS Glue documentation(https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html)
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>AWSGlueETL</artifactId>
<version>2.0</version>
<!-- A "provided" dependency, this will be ignored when you package your application -->
</dependency>
But this dependency is not found(not resolving)
Please let me know if this dependency moved to some other name.
Thank you
AWSGlueETL should be resolving in pom.xml

I found that https://aws-glue-etl-artifacts.s3.amazonaws.com/ has only 3.0.0, 0.9.0 and 1.0.0 deps. There is no 2.0.0 published version. I found this issue related to this. https://github.com/awslabs/aws-glue-libs/issues/15

[io.r2dbc.spi.ConnectionFactory]: java.lang.NoSuchFieldError: LOCK_WAIT_TIMEOUT

My Java Spring Boot project fails at startup with the given error:
Failed to instantiate [io.r2dbc.spi.ConnectionFactory]: Factory method 'connectionFactory' threw exception; nested exception is java.lang.NoSuchFieldError: LOCK_WAIT_TIMEOUT
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-r2dbc</artifactId>
<version>1.4.3</version>
</dependency>
<dependency>
<groupId>com.oracle.database.r2dbc</groupId>
<artifactId>oracle-r2dbc</artifactId>
<version>0.4.0</version>
</dependency>
How do I fix it?

tl;dr: As of this writing, use oracle-r2dbc 0.1.0 in Spring projects.
oracle-r2dbc 0.4.0 depends on a newer version of io.r2dbc:r2dbc-spi than spring-data-r2dbc 1.4.3.
You may have to clear your build tool or IDE's cache for the change to take effect. (In my case mvn clean and IntelliJ "Invalidate Caches...")
Btw, when I downgraded, I think it broke using descriptors in the rdbc URL so I had to switch to a URL like this: r2dbc:oracle://<host>:<port>/<service-name>
Update, just saw on Oracle's driver docs for r2dbc, it says:
Use the 0.1.0 version of Oracle R2DBC if you are programming with
Spring. The later versions of Oracle R2DBC implement the 0.9.x
versions of the R2DBC SPI. Currently, Spring only supports drivers
that implement the 0.8.x versions of the SPI.
https://github.com/oracle/oracle-r2dbc
DO NOT USE 0.2.0 with Spring - even worse than no error, queries just hang forever. I spent over a day trying to debug it. Not only I had the wrong version, but when I actually did try changing the version, my IDE was caching the old version out of sync with Maven.

Scala no class def error on running Spark program with Kafka integrations

I have the following dependency versions
<storm.version>1.0.2</storm.version>
<kafka.version>0.9.0.1</kafka.version> <!-- 0.8.2.2/0.9.0.1 -->
<spark.version>1.6.2</spark.version>
<scala.lib.version>2.11.7</scala.lib.version>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
I use eclipse Luna as my runtime env.
However, when I run my spark program (with kafka), I see the following error
Exception in thread "main" java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at kafka.utils.Pool.<init>(Pool.scala:28)
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.<init>(FetchRequestAndResponseStats.scala:60)
It seems that this is problem because of scala version w.r.t kafka and spark. Ive tried updating scala versions 2.10.6, 2.11.7, 2.11.8 in the pom, but of no difference. Any suggestions please?
Thanks

Index out of bounds error when doing dataframe union on bzip2 csv data

Problem is pretty weird. If I work with the uncompressed file, there is no issue. But, if I work with the compressed bz2 file, I get an index out of bounds error.
From what I've read, it apparently is spark-csv parser that doesn't detect the end of line character and reads the whole thing as a huge line. The fact that it works on the uncompressed csv but not the .csv.bz2 file is pretty weird to me.
Also, like I said, it only happens when doing a dataframe union. I tried to do rdd union with the spark context, same error.

My whole problem was that I was using Scala-IDE. I thought I was using hadoop 2.7 but I didn't run mvn eclipse:eclipse to update my m2_repo, so I was still using the hadoop 2.2 (in the referenced libraries, since spark core latest version references hadoop 2.2 by default, I don't know why).
All in all, for future reference, if you plan on using spark-csv, don't forget to specify the hadoop version in your pom.xml even though spark-core references a version of hadoop by itself.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.1</version>
</dependency>

java.lang.NoClassDefFoundError: scala/reflect/ClassManifest

I am getting an error when trying to run an example on spark. Can anybody please let me know what changes do i need to do to my pom.xml to run programs with spark.

Currently Spark only works with Scala 2.9.3. It does not work with later versions of Scala. I saw the error you describe when I tried to run the SparkPi example with SCALA_HOME pointing to a 2.10.2 installation. When I pointed SCALA_HOME at a 2.9.3 installation instead, things worked for me. Details here.

You should add dependecy for scala-reflect to your maven build:
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-reflect</artifactId>
<version>2.10.2</version>
</dependency>

Ran into the same issue using the Scala-Redis 2.9 client (incompatible with Scala 2.10) and including a dependency to scala-reflect does not help. Indeed, scala-reflect is packaged as its own jar but does not include the Class missing which is deprecated since Scala 2.10.0 (see this thread).
The correct answer is to point to an installation of Scala which includes this class (In my case using the Scala-Redis client, the answer of McNeill helped. I pointed to Scala 2.9.3 using SBT and everything worked as expected)

In my case, the error is raised in Kafka's api. I change the dependency from
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.9.2</artifactId>
<version>0.8.1.1</version>
</dependency>
to
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.1</version>
</dependency>
fixed the problem.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

saveToCassandra with spark-cassandra connector throws java.lang.ClassCastException - scala

Related

AWS Glue - AWSGlueETL dependency not resolved

[io.r2dbc.spi.ConnectionFactory]: java.lang.NoSuchFieldError: LOCK_WAIT_TIMEOUT

Scala no class def error on running Spark program with Kafka integrations

Index out of bounds error when doing dataframe union on bzip2 csv data

java.lang.NoClassDefFoundError: scala/reflect/ClassManifest

Categories

Resources