Spark and Hbase Integration version compatibility? - scala

I am trying to integrate Spark 3.2 with Hbase 2.0. I am currently using hadoop 3.2.2. Can somebody tell me which version of Hbase is compatible with Spark 3.2?

Related

Running scala 2.12 on emr 5.29.0

I have a jar file compiled in scala 2.12 and now I want to run it on emr 5.29.0. How do I run them as the default version of emr 5.29.0 is scala 2.11.
As per this thread in AWS Forum, all Spark versions on EMR are built with Scala 2.11 as it's the stable version:
On EMR, Spark is built with Scala-2.11.x, which is currently the
stable version. As per-
https://spark.apache.org/releases/spark-release-2-4-0.html ,
Scala-2.12 is still under experimental support. Our service team is
already aware of this feature request, and they shall be adding
Scala-2.12.0 support in coming releases, once it becomes stable.
So you'll have to wait until they add support on future EMR releases or you may want to build a Spark with Scala 2.12 and install it on EMR. See Building and Deploying Custom Applications with Apache Bigtop and Amazon EMR and Building a Spark Distribution for EMR.
UPDATE:
Since Release 6.0.0, Scala 2.12 can be used with Spark on EMR:
Changes, Enhancements, and Resolved Issues
Scala
Scala 2.12 is used with Apache Spark and Apache Livy.
Just an Idea, if waiting is not the option!
Is it possible to package the latest scala jars with the application with appropriate maven scope defined and point those packages with the spark property
--properties spark.jars.repositories ??
Maybe you'll have to figure out a way to transfer the jars to the driver node. If s3 is an option that can be used as intermediatory storage.

how to read an ORC file in scala without hadoop ecosystem

I have snappy.orc formatted data in a S3 bucket that i need to read into a scala application.
I don't use non of the hadoop ecosystem components.
is there any way to read the file into the app without using hadoop?
I am using windows 10, JDK 1.8, scala-version 2.12.8

Compatibility of Apache Spark 2.3.1 and 2.0.0

I would like to use an application developed with Apache Spark 2.0.0 (GitHub repo here) but I only have Spark 2.3.1 installed on my iMac (it seems to be the only one supported by homebrew at the moment). I can successfully compile it with sbt assembly but then when I run the first example given here I get the following error:
java.lang.NoSuchMethodError: breeze.linalg.DenseVector$.canDotD()Lbreeze/generic/UFunc$UImpl2;
Is this a compatibility issue between the two different versions of Scala-breeze used by Spark 2.0.0 and Spark 2.3.1. Is there a way to easily change the code in order to be able to use it with Spark 2.3.1? (I have never used scala before)
It probably is.
You can always manually download required version of Apache Spark (not by homebrew, but by downloading tar.gz archive from official page and just extracting it).

How to run Spark application assembled with Spark 1.6 on cluster with Spark 2.0?

I'm trying to execute an application that is assembled with Spark 1.6 in a EMR cluster that has Spark 2.0 installed.
Is that possible?

Spark 2.0 and Scala 2.11 on Dataproc?

Spark 2.0 final was released last week - when can we expect to see it on Dataproc? Spark 2.0 uses Scala 2.11 as default - does this mean that Dataproc will use Scala 2.11?
Scala 2.11 is actually the feature we are most eagerly awaiting. Scala 2.10 is getting old and we have a couple of libraries we would like to use that we cannot because they don't support 2.10.
To follow this up, Cloud Dataproc 1.1, newly released this week, has Spark 2.0 and Scala 2.11 and is intended for production uses. See the release notes for more details!
Dataproc has a preview image version with Spark 2.0 (currently an earlier RC) and Scala 2.11, which you can try out today.
It is not supported for production use cases, because it is less vetted and we will push breaking changes as we refine the preview into a new fully supported image version. When it is fully stabilized we will ship a new image version with Spark 2 and Scala 2.11.