How to add spark-core-assembly-0.7.0.jar to classpath in Ubuntu to run a Spark project - scala

I am new to Spark. I am trying to run a simple spark project in local system.
So based on tutorials I have run 'sbt/sbt assembly'. Now jar file is created in core/target/scala-2.9.2/spark-core-assembly-0.7.0.jar. To run samples could you please tell where and how I have to add this jar to classpath?
Regards,
Dinesh

The Spark documentation's quick start guide has documentation on developing standalone applications using Spark with Scala and Java. Those instructions show how to add a Spark dependency to your Maven or SBT projects.
If you're not using Maven or SBT to build your project, you'll have to pass the appropriate flags to javac and java to add the Spark assembly JAR to your classpath, the same as you'd do for any other JAR dependency.
As an aside, 0.7.0 is a pretty old version of Spark (it was released almost a year ago); I'd recommend using a newer version, such as 0.9.0.

Related

Running scala 2.12 on emr 5.29.0

I have a jar file compiled in scala 2.12 and now I want to run it on emr 5.29.0. How do I run them as the default version of emr 5.29.0 is scala 2.11.
As per this thread in AWS Forum, all Spark versions on EMR are built with Scala 2.11 as it's the stable version:
On EMR, Spark is built with Scala-2.11.x, which is currently the
stable version. As per-
https://spark.apache.org/releases/spark-release-2-4-0.html ,
Scala-2.12 is still under experimental support. Our service team is
already aware of this feature request, and they shall be adding
Scala-2.12.0 support in coming releases, once it becomes stable.
So you'll have to wait until they add support on future EMR releases or you may want to build a Spark with Scala 2.12 and install it on EMR. See Building and Deploying Custom Applications with Apache Bigtop and Amazon EMR and Building a Spark Distribution for EMR.
UPDATE:
Since Release 6.0.0, Scala 2.12 can be used with Spark on EMR:
Changes, Enhancements, and Resolved Issues
Scala
Scala 2.12 is used with Apache Spark and Apache Livy.
Just an Idea, if waiting is not the option!
Is it possible to package the latest scala jars with the application with appropriate maven scope defined and point those packages with the spark property
--properties spark.jars.repositories ??
Maybe you'll have to figure out a way to transfer the jars to the driver node. If s3 is an option that can be used as intermediatory storage.

Compatibility of Apache Spark 2.3.1 and 2.0.0

I would like to use an application developed with Apache Spark 2.0.0 (GitHub repo here) but I only have Spark 2.3.1 installed on my iMac (it seems to be the only one supported by homebrew at the moment). I can successfully compile it with sbt assembly but then when I run the first example given here I get the following error:
java.lang.NoSuchMethodError: breeze.linalg.DenseVector$.canDotD()Lbreeze/generic/UFunc$UImpl2;
Is this a compatibility issue between the two different versions of Scala-breeze used by Spark 2.0.0 and Spark 2.3.1. Is there a way to easily change the code in order to be able to use it with Spark 2.3.1? (I have never used scala before)
It probably is.
You can always manually download required version of Apache Spark (not by homebrew, but by downloading tar.gz archive from official page and just extracting it).

Eclipse not recognizing the new jars when migrating from Play Framework 2.4.2 to 2.5.0

I've been trying to migrate from Play 2.4 to 2.5 by following the Migration Guide, and I've upgraded my sbt version to 0.13.11 and ensured that I'm using Scala 2.11. I believe I've been able to successfully migrate to 2.5 because I've changed my routes to fit the new default InjectedRoutesGenerator, but I can't seem to use the new play.libs.streams.Accumulator in a custom BodyParser I want to make.
Any ideas as to why I might not be able to reference Accumulator? If it helps, even when I clean, build, and refresh my project in Eclipse, the referenced jars stay as <jar_name>_2.11-2.4.2.jar.

using hadoop 2.2.0 jar files in netbeans

I was previously using hadoop 1.2.1 in one of my netbeans project. I did this by including the various jar files in the 1.2.1 distribution I downloaded from hadoop's website.
I was wondering, is a similar approach with hadoop 2.2.0 possible? Namely, can I just include a bunch of jar files in my netbeans project and plug into hadoop that way?
Thanks in advance!
You can - There are more jars in the 2.x distributions of hadoop but the same principle should work.
On a side note, you may also want to look into using Maven for dependency management that will manage the list of included jars in Netbeans for you.

Hadoop plugin (1.0.3) for eclipse

I'm new to Hadoop. Can anyone tell me how to create Hadoop Plugin (version 1.0.3) for eclipse? In fact, they removed the plugin from /hadoop-x.x.x/contrib/ (in my case, x.x.x = 1.0.3)
There's a eclipse-plugin in /hadoop-x.x.x/src/contrib/.
By the way, What's the "typical way" to develop a MapReduce app using eclipse (words count for example) in term of:
Configuration (Standalone or Pseudodistributed...)
Coding convention (Folder structure, code, debug...)
when you have Hadoop Eclipse plugin installed and configured
in eclipse create a mapreduce project, it provides required dependencies of hadoop and other jars.
then you need to create Main class, Mapper class and Reducer class, In Main class you need to configure Job. wordcount example
once done you can run main program as run on hadoop, no need to start hadoop before running the program
for 1.0.3 plugin:
Apache has remove the plugin from Hadoop installation folder. Instead you can find Eclipse Plugin source code with build.xml file at "${HADOOP_HOME}\hadoop-1.0.3\src\contrib\eclipse-plugin". or you can simply download it from here