How to recompile examples using Spark project's sources? - scala

I am new to java / scala and coming from python i don't know the compile phases...
I changed a scala example given with spark, but when I re-execute it, the changes are note taken into account. I guess it is because I am used not to compile my scripts :-/
What are the commands to compile a scala script ?
Namely : ~/spark/examples/src/main/scala/org/apache/spark/examples/streaming/FlumePollingEventCount.scala
scalac /Users/romain/spark/examples/src/main/scala/org/apache/spark/examples/streaming/FlumePollingEventCount2.scala
/Users/romain/spark/examples/src/main/scala/org/apache/spark/examples/streaming/FlumePollingEventCount2.scala:21:
error: object SparkConf is not a member of package org.apache.spark
import org.apache.spark.SparkConf
^

According to Running Build Targets For Individual Projects:
$ # sbt
$ build/sbt package
$ # Maven
$ build/mvn package -DskipTests -pl assembly
In your case it'd be build/sbt examples:package.
You may want to read up Building Spark to learn the build process in Spark.

Related

How to run Scala 3 applications in the command line with Coursier

If you follow the steps at the official Scala 3 sites, like Dotty or Scala Lang then it recommends using Coursier to install Scala 3. The problem is that neither or these explain how to run a compiled Scala 3 application after following the steps.
Scala 2:
> cs install scala
> scalac HelloScala2.scala
> scala HelloScala2
Hello, Scala 2!
Scala 3:
> cs install scala3-compiler
> scala3-compiler HelloScala3.scala
Now how do you run the compiled application with Scala 3?
Currently there does not seem to be a way to launch a runner for Scala 3 using coursier, see this issue. As a workaround, you can install the binaries from the github release page. Scroll all the way down passed the contribution list to see the .zip file and download and unpack it to some local folder. Then put the unpacked bin directory on your path. After a restart you will get the scala command (and scalac etc) in terminal.
Another workaround is using the java runner directly with a classpath from coursier by this command:
java -cp $(cs fetch -p org.scala-lang:scala3-library_3:3.0.0):. myMain
Replace myMain with the name of your #main def function. If it is in a package myPack you need to say myPack.myMain (as usual).
Finally, it seems that is possible to run scala application like scala 2 version using scala3 in Coursier:
cs install scala3
Then, you can compile it with scala3-compiler and run with scala3:
scala3-compiler Main.scala
scala3 Main.scala
This work-around seems to work for me:
cs launch scala3-repl:3+ -M dotty.tools.MainGenericRunner -- YourScala3File.scala
This way, you don't even have to compile the source code first.
In case your source depends on third-party libraries, you can specify the dependencies like this:
cs launch scala3-repl:3+ -M dotty.tools.MainGenericRunner -- -classpath \
$(cs fetch --classpath io.circe:circe-generic_3:0.14.1):. \
YourScala3File.scala
This would be an example where you use the circe library that's compiled with Scala 3. You should be able to specify multiple third-party libraries with the fetch sub-command.

Compiling with scalac does not find sbt dependencies

I tried running my Scala code in VSCode editor. I am able to run my script via spark-submit command. But when I am trying with scalac to compile, I am getting:
.\src\main\scala\sample.scala:1: error: object apache is not a member of package org
import org.apache.spark.sql.{SQLContext,SparkSession}
I have already added respective library dependencies to build.sbt.
Have you tried running sbt compile?
Running scalac directly means you're compiling only one file, without the benefits of sbt and especially the dependencies that you have added in your build.sbt file.
In a sbt project, there's no reason to use scalac directly. This defeats the purpose of sbt.

Compiling Spark Mllib using sbt

I have been building some modifications inside Spark Mllib for a while now and every time I want to want to compile Spark, I have to do the following:
sbt update
sbt compile
sbt clean
sbt package
While this procedure produces what I want, I find it unnecessary to compile and package the other Spark modules. Is there a quick command to do what I want?
For the compile part, just do sbt complie which is incremental by default.
You might be able to package subsets of the project with something like sbt mllib/package.

Run Spark in standalone mode with Scala 2.11?

I follow the instructions to build Spark with Scala 2.11:
mvn -Dscala-2.11 -DskipTests clean package
Then I launch per instructions:
./sbin/start-master.sh
It fails with two lines in the log file:
Failed to find Spark assembly in /etc/spark-1.2.1/assembly/target/scala-2.10
You need to build Spark before running this program.
Obviously, it's looking for a scala-2.10 build, but I did a scala-2.11 build. I tried the obvious -Dscala-2.11 flag, but that didn't change anything. The docs don't mention anything about how to run in standalone mode with scala 2.11.
Thanks in advance!
Before building you must run the script under:
dev/change-version-to-2.11.sh
Which should replace references to 2.10 with 2.11.
Note that this will not necessarily work as intended with non-GNU sed (e.g. OS X)

"is not a member of package" error when importing package in Scala with SBT

(Relative beginner here, please be gentle...)
I've got a Scala program that I can build with sbt. I can (from within sbt) run compile and test-compile with no errors. I've defined a package by putting package com.mycompany.mypackagename at the top of several .scala files. When I do console to get a Scala REPL, this happens:
scala> import com.mycompany.mypackagename._
<console>:5: error: value mypackagename is not a member of package com.mycompany
import com.mycompany.mypackagename._
Any variation of this also fails. When I just do import com.mycompany I get no problems.
I thought that running the Scala console from within sbt would properly set the classpath based on the current projects? What (completely obvious) thing am I missing?
I ran into this same problem, and then I realized I was running scala 2.10.0 on commandline, and IDEA was using Scala 2.9.2. So the fix was to change both to use the same version, and:
sbt clean
What will happen if you import actual class name instead of wildcard.
import com.mycompany.mypackagename.ActualClassName