How to build Spark from the sources from the Download Spark page? - scala

I tried to install and build Spark 2.0.0 on Ubuntu VM with Ubuntu 16.04 as follows:
Install Java
sudo apt-add-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
Install Scala
Go to their Downloads tab on their site: scala-lang.org/download/all.html
I used Scala 2.11.8.
sudo mkdir /usr/local/src/scala
sudo tar -xvf scala-2.11.8.tgz -C /usr/local/src/scala/
Modify the .bashrc file and include the path for scala:
export SCALA_HOME=/usr/local/src/scala/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH
then type:
. .bashrc
Install git
sudo apt-get install git
Download and build spark
Go to: http://spark.apache.org/downloads.html
Download Spark 2.0.0 (Build from Source - for standalone mode).
tar -xvf spark-2.0.0.tgz
cd into the Spark folder (that has been extracted).
now type:
./build/sbt assembly
After its done Installing, I get the message:
[success] Total time: 1940 s, completed...
followed by date and time...
Run Spark shell
bin/spark-shell
That's when all hell breaks loose and I start getting the error. I go into the assembly folder to look for a folder called target. But there's no such folder there. The only things visible in assembly are: pom.xml, README, and src.
I looked it up online for quite a while and I haven't been able to find a single concrete solution that would help solve the error. Can someone please provide explicit step-by-step instructions as to how to go about solving this ?!? It's driving me nuts now... (T.T)
Screenshot of the error:

For some reason, Scala 2.11.8 is not working well while building but if I switch over to Scala 2.10.6 then it builds properly. I guess the reason I would need Scala in the first place is to get access to sbt to be able to build spark. Once its built, I need to direct myself to the spark folder and type:
build/sbt package
This will build the missing JAR files for me using Scala 2.11... kinda weird but that's how its working (I am assuming by looking at the logs).
Once spark builds again, type: bin/spark-shell (while being in the spark folder) and you'll have access to the spark shell.

type sbt package in spark directory not in build directory.

If your goal is really to build your custom Spark package from the sources you've downloaded from http://spark.apache.org/downloads.html, you should do the following instead:
./build/mvn -Phadoop-2.7,yarn,mesos,hive,hive-thriftserver -DskipTests clean install
You may want to read the official document Building Spark.
NB You don't have to install Scala and git packages to build Spark so you could have skipped "2. Install Scala" and "3. Install git" steps.

Related

sbt installation does not set up scala; cant run scala files

I followed instructions on this site because I didn't want to work with intellij.
With this installation, scala is not available as a command. How would I go about running .scala files?
Installing sbt doesn't get scala REPL for you. If you have sbt in your PATH variable, then you can use sbt console command to do and verify simple scala commands.
Otherwise you need to install scala separately.
The easy way to install scala and sbt is to use sdkman. Follow steps here.
curl -s "https://get.sdkman.io" | bash
source "$HOME/.sdkman/bin/sdkman-init.sh"
sdk install sbt
sdk install scala

Locate Scala installation for Spark

I'm using EMR and can launch spark-shell, but I want to run Scala REPL. Currently when I type scala command on shell it says:
-bash: scala: command not found
How to locate and run Scala REPL give that Spark is already installed and configured?
As already told by #ernest_k, EMR doesn't come with Scala installation.
With that said, I used the steps mentioned in this blog to install Scala on EMR
# perform customary yum update
sudo yum update
# download Scala installer RPM
wget https://downloads.lightbend.com/scala/2.13.0-M5/scala-2.13.0-M5.rpm
# install Scala RPM
sudo yum install scala-2.12.5.rpm

Building spark-jobserver Using SBT and Scala

Can anyone suggest me a better documentation about spark-jobserver. I have gone through the url spark-jobserver but unable to follow the same. It will be great if some one explain step by step instruction on how to use spark-jobserver.
Tools used in building the project.
sbt launcher version 0.13.5
Scala code runner version 2.11.6
With the above mentioned tools I am getting errors while building the spark-jobserver.
The documentation provided in the jobserver repo is indeed confusing.
Here's the steps I followed to manually build and run Spark Job Server on a local machine.
1. git clone https://github.com/spark-jobserver/spark-jobserver
2. sudo mkdir -p /var/log/job-server
3. sudo chown user1:user1 /var/log/job-server
4. cd spark-jobserver
5. sbt job-server/assembly
6. cd config
7. cp local.sh.template abc.sh # Note that the same name 'abc' is used in steps 8 and 10 as well
8. cp ec2.conf.template abc.conf
9. cd .. # The jobserver root dir
10. ./bin/server_package.sh abc # This script copies the files and packages necessary to run job server into a single dir [ default - /tmp/job-server]
11. cd /tmp/job-server [This is where the files and packages necessary to run job server are published by default]
12. ./server_start.sh
13. Run ./server_stop.sh to stop the server
Hope this helps
Here are the steps that I used to install:
Clone the jobserver repo.
Get sbt using wget https://dl.bintray.com/sbt/native-packages/sbt/0.13.8/sbt-0.13.8.tgz
Move "sbt-launch.jar" in sbt/bin to /bin
Create a script /bin/sbt, contents found here, making sure to change the pointer to java if necessary
Make the above script executable
Now cd into the spark jobserver directory, and run sbt publish-local
Assuming the above was successful, run sbt in the same directory
Finally, use the command re-start, and if it succeeds the server is now running!

stuck at command "sbt compile" in docker ubuntu

I try to include sbt into docker images. However, it never works and always stuck at Getting org.scala-sbt sbt 0.13.7 ... Also, it is also not working for changing the sbt version.
Here is the snippet of docker file
FROM ubuntu:14.04
RUN echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
RUN sudo apt-get update
RUN sudo apt-get install sbt //also I used force--yes
also, I try to install it on the container manually by using
wget http://dl.bintray.com/sbt/debian/sbt-0.13.5.deb
sudo apt-get update
sudo dpkg -i sbt-0.13.5.deb
When I run sbt compile, it also stuck trying to get Getting org.scala-sbt ...
but it is working for sbt --version
Basically, I dont know why sbt stucks trying to get Getting org.scala-sbt ...
You will need a Java Virtual Machine for SBT and so I think it's a good think you start from the java official docker image. Here is a basic docker file that uses the official ubuntu installation method:
FROM java
RUN echo "deb http://dl.bintray.com/sbt/debian /" | tee -a /etc/apt/sources.list.d/sbt.list
RUN apt-key update
RUN apt-get update
RUN apt-get -y --force-yes install sbt
NOTE: --force-yes has to be there because it is not an authenticated package. I tried adding RUN apt-key update, but that didn't make a difference so you can omit this line.
Then, build a test image: docker build -t test/sbt ., create an interactive container docker run -i -t test/sbt sbt and play with it.
This works for me, but I noticed the download times were slow for launching SBT, so be patient at this step.
This is because the SBT executable in itself is really light and will fetch a bunch of libraries on the first run to accomplish its task. It's also a way for SBT to support multiple projects using multiple SBT versions. If you are stuck at libraries resolution, check your networking configuration. For SBT errors, they are mostly printed on the command line, but you can configure logging if you want.
What's left for you to figure out is to add your project files and issue a compile command to test it.
sbt will try to download a higher version of itself if the project require a higher version than the currently installed sbt version. Usually there is a {projectFolder}/project/build.properties which specifies the desired sbt version for a sbt project. for example: sbt.version=0.13.7 requires version 0.13.7
You seem to get stuck at Getting org.scala-sbt sbt 0.13.7 .... But I believe sbt is actually trying to download sbt 0.13.7 to your local. As the package is not small, depending on your network speed, it may take a while.
It is also likely that there is a network connectivity issue that prevents sbt from downloading its package. So you can try to verify first your network connectivity is not a problem.
If your network is fine, another approach you can try is to go to sbt site to download 0.13.7 package manually to your docker and install it there by following instructions you can find from sbt site.
Hope this helps.
Sometimes sbt stuck when downloading files. You can periodically check size ~/.ivy2 folder and if size isn't grow kick sbt process and rerun sbt.
For me only after 5 kicks sbt download all files!!!

How to reconfigure version of scala on ubuntu?

I installed scala with apt-get install scala. ans figure out another trouble where exactly scala home is now?
this Where is SCALA_HOME on Ubuntu? question suggest:
/usr/share/java
I want to be sure and downloaded and untarred last scala version.
all this I put at /opt/scala and added to system environment.
But now I have default installed version of scala. I don't know how to set version to new location?
Here is how it looks:
nazar#nazar-desktop:~$ scala -version
Scala code runner version 2.9.2 -- Copyright 2002-2011, LAMP/EPFL
nazar#nazar-desktop:~$ echo $SCALA_HOME
/opt/scala/scala-2.10.3
I want to turn installed version from 2.9.2 to untarred one.
how to solve this trouble?
I personally wouldn't bother.
The binary incompatibilities across major versions mean you'll likely want more than one version available if you ever work on more than one project.
My advice is to install the latest version of SBT and use that to manage versions at a per-project level. You'll still be able to get a REPL via sbt console
If you put $SCALA_HOME/bin at the front of your PATH variable, like this:
export PATH="$SCALA_HOME/bin:$PATH"
that should fix it.
However, you need to type hash -r in each terminal window in which you have tried to run scala, to make the change take effect.
I untarred scala in /usr/local then made scala a symbolic link to scala-2.10.0
The reason for this is to make an upgrade easier, just alter the symlink
Next, I added /usr/local/scala/bin to the PATH in .bashrc
After this, typing scala in a term gives me the prompt
To install scala into eclipse was considerably more complicated but I got that to work too, by untarring stuff into /usr/local
I didn't bother with apt-get on .deb packages as I run Ubuntu LTS and as far as I know there are no ppa that track a current version
Also, this SCALA_HOME thing: there is a function in the shell script "scala" that finds it in this way
findScalaHome () {
# see SI-2092 and SI-5792
local source="${BASH_SOURCE[0]}"
while [ -h "$source" ] ; do
local linked="$(readlink "$source")"
local dir="$( cd -P $(dirname "$source") && cd -P $(dirname "$linked") && pw
d )"
source="$dir/$(basename "$linked")"
done
( cd -P "$(dirname "$source")/.." && pwd )
}