I am trying to compile this simple code in scala to run it in spark:
import org.apache.spark.sql.SparkSession
object Main {
def main(args: Array[String]) {
if (args.length < 1) {
System.err.println("Usage: HDFStEST <file>")
System.exit(1)
}
val spark = SparkSession.builder.appName("TesteHelena").getOrCreate()
println("hellooo")
spark.stop() }
}
I don't know how to make scalac find the dependency org.apache.spark.sql.SparkSession
I try to set where are the jars files with the following command:
scalac main.scala -cp C:\Spark\spark-2.4.0-bin-hadoop2.7\jars -d main.jar
which returns me the error:
main.scala:1: error: object apache is not a member of package org
import org.apache.spark.sql.SparkSession
and if I just send every jar file with the command:
scalac main.scala -cp C:\Spark\spark-2.4.0-bin-hadoop2.7\jars\* org.apache.spark.sql.SparkSession -d main.jar
it returns me the error:
error: IO error while decoding C:\Spark\spark-2.4.0-bin-hadoop2.7\jars\aircompressor-0.10.jar with UTF-8
for every jar file.
The command:
scalac main.scala -cp org.apache.spark.sql.SparkSession -d main.jar
returns me:
main.scala:1: error: object apache is not a member of package org
import org.apache.spark.sql.SparkSession
So, is there a way to use the Spark dependency in scalac to compile a program? I cannot use any dependency builder, like sbt and gradle, because I don't have access to internet in my terminal, due to security issues of my job, and they call those dependencies in their repository.
I solved my issue with the command:
scalac -cp C:\Spark\spark-2.4.0-bin-hadoop2.7\jars\ -extdirs C:\Spark\spark-2.4.0-bin-hadoop2.7\jars\ main.scala -d main1.jar
so, I added the scalac's option "-extdirs", which overrides the location of installed extensions. And it worked!
Related
I am trying to run the scala test from the command line without SBT and I am failing. I followed the documentation line-by-line.
import collection.mutable.Stack
import org.scalatest._
import flatspec._
import matchers._
class FirstSpec extends AnyFlatSpec with should.Matchers {
"A Stack" should "pop values in last-in-first-out order" in {
val stack = new Stack[Int]
stack.push(1)
stack.push(2)
stack.pop() should be (2)
stack.pop() should be (1)
}
}
Error message:
> scala -cp scalatest_2.13-3.2.5.jar org.scalatest.tools.Runner -R . -o -s FirstSpec.scala
No such file or class on classpath: org.scalatest.tools.Runner
Repository
ScalaTest has been modularised since 2.3.0 so just scalatest.jar artifact is not sufficient from raw shell. Build tools such as sbt would usually resolve all the transitive dependencies automatically, however if you are not using a build tool, then it is necessary to do that manually.
So download all the transitive dependencies and run something like
scala -cp scalatest_2.13-3.2.4.jar:scalatest-compatible-3.2.4.jar:scalatest-core_2.13-3.2.4.jar:scalactic_2.13-3.2.4.jar:scalatest-diagrams_2.13-3.2.4.jar:scalatest-matchers-core_2.13-3.2.4.jar:scalatest-shouldmatchers_2.13-3.2.4.jar:scalatest-flatspec_2.13-3.2.4.jar:scala-xml_2.13-1.3.0.jar org.scalatest.run ExampleSpec
or given all the transitive jars are in the same directory
scala -cp '*' org.scalatest.run ExampleSpec
or coursier can help you fetch and build the correct classpath
scala -cp "$(cs fetch --classpath org.scalatest:scalatest_2.13:3.2.4)" org.scalatest.run ExampleSpec
or use coursier to launch the main class from directory containing compiled tests
cs launch org.scalatest:scalatest_2.13:3.2.4 -M org.scalatest.run
or launch the default main runner which provides basic GUI by providing the run path -R
cs launch org.scalatest:scalatest_2.13:3.2.4 -- -R .
For record here are all the transitive dependencies of scalatest.jar
cs resolve org.scalatest:scalatest_2.13:3.2.5
org.scala-lang:scala-library:2.13.4:default
org.scala-lang:scala-reflect:2.13.4:default
org.scala-lang.modules:scala-xml_2.13:1.2.0:default
org.scalactic:scalactic_2.13:3.2.5:default
org.scalatest:scalatest-compatible:3.2.5:default
org.scalatest:scalatest-core_2.13:3.2.5:default
org.scalatest:scalatest-diagrams_2.13:3.2.5:default
org.scalatest:scalatest-featurespec_2.13:3.2.5:default
org.scalatest:scalatest-flatspec_2.13:3.2.5:default
org.scalatest:scalatest-freespec_2.13:3.2.5:default
org.scalatest:scalatest-funspec_2.13:3.2.5:default
org.scalatest:scalatest-funsuite_2.13:3.2.5:default
org.scalatest:scalatest-matchers-core_2.13:3.2.5:default
org.scalatest:scalatest-mustmatchers_2.13:3.2.5:default
org.scalatest:scalatest-propspec_2.13:3.2.5:default
org.scalatest:scalatest-refspec_2.13:3.2.5:default
org.scalatest:scalatest-shouldmatchers_2.13:3.2.5:default
org.scalatest:scalatest-wordspec_2.13:3.2.5:default
org.scalatest:scalatest_2.13:3.2.5:default
I have started to build a small scala code.
It will be expanded to open SparkContext and do some data processing.
As of now I have only imported 3 spark libraries to run a simple code.
I am getting error "error: object apache is not a member of package org".
Question is: How can I compile manually using scalac so that the compilation process can include the Spark libraries - without Maven or sbt?
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object HelloWorld {
def main(args: Array[String]) {
println("Hello, world!")
}
}
I am trying to run scala code using java -jar <> i am getting below issue
ERROR:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataOutputStream at com.cargill.finance.cdp.blackline.Ingest.main(Ingest.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataOutputStream
The same code is running fine with spark-submit.
I am trying to write data to hdfs file.
I have imported below classes
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FSDataOutputStream
You need to add all dependencies (including transitive dependencies, i.e. dependencies of dependencies) to -cp argument. If you just look at direct dependencies of hadoop-core you'll see why you should never do this manually. Instead use a build system. If you followed e.g. https://spark.apache.org/docs/latest/quick-start.html it actually sets up SBT, so you can do sbt run to run the main class like java -cp <lots of libraries> -jar <jarfile> would). If you didn't, add build.sbt as described there.
I'm trying to run a file called test.scala which refers to a library called moarcoref-assembly-1.jar. I'm very sure that the library jar file contains the class edu.berkeley.nlp.coref.NumberGenderComputer but scala keeps complaining at the import command. It even doesn't seem to find the edu package.
$ cat test.scala
import java.io._
import scala.collection.mutable.ListBuffer
import scala.collection.mutable.ArrayBuffer
import scala.io.Source
import edu.berkeley.nlp.coref.NumberGenderComputer
println("Hello, world!")
This is the error:
$ scala -cp "moarcoref-assembly-1.jar:." test.scala
/data/EvEn/nn_coref/modifiedBCS/test.scala:5: error: not found: value edu
import edu.berkeley.nlp.coref.NumberGenderComputer
^
one error found
I'm using version 2.11.6 and can't install a newer one.
$ scala -version
Scala code runner version 2.11.6 -- Copyright 2002-2013, LAMP/EPFL
I am adding jars into my scala repl like so:
scala> :cp scalaj-http_2.10-2.2.1.jar
Added '/home/XXX/scalaj-http_2.10-2.2.1.jar'. Your new classpath is:
".:/home/XXX/json4s-native_2.10-3.3.0.RC3.jar:/home/XXX/scalaj-http_2.10-2.2.1.jar"
Nothing to replay.
Now when I try and import that jar for use I get an error:
scala> import scalaj.http._
<console>:7: error: not found: value scalaj
import scalaj.http._
I've tried this on another jar:
scala> :cp json4s-native_2.10-3.3.0.RC3.jar
Added '/home/XXX/json4s-native_2.10-3.3.0.RC3.jar'. Your new classpath is:
".:/home/XXX/json4s-native_2.10-3.3.0.RC3.jar"
Nothing to replay.
scala> import org.json4s.JsonDSL._
<console>:7: error: object json4s is not a member of package org
import org.json4s.JsonDSL._
I've read multuple tutorials online that all do it this way but my REPL does not seem to be behaving in the same manor.
I am using Scala 2.10
Double check your path, if it still is not working you can try adding the jar at the time you start the REPL (it's always seemed to work for me, even with v2.10)
scala -cp /home/XXX/json4s-native_2.10-3.3.0.RC3.jar:/home/XXX/scalaj-http_2.10-2.2.1.jar
Note: That the delimeter between jars is ; for Windows and : otherwise.