main class not found in spark scala program - eclipse

//package com.jsonReader
import play.api.libs.json._
import play.api.libs.json._
import play.api.libs.json.Reads._
import play.api.libs.json.Json.JsValueWrapper
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.SQLContext
//import org.apache.spark.implicits._
//import sqlContext.implicits._
object json {
def flatten(js: JsValue, prefix: String = ""): JsObject = js.as[JsObject].fields.foldLeft(Json.obj()) {
case (acc, (k, v: JsObject)) => {
val nk = if(prefix.isEmpty) k else s"$prefix.$k"
acc.deepMerge(flatten(v, nk))
}
case (acc, (k, v: JsArray)) => {
val nk = if(prefix.isEmpty) k else s"$prefix.$k"
val arr = flattenArray(v, nk).foldLeft(Json.obj())(_++_)
acc.deepMerge(arr)
}
case (acc, (k, v)) => {
val nk = if(prefix.isEmpty) k else s"$prefix.$k"
acc + (nk -> v)
}
}
def flattenArray(a: JsArray, k: String = ""): Seq[JsObject] = {
flattenSeq(a.value.zipWithIndex.map {
case (o: JsObject, i: Int) =>
flatten(o, s"$k[$i]")
case (o: JsArray, i: Int) =>
flattenArray(o, s"$k[$i]")
case a =>
Json.obj(s"$k[${a._2}]" -> a._1)
})
}
def flattenSeq(s: Seq[Any], b: Seq[JsObject] = Seq()): Seq[JsObject] = {
s.foldLeft[Seq[JsObject]](b){
case (acc, v: JsObject) =>
acc:+v
case (acc, v: Seq[Any]) =>
flattenSeq(v, acc)
}
}
def main(args: Array[String]) {
val appName = "Stream example 1"
val conf = new SparkConf().setAppName(appName).setMaster("local[*]")
//val spark = new SparkContext(conf)
val sc = new SparkContext(conf)
//val sqlContext = new SQLContext(sc)
val sqlContext=new SQLContext(sc);
//val spark=sqlContext.sparkSession
val spark = SparkSession.builder().appName("json Reader")
val df = sqlContext.read.json("C://Users//ashda//Desktop//test.json")
val set = df.select($"user",$"status",$"reason",explode($"dates")).show()
val read = flatten(df)
read.printSchema()
df.show()
}
}
I'm trying to use this code to flatten a higly nested json. For this I created a project and converted it to a maven project. I edited the pom.xml and included the libraries I needed but when I run program it says "Error: Could not find or load main class".
I tried converting the code to sbt project and then run but I get the same error. I tried packaging the code and run through spark-submit which gives me same error. Please let me know what am I missing here. I have tried I could for this.
Thanks

Hard to say, but maybe you have many classes that qualify as main so the build tool does not know which one to choose. Maybe try to clean the project first sbt clean.
Anyway in scala the preferred way to define a main class is to extend the App -trait.
object SomeApp extends App
Then the whole object body will become your main method.
You can also define in your build.sbt the main class. This is necessary if you have many objects that extend the App -trait.
mainClass in (Compile, run) := Some("io.example.SomeApp")

I am answering this question for sbt configurations. I also got the same issues which I resolved recently and made some basic mistakes which I would like you to note :
1. Configure your sbt file
go to build.sbt file and see that the scala version you are using is compatible with spark.As per version 2.4.0 of spark https://spark.apache.org/docs/latest/ ,scala version required is 2.11.x and not 2.12.x . So, even though your IDE (Eclipse/IntelliJ) shows the latest version of scala or the version you downloaded, change it to compatible version. Also, include this line of code
libraryDependencies += "org.scala-lang" % "scala-library" % "2.11.6"
2.11.x is your scala version
2. File Hierarchy
Make sure your Scala file is under /src/main/scala package only
3. Terminal
If your IDE allows you to launch terminal within it, launch it(IntelliJ allows, Not sure of Eclipse or any other) OR Go to terminal and change directory to your project directory
then run :
sbt clean
This will clear any libraries loaded previously or folders created after compilation.
sbt package
This will pack your files into a single jar file under target/scala-/ package
Then submit to spark :
spark-submit target/scala-<version>/<.jar file> --class "<ClassName>(In your case , com.jsonReader.json)" --jars target/scala-<version>/<.jar file> --master local[*]
Note here that -- if specified in a program isnt required here

Related

How to fix the problem "Fatal error: java.lang.Object is missing called from core module analyzer" in ScalaJS Linking

I'm trying to defer ScalaJS Linking to runtime, which allows multi-stage compilation to be more flexible and less dependent on sbt.
The setup looks like this:
Instead of using scalajs-sbt plugin, I chose to invoke scalajs-compiler directly as a scala compiler plugin:
scalaCompilerPlugins("org.scala-js:scalajs-compiler_${vs.scalaV}:${vs.scalaJSV}")
This can successfully generate the "sjsir" files under project output directory, but no further.
Use the solution in this post:
Build / Compile latest SalaJS (1.3+) using gradle on a windows machine?
"Linking scala.js yourself" to invoke the linker on all the compiled sjsir files to produce js files, this is my implementation:
in compile-time & runtime dependencies, add scalajs basics and scalajs-linker:
bothImpl("org.scala-js:scalajs-library_${vs.scalaBinaryV}:${vs.scalaJSV}")
bothImpl("org.scala-js:scalajs-linker_${vs.scalaBinaryV}:${vs.scalaJSV}")
bothImpl("org.scala-js:scalajs-dom_${vs.scalaJSSuffix}:2.1.0")
Write the following code:
import org.scalajs.linker.interface.{Report, StandardConfig}
import org.scalajs.linker.{PathIRContainer, PathOutputDirectory, StandardImpl}
import org.scalajs.logging.{Level, ScalaConsoleLogger}
import java.nio.file.{Path, Paths}
import java.util.Collections
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, ExecutionContext}
object JSLinker {
implicit def gec = ExecutionContext.global
def link(classpath: Seq[Path], outputDir: Path): Report = {
val logger = new ScalaConsoleLogger(Level.Warn)
val linkerConfig = StandardConfig() // look at the API of this, lots of options.
val linker = StandardImpl.linker(linkerConfig)
// Same as scalaJSModuleInitializers in sbt, add if needed.
val moduleInitializers = Seq()
val cache = StandardImpl.irFileCache().newCache
val result = PathIRContainer
.fromClasspath(classpath)
.map(_._1)
.flatMap(cache.cached _)
.flatMap(linker.link(_, moduleInitializers, PathOutputDirectory(outputDir), logger))
Await.result(result, Duration.Inf)
}
def linkClasses(outputDir: Path = Paths.get("./")): Report = {
import scala.jdk.CollectionConverters._
val cl = Thread.currentThread().getContextClassLoader
val resources = cl.getResources("")
val rList = Collections.list(resources).asScala.toSeq.map { v =>
Paths.get(v.toURI)
}
link(rList, outputDir)
}
lazy val linkOnce = {
linkClasses()
}
}
The resources detection was successful, all roots containing sjsir are detected:
rList = {$colon$colon#1629} "::" size = 4
0 = {UnixPath#1917} "/home/peng/git-scaffold/scaffold-gradle-kts/build/classes/scala/test"
1 = {UnixPath#1918} "/home/peng/git-scaffold/scaffold-gradle-kts/build/classes/scala/testFixtures"
2 = {UnixPath#1919} "/home/peng/git-scaffold/scaffold-gradle-kts/build/classes/scala/main"
3 = {UnixPath#1920} "/home/peng/git-scaffold/scaffold-gradle-kts/build/resources/main"
But linking still fails:
Fatal error: java.lang.Object is missing
called from core module analyzer
There were linking errors
org.scalajs.linker.interface.LinkingException: There were linking errors
at org.scalajs.linker.frontend.BaseLinker.reportErrors$1(BaseLinker.scala:91)
at org.scalajs.linker.frontend.BaseLinker.$anonfun$analyze$5(BaseLinker.scala:100)
at scala.concurrent.impl.Promise$Transformation.run$$$capture(Promise.scala:467)
at scala.concurrent.impl.Promise$Transformation.run(Promise.scala)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
I wonder what this error message entails. Clearly java.lang.Object is not compiled into sjsir. Does this error message make sense? How do I fix it?
Thanks to #sjrd I now have the correct runtime compilation stack. There are 2 problems in my old settings:
It turns out that cl.getResources("") is indeed not able to infer all classpath, so I switch to system property java.class.path, which contains classpaths of all dependencies
moduleInitializers has to be manually set to point to a main method, which will be invoked when the js function is called.
After correcting them, the compilation class becomes:
import org.scalajs.linker.interface.{ModuleInitializer, Report, StandardConfig}
import org.scalajs.linker.{PathIRContainer, PathOutputDirectory, StandardImpl}
import org.scalajs.logging.{Level, ScalaConsoleLogger}
import java.nio.file.{Files, Path, Paths}
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, ExecutionContext, ExecutionContextExecutor}
object JSLinker {
implicit def gec: ExecutionContextExecutor = ExecutionContext.global
val logger = new ScalaConsoleLogger(Level.Info) // TODO: cannot be lazy val, why?
lazy val linkerConf: StandardConfig = {
StandardConfig()
} // look at the API of this, lots of options.
def link(classpath: Seq[Path], outputDir: Path): Report = {
val linker = StandardImpl.linker(linkerConf)
// Same as scalaJSModuleInitializers in sbt, add if needed.
val moduleInitializers = Seq(
ModuleInitializer.mainMethodWithArgs(SlinkyHelloWorld.getClass.getName.stripSuffix("$"), "main")
)
Files.createDirectories(outputDir)
val cache = StandardImpl.irFileCache().newCache
val result = PathIRContainer
.fromClasspath(classpath)
.map(_._1)
.flatMap(cache.cached _)
.flatMap { v =>
linker.link(v, moduleInitializers, PathOutputDirectory(outputDir), logger)
}
Await.result(result, Duration.Inf)
}
def linkClasses(outputDir: Path = Paths.get("./ui/build/js")): Report = {
val rList = getClassPaths
link(rList, outputDir)
}
def getClassPaths: Seq[Path] = {
val str = System.getProperty("java.class.path")
val paths = str.split(':').map { v =>
Paths.get(v)
}
paths
}
lazy val linkOnce: Report = {
val report = linkClasses()
logger.info(
s"""
|=== [Linked] ===
|${report.toString()}
|""".stripMargin
)
report
}
}
This is all it takes to convert sjsir artefacts to a single main.js file.

how to import a class in scala

I am learning scala recently, the package in scala confused me.
I have a file named StockPriceFinder.scala:
// StockPriceFinder.scala
object StockPriceFinder {
def getTickersAndUnits() = {
val stockAndUnitsXML = scala.xml.XML.load("stocks.xml")
(Map[String, Int]() /: (stocksAndUnitsXML \ "symbol")) {
(map, symbolNode) =>
val ticker = (symbolNode \ "#ticker").toString
val units = (symbolNode \ "units").text.toInt
map ++ Map(ticker -> units)
}
}
}
then I want to use StockPriceFinder in test.scala which is in the same folder:
val symbolAndUnits = StockPriceFinder.getTickersAndUnits
but when I run it with scala test.scala, I got error:error: not found: value StockPriceFinder. In Java, if this two source files are in the same folder, I do not need to import and I can use it directly, so how can I import StockPriceFinder correctly in scala?
I have tried to use import StockPriceFinder in test.scala, but it still does not work.
You don't need to import StockPriceFinder if the files are in the same package (not folder).
But you do need to compile StockPriceFinder.scala first and pass the correct classpath to scala.
scalac StockPriceFinder.scala
scala -cp . test.scala
should work (might be a bit off). However, you shouldn't do it manually, since it becomes unmanageable very quickly; use SBT or other build tools (Maven, Gradle).

Create custom Arbitrary generator for testing java code from ScalaTest ScalaCheck

Is it possible to create a custom Arbitrary Generator in a ScalaTest (which mixins Checkers for ScalaCheck property) which is testing Java code? for e.g. following are the required steps for each test within forAll
val fund = new Fund()
val fundAccount = new Account(Account.RETIREMENT)
val consumer = new Consumer("John")
.createAccount(fundAccount)
fund.addConsumer(consumer)
fundAccount.deposit(amount)
above is a prep code before asserting results etc.
You sure can. This should get you started.
import org.scalacheck._
import Arbitrary._
import Prop._
case class Consumer(name:String)
object ConsumerChecks extends Properties("Consumer") {
lazy val genConsumer:Gen[Consumer] = for {
name <- arbitrary[String]
} yield Consumer(name)
implicit lazy val arbConsumer:Arbitrary[Consumer] = Arbitrary(genConsumer)
property("some prop") = forAll { c:Consumer =>
// Check stuff
true
}
}

Does TypeSafe Slick work on Scala 2.9.3?

Does TypeSafe Slick work on Scala 2.9.3? I get
[ERROR] exception when typing query.list
exception when typing query.listscala.tools.nsc.symtab.Types$TypeError: class file needed by StatementInvoker is missing.
[INFO] class file needed by StatementInvoker is missing.
[INFO] reference type Either of object package refers to nonexisting symbol.
which goes away when I use Scala 2.10.x, but I'm too new to Scala to understand why.
import slick.session.Database
import scala.slick.jdbc.StaticQuery
import Database.threadLocalSession
import com.typesafe.config.ConfigFactory
object PostgresDao {
protected val conf = ConfigFactory.load
def findFoo(a: Int, b: String): Option[Int] = {
Database.forURL("jdbc:postgresql://localhost/bar", driver = "org.postgresql.Driver") withSession {
val query = StaticQuery.query[(Int, String), Int](
"""
select some_int
from some_table t
where t.a = ? and t.b = ?
""".stripMargin)
val list: List[Int] = query.list((a, b))
if (list.isEmpty) {
None
}
else {
Some(list.head)
}
}
}
I shamelessly copy-past official doc here:
Slick requires Scala
2.10. (For Scala 2.9 please use ScalaQuery, the
predecessor of Slick).

How do I provide basic configuration for a Scala application?

I am working on a small GUI application written in Scala. There are a few settings that the user will set in the GUI and I want them to persist between program executions. Basically I want a scala.collections.mutable.Map that automatically persists to a file when modified.
This seems like it must be a common problem, but I have been unable to find a lightweight solution. How is this problem typically solved?
I do a lot of this, and I use .properties files (it's idiomatic in Java-land). I keep my config pretty straight-forward by design, though. If you have nested config constructs you might want a different format like YAML (if humans are the main authors) or JSON or XML (if machines are the authors).
Here's some example code for loading props, manipulating as Scala Map, then saving as .properties again:
import java.io._
import java.util._
import scala.collection.JavaConverters._
val f = new File("test.properties")
// test.properties:
// foo=bar
// baz=123
val props = new Properties
// Note: in real code make sure all these streams are
// closed carefully in try/finally
val fis = new InputStreamReader(new FileInputStream(f), "UTF-8")
props.load(fis)
fis.close()
println(props) // {baz=123, foo=bar}
val map = props.asScala // Get to Scala Map via JavaConverters
map("foo") = "42"
map("quux") = "newvalue"
println(map) // Map(baz -> 123, quux -> newvalue, foo -> 42)
println(props) // {baz=123, quux=newvalue, foo=42}
val fos = new OutputStreamWriter(new FileOutputStream(f), "UTF-8")
props.store(fos, "")
fos.close()
Here's an example of using XML and a case class for reading a config. A real class can be nicer than a map. (You could also do what sbt and at least one project do, take the config as Scala source and compile it in; saving it is less automatic. Or as a repl script. I haven't googled, but someone must have done that.)
Here's the simpler code.
This version uses a case class:
case class PluginDescription(name: String, classname: String) {
def toXML: Node = {
<plugin>
<name>{name}</name>
<classname>{classname}</classname>
</plugin>
}
}
object PluginDescription {
def fromXML(xml: Node): PluginDescription = {
// extract one field
def getField(field: String): Option[String] = {
val text = (xml \\ field).text.trim
if (text == "") None else Some(text)
}
def extracted = {
val name = "name"
val claas = "classname"
val vs = Map(name -> getField(name), claas -> getField(claas))
if (vs.values exists (_.isEmpty)) fail()
else PluginDescription(name = vs(name).get, classname = vs(claas).get)
}
def fail() = throw new RuntimeException("Bad plugin descriptor.")
// check the top-level tag
xml match {
case <plugin>{_*}</plugin> => extracted
case _ => fail()
}
}
}
This code reflectively calls the apply of a case class. The use case is that fields missing from config can be supplied by default args. No type conversions here. E.g., case class Config(foo: String = "bar").
// isn't it easier to write a quick loop to reflect the field names?
import scala.reflect.runtime.{currentMirror => cm, universe => ru}
import ru._
def fromXML(xml: Node): Option[PluginDescription] = {
def extract[A]()(implicit tt: TypeTag[A]): Option[A] = {
// extract one field
def getField(field: String): Option[String] = {
val text = (xml \\ field).text.trim
if (text == "") None else Some(text)
}
val apply = ru.newTermName("apply")
val module = ru.typeOf[A].typeSymbol.companionSymbol.asModule
val ts = module.moduleClass.typeSignature
val m = (ts member apply).asMethod
val im = cm reflect (cm reflectModule module).instance
val mm = im reflectMethod m
def getDefault(i: Int): Option[Any] = {
val n = ru.newTermName("apply$default$" + (i+1))
val m = ts member n
if (m == NoSymbol) None
else Some((im reflectMethod m.asMethod)())
}
def extractArgs(pss: List[List[Symbol]]): List[Option[Any]] =
pss.flatten.zipWithIndex map (p => getField(p._1.name.encoded) orElse getDefault(p._2))
val args = extractArgs(m.paramss)
if (args exists (!_.isDefined)) None
else Some(mm(args.flatten: _*).asInstanceOf[A])
}
// check the top-level tag
xml match {
case <plugin>{_*}</plugin> => extract[PluginDescription]()
case _ => None
}
}
XML has loadFile and save, it's too bad there seems to be no one-liner for Properties.
$ scala
Welcome to Scala version 2.10.0-RC5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_06).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import reflect.io._
import reflect.io._
scala> import java.util._
import java.util._
scala> import java.io.{StringReader, File=>JFile}
import java.io.{StringReader, File=>JFile}
scala> import scala.collection.JavaConverters._
import scala.collection.JavaConverters._
scala> val p = new Properties
p: java.util.Properties = {}
scala> p load new StringReader(
| (new File(new JFile("t.properties"))).slurp)
scala> p.asScala
res2: scala.collection.mutable.Map[String,String] = Map(foo -> bar)
As it all boils down to serializing a map / object to a file, your choices are:
classic serialization to Bytecode
serialization to XML
serialization to JSON (easy using Jackson, or Lift-JSON)
use of a properties file (ugly, no utf-8 support)
serialization to a proprietary format (ugly, why reinvent the wheel)
I suggest to convert Map to Properties and vice versa. "*.properties" files are standard for storing configuration in Java world, why not use it for Scala?
The common way are *. properties, *.xml, since scala supports xml natively, so it would be easier using xml config then in java.