spark context cannot reslove in MLUtils.loadLibSVMFile with Intellij - eclipse

I try to run Multilayer perceptron classifier example here:https://spark.apache.org/docs/1.5.2/ml-ann.html, it seems works well at spark-shell, but not with IDE like Intellij and Eclipse. The problem comes from
val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_multiclass_classification_data.txt").toDF()
IDE prompts cannot resolve symbol sc(sparkcontext), but the libraries path has been correctly configure. If anyone can helps me, thanks!

Actually there is no such value as sc by default. It's imported on spark-shell startup. In any ordinal scala\java\python code you should create it manually.
I've recently made very low quality answer. You can use part about sbt and libraries in it.
Next you can use something like following code as template to start.
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}
object Spark extends App {
val config = new SparkConf().setAppName("odo").setMaster("local[2]").set("spark.driver.host", "localhost")
val sc = new SparkContext(config)
val sqlc = new SQLContext(cs)
import sqlc.implicits._
//here you code follows
}
Next you can just CtrlShiftF10 it

Related

Getting error while importing downloaded github project (scala)?

So I'm pretty new at scala. I'm trying to use this library in my other project: https://www.github.com/desmondyeung/scala-hashing
I downloaded it, and looked up a guide on how to use downloaded projects (https://www.oreilly.com/library/view/scala-cookbook/9781449340292/ch18s11.html), and it said to make a build.scala and then put some such in it. The way I've tried to do that is this:
import sbt._
object MyBuild extends Build {
lazy val root = Project("root", file(".")) dependsOn(xxHash)
lazy val xxHash = RootProject(uri("file:///Users/[other stuff]/documents/GitHub/scala-hashing-master/project"))
}
And that's all that's in that build.scala file. It's under the project folder.
I tried to run it using some simple test code:
import xxHash._
object hashtest {
def main(): Unit = {
println(XxHash.com.desmondyeung.hashing.Xxhash64.hashByteArray(Array[Byte](xs = 123), seed = 0))
}
}
But I'm getting an error, it says "not found: object xxHash". I must be missing something, because the guide doesn't tell me how to reference it I don't think? I tried just using import com.desmondyeung.hashing.XxHash64 but it didn't work either, saying object desmondyeung is not a member of package com
I googled that, and it said to try putting _root_ before .com, but that did not work.

Importing spark.implicits._ inside a Jupyter notebook

In order to make use of $"my_column" constructs within a spark sql we need to:
import spark.implicits._
This is however not working afaict inside a jupyter notebook cell: the result is:
Name: Compile Error
Message: <console>:49: error: stable identifier required, but this.$line7$read.spark.implicits found.
import spark.implicits._
^
I have seen notebooks in the past for which that did work - but they may have been zeppelin.. Is there a way to get this for jupyter ?
Here is a hack that works
val spark2: SparkSession = spark
import spark2.implicits._
So now the spark2 reference is "stable" apparently.

How to run a shell file in Apache Spark using scala

I need to execute a shell file at the end of my code in spark using scala. I used count and groupby functions in my code. I should mention that, my code works perfectly without the last line of code.
(import sys.process._
/////////////////////////linux commands
val procresult="./Users/saeedtkh/Desktop/seprator.sh".!!)
could you please help me how to fix it.
You must use sys.process._ package from Scala SDK and use DSL with !:
import sys.process._
val result = "ls -al".!
Or make same with scala.sys.process.Process:
import scala.sys.process._
Process("cat data.txt")!

Calling into Play framework app from the Scala console

I have a Play Framework 2.3 app. I can drop into a Scala console with activator console. However, when I try to call into code from my app, specifically some helper function which uses WS, which uses the implicit import play.api.Play.current to retrieve the currently running app, I get the error message java.lang.RuntimeException: There is no started application.
What steps do I have to take to be able to load my app into the current console session?
There is a similar existing question, but the accepted answer appears to be using a mock app from the framework's test helpers. Preferably, I would like to run in the context of my actual app. If I must use a fake app, would it be possible to make it match my development environment (what I get when running activator run) rather than my test environment (what I get when running the unit tests)?
Thanks in advance!
In this specific case you can just create an Application instance and use it instead of the implicit one:
// Tested in 2.3.7
import play.api.{Play, Mode, DefaultApplication}
import java.io.File
import play.api.libs.ws.WS
val application = new DefaultApplication(
new File("."),
Thread.currentThread().getContextClassLoader(),
None,
Mode.Dev
)
import scala.concurrent.ExecutionContext.Implicits.global
WS.client(application).url("http://www.google.com").get().map((x) => println(x.body))
For future readers, for Play framework 2.5.x:
import play.api._
val env = Environment(new java.io.File("."), this.getClass.getClassLoader, Mode.Dev)
val context = ApplicationLoader.createContext(env)
val loader = ApplicationLoader(context)
val app = loader.load(context)
Play.start(app)
Source: https://www.playframework.com/documentation/2.5.x/PlayConsole#Launch-the-interactive-console

Scala IDE: can't read XML file into scala worksheet

New to Scala and having problems reading an XML file in a Scala worksheet. So far I have:
downloaded the Scala IDE (for Windows) and unzipped it to my C:\ drive
created a Scala project with the following file path: C:\eclipse\workspace\xml_data
created the xml file ...\xml_data\music.xml using the following data
created a package sample_data and create the following object (with file path: ...\xml_data\src\sample_data\SampleData.scala):
package sample_data
import scala.xml.XML
object SampleData {
val data = XML.loadFile("music.xml")
}
object PrintSampleData extends Application {
println(SampleData.data)
}
This runs OK, however, when I create the Scala worksheet test_sample_data.sc:
import sample_data.SampleData
object test {
println(SampleData.data)
}
I get a java.lang.ExceptionInInitializerError which includes: Caused by: java.io.FileNotFoundException: music.xml (The system cannot find the file specified).
The workspace is C:\eclipse\workspace. Any help or insight much appreciated. Cheers!
UPDATE:
Following aepurniet's advice, I ran new java.io.File(".").getAbsolutePath() and got the following respectively:
SampleData.scala: C:\eclipse\workspace\xml_data\.
test_sample_data.sc: C:\eclipse\.
So this is what is causing the problem. Does anyone know why this occurs? Absolute file paths resolve the problem. Is this the best solution?
Regarding what is causing different user directory between the scala class and worksheet:
You are likely hitting the Eclipse IDE issue listed here
https://github.com/scala-ide/scala-worksheet/issues/102
Jfyi, I used Intellij and the issue is not reproducible there.
Regarding using absolute paths:
Using absolute path works fine for quick testing, but would NOT be a good practice for the actual implementation. You can consider passing the path along with the filename as input to SampleData.
Some hack mentioned here to get the base path of the workspace from the scala worksheet: Configure working directory of Scala worksheet
If this is just for your testing, hacking the absolute path of workspace inside the worksheets might be the easiest for you.
SampleData.scala
package sample_data
import scala.xml.XML
object SampleData {
def data(filename: String) = XML.loadFile(filename)
}
object PrintSampleData extends Application {
println(SampleData.data(System.getProperty("user.dir") + "/music.xml")
}
Scala worksheet:
import sample_data.SampleData
object test {
val workDir = ... // Using the hack or hardcoding
println(SampleData.data(workDir + "/music.xml"))
}