How to use BLAS library in Spark? - scala

I'm new to scala and I'm writing a Spark application in Scala and I need to use the axpy function from org.apache.spark.mllib.linalg.BLAS. However it looks to be not accessible to users. Instead I try to import the com.github.fomil.netlib and directly access it. But I could either. I need to multiply to DenseVector.

Right now, the BLAS class within mllib is marked private[spark] in the spark source code. What this means is that it is not accessible external to spark itself as you seem to have figured out. In short, you can't use it in your code.
If you want to use netlib-java classes directly, you need to add the following dependency to your project
libraryDependencies += "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly()
That should allow you to import the BLAS class. Note, I haven't really tried to use it, but I am able to execute BLAS.getInstance() without a problem. There might be some complexities in the installation on some Linux platforms as described here - https://github.com/fommil/netlib-java.

Add mllib to your project
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "1.3.0"

Related

How can online code editor Scastie read an input file?

I have a need to pass a very big input file to Scastie. I mean how can Scastie which is online code editor read a file which is available at my local machine, for example
val lines = sc.textfile("....mdb/u.data")
Some asked this on the team's Gitter channel.
The Scastie team member first asked how big the file is, then recommended to put it in a Gist on Github and to use the raw url to read it in.
This works only for small files. The limits of files on Gist are explained in their Developer Guide.
If you need the full contents of the file, you can make a GET request to the URL specified by raw_url. Be aware that for files larger than ten megabytes, you'll need to clone the gist via the URL provided by git_pull_url.
So 10 MB is your limit. Also note that you can't use a SparkContext(denoted by sc in your question) without identifying the library to the online environment.
To do that, you'll have to add the SBT dependency.
Navigate to Build Settings on the left part of the interface.
Set the Scala Version to a version compatible with the Spark we'll choose, in our case 2.11.12.
Under Extra Sbt Configuration place the following dependencies:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.3",
"org.apache.spark" %% "spark-sql" % "2.4.3"
)
You won't be able to read url content directly using sc.textFile, that is only for reading local/HDFS text files. You'll have to get the content first, wrangle it into shape and get a DataFrame out of it.
The answer shown here describes how to access a web url using Source from the Scala Standard Library.
At the request of the OP, here's an implementation on scastie.

Trying To Understand Which Play Library To Use

What is the difference between the following libraries:
libraryDependencies += "com.typesafe.play" %% "play-ahc-ws-standalone" % "LATEST_VERSION"
and
libraryDependencies += "com.typesafe.play" %% "play-ahc-ws" % "LATEST_VERSION"
I am just trying to figure which is the correct one to use. What I did was to create a Play module in a separate library and I want to inject that into a Play application. But when I used the first library listed above, it only offers a StandaloneWSClient. When I injected that into a Play application it couldn't bind an implementation to it. But when I switched the second library, it offers a WSClient which the Play application could find an implementation to bind to as it already has one which you can specify in the build.sbt definition ie ws.
Within Play project you should use play-ahc-ws which is usually added like so
libraryDependencies += ws
The ws value comes from Play's sbt plugin
addSbtPlugin("com.typesafe.play" % "sbt-plugin" % "2.8.1")
On the other hand, play-ahc-ws-standalone is a HTTP client in its own right which can be used outside Play projects, just how one could use, for example, scalaj-http or requests-scala HTTP clients which are in no way aware of Play.
The difference is documented by Play 2.6 Migration Guide.

easiest way for users to include scalamacros paradise

I have just added some macro annotations to my library. In my library build, I include
addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0" cross CrossVersion.full)
to enable macro paradise.
In my users' projects that consume the macros, I know they need to include scalamacros somehow as well. Right now, in my example project, I do it the same as above. I was wondering if there was a briefer or less complicated way for the users to bring in the macros? For instance, is there some way I can leave off the cross CrossVersion.full? (As the user is probably not cross-compiling.)
That's as simple as it gets, really. Macro paradise versions are published against full Scala versions (like 2.11.8) rather than binary versions (like 2.11). cross CrossVersion.full ensures that the full scalaVersion from a build is appended to the artifact name, so assuming scalaVersion := "2.11.8":
addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0" cross CrossVersion.full)
becomes
addCompilerPlugin("org.scalamacros" % "paradise_2.11.8" % "2.1.0")
It's simple enough to ask your users to include a single line in their build.

using spec2 from Play or from the normal distribution?

I am working in a legacy project where the guys before me use the specs2 from play inserted through
libraryDependencies += specs2 % Test
as well as the normal distribution
libraryDependencies ++= Seq("org.specs2" %% "specs2-core" % "3.8.5" % "test")
I am wandering which is better option, as I want to have just one distribution as don't see the need for both, what is the advantages of one over the other one, as well I want to have just one as the jars are conflicting for be debugged from IDE.
I'm not sure what version of Play! you are on, but looking at 2.5.8, specs2 from Play (play.sbt.PlayImport.specs2) refers to the play-specs2 library, which in turn depends on specs2#3.8.5, but also adds some Play! specific testing logic. Using play.sbt.PlayImport.specs2 as a dependency, will allow your module to use both specs2 and the play-specs2 library.

How to export properties of shared case classes

I am trying to share a case class between server and client. I am using upickle on both ends. The objects and their data are nicely available on both ends.
shared class
case class Foo(var id : Long,var title: Description)
However i need to export the fields of the case class on the client side. I could add the #ExportAll annotation, but that means pulling in the scalajs libraries on the server project.
Is there perhaps a better way of exposing the members id and title to the javascript.
tx.,
The right way to export things to JavaScript is to use the #JSExportAll annotation. You cannot, and should not, pull the Scala.js libraries on the server project, though. For this use case, we have a dedicated JVM artifact, scalajs-stubs, which you can add to your JVM project like this:
libraryDependencies += "org.scala-js" %% "scalajs-stubs" % scalaJSVersion % "provided"
As a "provided" dependency, it will not be present at runtime. But it will allow you to compile the JVM project even though it references JSExportAll.
See also the ScalaDoc of scalajs-stubs.