How to use JDBC from within the Spark/Scala interpreter (REPL)? - scala

I'm attempting to access a database in the Scala interpreter for Spark, but am having no success.
First, I have imported the DriverManager, and I have added my SQL Server JDBC driver to the class path with the following commands:
scala> import java.sql._
import java.sql._
scala> :cp sqljdbc41.jar
The REPL crashes with a long dump message:
Added 'C:\spark\sqljdbc41.jar'. Your new classpath is:
";;C:\spark\bin\..\conf;C:\spark\bin\..\lib\spark-assembly-1.1.1-hadoop2.4.0.jar;;C:\spark\bin\..\lib\datanucleus-api-jdo-3.2.1.jar;C:\spark\bin\..\lib\datanucleus-core-3.2.2.jar;C:\spark\bin\..\lib\datanucleus-rdbms-3.2.1.jar;;C:\spark\sqljdbc41.jar"
Replaying: import java.sql._
error:
while compiling: <console>
during phase: jvm
library version: version 2.10.4
compiler version: version 2.10.4
reconstructed args:
last tree to typer: Apply(constructor $read)
symbol: constructor $read in class $read (flags: <method> <triedcooking>)
symbol definition: def <init>(): $line10.$read
tpe: $line10.$read
symbol owners: constructor $read -> class $read -> package $line10
context owners: class iwC -> package $line10
== Enclosing template or block ==
Template( // val <local $iwC>: <notype>, tree.tpe=$line10.iwC
"java.lang.Object", "scala.Serializable" // parents
ValDef(
private
"_"
<tpt>
<empty>
)
...
== Expanded type of tree ==
TypeRef(TypeSymbol(class $read extends Serializable))
uncaught exception during compilation: java.lang.AssertionError
java.lang.AssertionError: assertion failed: Tried to find '$line10' in 'C:\Users\Username\AppData\Local\Temp\spark-28055904-e7d2-4052-9354-ae3769266cb4' but it is not a directory
That entry seems to have slain the compiler. Shall I replay
your session? I can re-run each line except the last one.
I am able to run a Scala program with the driver and everything works just fine.
How can I initialize my REPL to allow me to access data from SQL Server through JDBC?

It looks like the interactive :cp command does not work in Windows. But I found that if I launch the spark shell using the following command, the JDBC driver is loaded and available:
C:\spark> .\bin\spark-shell --jars sqljdbc41.jar
In this case, I had copied my jar file into the C:\spark folder.
(Also, one can also get help to see other commands available at launch using --help.)

Related

How to differentiate between a script and normal class files in Scala?

In the book, Programming in Scala 5th Edition, the author says the following for two classes:
Neither ChecksumAccumulator.scala nor Summer.scala are scripts, because they end in a definition. A script, by contrast, must end in a result expression.
The ChecksumAccumulator.scala is as follows:
import scala.collection.mutable
class CheckSumAccumulator:
private var sum = 0
def add(b: Byte): Unit = sum += b
def checksum(): Int = ~(sum & 0XFF) + 1
object CheckSumAccumulator:
private val cache = mutable.Map.empty[String, Int]
def calculate(s: String): Int =
if cache.contains(s) then
cache(s)
else
val acc = new CheckSumAccumulator
for c<-s do
acc.add((c >> 8).toByte)
acc.add(c.toByte)
val cs = acc.checksum()
cache += (s -> cs)
cs
whereas the Summer.scala is as follows:
import CheckSumAccumulator.calculate
object Summer:
def main(args: Array[String]): Unit =
for arg <- args do
println(arg + ": " + calculate(arg))
But when I run the Summer.scala file, I get a different error than what mentioned by the author:
➜ learning-scala git:(main) ./scala3-3.0.0-RC3/bin/scala Summer.scala
-- [E006] Not Found Error: /Users/avirals/dev/learning-scala/Summer.scala:1:7 --
1 |import CheckSumAccumulator.calculate
| ^^^^^^^^^^^^^^^^^^^
| Not found: CheckSumAccumulator
longer explanation available when compiling with `-explain`
1 error found
Error: Errors encountered during compilation
➜ learning-scala git:(main)
The author mentioned that the error would be around not having a result expression.
I also tried to compile CheckSumAccumulator only and then run Summer.scala as a script without compiling it:
➜ learning-scala git:(main) ./scala3-3.0.0-RC3/bin/scalac CheckSumAccumulator.scala
➜ learning-scala git:(main) ✗ ./scala3-3.0.0-RC3/bin/scala Summer.scala
<No output, given no input>
➜ learning-scala git:(main) ✗ ./scala3-3.0.0-RC3/bin/scala Summer.scala Summer of love
Summer: -121
of: -213
love: -182
It works.
Obviously, when I compile both, and then run Summer.scala, it works as expected. However, the differentiation of Summer.scala as a script vs normal file is unclear to me.
Let's start top-down...
The most regular way to compile Scala is to use a build tool like SBT/Maven/Mill/Gradle/etc. This build tool will help with a few things: downloading dependencies/libraries, downloading Scala compiler (optional), setting up CLASS_PATH and most importantly running scalac compiler and passing all flags to it. Additionally it can package compiled class files into JARs and other formats and do much more. Most relevant part is CP and compilation flags.
If you strip off the build tool you can compile your project by manually invoking scalac with all required arguments and making sure your working directory matches package structure, i.e. you are in the right directory. This can be tedious because you need to download all libraries manually and make sure they are on the class path.
So far build tool and manual compiler invocation are very similar to what you can also do in Java.
If you want to have an ah-hoc way of running some Scala code there are 2 options. scala let's you run scripts or REPL by simply compiling your uncompiled code before it executes it.
However, there are some caveats. Essentially REPL and shell scripts are the same - Scala wraps your code in some anonymous object and then runs it. This way you can write any expression without having to follow convention of using main function or App trait (which provides main). It will compile the script you are trying to run but will have no idea about imported classes. You can either compile them beforehand or make a large script that contains all code. Of course if it starts getting too large it's time to make a proper project.
So in a sense there is no such thing as script vs normal file because they both contain Scala code. The file you are running with scala is a script if it's an uncompiled code XXX.scala and "normal" compiled class XXX.class otherwise. If you ignore object wrapping I've mentioned above the rest is the same just different steps to compile and run them.
Here is the traditional 2.xxx scala runner code snippet with all possible options:
def runTarget(): Option[Throwable] = howToRun match {
case AsObject =>
ObjectRunner.runAndCatch(settings.classpathURLs, thingToRun, command.arguments)
case AsScript if isE =>
ScriptRunner(settings).runScriptText(combinedCode, thingToRun +: command.arguments)
case AsScript =>
ScriptRunner(settings).runScript(thingToRun, command.arguments)
case AsJar =>
JarRunner.runJar(settings, thingToRun, command.arguments)
case Error =>
None
case _ =>
// We start the repl when no arguments are given.
if (settings.Wconf.isDefault && settings.lint.isDefault) {
// If user is agnostic about -Wconf and -Xlint, enable -deprecation and -feature
settings.deprecation.value = true
settings.feature.value = true
}
val config = ShellConfig(settings)
new ILoop(config).run(settings)
None
}
This is what's getting invoked when you run scala.
In Dotty/Scala3 the idea is similar but split into multiple classes and classpath logic might be different: REPL, Script runner. Script runner invokes repl.

Bug an error when accessing a class that doesn't have access to type information

There are some classes in the library that have an IMPORT that cannot be resolved.
For example, org.scalatest.tools.Framework in ScalaTest.
If I add scalatest as a dependent library, it will be added to the test classpath, but this import will not be resolved in the normal test classpath.
There is no SBT module in the test classpath.
import sbt.testing.{Event => SbtEvent, Framework => SbtFramework, Runner => SbtRunner, Status => SbtStatus, _}
I need to scan for classes under a specific package arrangement in macro and search for classes with specific features.
def collectXxx(targets: List[c.Symbol]) {
targets.filter { x =>
{
x.isModule || (
x.isClass &&
!x.isAbstract &&
x.asClass.primaryConstructor.isMethod
} && x.typeSignature.baseClasses.contains(XxxTag.typeSymbol)
}
}
This will filter to symbols that are object / class and inherit from Xxx.
This will work in most cases, but if there is a class in targets that cannot be compiled as is, such as scalatest, the compiler error cannot be avoided.
The moment baseClasses is accessed, the macro deployment status is set to global error.
[error] <macro>:1:26: Symbol 'type sbt.testing.Framework' is missing from the classpath.
[error] This symbol is required by 'class org.scalatest.tools.Framework'.
[error] Make sure that type Framework is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
[error] A full rebuild may help if 'Framework.class' was compiled against an incompatible version of sbt.testing.
[error] type `fresh$macro$612` = org.scalatest.tools.Framework
[error] ^
If you look at the stack trace in debug mode, I was setting global_error when I accessed each property of StubClassSymbol.
java.lang.Throwable
at scala.reflect.internal.Symbols$StubSymbol.fail(Symbols.scala:3552)
at scala.reflect.internal.Symbols$StubSymbol.info(Symbols.scala:3563)
at scala.reflect.internal.Symbols$StubSymbol.info$(Symbols.scala:3563)
at scala.reflect.internal.Symbols$StubClassSymbol.info(Symbols.scala:3567)
| => cat scala.reflect.internal.Symbols$StubClassSymbol.info(Symbols.scala:3567)
at scala.reflect.internal.Types$TypeRef.baseClasses(Types.scala:2593)
at scala.reflect.internal.Types.computeBaseClasses(Types.scala:1703)
at scala.reflect.internal.Types.computeBaseClasses$(Types.scala:1680)
at scala.reflect.internal.SymbolTable.computeBaseClasses(SymbolTable.scala:28)
at scala.reflect.internal.Types.$anonfun$defineBaseClassesOfCompoundType$2(Types.scala:1781)
at scala.reflect.internal.Types$CompoundType.memo(Types.scala:1651)
at scala.reflect.internal.Types.defineBaseClassesOfCompoundType(Types.scala:1781)
at scala.reflect.internal.Types.defineBaseClassesOfCompoundType$(Types.scala:1773)
at scala.reflect.internal.SymbolTable.defineBaseClassesOfCompoundType(SymbolTable.scala:28)
at scala.reflect.internal.Types$CompoundType.baseClasses(Types.scala:1634)
at refuel.internal.AutoDIExtractor.$anonfun$recursivePackageExplore$3(AutoDIExtractor.scala:119)
I thought of a way to get around this.
Perhaps when the import fails to resolve, that TypeSymbol becomes a StubClassSymbol.
So I parsed the structure of the Symbol that went into error and added a condition to filter it if a StubClassSymbol was found. And this one has worked.
!x.typeSignature.asInstanceOf[ClassInfoTypeApi].parents.exists { pr =>
pr.typeSymbol.isClass &&
pr.typeSymbol.asClass.isInstanceOf[scala.reflect.internal.Symbols#StubClassSymbol]
}
But I think this is really pushy. Is there any other way around it? And I wonder if this really covers all cases.

How to use Gremlin in Scala script?

I'm trying to use Janusgraph in scala script with tinkerpop 3. I use the gremlin.scala library (https://github.com/mpollmeier/gremlin-scala) but I get an error about HNil (see below). How to use gremlin in scala script and Janusgraph ?
import gremlin.scala._
import org.apache.commons.configuration.BaseConfiguration
import org.janusgraph.core.JanusGraphFactory
import org.apache.tinkerpop.gremlin.structure.Graph
object Janus {
def main(args: Array[String]): Unit = {
val conf = new BaseConfiguration()
conf.setProperty("storage.backend","inmemory")
val graph = JanusGraphFactory.open(conf)
val v1 = graph.graph.addV("test")
}
}
Error:(11, 14) Symbol 'type scala.ScalaObject' is missing from the classpath.
This symbol is required by 'trait shapeless.HNil'.
Make sure that type ScalaObject is in your classpath and check for conflicting dependencies with -Ylog-classpath.
A full rebuild may help if 'HNil.class' was compiled against an incompatible version of scala.
val v1 = graph.graph.addV("test")
Not sure what you mean by 'scala script', but it looks like you're missing many (all?) dependencies. Did you have a look at https://github.com/mpollmeier/gremlin-scala-examples/ ? It contains an example setup for janusgraph.

Type mismatch when utilising a case class in a package object

I am receiving the following error when I try to run my code:
Error:(104, 63) type mismatch;
found : hydrant.spark.hydrant.spark.IPPortPair
required: hydrant.spark.(some other)hydrant.spark.IPPortPair
IPPortPair(it.getHeader.getDestinationIP, it.getHeader.getDestinationPort))
My code uses a case class defined in the package object spark to set up the IP/Port map for each connection.
The package object looks like this:
package object spark{
case class IPPortPair(ip:Long,port:Long)
}
And the code using the package object like the below:
package hydrant.spark
import java.io.{File,PrintStream}
object identifyCustomers{
……………
def mapCustomers(suspectTraffic:RDD[Generic])={
suspectTraffic.filter(
it => !it.getHeader.isEmtpy
).map(
it => IPPortPair(it.getHeader.getDestinationIP,it.getHeader.getDestinationPort)
) ^`
}
I am concious about the strange way that my packages are being displayed as the error makes it seem that I am in hydrant.spark.hydrant.spark which does not exist.
I am also using Intellij if that makes a difference.
You need to run sbt clean (or the IntelliJ equivalent). You changed something in the project (e.g. Scala version) and this is how the incompatibility manifests.

Scala + stax compile problem during deploy process

I developed an app in scala-ide (eclipse plugin), no errors or warnings. Now I'm trying to deploy it to the stax cloud:
$ stax deploy
But it fails to compile it:
compile:
[scalac] Compiling 2 source files to /home/gleontiev/workspace/rss2lj/webapp/WEB-INF/classes
error: error while loading FlickrUtils, Scala signature FlickrUtils has wrong version
expected: 4.1
found: 5.0
/home/gleontiev/workspace/rss2lj/src/scala/example/snippet/DisplaySnippet.scala:8: error: com.folone.logic.FlickrUtils does not have a constructor
val dispatcher = new FlickrUtils("8196243#N02")
^
error: error while loading Photo, Scala signature Photo has wrong version
expected: 4.1
found: 5.0
/home/gleontiev/workspace/rss2lj/src/scala/example/snippet/DisplaySnippet.scala:9: error: value link is not a member of com.folone.logic.Photo
val linksGetter = (p:Photo) => p.link
^
/home/gleontiev/workspace/rss2lj/src/scala/example/snippet/DisplaySnippet.scala:15: error: com.folone.logic.FlickrUtils does not have a constructor
val dispatcher = new FlickrUtils("8196243#N02")
^
/home/gleontiev/workspace/rss2lj/src/scala/example/snippet/DisplaySnippet.scala:16: error: value medium1 is not a member of com.folone.logic.Photo
val picsGetter = (p:Photo) => p.medium1
^
/home/gleontiev/workspace/rss2lj/src/scala/example/snippet/RefreshSnippet.scala:12: error: com.folone.logic.FlickrUtils does not have a constructor
val dispatcher = new FlickrUtils("8196243#N02")
^
7 errors found
ERROR: : The following error occurred while executing this line:
/home/gleontiev/workspace/rss2lj/build.xml:61: Compile failed with 7 errors; see the compiler error output for details.
I see two errors, it is complaining about: the first one is FlickrUtils class constructor, which is defined like this:
class FlickrUtils(val userId : String) {
//...
}
The second one is the fact, that two fields are missing from Photo class, which is:
class Photo (val photoId:String, val userId:String, val secret:String, val server:String) {
private val _medium1 = "/sizes/m/in/photostream"
val link = "http://flickr.com/photos/" + userId + "/" + photoId
val medium1 = link + _medium1
}
Seems like stax sdk uses the wrong comliler (?). How do I make it use the right one? If it is not, what is the problem here, and what are some ways to resolve it?
Edit: $ scala -version says
Scala code runner version 2.8.0.final -- Copyright 2002-2010, LAMP/EPFL
I tried compiling everything with scalac manually, puting everything to their places, and running stax deploy afterwards -- same result.
I actually resolved this by moving FlickrUtils and Photo classes to the packages, where snippets originally are, but I still don't get, why it was not able to compile and use them from the other package.