How is an imported name resolved in Scala? (Spark / Zeppelin) - scala

I have a script running in a paragraph with the Spark interpreter in Zeppelin. It has an import and the name imported can be resolved from the global namespace and also from a function, but not from a method inside a class.
This runs well on my computer's installation of Scala (2.12) but it doesn't work in Zeppelin (Scala 2.11).
import java.util.Calendar
def myFun: String = {
// this works
return Calendar.getInstance.toString
}
class MyClass {
def myFun(): String = {
// this doesn't
return Calendar.getInstance.toString
// this works
return java.util.Calendar.getInstance.toString
}
}
The error message is like:
import java.util.Calendar
myFun: String
<console>:15: error: not found: value Calendar
return Calendar.getInstance.toString
What am I missing?

In 0.8.0 Zeppelin has introduced a new SparkInterpreter I suppose due to which the global imports don't work and the imports have to be scoped within a wrapper.
As a workaround, the property zeppelin.spark.useNew can be set to the value "false". This disables the sparks new interpreter.

Related

Scala can't import a class that Java can

Why I can do this in Java:
import javax.swing.GroupLayout.Group;
but if I do the same in Scala (by using Ammonite), I get this:
value Group is not a member of object javax.swing.GroupLayout possible
cause: maybe a semicolon is missing before `value Group'? import
javax.swing.GroupLayout.Group
Is it due to the fact that Group is a public class derived from a private class called Spring?.
I can import neither SequentialGroup nor ParallelGroup.
Is it a bug in Scala?
I'm using Java 11 and Scala 2.12.10.
Scala 2.13.1 also fails. :-(
I need the import, for defining a generic method that can have a Group parameter, that could be either a ParallelGroup or a SequentialGroup.
I'd like to generate a generic method that takes as a parameter a Group, that could be either a ParallelGroup or a SequientialGroup
That would be a type projection
def method(group: GroupLayout#Group) = ...
or if you also have the layout the group belongs to,
def method(layout: GroupLayout)(group: layout.Group) = ...
or
val layout: GroupLayout = ...
def method(group: layout.Group) = ...

DSE 5.0 Custom Codecs in Scala - Intellij does not compile

I am trying to write a custom codec to convert Cassandra columns of type timestamp to org.joda.time.DateTime.
I am building my project with sbt versions 0.13.13.
I wrote a test that serializes and deserializes a DateTime object. When I run the test via the command line with sbt "test:testOnly *DateTimeCodecTest", the project builds and the test passes.
However, if I try to build the project inside Intellij, I receive the following error:
Error:(17, 22) overloaded method constructor TypeCodec with alternatives:
(x$1: com.datastax.driver.core.DataType,x$2: shade.com.datastax.spark.connector.google.common.reflect.TypeToken[org.joda.time.DateTime])com.datastax.driver.core.TypeCodec[org.joda.time.DateTime] <and>
(x$1: com.datastax.driver.core.DataType,x$2: Class[org.joda.time.DateTime])com.datastax.driver.core.TypeCodec[org.joda.time.DateTime]
cannot be applied to (com.datastax.driver.core.DataType, com.google.common.reflect.TypeToken[org.joda.time.DateTime])
object DateTimeCodec extends TypeCodec[DateTime](DataType.timestamp(), TypeToken.of(classOf[DateTime]).wrap()) {
Here is the codec:
import java.nio.ByteBuffer
import com.datastax.driver.core.exceptions.InvalidTypeException
import com.datastax.driver.core.{ DataType, ProtocolVersion, TypeCodec }
import com.google.common.reflect.TypeToken
import org.joda.time.{ DateTime, DateTimeZone }
/**
* Provides serialization between Cassandra types and org.joda.time.DateTime
*
* Reference for writing custom codecs in Scala:
* https://www.datastax.com/dev/blog/writing-scala-codecs-for-the-java-driver
*/
object DateTimeCodec extends TypeCodec[DateTime](DataType.timestamp(), TypeToken.of(classOf[DateTime]).wrap()) {
override def serialize(value: DateTime, protocolVersion: ProtocolVersion): ByteBuffer = {
if (value == null) return null
val millis: Long = value.getMillis
TypeCodec.bigint().serializeNoBoxing(millis, protocolVersion)
}
override def deserialize(bytes: ByteBuffer, protocolVersion: ProtocolVersion): DateTime = {
val millis: Long = TypeCodec.bigint().deserializeNoBoxing(bytes, protocolVersion)
new DateTime(millis).withZone(DateTimeZone.UTC)
}
// Do we need a formatter?
override def format(value: DateTime): String = value.getMillis.toString
// Do we need a formatter?
override def parse(value: String): DateTime = {
try {
if (value == null ||
value.isEmpty ||
value.equalsIgnoreCase("NULL")) throw new Exception("Cannot produce a DateTime object from empty value")
// Do we need a formatter?
else new DateTime(value)
} catch {
// TODO: Determine the more specific exception that would be thrown in this case
case e: Exception =>
throw new InvalidTypeException(s"""Cannot parse DateTime from "$value"""", e)
}
}
}
and here is the test:
import com.datastax.driver.core.ProtocolVersion
import org.joda.time.{ DateTime, DateTimeZone }
import org.scalatest.FunSpec
class DateTimeCodecTest extends FunSpec {
describe("Serialization") {
it("should serialize between Cassandra types and org.joda.time.DateTime") {
val now = new DateTime().withZone(DateTimeZone.UTC)
val result = DateTimeCodec.deserialize(
// TODO: Verify correct ProtocolVersion for DSE 5.0
DateTimeCodec.serialize(now, ProtocolVersion.V4), ProtocolVersion.V4
)
assertResult(now)(result)
}
}
}
I make extensive use of the debugger within Intellij as well as the ability to quickly run a single test using some hotkeys. Losing the ability to compile within the IDE is almost as bad as losing the ability to compile at all. Any help would be appreciated, and I am more than happy to provide any additional information about my project // environment if anyone needs it.
Edit, update:
The project compiles within IntelliJ if I provide an instance of com.google.common.reflect.TypeToken as opposed to shade.com.datastax.spark.connector.google.common.reflect.TypeToken.
However, this breaks the build within sbt.
You must create a default constructor for DateTimeCodec.
I resolved the issue.
The issue stemmed from conflicting versions of spark-cassandra-connector on the classpath. Both shaded and unshaded versions of the dependency were on the classpath, and removing the shaded dependency fixed the issue.

Using private package classes in Scala repl

I have several classes that are marked as
package com.salil.mypackage
private [mypackage] MyClass{
}
However, I would like to use them in a scala repl. I tried using paste: -raw with code like :
package com.salil.mypackage {
val my = new MyClass()
}
but that fails with :
<console>:1: error: illegal start of definition
any way to access these classes in a repl?
You can use them using :paste -raw in the REPL without a problem, as you've tried. Your issue is that your Scala is invalid.
This definition is invalid syntax, you are missing the class declaration:
private [mypackage] MyClass
The following is also invalid syntax, because you cannot place vals in the root of a package. You can make it work if you use a package object, though.
package com.salil
package object mypackage {
val my = new MyClass()
}
scala> com.salil.mypackage.my
res12: com.salil.mypackage.MyClass = com.salil.mypackage.MyClass#56eae567

snakeyaml and spark results in an inability to construct objects

The following code executes fine in a scala shell given snakeyaml version 1.17
import org.yaml.snakeyaml.Yaml
import org.yaml.snakeyaml.constructor.Constructor
import scala.collection.mutable.ListBuffer
import scala.beans.BeanProperty
class EmailAccount {
#scala.beans.BeanProperty var accountName: String = null
override def toString: String = {
return s"acct ($accountName)"
}
}
val text = """accountName: Ymail Account"""
val yaml = new Yaml(new Constructor(classOf[EmailAccount]))
val e = yaml.load(text).asInstanceOf[EmailAccount]
println(e)
However when running in spark (2.0.0 in this case) the resulting error is:
org.yaml.snakeyaml.constructor.ConstructorException: Can't construct a java object for tag:yaml.org,2002:EmailAccount; exception=java.lang.NoSuchMethodException: EmailAccount.<init>()
in 'string', line 1, column 1:
accountName: Ymail Account
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:350)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
at org.yaml.snakeyaml.Yaml.load(Yaml.java:369)
... 48 elided
Caused by: org.yaml.snakeyaml.error.YAMLException: java.lang.NoSuchMethodException: EmailAccount.<init>()
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.createEmptyJavaBean(Constructor.java:220)
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:190)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:346)
... 53 more
Caused by: java.lang.NoSuchMethodException: EmailAccount.<init>()
at java.lang.Class.getConstructor0(Class.java:2810)
at java.lang.Class.getDeclaredConstructor(Class.java:2053)
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.createEmptyJavaBean(Constructor.java:216)
... 55 more
I launched the scala shell with
scala -classpath "/home/placey/snakeyaml-1.17.jar"
I launched the spark shell with
/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-shell --master local --jars /home/placey/snakeyaml-1.17.jar
Solution
Create a self-contained application and run it using spark-submit instead of using spark-shell.
I've created a minimal project for you as a gist here. All you need to do is put both files (build.sbt and Main.scala) in some directory, then run:
sbt package
in order to create a JAR. The JAR will be in target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar or a similar location. You can get SBT from here if you haven't used it yet. Finally, you can run the project:
/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-submit --class "Main" --master local --jars /home/placey/snakeyaml-1.17.jar target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar
The output should be:
[many lines of Spark's log)]
acct (Ymail Account)
[more lines of Spark's log)]
Explanation
Spark's shell (REPL) transforms all classes you define in it by adding $iw parameter to your constructors. I've explained it here. SnakeYAML expects a zero-parameter constructor for JavaBean-like classes, but there isn't one, so it fails.
You can try this yourself:
scala> class Foo() {}
defined class Foo
scala> classOf[Foo].getConstructors()
res0: Array[java.lang.reflect.Constructor[_]] = Array(public Foo($iw))
scala> classOf[Foo].getConstructors()(0).getParameterCount
res1: Int = 1
As you can see, Spark transforms the constructor by adding a parameter of type $iw.
Alternative solutions
Define your own Constructor
If you really need to get it working in the shell, you could define your own class implementing org.yaml.snakeyaml.constructor.BaseConstructor and make sure that $iw gets passed to constructors, but this is a lot of work (I actually wrote my own Constructor in Scala for security reasons some time ago, so I have some experience with this).
You could also define a custom Constructor hard-coded to instantiate a specific class (EmailAccount in your case) similar to the DiceConstructor shown in SnakeYAML's documentation. This is much easier, but requires writing code for each class you want to support.
Example:
case class EmailAccount(accountName: String)
class EmailAccountConstructor extends org.yaml.snakeyaml.constructor.Constructor {
val emailAccountTag = new org.yaml.snakeyaml.nodes.Tag("!emailAccount")
this.rootTag = emailAccountTag
this.yamlConstructors.put(emailAccountTag, new ConstructEmailAccount)
private class ConstructEmailAccount extends org.yaml.snakeyaml.constructor.AbstractConstruct {
def construct(node: org.yaml.snakeyaml.nodes.Node): Object = {
// TODO: This is fine for quick prototyping in a REPL, but in a real
// application you should probably add type checks.
val mnode = node.asInstanceOf[org.yaml.snakeyaml.nodes.MappingNode]
val mapping = constructMapping(mnode)
val name = mapping.get("accountName").asInstanceOf[String]
new EmailAccount(name)
}
}
}
You can save this as a file and load it in the REPL using :load filename.scala.
Bonus advantage of this solution is that it can create immutable case class instances directly. Unfortunately Scala REPL seems to have issues with imports, so I've used fully qualified names.
Don't use JavaBeans
You can also just parse YAML documents as simple Java maps:
scala> val yaml2 = new Yaml()
yaml2: org.yaml.snakeyaml.Yaml = Yaml:1141996301
scala> val e2 = yaml2.load(text)
e2: Object = {accountName=Ymail Account}
scala> val map = e2.asInstanceOf[java.util.Map[String, Any]]
map: java.util.Map[String,Any] = {accountName=Ymail Account}
scala> map.get("accountName")
res4: Any = Ymail Account
This way SnakeYAML won't need to use reflection.
However, since you're using Scala, I recommend trying
MoultingYAML, which is a Scala wrapper for SnakeYAML. It parses YAML documents to simple Java types and then maps them to Scala types (even your own types like EmailAccount).

private field in object doesn't compile

I tried to run one example from Programming in Scala but compiler gives me error:
Description Resource Path Location Type
illegal start of statement (no modifiers allowed here) ChecksumAcc.sc /HelloWorld/src line 3 Scala Problem
basically complains about private
import scala.collection.mutable.Map
object ChecksumAcc {
private val cache = Map[String, Int]()
}
I'm using Eclipse for Scala worksheet. Same after updating. I believe it uses 2.9.3 scala compiler. Why doesn't it compile?
Not sure what your actual question is, but the Scala worksheet has some special rules (as indicated by the very clear error message...). One thing you can do if you have to use the worksheet, is to put all your code inside a Worksheet object like this:
object Worksheet {
import scala.collection.mutable.Map
object ChecksumAcc {
private val cache = Map[String, Int]()
}
}
Or alternatively, use Eclipse's "New Scala object..." and use that instead of the worksheet.
To avoid the error message you are seeing, when you are working in a Eclipse Scala work sheet
wrap the Class definition and Companion class (Singleton object) in the same object
object worksheet {
class CheckSumAccumulator {
...
}
object CheckSumAccumulator {
...
}
CheckSumAccumulator.calculate("foobar")
}