How can I convert a Scala HashSet to a Python set using py4j? - scala

I have an object in Scala that returns a Set[String]:
object MyObject {
val setOfStrings = Set[String]("string1","string2")
}
I can successfully refer to that val from my Python code (using Py4J) with this:
jvm.com.package.MyObject.setOfStrings()
however I can't figure out how to do anything useful with it. I want to convert it into a python set.
I managed to do it using this:
eval(
str(
jvm.com.package.MyObject.setOfStrings()
)
.replace("Set(", "{\"")
.replace(")", "\"}")
.replace(", ", "\", \"")
)
but that's a horrifically brittle way of achieving it and I'm assuming Py4J provides a much better way.
I'm aware that SetConverter can convert a python set to a Java set, but I want to do the opposite. Anyone know how?

You could use a java.util.Set on the Scala side, which will be converted by Py4J into the Python equivalent MutableSet.
If you don't want to use Java collections in your Scala code base, then you might want to convert a scala.collection.immutable.Set into its Java equivalent when you want to communicate with Python.
I didn't run this code, but you could probably do this:
object YourObject {
val setOfStrings = Set[String]("string1","string2")
}
object ViewObject {
import scala.jdk.CollectionConverters._
def setOfStrings: java.util.Set[String] = YourObject.setOfStrings.asJava
}
and use it in Python:
jvm.com.package.ViewObject.setOfStrings()
Note that in this case, you won't be able to modify the collection. If you want to be able to update the view in Python and have the changes reflected back in Scala, then you'll have to use a scala.collection.mutable.Set in your Scala code.

Related

How to get the full class name of a dynamically created class in Scala

I have a situation where I have to get the fully qualified name of a class I generate dynamically in Scala. Here's what I have so far.
import scala.reflect.runtime.universe
import scala.tools.reflect.ToolBox
val tb = universe.runtimeMirror(getClass.getClassLoader).mkToolBox()
val generatedClass = "class Foo { def addOne(i: Int) = i + 1 }"
tb.compile(tb.parse(generatedClass))
val fooClass:String = ???
Clearly this is just a toy example, but I just don't know how to get the fully qualified name of Foo. I tried sticking a package declaration into the code but that threw an error when calling tb.compile.
Does anyone know how to get the fully qualified class name or (even better) to specify the package that Foo gets compiled under?
Thanks
EDIT
After using the solution proposed I was able to get the class name. However, the next step is the register this class to take some actions later. Specifically I'm trying to make use of the UDTRegistration within Apache Spark to handle my own custom UserDefinedTypes. This strategy works fine when I manually create all the types, however, I want to use them to extend other types I may not know about.
After reading this it seems like what I'm trying to do might not be possible using code compiled at runtime using reflection. Maybe a better solution is to use Scala macros, but I'm very new to that area.
You may use define instead of compile to generate new class and get its package
val cls = tb.define(tb.parse(generatedClass).asInstanceOf[universe.ImplDef])
println(cls.fullName) //__wrapper$1$d1de39015284494799acd2875643f78e.Foo

Storing an object to a file

I want to save an object (an instance of a class) to a file. I didn't find any valuable example of it. Do I need to use serialization for it?
How do I do that?
UPDATE:
Here is how I tried to do that
import scala.util.Marshal
import scala.io.Source
import scala.collection.immutable
import java.io._
object Example {
class Foo(val message: String) extends scala.Serializable
val foo = new Foo("qweqwe")
val out = new FileOutputStream("out123.txt")
out.write(Marshal.dump(foo))
out.close
}
First of all, out123.txt contains many extra data and it was in a wrong encoding. My gut tells me there should be another proper way.
On the last ScalaDays Heather introduced a new library which gives a new cool mechanism for serialization - pickling. I think it's would be an idiomatic way in scala to use serialization and just what you want.
Check out a paper on this topic, slides and talk on ScalaDays'13
It is also possible to serialize to and deserialize from JSON using Jackson.
A nice wrapper that makes it Scala friendly is Jacks
JSON has the following advantages
a simple human readable text
a rather efficient format byte wise
it can be used directly by Javascript
and even be natively stored and queried using a DB like Mongo DB
(Edit) Example Usage
Serializing to JSON:
val json = JacksMapper.writeValueAsString[MyClass](instance)
... and deserializing
val obj = JacksMapper.readValue[MyClass](json)
Take a look at Twitter Chill to handle your serialization: https://github.com/twitter/chill. It's a Scala helper for the Kyro serialization library. The documentation/example on the Github page looks to be sufficient for your needs.
Just add my answer here for the convenience of someone like me.
The pickling library, which is mentioned by #4lex1v, only supports Scala 2.10/2.11 but I'm using Scala 2.12. So I'm not able to use it in my project.
And then I find out BooPickle. It supports Scala 2.11 as well as 2.12!
Here's the example:
import boopickle.Default._
val data = Seq("Hello", "World!")
val buf = Pickle.intoBytes(data)
val helloWorld = Unpickle[Seq[String]].fromBytes(buf)
More details please check here.

how to read properties file in scala

I am new to Scala programming and I wanted to read a properties file in Scala.
I can't find any APIs to read a property file in Scala.
Please let me know if there are any API for this or other way to read properties files in Scala.
Beside form Java API, there is a library by Typesafe called config with a good API for working with configuration files of different types.
You will have to do it in similar way you would with with Scala Map to java.util.Map. java.util.Properties extends java.util.HashTable whiche extends java.util.Dictionary.
scala.collection.JavaConverters has functions to convert to and fro from Dictionary to Scala mutable.Map:
val x = new Properties
//load from .properties file here.
import scala.collection.JavaConverters._
scala> x.asScala
res4: scala.collection.mutable.Map[String,String] = Map()
You can then use Map above. To get and retrieve. But if you wish to convert it back to Properties type (to store back etc), you might have to type cast it manually then.
You can just use the Java API.
Consider something along the lines
def getPropertyX: Option[String] = Source.fromFile(fileName)
.getLines()
.find(_.startsWith("propertyX="))
.map(_.replace("propertyX=", ""))

How to save and reuse Scala utility code

What is a nice approach for collecting and using useful Scala utility functions across projects. The focus here on really simple, standalone functions like:
def toBinary(i: Int, digits: Int = 8) =
String.format("%" + digits + "s", i.toBinaryString).replace(' ', '0')
def concat(ss: String*) = ss filter (_.nonEmpty) mkString ", "
concat: (ss: String*)String
This question is basic, I know ;-) but, I've learned that there is always an optimum way to do something. For example, reusing code from within the Scala interactive shell, Idea, Eclipse, with or without SBT, having the library hosed on GitHub, ect, could quickly introduce optimal, and non-optimal approaches to such a simple problem.
You might want to put such methods in a package object.
You could also put them in a normal object and import everything in the object when you need those methods.
object Utilities {
def toBinary(i: Int, digits: Int = 8) = // ...
}
// Import everything in the Utilities object
import Utilities._
If you want it trivially accessible from everywhere, your best bet is actually to stick it inside the scala-library jar. Personally, I have all my custom stuff in /jvm/S.jar (or something like that) and add it to the classpath every time I need it.
Repacking the Scala library jar is really easy--unpack it, move your class hierarchy in, and pack it up again. (You should have it inside some package and/or package object.)

dynamically create class in scala, should I use interpreter?

I want to create a class at run-time in Scala. For now, just consider a simple case where I want to make the equivalent of a java bean with some attributes, I only know these attributes at run time.
How can I create the scala class? I am willing to create from scala source file if there is a way to compile it and load it at run time, I may want to as I sometimes have some complex function I want to add to the class. How can I do it?
I worry that the scala interpreter which I read about is sandboxing the interpreted code that it loads so that it won't be available to the general application hosting the interpreter? If this is the case, then I wouldn't be able to use the dynamically loaded scala class.
Anyway, the question is, how can I dynamically create a scala class at run time and use it in my application, best case is to load it from a scala source file at run time, something like interpreterSource("file.scala") and its loaded into my current runtime, second best case is some creation by calling methods ie. createClass(...) to create it at runtime.
Thanks, Phil
There's not enough information to know the best answer, but do remember that you're running on the JVM, so any techniques or bytecode engineering libraries valid for Java should also be valid here.
There are hundreds of techniques you might use, but the best choice depends totally on your exact use case, as many aren't general purpose. Here's a couple of ideas though:
For a simple bean, you may as well
just use a map, or look into the
DynaBean class from apache commons.
For more advanced behaviour you could
invoke the compiler explicitly and
then grab the resulting .class file
via a classloader (this is largely
how JSPs do it)
A parser and custom DSL fit well in
some cases. As does bean shell
scripting.
Check out the ScalaDays video here: http://days2010.scala-lang.org/node/138/146
which demonstrates the use of Scala as a JSR-223 compliant scripting language.
This should cover most scenarios where you'd want to evaluate Scala at runtime.
You'll also want to look at the email thread here: http://scala-programming-language.1934581.n4.nabble.com/Compiler-API-td1992165.html#a1992165
This contains the following sample code:
// We currently call the compiler directly
// To reduce coupling, we could instead use ant and the scalac ant task
import scala.tools.nsc.{Global, Settings}
import scala.tools.nsc.reporters.ConsoleReporter
{
// called in the event of a compilation error
def error(message: String): Nothing = ...
val settings = new Settings(error)
settings.outdir.value = classesDir.getPath
settings.deprecation.value = true // enable detailed deprecation warnings
settings.unchecked.value = true // enable detailed unchecked warnings
val reporter = new ConsoleReporter(settings)
val compiler = new Global(settings, reporter)
(new compiler.Run).compile(filenames)
reporter.printSummary
if (reporter.hasErrors || reporter.WARNING.count > 0)
{
...
}
}
val mainMethod: Method = {
val urls = Array[URL]( classesDir.toURL )
val loader = new URLClassLoader(urls)
try {
val clazz: Class = loader.loadClass(...)
val method: Method = clazz.getMethod("main", Array[Class]( classOf[Array[String]] ))
if (Modifier.isStatic(method.getModifiers)) {
method
} else {
...
}
} catch {
case cnf: ClassNotFoundException => ...
case nsm: NoSuchMethodException => ...
}
}
mainMethod.invoke(null, Array[Object]( args ))