How to use Array in JCommander in Scala - scala

I want to use JCommander to parse args.
I wrote some code:
import com.beust.jcommander.{JCommander, Parameter}
import scala.collection.mutable.ArrayBuffer
object Config {
#Parameter(names = Array("--categories"), required = true)
var categories = new ArrayBuffer[String]
}
object Main {
def main(args: Array[String]): Unit = {
val cfg = Config
JCommander
.newBuilder()
.addObject(cfg)
.build()
.parse(args.toArray: _*)
println(cfg.categories)
}
}
Howewer it fails with
com.beust.jcommander.ParameterException: Could not invoke null
Reason: Can not set static scala.collection.mutable.ArrayBuffer field InterestRulesConfig$.categories to java.lang.String
What am i doing wrong?

JCommander uses knowledge about types in Java to map values to parameters. But Java doesn't have a type scala.collection.mutable.ArrayBuffer. It has a type java.util.List. If you want to use JCommander you have to stick to Java's build-in types.
If you want to use Scala's types use one of Scala's libraries that handle in in more idiomatic manner: scopt or decline.

Working example
import java.util
import com.beust.jcommander.{JCommander, Parameter}
import scala.jdk.CollectionConverters._
object Config {
#Parameter(names = Array("--categories"), required = true)
var categories: java.util.List[Integer] = new util.ArrayList[Integer]()
}
object Hello {
def main(args: Array[String]): Unit = {
val cfg = Config
JCommander
.newBuilder()
.addObject(cfg)
.build()
.parse(args.toArray: _*)
println(cfg.categories)
println(cfg.categories.getClass())
val a = cfg.categories.asScala
for (x <- a) {
println(x.toInt)
println(x.toInt.getClass())
}
}
}

Related

Registering UDF's dynamically using scala reflect not working

I am registering my UDF's dynamically using scala reflect as shown below and this code works fine. However when I try to list spark functions using spark.catalog then I don't see it. Can you please help me understanding what I am missing here:
spark.catalog.listFunctions().foreach{
fun =>
if (fun.name == "ModelIdToModelYear") {
println(fun.name)
}
}
def registerUDF(spark: SparkSession) : Unit = {
val runtimeMirror = scala.reflect.runtime.universe.runtimeMirror(getClass.getClassLoader)
val moduleSymbol = runtimeMirror.moduleSymbol(Class.forName("com.local.practice.udf.UdfModelIdToModelYear"))
val targetMethod = moduleSymbol.typeSignature.members.filter{
x => x.isMethod && x.name.toString == "ModelIdToModelYear"
}.head.asMethod
val function = runtimeMirror.reflect(runtimeMirror.reflectModule(moduleSymbol).instance).reflectMethod(targetMethod)
function(spark.udf)
}
Below is my UDF definition
package com.local.practice.udf
import org.apache.spark.sql.expressions.UserDefinedFunction
import org.apache.spark.sql.functions.udf
//noinspection ScalaStyle
object UdfModelIdToModelYear {
val ModelIdToModelYear: UserDefinedFunction = udf((model_id : String) => {
val numPattern = "(\\d{2})_.+".r
numPattern.findFirstIn(model_id).getOrElse("0").toInt
})
}

json4s, how to deserialize json with FullTypeHints w/o explicitly setting TypeHints

I do specify FullTypeHints before deserialization
def serialize(definition: Definition): String = {
val hints = definition.tasks.map(_.getClass).groupBy(_.getName).values.map(_.head).toList
implicit val formats = Serialization.formats(FullTypeHints(hints))
writePretty(definition)
}
It produces json with type hints, great!
{
"name": "My definition",
"tasks": [
{
"jsonClass": "com.soft.RootTask",
"name": "Root"
}
]
}
Deserialization doesn't work, it ignores "jsonClass" field with type hint
def deserialize(jsonString: String): Definition = {
implicit val formats = DefaultFormats.withTypeHintFieldName("jsonClass")
read[Definition](jsonString)
}
Why should I repeat typeHints using Serialization.formats(FullTypeHints(hints)) for deserialization if hints are in json string?
Can json4s infer them from json?
The deserialiser is not ignoring the type hint field name, it just does not have anything to map it with. This is where the hints come in. Thus, you have to declare and assign your hints list object once again and pass it to the DefaultFormats object either by using the withHints method or by overriding the value when creating a new instance of DefaultFormats. Here's an example using the latter approach.
val hints = definition.tasks.map(_.getClass).groupBy(_.getName).values.map(_.head).toList
implicit val formats: Formats = new DefaultFormats {
outer =>
override val typeHintFieldName = "jsonClass"
override val typeHints = hints
}
I did it this way since I have contract:
withTypeHintFieldName is known in advance
withTypeHintFieldName contains fully qualified class name and it's always case class
def deserialize(jsonString: String): Definition = {
import org.json4s._
import org.json4s.native.JsonMethods._
import org.json4s.JsonDSL._
val json = parse(jsonString)
val classNames: List[String] = (json \\ $$definitionTypes$$ \\ classOf[JString])
val hints: List[Class[_]] = classNames.map(clz => Try(Class.forName(clz)).getOrElse(throw new RuntimeException(s"Can't get class for $clz")))
implicit val formats = Serialization.formats(FullTypeHints(hints)).withTypeHintFieldName($$definitionTypes$$)
read[Definition](jsonString)

udf No TypeTag available for type string

I don't understand a behavior of spark.
I create an udf which returns an Integer like below
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
object Show {
def main(args: Array[String]): Unit = {
val (sc,sqlContext) = iniSparkConf("test")
val testInt_udf = sqlContext.udf.register("testInt_udf", testInt _)
}
def iniSparkConf(appName: String): (SparkContext, SQLContext) = {
val conf = new SparkConf().setAppName(appName)//.setExecutorEnv("spark.ui.port", "4046")
val sc = new SparkContext(conf)
sc.setLogLevel("WARN")
val sqlContext = new SQLContext(sc)
(sc, sqlContext)
}
def testInt() : Int= {
return 2
}
}
I work perfectly but if I change the return type of method test from Int to String
val testString_udf = sqlContext.udf.register("testString_udf", testString _)
def testString() : String = {
return "myString"
}
I get the following error
Error:(34, 43) No TypeTag available for String
val testString_udf = sqlContext.udf.register("testString_udf", testString _)
Error:(34, 43) not enough arguments for method register: (implicit evidence$1: reflect.runtime.universe.TypeTag[String])org.apache.spark.sql.UserDefinedFunction.
Unspecified value parameter evidence$1.
val testString_udf = sqlContext.udf.register("testString_udf", testString _)
here are my embedded jars:
datanucleus-api-jdo-3.2.6
datanucleus-core-3.2.10
datanucleus-rdbms-3.2.9
spark-1.6.1-yarn-shuffle
spark-assembly-1.6.1-hadoop2.6.0
spark-examples-1.6.1-hadoop2.6.0
I am a little bit lost... Do you have any idea?
Since I can't reproduce the issue copy-pasting just your example code into a new file, I bet that in your real code String is actually shadowed by something else. To verify this theory you can try to change you signature to
def testString() : scala.Predef.String = {
return "myString"
}
or
def testString() : java.lang.String = {
return "myString"
}
If this one compiles, search for "String" to see how you shadowed the standard type. If you use IntelliJ Idea, you can try to use "Ctrl+B" (GoTo) to find it out. The most obvious candidate is that you used String as a name of generic type parameter but there might be some other choices.

How do I turn a Scala case class into a mongo Document

I'd like to build a generic method for transforming Scala Case Classes to Mongo Documents.
A promising Document constructor is
fromSeq(ts: Seq[(String, BsonValue)]): Document
I can turn a case class into a Map[String -> Any], but then I've lost the type information I need to use the implicit conversions to BsonValues. Maybe TypeTags can help with this?
Here's what I've tried:
import org.mongodb.scala.bson.BsonTransformer
import org.mongodb.scala.bson.collection.immutable.Document
import org.mongodb.scala.bson.BsonValue
case class Person(age: Int, name: String)
//transform scala values into BsonValues
def transform[T](v: T)(implicit transformer: BsonTransformer[T]): BsonValue = transformer(v)
// turn any case class into a Map[String, Any]
def caseClassToMap(cc: Product) = {
val values = cc.productIterator
cc.getClass.getDeclaredFields.map( _.getName -> values.next).toMap
}
// transform a Person into a Document
def personToDocument(person: Person): Document = {
val map = caseClassToMap(person)
val bsonValues = map.toSeq.map { case (key, value) =>
(key, transform(value))
}
Document.fromSeq(bsonValues)
}
<console>:24: error: No bson implicit transformer found for type Any. Implement or import an implicit BsonTransformer for this type.
(key, transform(value))
def personToDocument(person: Person): Document = {
Document("age" -> person.age, "name" -> person.name)
}
Below code works without manual conversion of an object.
import reactivemongo.api.bson.{BSON, BSONDocument, Macros}
case class Person(name:String = "SomeName", age:Int = 20)
implicit val personHandler = Macros.handler[Person]
val bsonPerson = BSON.writeDocument[Person](Person())
println(s"${BSONDocument.pretty(bsonPerson.getOrElse(BSONDocument.empty))}")
You can use Salat https://github.com/salat/salat. A nice example can be found here - https://gist.github.com/bhameyie/8276017. This is the piece of code that will help you -
import salat._
val dBObject = grater[Artist].asDBObject(artist)
artistsCollection.save(dBObject, WriteConcern.Safe)
I was able to serialize a case class to a BsonDocument using the org.bson.BsonDocumentWriter. The below code runs using scala 2.12 and mongo-scala-driver_2.12 version 2.6.0
My quest for this solution was aided by this answer (where they are trying to serialize in the opposite direction): Serialize to object using scala mongo driver?
import org.mongodb.scala.bson.codecs.Macros
import org.mongodb.scala.bson.codecs.DEFAULT_CODEC_REGISTRY
import org.bson.codecs.configuration.CodecRegistries.{fromRegistries, fromProviders}
import org.bson.codecs.EncoderContext
import org.bson.BsonDocumentWriter
import org.mongodb.scala.bson.BsonDocument
import org.bson.codecs.configuration.CodecRegistry
import org.bson.codecs.Codec
case class Animal(name : String, species: String, genus: String, weight: Int)
object TempApp {
def main(args: Array[String]) {
val jaguar = Animal("Jenny", "Jaguar", "Panthera", 190)
val codecProvider = Macros.createCodecProvider[Animal]()
val codecRegistry: CodecRegistry = fromRegistries(fromProviders(codecProvider), DEFAULT_CODEC_REGISTRY)
val codec = Macros.createCodec[Animal](codecRegistry)
val encoderContext = EncoderContext.builder.isEncodingCollectibleDocument(true).build()
var doc = BsonDocument()
val writr = new BsonDocumentWriter(doc) // need to call new since Java lib w/o companion object
codec.encode(writr, jaguar, encoderContext)
print(doc)
}
};

Casbah: No implicit view available error

In a Play app, using Salat and Casbah, I am trying to de-serialize a DBObject into an object of type Task, but I am getting this error when calling .asObject:
No implicit view available from com.mongodb.casbah.Imports.DBObject =>
com.mongodb.casbah.Imports.MongoDBObject. Error occurred in an
application involving default arguments.
The object is serialized correctly with .asDBObject, and written to the database as expected.
What is causing this behaviour, and what can be done to solve it? Here's the model involved:
package models
import db.{MongoFactory, MongoConnection}
import com.novus.salat._
import com.novus.salat.global._
import com.novus.salat.annotations._
import com.mongodb.casbah.Imports._
import com.mongodb.casbah.commons.Imports._
import play.api.Play
case class Task(label: String, _id: ObjectId=new ObjectId)
object Task {
implicit val ctx = new Context {
val name = "Custom_Classloader"
}
ctx.registerClassLoader(Play.classloader(Play.current))
val taskCollection = MongoFactory.database("tasks")
def create(label: String): Task = {
val task = new Task(label)
val dbObject = grater[Task].asDBObject(task)
taskCollection.save(dbObject)
grater[Task].asObject(dbObject)
}
def all(): List[Task] = {
val results = taskCollection.find()
val tasks = for (item <- results) yield grater[Task].asObject(item)
tasks.toList
}
}
Versions
casbah: "2.8.1"
scala: "2.11.6"
salat: "1.9.9"
Instructions on creating a custom context:
First, define a custom context as
implicit val ctx = new Context { /* custom behaviour */ }
in a package object
Stop importing com.novus.salat.global._
Import your own custom context everywhere instead.
Source: https://github.com/novus/salat/wiki/CustomContext