I am not able to understand how contramap works when writing a case class with a single field as json.
say I have a case class and I want to create its json
case class SomeThingContainer (something:SomeThing)
I would write its write as follows:
implicit val somethingContainerWrites:Writes[SomeThingContainer] = (
(JsPath \ "something").write[Something]
).contramap{(somethingContainer:SomeThingContainer)=>somethingContainer.something}
If I have a model as follows,
val somethingContainerVariable = SomethingContainer(something)
Somewhere in the application, I would call toJson[SomeThingContainer] to convert the model into Json. This would look for implicit Writes[SomeThingContainer] which would be somethingContainerWrites. From here, how is the JSON getting created? I could somewhat understand how unapply _ works if there are multiple fields but I am not able to comprehend how contramap does its magic.
The contramap is used for composing functors.
A Writes is a (contravariant) functor X => JsValue. (it knows how to create a JsValue from a given X)
Apparently, you already have defined a Writes[Something] somewhere (since you're calling JsPath.write[Something] which implicitly requires it).
Inside the parens, you create a new Writes[Something] which just uses the former to write a Something to a different path in a JSON object.
Now, to be able to create a JsValue from a SomeThingContainer all you need to do is to convert the SomeThingContainer to a Something -- since you already have a Writes[Something] at hand -- and use that.
This is what the contramap call does: from the Writes[Something] you have defined it creates a new Writes[SomeThingContainer] which, when given a SomeThingContainer, first calls the given "conversion function" producing a Something. Then, it calls the Writes[Something] with that.
To illustrate, what contramap does:
Writes[X]: Functor[X => JsValue]
C: Y => X
Writes[X].contramap(C) <==> Writes[Y]: Functor[Y => X => JsValue]
I need to be able to register a udf from a string which I will get from a web service, i.e at run time I call a web service to get the scala code which constitutes the udf, compile it and register it as an udf in the spark context. As as example let's say my web service return the following scala code in a json response -
(row: Row, field:String) => {
import scala.util.{Try, Success, Failure}
val index: Int = Try(row.fieldIndex(field)) match {
case Success(_) => 1
case Failure(_) => 0
}
index
})
I want to compile this code on the fly and then register it as an udf. I have already multiple options such as using toolbox, twitter eval util etc. but found that I need to explicity specify the arguments types of the method while creating an instance for ex -
val code =
q"""
(a:String, b:String) => {
a+b
}
"""
val compiledCode = toolBox.compile(code)
val compiledFunc = compiledCode().asInstanceOf[(String, String) => Option[Any]]
This udf takes two strings as arguments hence I need to specify the types while creating the object like
compiledCode().asInstanceOf[(String, String) => Option[Any]]
The other option I explored is
https://stackoverflow.com/a/34371343/1218856
In both the cases I have to know the no of arguments, argument types and the return type before hand to instantiate the code as a method. But in my case as the udfs are created my users, I have no control over the no of arguments and thier types, so I would like to know if there any way I can register the UDF by compiling the scala code with out knowing the argument number and type information.
In a nut shell, I get the code as string, compile it and register it as udf without knowing the type information.
I think you'd be much better off by not trying to generate/execute code directly but defining a different kind of expression language and executing that. Something like ANTLR could help you with writing the grammar of that expression language and generating the parser and the Abstract Syntax Trees. Or even scala's parser combinators. It's of course more work but also a far less risky and error-prone way of allowing custom function execution.
Does Scala have a way to get the contained class(es) of a collection? i.e. if I have:
val foo = Vector[Int]()
Is there a way to get back classOf[Int] from it?
(Just checking the first element doesn't work since it might be empty.)
You can use TypeTag:
import scala.reflect.runtime.universe._
def getType[F[_], A: TypeTag](as: F[A]) = typeOf[A]
val foo = Vector[Int]()
getType(foo)
Not from the collection itself, but if you get it a parameter from a method, you could add an implicit TypeTag to that method to obtain the type at runtime. E.g.
def mymethod[T](x: Vector[T])(implicit tag: TypeTag[T]) = ...
See https://docs.scala-lang.org/.../typetags-manifests.html for details.
Technically you can do it by using TypeTag or Typeable/TypeCase from Shapless library (see link). But I just want to note that all these tricks are really very advanced solutions when there is no any better way get the task done without digging inside type parameters.
All type parameters in scala and java are affected by type erasure on runtime, and if you сatch yourself thinking about extracting these information from the class it might be a good sign to redesign the solution that you are trying to implement.
I'm working on writing some unit tests for my Scala Spark application
In order to do so I need to create different dataframes in my tests. So I wrote a very short DFsBuilder code that basically allows me to add new rows and eventually create the DF. The code is:
class DFsBuilder[T](private val sqlContext: SQLContext, private val columnNames: Array[String]) {
var rows = new ListBuffer[T]()
def add(row: T): DFsBuilder[T] = {
rows += row
this
}
def build() : DataFrame = {
import sqlContext.implicits._
rows.toList.toDF(columnNames:_*) // UPDATE: added :_* because it was accidently removed in the original question
}
}
However the toDF method doesn't compile with a cannot resolve symbol toDF.
I wrote this builder code with generics since I need to create different kinds of DFs (different number of columns and different column types). The way I would like to use it is to define some certain case class in the unit test and use it for the builder
I know this issue somehow relates to the fact that I'm using generics (probably some kind of type erasure issue) but I can't quite put my finger on what the problem is exactly
And so my questions are:
Can anyone show me where the problem is? And also hopefully how to fix it
If this issue cannot be solved this way, could someone perhaps offer another elegant way to create dataframes? (I prefer not to pollute my unit tests with the creation code)
I obviously googled this issue first but only found examples where people forgot to import the sqlContext.implicits method or something about a case class out of scope which is probably not the same issue as I'm having
Thanks in advance
If you look at the signatures of toDF and of SQLImplicits.localSeqToDataFrameHolder (which is the implicit function used) you'll be able to detect two issues:
Type T must be a subclass of Product (the superclass of all case classes, tuples...), and you must provide an implicit TypeTag for it. To fix this - change the declaration of your class to:
class DFsBuilder[T <: Product : TypeTag](...) { ... }
The columnNames argument is not of type Array, it's a "repeated parameter" (like Java's "varargs", see section 4.6.2 here), so you have to convert the array into arguments:
rows.toList.toDF(columnNames: _*)
With these two changes, your code compiles (and works).
I have written a parser which transforms a String to a Seq[String] following some rules. This will be used in a library.
I am trying to transform this Seq[String] to a case class. The case class would be provided by the user (so there is no way to guess what it will be).
I have thought to shapeless library because it seems to implement the good features and it seems mature, but I have no idea to how to proceed.
I have found this question with an interesting answer but I don't find how to transform it for my needs. Indeed, in the answer there is only one type to parse (String), and the library iterates inside the String itself. It probably requires a deep change in the way things are done, and I have no clue how.
Moreover, if possible, I want to make this process as easy as possible for the user of my library. So, if possible, unlike the answer in link above, the HList type would be guess from the case class itself (however according to my search, it seems the compiler needs this information).
I am a bit new to the type system and all these beautiful things, if anyone is able to give me an advice on how to do, I would be very happy!
Kind Regards
--- EDIT ---
As ziggystar requested, here is some possible of the needed signature:
//Let's say we are just parsing a CSV.
#onUserSide
case class UserClass(i:Int, j:Int, s:String)
val list = Seq("1,2,toto", "3,4,titi")
// User transforms his case class to a function with something like:
val f = UserClass.curried
// The function created in 1/ is injected in the parser
val parser = new Parser(f)
// The Strings to convert to case classes are provided as an argument to the parse() method.
val finalResult:Seq[UserClass] = parser.parse(list)
// The transfomation is done in two steps inside the parse() method:
// 1/ first we have: val list = Seq("1,2,toto", "3,4,titi")
// 2/ then we have a call to internalParserImplementedSomewhereElse(list)
// val parseResult is now equal to Seq(Seq("1", "2", "toto"), Seq("3","4", "titi"))
// 3/ finally Shapeless do its magick trick and we have Seq(UserClass(1,2,"toto"), UserClass(3,4,"titi))
#insideTheLibrary
class Parser[A](function:A) {
//The internal parser takes each String provided through argument of the method and transforms each String to a Seq[String]. So the Seq[String] provided is changed to Seq[Seq[String]].
private def internalParserImplementedSomewhereElse(l:Seq[String]): Seq[Seq[String]] = {
...
}
/*
* Class A and B are both related to the case class provided by the user:
* - A is the type of the case class as a function,
* - B is the type of the original case class (can be guessed from type A).
*/
private def convert2CaseClass[B](list:Seq[String]): B {
//do something with Shapeless
//I don't know what to put inside ???
}
def parse(l:Seq[String]){
val parseResult:Seq[Seq[String]] = internalParserImplementedSomewhereElse(l:Seq[String])
val finalResult = result.map(convert2CaseClass)
finalResult // it is a Seq[CaseClassProvidedByUser]
}
}
Inside the library some implicit would be available to convert the String to the correct type as they are guessed by Shapeless (similar to the answered proposed in the link above). Like string.toInt, string.ToDouble, and so on...
May be there are other way to design it. It's just what I have in mind after playing with Shapeless few hours.
This uses a very simple library called product-collecions
import com.github.marklister.collections.io._
case class UserClass(i:Int, j:Int, s:String)
val csv = Seq("1,2,toto", "3,4,titi").mkString("\n")
csv: String =
1,2,toto
3,4,titi
CsvParser(UserClass).parse(new java.io.StringReader(csv))
res28: Seq[UserClass] = List(UserClass(1,2,toto), UserClass(3,4,titi))
And to serialize the other way:
scala> res28.csvIterator.toList
res30: List[String] = List(1,2,"toto", 3,4,"titi")
product-collections is orientated towards csv and a java.io.Reader, hence the shims above.
This answer will not tell you how to do exactly what you want, but it will solve your problem. I think you're overcomplicating things.
What is it you want to do? It appears to me that you're simply looking for a way to serialize and deserialize your case classes - i.e. convert your Scala objects to a generic string format and the generic string format back to Scala objects. Your serialization step presently is something you seem to already have defined, and you're asking about how to do the deserialization.
There are a few serialization/deserialization options available for Scala. You do not have to hack away with Shapeless or Scalaz to do it yourself. Try to take a look at these solutions:
Java serialization/deserialization. The regular serialization/deserialization facilities provided by the Java environment. Requires explicit casting and gives you no control over the serialization format, but it's built in and doesn't require much work to implement.
JSON serialization: there are many libraries that provide JSON generation and parsing for Java. Take a look at play-json, spray-json and Argonaut, for example.
The Scala Pickling library is a more general library for serialization/deserialization. Out of the box it comes with some binary and some JSON format, but you can create your own formats.
Out of these solutions, at least play-json and Scala Pickling use macros to generate serializers and deserializers for you at compile time. That means that they should both be typesafe and performant.