Map[String,Any] to compact json string using json4s - scala

I am currently extracting some metrics from different data sources and storing them in a map of type Map[String,Any] where the key corresponds to the metric name and the value corresponds to the metric value. I need this to be more or less generic, which means that values types can be primitive types or lists of primitive types.
I would like to serialize this map to a JSON-formatted string and for that I am using json4s library. The thing is that it does not seem possible and I don't see a possible solution for that. I would expect something like the following to work out of the box :)
val myMap: Map[String,Any] = ... // extract metrics
val json = myMap.reduceLeft(_ ~ _) // create JSON of metrics
Navigating through source code I've seen json4s provides implicit conversions in order to transform primitive types to JValue's and also to convert Traversable[A]/Map[String,A]/Option[A] to JValue's (under the restriction of being available an implicit conversion from A to JValue, which I understand it actually means A is a primitive type). The ~ operator offers a nice way of constructing JObject's out of JField's, which is just a type alias for (String, JValue).
In this case, map values type is Any, so implicit conversions don't take place and hence the compiler throws the following error:
value ~ is not a member of (String, Any)
[error] val json = r.reduceLeft(_ ~ _)
Is there a solution for what I want to accomplish?

Since you are actually only looking for the JSON string representation of myMap, you can use the Serialization object directly. Here is a small example (if using the native version of json4s change the import to org.json4s.native.Serialization):
EDIT: added formats implicit
import org.json4s.jackson.Serialization
implicit val formats = org.json4s.DefaultFormats
val m: Map[String, Any] = Map(
"name "-> "joe",
"children" -> List(
Map("name" -> "Mary", "age" -> 5),
Map("name" -> "Mazy", "age" -> 3)
)
)
// prints {"name ":"joe","children":[{"name":"Mary","age":5},{"name":"Mazy","age":3}]}
println(Serialization.write(m))

json4s has method for it.
pretty(render(yourMap))

Related

How to call Scala method with Map Mutable signature?

I have a Scala code:
import collection.mutable._
def myMethod(mycollection: Map[A, B]) = {
...
}
How do I call this method?
Tried this:
myMethod(["test1", "test2"])
Got error:
Identifier expected but 'def' found
Thanks.
A Map is a data structure that maps a key (of some type K) to a value (of some type V). In Scala, such a pair can be denoted by the syntax key -> value. If your intent is to have a single String key "test1" that maps to a String value of "test2", then you can do that as follows:
Map("test1" -> "test2")
Your declaration of myMethod is invalid: you need to either define actual types for A and B or make them generic parameters for your method (so that the method is generic):
// With specific types (assuming both keys and values have String types):
def myMethod(mycollection: Map[String, String]) = //...
// In generic form (allows any key type A, or value type B):
def myMethod[A, B](mycollection: Map[A, B]) = //...
Either way, you can then use the result as the argument in a call to your method as follows:
myMethod(Map("test1" -> "test2"))
Some points to note:
Square brackets are used when defining generic type parameters, or specifying the types used as type parameters.
Type parameters can be inferred from the values supplied. For example Map("test1" -> "test2") uses String as the type for both the key and the value, and is equivalent to Map[String, String]("test1" -> "test2").
If you need more than one key/value pair, list them with a comma separator, for example: Map("key1" -> "value1", "key2" -> "value2", "key3" -> "value3")
I strongly recommend that you read a good book on Scala, such as the excellent Programming in Scala, 3rd Edition by Odersky, Spoon & Venners, in order to become familiar with its syntax and standard library.
As a final point, I would strongly recommend that you use the immutable version of Map whenever possible. If you're not familiar with functional programming principles, this will seem unusual at first, but the benefits are huge.

Scala: list to set using flatMap

I have class with field of type Set[String]. Also, I have list of objects of this class. I'd like to collect all strings from all sets of these objects into one set. Here is how I can do it already:
case class MyClass(field: Set[String])
val list = List(
MyClass(Set("123")),
MyClass(Set("456", "798")),
MyClass(Set("123", "798"))
)
list.flatMap(_.field).toSet // Set(123, 456, 798)
It works, but I think, I can achieve the same using only flatMap, without toSet invocation. I tried this, but it had given compilation error:
// error: Cannot construct a collection of type Set[String]
// with elements of type String based on a collection of type List[MyClass].
list.flatMap[String, Set[String]](_.field)
If I change type of list to Set (i.e., val list = Set(...)), then such flatMap invocation works.
So, can I use somehow Set.canBuildFrom or any other CanBuildFrom object to invoke flatMap on List object, so that I'll get Set as a result?
The CanBuildFrom instance you want is called breakOut and has to be provided as a second parameter:
import scala.collection.breakOut
case class MyClass(field: Set[String])
val list = List(
MyClass(Set("123")),
MyClass(Set("456", "798")),
MyClass(Set("123", "798"))
)
val s: Set[String] = list.flatMap(_.field)(breakOut)
Note that explicit type annotation on variable s is mandatory - that's how the type is chosen.
Edit:
If you're using Scalaz or cats, you can use foldMap as well:
import scalaz._, Scalaz._
list.foldMap(_.field)
This does essentially what mdms answer proposes, except the Set.empty and ++ parts are already baked in.
The way flatMap work in Scala is that it can only remove one wrapper for the same type of wrappers i.e. List[List[String]] -> flatMap -> List[String]
if you apply flatMap on different wrapper data types then you will always get the final outcome as higher level wrapper data type i.e.List[Set[String]] -> flatMap -> List[String]
if you want to apply the flatMap on different wrapper type i.e. List[Set[String]] -> flatMap -> Set[String] in you have 2 options :-
Explicitly cast the one datatype wrapper to another i.e. list.flatMap(_.field).toSet or
By providing implicit converter ie. implicit def listToSet(list: List[String]): Set[String] = list.toSet and the you can get val set:Set[String] = list.flatMap(_.field)
only then what you are trying to achieve will be accomplished.
Conclusion:- if you apply flatMap on 2 wrapped data type then you will always get the final result as op type which is on top of wrapper data type i.e. List[Set[String]] -> flatMap -> List[String] and if you want to convert or cast to different datatype then either you need to implicitly or explicitly cast it.
You could maybe provide a specific CanBuildFrom, but why not to use a fold instead?
list.foldLeft(Set.empty[String]){case (set, myClass) => set ++ myClass.field}
Still just one pass through the collection, and if you are sure the list is not empty, you could even user reduceLeft instead.

Spark SQL UDF returning scala immutable Map with df.WithColumn()

I have case class
case class MyCaseClass(City : String, Extras : Map[String, String])
and user defined function which returns scala.collection.immutable.Map
def extrasUdf = spark.udf.register(
"extras_udf",
(age : Int, name : String) => Map("age" -> age.toString, "name" -> name)
)
but this breaks with Exception:
import spark.implicits._
spark.read.options(...).load(...)
.select('City, 'Age, 'Name)
.withColumn("Extras", extrasUdf('Age, 'Name))
.drop('Age)
.drop('Name)
.as[MyCaseClass]
I should use spark sql's MapType(DataTypes.StringType, DataTypes.IntegerType)
but I can't find any working example...
And this works if I use scala.collection.Map but I need immutable Map
There are many problems with your code:
You are using def extrastUdf =, which creates a function for registering a UDF as opposed to actually creating/registering a UDF. Use val extrasUdf = instead.
You are mixing value types in your map (String and Int), which makes the map be Map[String, Any] as Any is the common superclass of String and Int. Spark does not support Any. You can do at least two things: (a) switch to using a string map (with age.toString, in which case you don't need a UDF as you can simply use map()) or (b) switch to using named structs using named_struct() (again, without the need for a UDF). As a rule, only write a UDF if you cannot do what you need to do with the existing functions. I prefer to look at the Hive documentation because the Spark docs are rather sparse.
Also, keep in mind that type specification in Spark schema (e.g., MapType) is completely different from Scala types (e.g., Map[_, _]) and separate from how types are represented internally and mapped between Scala & Spark data structures. In other words, this has nothing to do with mutable vs. immutable collections.
Hope this helps!

Creating a `Decoder` for arbitrary JSON

I am building a GraphQL endpoint for an API using Finch, Circe and Sangria. The variables that come through in a GraphQL query are basically an arbitrary JSON object (let's assume there's no nesting). So for example, in my test code as Strings, here are two examples:
val variables = List(
"{\n \"foo\": 123\n}",
"{\n \"foo\": \"bar\"\n}"
)
The Sangria API expects a type for these of Map[String, Any].
I've tried a bunch of ways but have so far been unable to write a Decoder for this in Circe. Any help appreciated.
The Sangria API expects a type for these of Map[String, Any]
This is not true. Variables for an execution in sangria can be of an arbitrary type T, the only requirement that you have an instance of InputUnmarshaller[T] type class for it. All marshalling integration libraries provide an instance of InputUnmarshaller for correspondent JSON AST type.
This means that sangria-circe defines InputUnmarshaller[io.circe.Json] and you can import it with import sangria.marshalling.circe._.
Here is a small and self-contained example of how you can use circe Json as a variables:
import io.circe.Json
import sangria.schema._
import sangria.execution._
import sangria.macros._
import sangria.marshalling.circe._
val query =
graphql"""
query ($$foo: Int!, $$bar: Int!) {
add(a: $$foo, b: $$bar)
}
"""
val QueryType = ObjectType("Query", fields[Unit, Unit](
Field("add", IntType,
arguments = Argument("a", IntType) :: Argument("b", IntType) :: Nil,
resolve = c ⇒ c.arg[Int]("a") + c.arg[Int]("b"))))
val schema = Schema(QueryType)
val vars = Json.obj(
"foo" → Json.fromInt(123),
"bar" → Json.fromInt(456))
val result: Future[Json] =
Executor.execute(schema, query, variables = vars)
As you can see in this example, I used io.circe.Json as variables for an execution. The execution would produce following result JSON:
{
"data": {
"add": 579
}
}
Here's a decoder that works.
type GraphQLVariables = Map[String, Any]
val graphQlVariablesDecoder: Decoder[GraphQLVariables] = Decoder.instance { c =>
val variablesString = c.downField("variables").focus.flatMap(_.asString)
val parsedVariables = variablesString.flatMap { str =>
val variablesJsonObject = io.circe.jawn.parse(str).toOption.flatMap(_.asObject)
variablesJsonObject.map(j => j.toMap.transform { (_, value: Json) =>
val transformedValue: Any = value.fold(
(),
bool => bool,
number => number.toDouble,
str => str,
array => array.map(_.toString),
obj => obj.toMap.transform((s: String, json: Json) => json.toString)
)
transformedValue
})
}
parsedVariables match {
case None => left(DecodingFailure(s"Unable to decode GraphQL variables", c.history))
case Some(variables) => right(variables)
}
}
We basically parse the JSON, turn it into a JsonObject, then transform the values within the object fairly simplistically.
Although the above answers work for the specific case of Sangria, I'm interested in the original question: What's the best approach in Circe (which generally assumes that all types are known up front) for dealing with arbitrary chunks of Json?
It's fairly common when encoding/decoding Json that 95% of the Json is specified, but the last 5% is some type of "additional properties" chunk which can be any Json object.
Solutions I've played with:
Encode/Decode the free-form chunk as Map[String,Any]. This means you'll have to introduce implicit encoders/decoders for Map[String, Any], which can be done, but is dangerous as that implicit can be pulled into places you didn't intend.
Encode/Decode the free-form chunk as Map[String, Json]. This is the easiest approach and is supported out of the box in Circe. But now the Json serialization logic has leaked out into your API (often you'll want to keep the Json stuff completely wrapped, so you can swap in other non-json formats later).
Encode/Decode to a String, where the string is required to be a valid Json chunk. At least you haven't locked your API into a specific Json library, but it doesn't feel very nice to have to ask your users to create Json chunks in this manual way.
Create a custom trait hierarchy to hold the data (e.g sealed trait Free; FreeInt(i: Int) extends Free; FreeMap(m: Map[String, Free] extends Free; ...). Now you you can create specific encoders/decoders for it. But what you've really done is replicate the Json type hierarchy that already exists in Circe.
I'm leaning more toward option 3. Since it's the most flexible, and will introduce the least dependencies in the API. But none of them are entirely satisfying. Any other ideas?

Type alias for immutable collections

What is the best way to resolve the compilation error in the example below? Assume that 'm' must be of type GenMap and I do not have control over the arguments of myFun.
import scala.collection.GenMap
object Test {
def myFun(m: Map[Int, String]) = m
val m: GenMap[Int, String] = Map(1 -> "One", 2 -> "two")
//Build error here on m.seq
// Found scala.collection.Map[Int, String]
// Required scala.collection.immutable.Map[Int, String]
val result = myFun(m.seq)
}
EDIT:
I should have been clearer. In my actual use-case I don't have control over myFun, so I have to pass it a Map. The 'm' also arises from another scala component as a GenMap. I need to convert one to another, but there appears to be a conflict between collection.Map and collection.immutable.Map
m.seq.toMap will solve your problem.
According to the signature presented in the API toMap returns a scala.collection.immutable.Map which is said to be required in your error message. scala.collection.Map returned by the seq method is a more general trait which besides being a parent to immutable map is also a parent to the mutable and concurrent map.