Java function with Map.Class parameter in Scala - scala

I'm trying to use a Jackson's ObjectMapper() function: convertValue.
It takes 2 parameters (3 overloads):
(Object, Call)
(Object, TypeReference)
(Object, JavaType)
I have the following code:
val m = new ObjectMapper()
val map: Map[String, Object] = m.convertValue(bean, classOf[Map])
which doesn't work with error Type Mismatch. Expected JavaType actual Class[Map].
I tested with classOf[java.util.Map], Map.getClass, etc. but can't make it work.
How should I send that parameter?

Step 1: look at https://fasterxml.github.io/jackson-databind/javadoc/2.8/com/fasterxml/jackson/databind/JavaType.html. See
Instances can (only) be constructed by com.fasterxml.jackson.databind.type.TypeFactory.
Step 2: look at https://fasterxml.github.io/jackson-databind/javadoc/2.8/com/fasterxml/jackson/databind/type/TypeFactory.html.
Then you can see it can be used as e.g.
m.getTypeFactory.constructMapType(classOf[java.util.Map[_, _]], classOf[YourKey], classOf[YourValue])

You can use the mapper to get the JavaType, for example:
val stringType:JavaType = mapper.constructType(String.class);
You can try the following for your problem:
val m = new ObjectMapper()
val mapType:JavaType = mapper.constructType(java.util.Map.class)
val map: Map[String, Object] = m.convertValue(bean, mapType)

Related

Cannot splat an Array into function's arguments that accepts varargs

I have tried to make a function that can enrich a given DataFrame with a "session" column using a window function. So I need to use partitionBy and orderBy.
val by_uuid_per_date = Window.partitionBy("uuid").orderBy("year","month","day")
// A Session = A day of events for a certain user. uuid x (year+month+day)
val enriched_df = df
.withColumn("session", dense_rank().over(by_uuid_per_date))
.orderBy("uuid","timestamp")
.select("uuid","year","month","day","session")
This works perfectly, but when I try to make a function that encapsulates this behavior :
PS: I used the _* splat operator.
def enrich_with_session(df:DataFrame,
window_partition_cols:Array[String],
window_order_by_cols:Array[String],
presentation_order_by_cols:Array[String]):DataFrame={
val by_uuid_per_date = Window.partitionBy(window_partition_cols: _*).orderBy(window_order_by_cols: _*)
df.withColumn("session", dense_rank().over(by_uuid_per_date))
.orderBy(presentation_order_by_cols:_*)
.select("uuid","year","month","mday","session")
}
I get the following error:
notebook:6: error: no `: _*' annotation allowed here
(such annotations are only allowed in arguments to -parameters)
val by_uuid_per_date = Window.partitionBy(window_partition_cols: _).orderBy(window_order_by_cols: _*)
partitionBy and orderBy are expecting Seq[Column] or
Array[Column] as arguments, see below:
val data = Seq(
(1,99),
(1,99),
(1,70),
(1,20)
).toDF("id","value")
data.select('id,'value, rank().over(Window.partitionBy('id).orderBy('value))).show()
val partitionBy: Seq[Column] = Seq(data("id"))
val orderBy: Seq[Column] = Seq(data("value"))
data.select('id,'value, rank().over(Window.partitionBy(partitionBy:_*).orderBy(orderBy:_*))).show()
So in this case, your code should looks like this:
def enrich_with_session(df:DataFrame,
window_partition_cols:Array[String],
window_order_by_cols:Array[String],
presentation_order_by_cols:Array[String]):DataFrame={
val window_partition_cols_2: Array[Column] = window_partition_cols.map(df(_))
val window_order_by_cols_2: Array[Column] = window_order_by_cols.map(df(_))
val presentation_order_by_cols_2: Array[Column] = presentation_order_by_cols.map(df(_))
val by_uuid_per_date = Window.partitionBy(window_partition_cols_2: _*).orderBy(window_order_by_cols_2: _*)
df.withColumn("session", dense_rank().over(by_uuid_per_date))
.orderBy(presentation_order_by_cols_2:_*)
.select("uuid","year","month","mday","session")
}

How to convert Java LinkedHashMap to Scala LinkedHashMap?

I'm new to Scala. I've been trying to convert a java LinkedHashMap to an equivalent collection(LinkedHashMap?) in Scala in order to preserve the insertion order.
Tried following things as suggested in other threads, but nothing seems to work!
scalaAsMap() - is messing up the order
TreeMap() - sort on keys, values, etc. is not something I'm looking for
Explicit conversion is not working.
val f = new java.util.LinkedHashMap[String, java.util.Map[String, String]]
var g: scala.collection.mutable.LinkedHashMap[String, java.util.Map[String, String]] = f
Hmm, how about:
val javaMap = new java.util.LinkedHashMap[String, String]()
val scalaMap = javaMap.asScala
The type of scalaMap is Map[String, String] but under the hood it behaves just like LinkedHashMap.

How to convert Spark's TableRDD to RDD[Array[Double]] in Scala?

I am trying to perform Scala operation on Shark. I am creating an RDD as follows:
val tmp: shark.api.TableRDD = sc.sql2rdd("select duration from test")
I need it to convert it to RDD[Array[Double]]. I tried toArray, but it doesn't seem to work.
I also tried converting it to Array[String] and then converting using map as follows:
val tmp_2 = tmp.map(row => row.getString(0))
val tmp_3 = tmp_2.map { row =>
val features = Array[Double] (row(0))
}
But this gives me a Spark's RDD[Unit] which cannot be used in the function. Is there any other way to proceed with this type conversion?
Edit I also tried using toDouble, but this gives me an RDD[Double] type, not RDD[Array[Double]]
val tmp_5 = tmp_2.map(_.toDouble)
Edit 2:
I managed to do this as follows:
A sample of the data:
296.98567000000003
230.84362999999999
212.89751000000001
914.02404000000001
305.55383
A Spark Table RDD was created first.
val tmp = sc.sql2rdd("select duration from test")
I made use of getString to translate it to a RDD[String] and then converted it to an RDD[Array[Double]].
val duration = tmp.map(row => Array[Double](row.getString(0).toDouble))

Iterate over java.util.set in scala and create new java.util.hashset

I have java API which returns java.util.set, I want to iterate over the set till the size-1 and create new java.util.hashset in scala
I tried following :
val keys = CalltoJavaAPI()
val newHashSet = new java.util.HashSet()
val size = keys.size();
newHashSet.add(keys.take(keys.size() - 1))
But I am getting following error:
Caused by: java.lang.UnsupportedOperationException
at java.util.AbstractCollection.add(AbstractCollection.java:221)
Tried Following but still not working
val keys = CalltoJavaAPI().asScala
var newHashSet = new scala.collection.mutable.HashSet[Any]()
newHashSet.add(keys.take(keys.size - 1))
Use scala.collection.JavaConversions for implicit conversions between Scala and Java collections.
In the following approach we convert a Java HashSet onto a Scala Set, extract keys of interest, and convert the result onto a new Java HashSet:
import scala.collection.JavaConversions._
val javaKeys = new java.util.HashSet[Any](CalltoJavaAPI())
val n = javaKeys.size
val scalaSet = javaKeys.toSet.take(n-1)
val newJavaHashSet = new java.util.HashSet[Any]()
newJavaHashSet.addAll(scalaSet)
I think you should use newHashSet.addAll(...) instead of newHashSet.add(...) since keys.take(...) returns a List.
From the docs:
public boolean add(E e): Adds the specified element to this set if it is not already present.
public boolean addAll(Collection c): Adds all of the elements in the specified collection to this collection

Why does the creation of a map in scala not need and allow the new operator?

We create a new map in scala using:
val treasureMap = Map[Int, String]()
But why is it illegal to use the new operator here?
val treasureMap = new Map[Int, String]()
I thought new is for creating new object and in the example above I AM creating a new object.
Map is a trait (like an interface in java) - it's a contract without implementation.
Without new you are using factory method apply of singleton object named Map:
val treasureMap = Map.apply[Int, String]()
In scala you could call an apply method of any object by placing brackets after object name:
val functionIncrement = (_: Int) + 1
functionIncrement(2)
// 3
functionIncrement.apply(2)
// 3
val treasureMap = Map.apply(1 -> "a")
treasureMap(1)
// a
treasureMap.apply(1)
// a