I'm trying to create a map based on conditions. Here's the general workflow in pseudocode:
def createMyMap(input: String): Map[String, String] = {
val stringArray = input.split(",")
stringArray.map(element => {
if (condition) {
newKey -> newVal
}
}).toMap
}
and I see two compile errors:
No implicits found for parameter ev: Any <:< (T_, U_) in the toMap call
For the method createMyMap
Type mismatch.
Required: scala.Predef.Map [String, String]
Found: scala.collection.immutable.Map[Nothing, Nothing]
This makes sense since the compiler doesn't know how to create the map if the condition isn't fulfilled. For example, if I add this in the method:
if (condition) {
newKey -> new Val
} else {
null
}
then it'll compile. I'm just not too sure how to approach the else - how do I avoid this kind of problem? I'm running into this because I only want to create a map entry if a condition is fulfilled.
It's not clear how newKey and newVal are derived, but here is the template code using collect
def createMyMap(input: String): Map[String, String] =
input.split(",").collect {
case s if <condition> =>
newKey -> newVal
}.to(Map)
e.g.
def createMyMap(input: String): Map[String, String] =
input.split(",").collect {
case s if s.contains('.') =>
s -> "data"
}.to(Map)
You have a few good options that I'll summarize from the comments:
filter + map + toMap
// stringArray: Traversable[String]
// condition: String => Boolean
// transform: String => (String, String)
val result: Map[String, String] = stringArray
.filter(condition)
.map(transform)
.toMap
map + flatten + toMap
val result: Map[String, String] = stringArray
.map { k =>
if (condition(k)) Some(transform(k))
else None
}
.flatten
.toMap
flatMap + toMap
val result: Map[String, String] = stringArray
.flatMap { k =>
if (condition(k)) Some(transform(k))
else None
}
.toMap
collect + toMap
val result: Map[String, String] = stringArray
.collect {
case k if condition(s) => transform(k)
}
.toMap
See documentation.
In short, methods 3 and 4 are especially clean (although all are good IMO). However, the one that is semantically the most readable IMO is 4, which uses collect (with option 1 being a close second).
Want functions
row is assumed to be a nested structure. The output map keys should be the full paths of the field keys (/column names). E.g. if the input structure is {foo: { bar: 1}, bob: "alice"} then the output map should be Map("foo" -> Map("foo.bar" -> 1))
def rowToMap(row: Row): Map[String, Any]
Hoping there is a neat way to do this, if not will likely have to use recursion on row.schema.
Similarly given a nested map, e.g. Map("foo" -> Map("bar" -> 1), "bob" -> "alice") (note we don't need to parse full path).
def mapToRow(map: Map[String, Any]): Row
object RDDOfMapToDataFrame {
def apply(rdd: RDD[Map[String, Any]], schema: StructType)
(implicit sparkSession: SparkSession): DataFrame =
sparkSession.createDataFrame(rdd.map(mapToRow(_, schema.structType)), schema)
def getStructTypeFromStructType(field: String, schema: StructType): StructType =
schema.fields(schema.fieldIndex(field)).dataType.asInstanceOf[StructType]
def getStructTypeFromArrayType(field: String, schema: StructType): StructType =
schema.fields(schema.fieldIndex(field)).dataType.asInstanceOf[ArrayType].elementType.asInstanceOf[StructType]
def mapToRow(m: Map[String, Any], schema: StructType): Row = Row.fromSeq(m.toList.map {
case (key, struct: Map[String, Any]#unchecked) =>
schema.fieldIndex(key) -> mapToRow(struct, getStructTypeFromStructType(key, schema))
// Intellij is confused by this line, please leave as is
case (key, mapList) if mapList.isInstanceOf[TraversableOnce[_]]
&& mapList.asInstanceOf[TraversableOnce[Any]].toSeq.headOption.exists(_.isInstanceOf[Map[_, _]]) =>
schema.fieldIndex(key) ->
mapList.asInstanceOf[TraversableOnce[Any]]
.toSeq
.map(_.asInstanceOf[Map[String, Any]])
.map(mapToRow(_, getStructTypeFromArrayType(key, schema)))
case (key, None) =>
schema.fieldIndex(key) -> null
case (key, Some(other: Map[_, _])) =>
schema.fieldIndex(key) -> mapToRow(other.asInstanceOf[Map[String, Any]], getStructTypeFromStructType(key, schema))
case (key, Some(mapList))
if mapList.isInstanceOf[TraversableOnce[_]]
&& mapList.asInstanceOf[TraversableOnce[Any]].toSeq.headOption.exists(_.isInstanceOf[Map[_, _]]) =>
schema.fieldIndex(key) ->
mapList.asInstanceOf[TraversableOnce[Any]]
.toSeq
.map(_.asInstanceOf[Map[String, Any]])
.map(mapToRow(_, getStructTypeFromArrayType(key, schema)))
case x#(key, Some(other)) =>
schema.fieldIndex(key) -> other
case (key, other) =>
schema.fieldIndex(key) -> other
}.sortBy(_._1).map(_._2))
def rowToMap(row: Row): Map[String, Any] = row.schema.fieldNames.zip(row.toSeq.map {
case row: Row => rowToMap(row)
// Intellij is confused by this line, please leave as is
case seqOfRow#((_: Row) :: _) => seqOfRow.map(_.asInstanceOf[Row]).map(rowToMap)
case any => any
}).toMap
}
I have the following list in input:
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
and I want to write a function in scala using groupBy and foldleft ( just one function) in order to sum up third and fourth colum for lines having the same title(first column here), the wanted output is :
val listOutput1 =
List(
"itemA,CATS,5,5",
"itemB,CATQ,8,11",
"itemC,CARC,5,10"
)
def sumIndex (listIn:List[String]):List[String]={
listIn.map(_.split(",")).groupBy(_(0)).map{
case (title, label) =>
"%s,%s,%d,%d".format(
title,
label.head.apply(1),
label.map(_(2).toInt).sum,
label.map(_(3).toInt).sum)}.toList
}
Kind regards
The logic in your code looks sound, here it is with a case class implemented as that handles edge cases more cleanly:
// represents a 'row' in the original list
case class Item(
name: String,
category: String,
amount: Int,
price: Int
)
// safely converts the row of strings into case class, throws exception otherwise
def stringsToItem(strings: Array[String]): Item = {
if (strings.length != 4) {
throw new Exception(s"Invalid row: ${strings.foreach(print)}; must contain only 4 entries!")
} else {
val n = strings.headOption.getOrElse("N/A")
val cat = strings.lift(1).getOrElse("N/A")
val amt = strings.lift(2).filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
val p = strings.lastOption.filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
Item(n, cat, amt, p)
}
}
// original code with case class and method above used
listInput1.map(_.split(","))
.map(stringsToItem)
.groupBy(_.name)
.map { case (name, items) =>
Item(
name,
category = items.head.category,
amount = items.map(_.amount).sum,
price = items.map(_.price).sum
)
}.toList
You can solve it with a single foldLeft, iterating the input list only once. Use a Map to aggregate the result.
listInput1.map(_.split(",")).foldLeft(Map.empty[String, Int]) {
(acc: Map[String, Int], curr: Array[String]) =>
val label: String = curr(0)
val oldValue: Int = acc.getOrElse(label, 0)
val newValue: Int = oldValue + curr(2).toInt + curr(3).toInt
acc.updated(label, newValue)
}
result: Map(itemA -> 10, itemB -> 19, itemC -> 15)
If you have a list as
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
Then you can write a general function that can be used with foldLeft and reduceLeft as
def accumulateLeft(x: Map[String, Tuple3[String, Int, Int]], y: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and you can call them as
foldLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldLeft(Map.empty[String, Tuple3[String, Int, Int]])(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res0: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
reduceLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceLeft(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res1: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
Similarly you can just interchange the variables in the general function so that it can be used with foldRight and reduceRight as
def accumulateRight(y: Map[String, Tuple3[String, Int, Int]], x: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and calling the function would give you
foldRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldRight(Map.empty[String, Tuple3[String, Int, Int]])(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res2: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
reduceRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceRight(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res3: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
So you don't really need a groupBy and can use any of the foldLeft, foldRight, reduceLeft or reduceRight functions to get your desired output.
I want to write a scala macros that can override field values of case class based on map entries with simple type check.
In case original field type and override value type are compatible set new value otherwise keep original value.
So far I have following code:
import language.experimental.macros
import scala.reflect.macros.Context
object ProductUtils {
def withOverrides[T](entity: T, overrides: Map[String, Any]): T =
macro withOverridesImpl[T]
def withOverridesImpl[T: c.WeakTypeTag](c: Context)
(entity: c.Expr[T], overrides: c.Expr[Map[String, Any]]): c.Expr[T] = {
import c.universe._
val originalEntityTree = reify(entity.splice).tree
val originalEntityCopy = entity.actualType.member(newTermName("copy"))
val originalEntity =
weakTypeOf[T].declarations.collect {
case m: MethodSymbol if m.isCaseAccessor =>
(m.name, c.Expr[T](Select(originalEntityTree, m.name)), m.returnType)
}
val values =
originalEntity.map {
case (name, value, ctype) =>
AssignOrNamedArg(
Ident(name),
{
def reifyWithType[K: WeakTypeTag] = reify {
overrides
.splice
.asInstanceOf[Map[String, Any]]
.get(c.literal(name.decoded).splice) match {
case Some(newValue : K) => newValue
case _ => value.splice
}
}
reifyWithType(c.WeakTypeTag(ctype)).tree
}
)
}.toList
originalEntityCopy match {
case s: MethodSymbol =>
c.Expr[T](
Apply(Select(originalEntityTree, originalEntityCopy), values))
case _ => c.abort(c.enclosingPosition, "No eligible copy method!")
}
}
}
Executed like this:
import macros.ProductUtils
case class Example(field1: String, field2: Int, filed3: String)
object MacrosTest {
def main(args: Array[String]) {
val overrides = Map("field1" -> "new value", "field2" -> "wrong type")
println(ProductUtils.withOverrides(Example("", 0, ""), overrides)) // Example("new value", 0, "")
}
}
As you can see, I've managed to get type of original field and now want to pattern match on it in reifyWithType.
Unfortunately in current implementation I`m getting a warning during compilation:
warning: abstract type pattern K is unchecked since it is eliminated by erasure case Some(newValue : K) => newValue
and a compiler crash in IntelliJ:
Exception in thread "main" java.lang.NullPointerException
at scala.tools.nsc.transform.Erasure$ErasureTransformer$$anon$1.preEraseAsInstanceOf$1(Erasure.scala:1032)
at scala.tools.nsc.transform.Erasure$ErasureTransformer$$anon$1.preEraseNormalApply(Erasure.scala:1083)
at scala.tools.nsc.transform.Erasure$ErasureTransformer$$anon$1.preEraseApply(Erasure.scala:1187)
at scala.tools.nsc.transform.Erasure$ErasureTransformer$$anon$1.preErase(Erasure.scala:1193)
at scala.tools.nsc.transform.Erasure$ErasureTransformer$$anon$1.transform(Erasure.scala:1268)
at scala.tools.nsc.transform.Erasure$ErasureTransformer$$anon$1.transform(Erasure.scala:1018)
at scala.reflect.internal.Trees$class.itransform(Trees.scala:1217)
at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:13)
at scala.reflect.internal.SymbolTable.itransform(SymbolTable.scala:13)
at scala.reflect.api.Trees$Transformer.transform(Trees.scala:2897)
at scala.tools.nsc.transform.TypingTransformers$TypingTransformer.transform(TypingTransformers.scala:48)
at scala.tools.nsc.transform.Erasure$ErasureTransformer$$anon$1.transform(Erasure.scala:1280)
at scala.tools.nsc.transform.Erasure$ErasureTransformer$$anon$1.transform(Erasure.scala:1018)
So the questions are:
* Is it possible to make type comparison of type received in macro to value runtime type?
* Or is there any better approach to solve this task?
After all I ended up with following solution:
import language.experimental.macros
import scala.reflect.macros.Context
object ProductUtils {
def withOverrides[T](entity: T, overrides: Map[String, Any]): T =
macro withOverridesImpl[T]
def withOverridesImpl[T: c.WeakTypeTag](c: Context)(entity: c.Expr[T], overrides: c.Expr[Map[String, Any]]): c.Expr[T] = {
import c.universe._
val originalEntityTree = reify(entity.splice).tree
val originalEntityCopy = entity.actualType.member(newTermName("copy"))
val originalEntity =
weakTypeOf[T].declarations.collect {
case m: MethodSymbol if m.isCaseAccessor =>
(m.name, c.Expr[T](Select(c.resetAllAttrs(originalEntityTree), m.name)), m.returnType)
}
val values =
originalEntity.map {
case (name, value, ctype) =>
AssignOrNamedArg(
Ident(name),
{
val ruClass = c.reifyRuntimeClass(ctype)
val mtag = c.reifyType(treeBuild.mkRuntimeUniverseRef, Select(treeBuild.mkRuntimeUniverseRef, newTermName("rootMirror")), ctype)
val mtree = Select(mtag, newTermName("tpe"))
def reifyWithType[K: c.WeakTypeTag] = reify {
def tryNewValue[A: scala.reflect.runtime.universe.TypeTag](candidate: Option[A]): Option[K] =
if (candidate.isEmpty) {
None
} else {
val cc = c.Expr[Class[_]](ruClass).splice
val candidateValue = candidate.get
val candidateType = scala.reflect.runtime.universe.typeOf[A]
val expectedType = c.Expr[scala.reflect.runtime.universe.Type](mtree).splice
val ok = (cc.isPrimitive, candidateValue) match {
case (true, _: java.lang.Integer) => cc == java.lang.Integer.TYPE
case (true, _: java.lang.Long) => cc == java.lang.Long.TYPE
case (true, _: java.lang.Double) => cc == java.lang.Double.TYPE
case (true, _: java.lang.Character) => cc == java.lang.Character.TYPE
case (true, _: java.lang.Float) => cc == java.lang.Float.TYPE
case (true, _: java.lang.Byte) => cc == java.lang.Byte.TYPE
case (true, _: java.lang.Short) => cc == java.lang.Short.TYPE
case (true, _: java.lang.Boolean) => cc == java.lang.Boolean.TYPE
case (true, _: Unit) => cc == java.lang.Void.TYPE
case _ =>
val args = candidateType.asInstanceOf[scala.reflect.runtime.universe.TypeRefApi].args
if (!args.contains(scala.reflect.runtime.universe.typeOf[Any])
&& !(candidateType =:= scala.reflect.runtime.universe.typeOf[Any]))
candidateType =:= expectedType
else cc.isInstance(candidateValue)
}
if (ok)
Some(candidateValue.asInstanceOf[K])
else None
}
tryNewValue(overrides.splice.get(c.literal(name.decoded).splice)).getOrElse(value.splice)
}
reifyWithType(c.WeakTypeTag(ctype)).tree
}
)
}.toList
originalEntityCopy match {
case s: MethodSymbol =>
c.Expr[T](
Apply(Select(originalEntityTree, originalEntityCopy), values))
case _ => c.abort(c.enclosingPosition, "No eligible copy method!")
}
}
}
It kind of satisfies original requirements:
class ProductUtilsTest extends FunSuite {
case class A(a: String, b: String)
case class B(a: String, b: Int)
case class C(a: List[Int], b: List[String])
case class D(a: Map[Int, String], b: Double)
case class E(a: A, b: B)
test("simple overrides works"){
val overrides = Map("a" -> "A", "b" -> "B")
assert(ProductUtils.withOverrides(A("", ""), overrides) === A("A", "B"))
}
test("simple overrides works 1"){
val overrides = Map("a" -> "A", "b" -> 1)
assert(ProductUtils.withOverrides(B("", 0), overrides) === B("A", 1))
}
test("do not override if types do not match"){
val overrides = Map("a" -> 0, "b" -> List("B"))
assert(ProductUtils.withOverrides(B("", 0), overrides) === B("", 0))
}
test("complex types also works"){
val overrides = Map("a" -> List(1), "b" -> List("A"))
assert(ProductUtils.withOverrides(C(List(0), List("")), overrides) === C(List(1), List("A")))
}
test("complex types also works 1"){
val overrides = Map("a" -> List(new Date()), "b" -> 2.0d)
assert(ProductUtils.withOverrides(D(Map(), 1.0), overrides) === D(Map(), 2.0))
}
test("complex types also works 2"){
val overrides = Map("a" -> A("AA", "BB"), "b" -> 2.0d)
assert(ProductUtils.withOverrides(E(A("", ""), B("", 0)), overrides) === E(A("AA", "BB"), B("", 0)))
}
}
Unfortunatelly because of type erasure in Java/Scala it is hard to force type equality before changing value to new value, so you can do something like this:
scala> case class C(a: List[Int], b: List[String])
defined class C
scala> val overrides = Map("a" -> List(new Date()), "b" -> List(1.0))
overrides: scala.collection.immutable.Map[String,List[Any]] = Map(a -> List(Mon Aug 26 15:52:27 CEST 2013), b -> List(1.0))
scala> ProductUtils.withOverrides(C(List(0), List("")), overrides)
res0: C = C(List(Mon Aug 26 15:52:27 CEST 2013),List(1.0))
scala> res0.a.head + 1
java.lang.ClassCastException: java.util.Date cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
at .<init>(<console>:14)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)