Scala. How to create general method that accept tuple with different arities?

Scala. How to create general method that accept tuple with different arities? - scala

In my application I have many places, where I need get a list of tuples, groupBy it by first element of tuple and remove it from the rest. For example, I have tuples
(1, "Joe", "Account"), (1, "Tom", "Employer"), (2, "John", "Account"), and result should be Map(1 -> List(("Joe", "Account"), ("Joe", "Account")), 2 -> List(("John", "Account")))
It is easy implemented as
data.groupBy(_._1).map { case (k, v) => k -> v.map(f => (f._2, f._3)) }
But I am looking general solution, because I can have tuples with different arities, 2, 3, 4 or even 7.
I think Shapeless or Scalaz can help me, but my experience is low in those libraries, please point to some example

This is easily implemented using shapeless (for simplicity, I won't be generalizing it for all collection types). There's specific type class for tuples that can deconstruct them into head and tail called IsComposite
import shapeless.ops.tuple.IsComposite
def groupTail[P, H, T](tuples: List[P])(
implicit ic: IsComposite.Aux[P, H, T]): Map[H, List[T]] = {
tuples
.groupBy(ic.head)
.map { case (k, vs) => (k, vs.map(ic.tail)) }
}
This works for your case:
val data =
List((1, "Joe", "Account"), (1, "Tom", "Employer"), (2, "John", "Account"))
assert {
groupTail(data) == Map(
1 -> List(("Joe", "Account"), ("Tom", "Employer")),
2 -> List(("John", "Account"))
)
}
As well as for Tuple4 of different types:
val data2 = List((1, 1, "a", 'a), (1, 2, "b", 'b), (2, 1, "a", 'b))
assert {
groupTail(data2) == Map(
1 -> List((1, "a", 'a), (2, "b", 'b)),
2 -> List((1, "a", 'b))
)
}
Runnable code is available at Scastie

Related

Scala flatten a map with list as key and string as value

I have a peculiar case where I want to declare simple configuration like so
val config = List((("a", "b", "c"), ("first")),
(("d", "e"), ("second")),
(("f"), ("third")))
which at run time, I would like to have a map, which maps like
"a" -> "first"
"b" -> "first"
"c" -> "first"
"d" -> "second"
"e" -> "second"
"f" -> "third"
Using toMap, I was able to convert the config to a Map
scala> config.toMap
res42: scala.collection.immutable.Map[java.io.Serializable,String] = Map((a,b,c) -> first, (d,e) -> second, f -> third)
But I am not able to figure out how to flatten the list of keys into keys so I get the final desirable form. How do I solve this?

If you structure your config using List the code is very simple:
val config = List(
(List("a", "b", "c"), ("first")),
(List("d", "e"), ("second")),
(List("f"), ("third")))
config.flatMap{ case (k, v) => k.map(_ -> v) }.toMap

You can try the solution below:
val config = List(
(("a", "b", "c"), ("first")),
(("d", "e"), ("second")),
(("f"), ("third")))
val result = config.map {
case (k,v) =>
(
k.toString().replace(")", "")
.replace("(", "")
.split(","), v)
}
val res = result.map {
case (key,value) => key.map{ data =>
(data,value)
}.toList
}.flatten.toMap
In case you change the config structure to something like below, solution is much more simpler:
val config1 = List (
(List("a", "b", "c"), "first"),
(List("d", "e"), "second"),
(List("f"), "third")
)
config1.flatMap{
case (k,v) => k.map{data => (data,v)}
}.toMap

I think the above answers are good practical answers. If you're in a situation where you have no control over the input and you're stuck with Tuples instead of Lists, I'd do it this way:
val result: Map[String, String] = config.flatMap {
case (s: String, v) => List(s -> v)
case (ks: Product, v) => ks.productIterator.collect { case s: String => s -> v }
case _ => Nil //Prevent throwing
}.toMap
This will throw away anything that's not a String in the keys.

by using in built spark sql functions
val config = List((Array("a", "b", "c"), ("first")),
(Array("d", "e"), ("second")),
(Array("f"), ("third"))).toDF(List("col1","col2") : _*)
config.withColumn("exploded",functions.explode_outer($"col1")).drop("col1").show()

Merge scala map and update common key based on condition

I wrote following code to merge map and update common keys. Is there any better way to write this
case class Test(index: Int, min: Int, max: Int, aggMin: Int, aggMax: Int)
def mergeMaps(oldMap: Map[Int, Test], newMap: Map[Int, Test]): Map[Int, Test] = {
val intersect: Map[Int, Test] = oldMap.keySet.intersect(newMap.keySet)
.map(indexKey => indexKey -> (Test(newMap(indexKey).index, newMap(indexKey).min, newMap(indexKey).max,
oldMap(indexKey).aggMin.min(newMap(indexKey).aggMin), oldMap(indexKey).aggMax.max(newMap(indexKey).aggMax)))).toMap
val merge = (oldMap ++ newMap ++ intersect)
merge
}
Here is my test case
it("test my case"){
val oldMap = Map(10 -> Test(10, 1, 2, 1, 2), 25 -> Test(25, 3, 4, 3, 4), 46 -> Test(46, 3, 4, 3, 4), 26 -> Test(26, 1, 2, 1, 2))
val newMap = Map(32 -> Test(32, 5, 6, 5, 6), 26 -> Test(26, 5, 6, 5, 6))
val result = mergeMaps(oldMap, newMap)
//Total elements count should be map 1 elements + map 2 elements
assert(result.size == 5)
//Common key element aggMin and aggMax should be updated, keep min aggMin and max aggMax from 2 common key elements and keep min and max of second map key
assert(result.get(26).get.aggMin == 1)//min aggMin -> min(1,5)
assert(result.get(26).get.aggMax == 6)//max aggMax -> max(2,6)
assert(result.get(26).get.min == 5)// 5 from second map
assert(result.get(26).get.max == 6)//6 from second map
}

Here's a slightly different take on a solution.
def mergeMaps(oldMap :Map[Int,Test], newMap :Map[Int,Test]) :Map[Int,Test] =
(oldMap.values ++ newMap.values)
.groupBy(_.index)
.map{ case (k,v) =>
k -> v.reduceLeft((a,b) =>
Test(k, b.min, b.max, a.aggMin min b.aggMin, a.aggMax max b.aggMax))
}
I could have followed the groupBy() with mapValues() instead of map() but that doesn't result in a pure Map.

Another version to do the same task.
def mergeMaps(oldMap: Map[Int, Test], newMap: Map[Int, Test]): Map[Int, Test] = {
(newMap ++ oldMap).map(key => {
val _newMapData = newMap.get(key._1)
if (_newMapData.isDefined) {
val _newMapDataValue = _newMapData.get
val oldMapValue = key._2
val result = Test(_newMapDataValue.index, _newMapDataValue.min, _newMapDataValue.max,
oldMapValue.aggMin.min(_newMapDataValue.aggMin), oldMapValue.aggMax.max(_newMapDataValue.aggMax))
(key._1 -> result)
} else (key._1 -> key._2)
})
}

Spark dataframe to nested map

How can I convert a rather small data frame in spark (max 300 MB) to a nested map in order to improve spark's DAG. I believe this operation will be quicker than a join later on (Spark dynamic DAG is a lot slower and different from hard coded DAG) as the transformed values were created during the train step of a custom estimator. Now I just want to apply them really quick during predict step of the pipeline.
val inputSmall = Seq(
("A", 0.3, "B", 0.25),
("A", 0.3, "g", 0.4),
("d", 0.0, "f", 0.1),
("d", 0.0, "d", 0.7),
("A", 0.3, "d", 0.7),
("d", 0.0, "g", 0.4),
("c", 0.2, "B", 0.25)).toDF("column1", "transformedCol1", "column2", "transformedCol2")
This gives the wrong type of map
val inputToMap = inputSmall.collect.map(r => Map(inputSmall.columns.zip(r.toSeq):_*))
I would rather want something like:
Map[String, Map[String, Double]]("column1" -> Map("A" -> 0.3, "d" -> 0.0, ...), "column2" -> Map("B" -> 0.25), "g" -> 0.4, ...)

Edit: removed collect operation from final map
If you are using Spark 2+, here's a suggestion:
val inputToMap = inputSmall.select(
map($"column1", $"transformedCol1").as("column1"),
map($"column2", $"transformedCol2").as("column2")
)
val cols = inputToMap.columns
val localData = inputToMap.collect
cols.map { colName =>
colName -> localData.flatMap(_.getAs[Map[String, Double]](colName)).toMap
}.toMap

I'm not sure I follow the motivation, but I think this is the transformation that would get you the result you're after:
// collect from DF (by your assumption - it is small enough)
val data: Array[Row] = inputSmall.collect()
// Create the "column pairs" -
// can be replaced with hard-coded value: List(("column1", "transformedCol1"), ("column2", "transformedCol2"))
val columnPairs: List[(String, String)] = inputSmall.columns
.grouped(2)
.collect { case Array(k, v) => (k, v) }
.toList
// for each pair, get data and group it by left-column's value, choosing first match
val result: Map[String, Map[String, Double]] = columnPairs
.map { case (k, v) => k -> data.map(r => (r.getAs[String](k), r.getAs[Double](v))) }
.toMap
.mapValues(l => l.groupBy(_._1).map { case (c, l2) => l2.head })
result.foreach(println)
// prints:
// (column1,Map(A -> 0.3, d -> 0.0, c -> 0.2))
// (column2,Map(d -> 0.7, g -> 0.4, f -> 0.1, B -> 0.25))

How to Find Out Per key Maximum in scala collection

Let us consider I hava a collection of eployees as List of Tuples, where t._1 represents department Id, t._2 is salary and t._3 is Name of employee
val eployees = List((1, 8000, "Sally"),(1, 9999, "Tom"), (2, 5000, "Pam"), (4, 500, "NK"), (4, 999, "Robert"))
Expected Result: -((2,5000,Kumar), (4,999,Robert), (1,9999,Ashok))
I am trying with but getting error,
val maxSal1 = emps.map(t => (t._1, (t._2, t._3))).groupBy(a => a._1).map(k => {
k._2.foldLeft(0, "dummy")((aa, bb) => {
if (aa._1 > bb._1) aa else bb
})
})

Don't overcomplicate things, avoid doing unnecessary operations, and carrying redundant information around. Just be explicit, and spell out the transformations you need at each step. Simplicity is your friend.
employees.groupBy(_._1).values.map(_.maxBy(_._2))

scala> List((1, 8000, "Sally"),(1, 9999, "Tom"), (2, 5000, "Pam"), (4, 500, "NK"), (4, 999, "Robert")).groupBy {
| case (dept, salary, employee) => dept
| }
res6: scala.collection.immutable.Map[Int,List[(Int, Int, String)]] = Map(2 -> List((2,5000,Pam)), 4 -> List((4,500,NK), (4,999,Robert)), 1 -> List((1,8000,Sally), (1,9999,Tom)))
scala> res6.map {
| case (dept, employees) => employees.maxBy(_._2)
| }
res5: scala.collection.immutable.Iterable[(Int, Int, String)] = List((2,5000,Pam), (4,999,Robert), (1,9999,Tom))
But note that maxBy is a partial function:
scala> List[Int]().maxBy(x => x)
java.lang.UnsupportedOperationException: empty.maxBy
As a side note, I'd use case class Employee with 3 fields rather than a tuple. I believe it's more readable.

I tried with this option and seems to give result,
val maxsal1 = emps1.map(t => (t._1, t._2, t._3)).groupBy(_._1).values.map(t => t.foldLeft((0, 1, "dummy"))((aa, bb) => {
if (aa._2 > bb._2) aa else bb
}))
Output: List((2,5000,Pam), (4,999,Robert), (1,9999,Tom))

How can I build a new collection which consists of elements that satisfy some condition?

If I have a list like this in scala:
val list = List(
Map("val1" -> 1, "val2" -> 2),
Map("val1" -> 3, "val2" -> 4),
Map("val1" -> 5, "val2" -> 6),
Map("val1" -> 7, "val2" -> 8)
)
And I like to create another list where elements match certain condition like this:
val newList = list map { el /*match (el("val1") < 5) here*/ =>
el /*if condition is met, add element to new list*/
}
Then result would be something like this:
List(
Map("val1" -> 1, "val2" -> 2),
Map("val1" -> 3, "val2" -> 4)
)
Is something like this possible and if so then how? I'd like to make this work from functional programming perspective.

Use list.filter:
val filteredList = list.filter(_("val1") < 5)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Scala. How to create general method that accept tuple with different arities? - scala

Related

Scala flatten a map with list as key and string as value

Merge scala map and update common key based on condition

Spark dataframe to nested map

How to Find Out Per key Maximum in scala collection

How can I build a new collection which consists of elements that satisfy some condition?

Categories

Resources