In my application I have many places, where I need get a list of tuples, groupBy it by first element of tuple and remove it from the rest. For example, I have tuples
(1, "Joe", "Account"), (1, "Tom", "Employer"), (2, "John", "Account"), and result should be Map(1 -> List(("Joe", "Account"), ("Joe", "Account")), 2 -> List(("John", "Account")))
It is easy implemented as
data.groupBy(_._1).map { case (k, v) => k -> => (f._2, f._3)) }
But I am looking general solution, because I can have tuples with different arities, 2, 3, 4 or even 7.
I think Shapeless or Scalaz can help me, but my experience is low in those libraries, please point to some example
This is easily implemented using shapeless (for simplicity, I won't be generalizing it for all collection types). There's specific type class for tuples that can deconstruct them into head and tail called IsComposite
import shapeless.ops.tuple.IsComposite
def groupTail[P, H, T](tuples: List[P])(
implicit ic: IsComposite.Aux[P, H, T]): Map[H, List[T]] = {
.map { case (k, vs) => (k, }
This works for your case:
val data =
List((1, "Joe", "Account"), (1, "Tom", "Employer"), (2, "John", "Account"))
assert {
groupTail(data) == Map(
1 -> List(("Joe", "Account"), ("Tom", "Employer")),
2 -> List(("John", "Account"))
As well as for Tuple4 of different types:
val data2 = List((1, 1, "a", 'a), (1, 2, "b", 'b), (2, 1, "a", 'b))
assert {
groupTail(data2) == Map(
1 -> List((1, "a", 'a), (2, "b", 'b)),
2 -> List((1, "a", 'b))
Runnable code is available at Scastie
I have a peculiar case where I want to declare simple configuration like so
val config = List((("a", "b", "c"), ("first")),
(("d", "e"), ("second")),
(("f"), ("third")))
which at run time, I would like to have a map, which maps like
"a" -> "first"
"b" -> "first"
"c" -> "first"
"d" -> "second"
"e" -> "second"
"f" -> "third"
Using toMap, I was able to convert the config to a Map
scala> config.toMap
res42: scala.collection.immutable.Map[,String] = Map((a,b,c) -> first, (d,e) -> second, f -> third)
But I am not able to figure out how to flatten the list of keys into keys so I get the final desirable form. How do I solve this?
If you structure your config using List the code is very simple:
val config = List(
(List("a", "b", "c"), ("first")),
(List("d", "e"), ("second")),
(List("f"), ("third")))
config.flatMap{ case (k, v) => -> v) }.toMap
You can try the solution below:
val config = List(
(("a", "b", "c"), ("first")),
(("d", "e"), ("second")),
(("f"), ("third")))
val result = {
case (k,v) =>
k.toString().replace(")", "")
.replace("(", "")
.split(","), v)
val res = {
case (key,value) =>{ data =>
In case you change the config structure to something like below, solution is much more simpler:
val config1 = List (
(List("a", "b", "c"), "first"),
(List("d", "e"), "second"),
(List("f"), "third")
case (k,v) =>{data => (data,v)}
I think the above answers are good practical answers. If you're in a situation where you have no control over the input and you're stuck with Tuples instead of Lists, I'd do it this way:
val result: Map[String, String] = config.flatMap {
case (s: String, v) => List(s -> v)
case (ks: Product, v) => ks.productIterator.collect { case s: String => s -> v }
case _ => Nil //Prevent throwing
This will throw away anything that's not a String in the keys.
by using in built spark sql functions
val config = List((Array("a", "b", "c"), ("first")),
(Array("d", "e"), ("second")),
(Array("f"), ("third"))).toDF(List("col1","col2") : _*)
I wrote following code to merge map and update common keys. Is there any better way to write this
case class Test(index: Int, min: Int, max: Int, aggMin: Int, aggMax: Int)
def mergeMaps(oldMap: Map[Int, Test], newMap: Map[Int, Test]): Map[Int, Test] = {
val intersect: Map[Int, Test] = oldMap.keySet.intersect(newMap.keySet)
.map(indexKey => indexKey -> (Test(newMap(indexKey).index, newMap(indexKey).min, newMap(indexKey).max,
oldMap(indexKey).aggMin.min(newMap(indexKey).aggMin), oldMap(indexKey).aggMax.max(newMap(indexKey).aggMax)))).toMap
val merge = (oldMap ++ newMap ++ intersect)
Here is my test case
it("test my case"){
val oldMap = Map(10 -> Test(10, 1, 2, 1, 2), 25 -> Test(25, 3, 4, 3, 4), 46 -> Test(46, 3, 4, 3, 4), 26 -> Test(26, 1, 2, 1, 2))
val newMap = Map(32 -> Test(32, 5, 6, 5, 6), 26 -> Test(26, 5, 6, 5, 6))
val result = mergeMaps(oldMap, newMap)
//Total elements count should be map 1 elements + map 2 elements
assert(result.size == 5)
//Common key element aggMin and aggMax should be updated, keep min aggMin and max aggMax from 2 common key elements and keep min and max of second map key
assert(result.get(26).get.aggMin == 1)//min aggMin -> min(1,5)
assert(result.get(26).get.aggMax == 6)//max aggMax -> max(2,6)
assert(result.get(26).get.min == 5)// 5 from second map
assert(result.get(26).get.max == 6)//6 from second map
Here's a slightly different take on a solution.
def mergeMaps(oldMap :Map[Int,Test], newMap :Map[Int,Test]) :Map[Int,Test] =
(oldMap.values ++ newMap.values)
.map{ case (k,v) =>
k -> v.reduceLeft((a,b) =>
Test(k, b.min, b.max, a.aggMin min b.aggMin, a.aggMax max b.aggMax))
I could have followed the groupBy() with mapValues() instead of map() but that doesn't result in a pure Map.
Another version to do the same task.
def mergeMaps(oldMap: Map[Int, Test], newMap: Map[Int, Test]): Map[Int, Test] = {
(newMap ++ oldMap).map(key => {
val _newMapData = newMap.get(key._1)
if (_newMapData.isDefined) {
val _newMapDataValue = _newMapData.get
val oldMapValue = key._2
val result = Test(_newMapDataValue.index, _newMapDataValue.min, _newMapDataValue.max,
oldMapValue.aggMin.min(_newMapDataValue.aggMin), oldMapValue.aggMax.max(_newMapDataValue.aggMax))
(key._1 -> result)
} else (key._1 -> key._2)
How can I convert a rather small data frame in spark (max 300 MB) to a nested map in order to improve spark's DAG. I believe this operation will be quicker than a join later on (Spark dynamic DAG is a lot slower and different from hard coded DAG) as the transformed values were created during the train step of a custom estimator. Now I just want to apply them really quick during predict step of the pipeline.
val inputSmall = Seq(
("A", 0.3, "B", 0.25),
("A", 0.3, "g", 0.4),
("d", 0.0, "f", 0.1),
("d", 0.0, "d", 0.7),
("A", 0.3, "d", 0.7),
("d", 0.0, "g", 0.4),
("c", 0.2, "B", 0.25)).toDF("column1", "transformedCol1", "column2", "transformedCol2")
This gives the wrong type of map
val inputToMap = => Map(*))
I would rather want something like:
Map[String, Map[String, Double]]("column1" -> Map("A" -> 0.3, "d" -> 0.0, ...), "column2" -> Map("B" -> 0.25), "g" -> 0.4, ...)
Edit: removed collect operation from final map
If you are using Spark 2+, here's a suggestion:
val inputToMap =
map($"column1", $"transformedCol1").as("column1"),
map($"column2", $"transformedCol2").as("column2")
val cols = inputToMap.columns
val localData = inputToMap.collect { colName =>
colName -> localData.flatMap(_.getAs[Map[String, Double]](colName)).toMap
I'm not sure I follow the motivation, but I think this is the transformation that would get you the result you're after:
// collect from DF (by your assumption - it is small enough)
val data: Array[Row] = inputSmall.collect()
// Create the "column pairs" -
// can be replaced with hard-coded value: List(("column1", "transformedCol1"), ("column2", "transformedCol2"))
val columnPairs: List[(String, String)] = inputSmall.columns
.collect { case Array(k, v) => (k, v) }
// for each pair, get data and group it by left-column's value, choosing first match
val result: Map[String, Map[String, Double]] = columnPairs
.map { case (k, v) => k -> => (r.getAs[String](k), r.getAs[Double](v))) }
.mapValues(l => l.groupBy(_._1).map { case (c, l2) => l2.head })
// prints:
// (column1,Map(A -> 0.3, d -> 0.0, c -> 0.2))
// (column2,Map(d -> 0.7, g -> 0.4, f -> 0.1, B -> 0.25))
Let us consider I hava a collection of eployees as List of Tuples, where t._1 represents department Id, t._2 is salary and t._3 is Name of employee
val eployees = List((1, 8000, "Sally"),(1, 9999, "Tom"), (2, 5000, "Pam"), (4, 500, "NK"), (4, 999, "Robert"))
Expected Result: -((2,5000,Kumar), (4,999,Robert), (1,9999,Ashok))
I am trying with but getting error,
val maxSal1 = => (t._1, (t._2, t._3))).groupBy(a => a._1).map(k => {
k._2.foldLeft(0, "dummy")((aa, bb) => {
if (aa._1 > bb._1) aa else bb
Don't overcomplicate things, avoid doing unnecessary operations, and carrying redundant information around. Just be explicit, and spell out the transformations you need at each step. Simplicity is your friend.
scala> List((1, 8000, "Sally"),(1, 9999, "Tom"), (2, 5000, "Pam"), (4, 500, "NK"), (4, 999, "Robert")).groupBy {
| case (dept, salary, employee) => dept
| }
res6: scala.collection.immutable.Map[Int,List[(Int, Int, String)]] = Map(2 -> List((2,5000,Pam)), 4 -> List((4,500,NK), (4,999,Robert)), 1 -> List((1,8000,Sally), (1,9999,Tom)))
scala> {
| case (dept, employees) => employees.maxBy(_._2)
| }
res5: scala.collection.immutable.Iterable[(Int, Int, String)] = List((2,5000,Pam), (4,999,Robert), (1,9999,Tom))
But note that maxBy is a partial function:
scala> List[Int]().maxBy(x => x)
java.lang.UnsupportedOperationException: empty.maxBy
As a side note, I'd use case class Employee with 3 fields rather than a tuple. I believe it's more readable.
I tried with this option and seems to give result,
val maxsal1 = => (t._1, t._2, t._3)).groupBy(_._1) => t.foldLeft((0, 1, "dummy"))((aa, bb) => {
if (aa._2 > bb._2) aa else bb
Output: List((2,5000,Pam), (4,999,Robert), (1,9999,Tom))
If I have a list like this in scala:
val list = List(
Map("val1" -> 1, "val2" -> 2),
Map("val1" -> 3, "val2" -> 4),
Map("val1" -> 5, "val2" -> 6),
Map("val1" -> 7, "val2" -> 8)
And I like to create another list where elements match certain condition like this:
val newList = list map { el /*match (el("val1") < 5) here*/ =>
el /*if condition is met, add element to new list*/
Then result would be something like this:
Map("val1" -> 1, "val2" -> 2),
Map("val1" -> 3, "val2" -> 4)
Is something like this possible and if so then how? I'd like to make this work from functional programming perspective.
Use list.filter:
val filteredList = list.filter(_("val1") < 5)