I have two Seqs that I want to use to add columns to a dataframe.
Seq one is something like:
Seq("red", "blue", "green", "yellow", "violet")
and Seq two is something like:
Seq("child", "teen", "adult", "senior")
I also have a column that is a string that is in the format of: s"$color+$age-score=$score", containing every combination of the colors and ages, with a resulting score, so 20 different color-age scores.
Currently, I am doing something like
finalDF.withColumn("red_child", getScore("red", "child"))
.withColumn("red_teen", getScore("red", "teen"))
.withColumn("red_adult", getScore("red", "adult"))
and so on, for all 20 possible combinations, with getScore being a helper function that takes care of the regex.
Since I am using withColumn 20 times, it makes the code very hard to read. I am wondering if there is any way to make this code look more clean, using the two Seqs for color and age to loop and add the columns to the dataframe.
Thanks.
You can simply select additional columns derived from the Tuple list generated using for-comprehension, as shown below:
val colors = Seq("red", "blue", "green", "yellow", "violet")
val ageGroups = Seq("child", "teen", "adult", "senior")
val colPairs = for { c <- colors; a <- ageGroups } yield (c, a)
def getScore(c: String, a: String): Column = ???
df.select( df.columns.map(col) ++ colPairs.map{ case (c, a) =>
getScore(c, a).as(c + "_" + a)
}: _*
)
Alternatively, use foldLeft to traverse the colPairs list to add columns via withColumn:
colPairs.foldLeft(df){ case (accDF, (c, a)) =>
accDF.withColumn(c + "_" + a, getScore(c, a))
}
Related
I have two Scala lists with the same number and type of elements, like so:
val x = List("a", "b", "c")
val y = List("1", "2", "3")
The result I want is as follows:
List("a1", "b2", "c3")
How can this be done in Scala? I could figure this out using mutable structures but I think that would be unidiomatic for Scala.
Combine zip and map:
x zip y map { case (a, b) => a + b }
Strangely enough, this also works:
x zip y map (_.productIterator.mkString)
but I would strongly prefer the first version.
Suppose I'm doing something like the following:
val a = complicatedChainOfSteps("c")
val b = complicatedChainOfSteps("d")
I'm interested in writing code like the following (to reduce code and copy/paste errors):
val Seq(a, b) = Seq("c", "d").map(complicatedChainOfSteps(_))
but having the compiler ensure that the number of elements matches, so the following don't compile:
val Seq(a, b) = Seq("c", "d", "e").map(s => s + s)
val Seq(a, b) = Seq("c").map(s => s + s)
I know that using tuples instead to ensure that the number of elements matches works when performing multiple assignment (e.g., val (a, b) = ("c", "d")), but you cannot map over tuples (which makes sense because they have heterogeneous types).
I also know I can just define my own types for sequence of 2 elements and sequence of 3 elements or whatever, but is there a convenient built in way of doing this? If not, what's the simplest way to define a type that is a sequence of a specific number of elements?
Is it possible to use keys with wildcards for Scala Maps? For example tuples of the form (x,_)? Example:
scala> val t1 = ("x","y")
scala> val t2 = ("y","x")
scala> val m = Map(t1 -> "foo", t2 -> "foo")
scala> m(("x","y"))
res5: String = foo
scala> m(("x",_))
<console>:11: error: missing parameter type for expanded function ((x$1) => scala.Tuple2("x", x$1))
m(("x",_))
^
It would be great if there was a way to retrieve all (composite_key, value) pares where only some part of composite key is defined. Other ways to get the same functionality in Scala?
How about use collect
Map( 1 -> "1" -> "11", 2 -> "2" -> "22").collect { case (k#(1, _), v ) => k -> v }
Using comprehensions like this:
for ( a # ((k1,k2), v) <- m if k1 == "x" ) yield a
In general, you can do something like
m.filter(m => (m._1 == "x"))
but in your particular example it will still return only one result, because a Map has only one entry per key. If your key itself is composite then it will indeed make more sense:
scala> Map((1,2)->"a", (1,3)->"b", (3,4)->"c").filter(m => (m._1._1 == 1))
res0: scala.collection.immutable.Map[(Int, Int),String] = Map((1,2) -> a, (1,3) -> b)
Think about what is happening under the hood of the Map. The default Map in Scala is scala.collection.immutable.HashMap, which stores things based on their hash codes. Do ("x", "y") and ("x", "y2") have hash codes that relate to each other in anyway? No, they don't, and their is no efficient way to implement wildcards with this map. The other answers provide solutions, but these will iterate over key/value pair in the entire Map, which is not efficient.
If you expect you are going to want to do operations like this, use a TreeMap. This doesn't use a hash table internally, put instead puts elements into a tree based on an ordering. This is similar to the way a relational database uses B-Trees for its indices. Your wildcard query is like using a two-column index to filter on the first column in the index.
Here is an example:
import scala.collection.immutable.TreeMap
val t1 = ("x","y")
val t2 = ("x","y2")
val t3 = ("y","x")
val m = TreeMap(t1 -> "foo1", t2 -> "foo2", t3 -> "foo3")
// "" is < than all other strings
// "x\u0000" is the next > string after "x"
val submap = m.from(("x", "")).to(("x\u0000", ""))
submap.values.foreach(println) // prints foo1, foo2
I currently have 2 lists List('a','b','a') and List(45,65,12) with many more elements and elements in 2nd list linked to elements in first list by having a key value relationship. I want combine elements with same keys by adding their corresponding values and create a map which should look like Map('a'-> 57,'b'->65) as 57 = 45 + 12.
I have currently implemented it as
val keys = List('a','b','a')
val values = List(45,65,12)
val finalMap:Map(char:Int) =
scala.collection.mutable.Map().withDefaultValue(0)
0 until keys.length map (w => finalMap(keys(w)) += values(w))
I feel that there should be a better way(functional way) of creating the desired map than how I am doing it. How could I improve my code and do the same thing in more functional way?
val m = keys.zip(values).groupBy(_._1).mapValues(l => l.map(_._2).sum)
EDIT: To explain how the code works, zip pairs corresponding elements of two input sequences, so
keys.zip(values) = List((a, 45), (b, 65), (a, 12))
Now you want to group together all the pairs with the same first element. This can be done with groupBy:
keys.zip(values).groupBy(_._1) = Map((a, List((a, 45), (a, 12))), (b, List((b, 65))))
groupBy returns a map whose keys are the type being grouped on, and whose values are a list of the elements in the input sequence with the same key.
The keys of this map are the characters in keys, and the values are a list of associated pair from keys and values. Since the keys are the ones you want in the output map, you only need to transform the values from List[Char, Int] to List[Int].
You can do this by summing the values from the second element of each pair in the list.
You can extract the values from each pair using map e.g.
List((a, 45), (a, 12)).map(_._2) = List(45,12)
Now you can sum these values using sum:
List(45, 12).sum = 57
You can apply this transform to all the values in the map using mapValues to get the result you want.
I was going to +1 Lee's first version, but mapValues is a view, and ell always looks like one to me. Just not to seem petty.
scala> (keys zip values) groupBy (_._1) map { case (k,v) => (k, (v map (_._2)).sum) }
res0: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57)
Hey, the answer with fold disappeared. You can't blink on SO, the action is so fast.
I'm going to +1 Lee's typing speed anyway.
Edit: to explain how mapValues is a view:
scala> keys.zip(values).groupBy(_._1).mapValues(l => l.map { v =>
| println("OK mapping")
| v._2
| }.sum)
OK mapping
OK mapping
OK mapping
res2: scala.collection.immutable.Map[Char,Int] = Map(b -> 65, a -> 57)
scala> res2('a') // recomputes
OK mapping
OK mapping
res4: Int = 57
Sometimes that is what you want, but often it is surprising. I think there is a puzzler for it.
You were actually on the right track to a reasonably efficient functional solution. If we just switch to an immutable collection and use a fold on a key-value zip, we get:
( Map[Char,Int]() /: (keys,values).zipped ) ( (m,kv) =>
m + ( kv._1 -> ( m.getOrElse( kv._1, 0 ) + kv._2 ) )
)
Or you could use withDefaultValue 0, as you did, if you want the final map to have that default. Note that .zipped is faster than zip because it doesn't create an intermediate collection. And a groupBy would create a number of other intermediate collections. Of course it may not be worth optimizing, and if it is you could do even better than this, but I wanted to show you that your line of thinking wasn't far off the mark.
I am just starting out in Scala and for my first project, I am writing a Sudoku solver. I came across a great site explaining Sudoku and how to go about writing a solver: http://norvig.com/sudoku.html and from this site I am trying to create the corresponding Scala code.
The squares of a Sudoku grid are basically the cross product of the row name and the column name, this can be generated really easily in Python using a list comprehension:
# cross("AB", "12") = ["A1", "A2", "B1", "B2"]
def cross(A, B):
"Cross product of elements in A and elements in B."
return [a+b for a in A for b in B]
It took me awhile to think about how to do this elegantly in Scala, and this is what I came up with:
// cross("AB", "12") => List[String]("A1", "A2", "B1", "B2")
def cross(r: String, c: String) = {
for(i <- r; j <- c) yield i + "" + j
}.toList
I was just curious if there is a better way to doing this in Scala? It would seem much cleaner if I could do yield i + j but that results in an Int for some reason. Any comments or suggestions would be appreciated.
Yes, addition for Char is defined by adding their integer equivalents. I think your code is fine. You could also use string interpolation, and spare the toList (you will get an immutable indexed sequence instead which is just fine):
def cross(r: String, c: String) = for(i <- r; j <- c) yield s"$i$j"
EDIT
An IndexedSeq is at least as powerful as List. Just check your successive usage of the result. Does it require a List? E.g. do you want to use head and tail and pattern match with ::. If not, there is no reason why you should need to enforce List. If you use map and flatMap on the input arguments instead of the syntactic sugar with for, you can use the collection.breakOut argument to directly map to a List:
def cross(r: String, c: String): List[String] =
r.flatMap(i => c.map(j => s"$i$j"))(collection.breakOut)
Not as pretty, but faster than an extra toList.