Replace column values when matching keys in a Map - scala

I have a dataframe with a column named source_system that has the values contained in the keys of this Map:
val convertSourceSystem = Map (
"HON_MUREX3FXFI" -> "MX3_FXFI",
"MAD_MUREX3FXFI" -> "MX3_FXFI",
"MEX_MUREX3FXFI" -> "MX3_LT",
"MX3BRASIL" -> "MX3_BR",
"MX3EUROPAEQ_MAD" -> "MX3_EQ",
"MX3EUROPAEQ_POL" -> "MX3_EQ",
"MXEUROPA_MAD" -> "MX2_EU",
"MXEUROPA_PT" -> "MX2_EU",
"MXEUROPA_UK" -> "MX2_EU",
"MXLATAM_CHI" -> "MX2_LT",
"MXLATAM_NEW" -> "MX2_LT",
"MXLATAM_SOV" -> "MX2_LT",
"POR_MUREX3FXFI" -> "MX3_FXFI",
"SHN_MUREX3FXFI" -> "MX3_FXFI",
"UK_MUREX3FXFI" -> "MX3_FXFI",
"SOV_MX3LATAM" -> "MX3_LT"
)
I need to replace them to the short code, and using a foldLeft to do a withColumn is giving me only null values, because its replacing all the values and the last source_system is not in the map:
val ssReplacedDf = irisToCreamSourceSystem.foldLeft(tempDf) { (acc, filter) =>
acc.withColumn("source_system", when( col("source_system").equalTo(lit(filter._1)),
lit(filter._2)))
}

I would suggest another solution by joining the translation table :
// convert Map to a DataFrame
val convertSourceSystemDF = convertSourceSystem.toSeq.toDF("source_system","source_system_short")
tempDf.join(broadcast(convertSourceSystemDF),Seq("source_system"),"left")
// override column with short name, alternatively use withColumnRenamed
.withColumn("source_system",$"source_system_short")
.drop("source_system_short)

Related

How do I take a list of keys that contains the value of the inputted key in the table in Scala

Given the following immutable Map("CAT" -> "ET", "BAT" -> "ET", "DIAMOND" -> "AHND", "HAT" -> "ET"), how do I take a list of keys that contains the value of the inputted key in the table in Scala? If inputted key is not in table, return an empty list.
My Attempt:
val word = "CAT"
val table = Map("CAT" -> "ET", "BAT" -> "ET", "DIAMOND" -> "AHND",
"HAT" -> "ET")
if (table.get(find).isDefined) {
List(table.get(find))
}
Input: "CAT"
Output: List("CAT", "BAT", "HAT")
//"CAT" has value "ET"
//Return list of keys that contains the value of the inputted key in the table
table.keys.filter(table(_) == table("CAT"))
Other option is to use collect to perform the filter and the map in one step.
val target = "CAT"
val table = Map(
"CAT" -> "ET",
"BAT" -> "ET",
"DIAMOND" -> "AHND",
"HAT" -> "ET"
)
table.get(target).map { find =>
table.collect { case(key, value) if (value == find) => key }
}
// res0: Option[scala.collection.immutable.Iterable[String]] = List(CAT, BAT, HAT)
If the Map does not have any key that matches target, you will get a None.
One of the solution:
table.filter {case (_, v) => v.contains(find)}.keys.toList

How to combine all the values with the same key in Scala?

I have a map like :
val programming = Map(("functional", 1) -> "scala", ("functional", 2) -> "perl", ("orientedObject", 1) -> "java", ("orientedObject", 2) -> "C++")
with the same first element of key appearing multiple times.
How to regroup all the values corresponding to the same first element of key ? Which would turn this map into :
Map("functional" -> List("scala","perl"), "orientedObject" -> List("java","C++"))
UPDATE: This answer is based upon your original question. If you need the more complex Map definition, using a tuple as the key, then the other answers will address your requirements. You may still find this approach simpler.
As has been pointed out, you can't actually have multiple keys with the same value in a map. In the REPL, you'll note that your declaration becomes:
scala> val programming = Map("functional" -> "scala", "functional" -> "perl", "orientedObject" -> "java", "orientedObject" -> "C++")
programming: scala.collection.immutable.Map[String,String] = Map(functional -> perl, orientedObject -> C++)
So you end up missing some values. If you make this a List instead, you can get what you want as follows:
scala> val programming = List("functional" -> "scala", "functional" -> "perl", "orientedObject" -> "java", "orientedObject" -> "C++")
programming: List[(String, String)] = List((functional,scala), (functional,perl), (orientedObject,java), (orientedObject,C++))
scala> programming.groupBy(_._1).map(p => p._1 -> p._2.map(_._2)).toMap
res0: scala.collection.immutable.Map[String,List[String]] = Map(functional -> List(scala, perl), orientedObject -> List(java, C++))
Based on your edit, you have a data structure that looks something like this
val programming = Map(("functional", 1) -> "scala", ("functional", 2) -> "perl",
("orientedObject", 1) -> "java", ("orientedObject", 2) -> "C++")
and you want to scrap the numerical indices and group by the string key. Fortunately, Scala provides a built-in that gets you close.
programming groupBy { case ((k, _), _) => k }
This will return a new map which contains submaps of the original, grouped by the key that we return from the "partial" function. But we want a map of lists, so let's ignore the keys in the submaps.
programming groupBy { case ((k, _), _) => k } mapValues { _.values }
This gets us a map of... some kind of Iterable. But we really want lists, so let's take the final step and convert to a list.
programming groupBy { case ((k, _), _) => k } mapValues { _.values.toList }
You should try the .groupBy method
programming.groupBy(_._1._1)
and you will get
scala> programming.groupBy(_._1._1)
res1: scala.collection.immutable.Map[String,scala.collection.immutable.Map[(String, Int),String]] = Map(functional -> Map((functional,1) -> scala, (functional,2) -> perl), orientedObject -> Map((orientedObject,1) -> java, (orientedObject,2) -> C++))
you can now "clean" by doing something like:
scala> res1.mapValues(m => m.values.toList)
res3: scala.collection.immutable.Map[String,List[String]] = Map(functional -> List(scala, perl), orientedObject -> List(java, C++))
Read the csv file and create a map that contains key and list of values.
val fileStream = getClass.getResourceAsStream("/keyvaluepair.csv")
val lines = Source.fromInputStream(fileStream).getLines
var mp = Seq[List[(String, String)]]();
var codeMap=List[(String, String)]();
var res = Map[String,List[String]]();
for(line <- lines )
{
val cols=line.split(",").map(_.trim())
codeMap ++= Map(cols(0)->cols(1))
}
res = codeMap.groupBy(_._1).map(p => p._1 -> p._2.map(_._2)).toMap
Since no one has put in the specific ordering he asked for:
programming.groupBy(_._1._1)
.mapValues(_.toSeq.map { case ((t, i), l) => (i, l) }.sortBy(_._1).map(_._2))

Scala group By First Occurrence

I have the following data:
List(Map("weight"->"1000","type"->"abc","match"->"first"),Ma‌​p("weight"->"1000","‌​type"->"abc","match"‌​->"third"),Map("weig‌​ht"->"182","type"->"‌​bcd","match"->"secon‌​d"),Map("weight"->"1‌​50","type"->"bcd","m‌​atch"->"fourth"),Map‌​("weight"->"40","typ‌​e"->"aaa","match"->"‌​fifth"))
After grouping by "type" i would want results of first "abc" then "bcd" then "aaa"
When I apply group by "type" the resulting map gives first key as aaa whereas I want the first key to be "abc" and all corresponding values.
how can I achieve this?
As mentioned in the comments you need a sorted map, which is not created by a simple group by. What you could do is:
val a = List(Map("weight"->"1000", "type"->"abc","match"->"first"), Map("weight"->"1000","type"->"abc","match"->"third"),Map("weig‌​ht"->"182","type"->"‌​bcd","match"->"secon‌​d"), Map("weight"->"1‌​50","type"->"bcd","m‌​atch"->"fourth"), Map("weight"->"40","type"->"aaa","match"->"‌​fifth"))
val sortedGrouping = SortedMap.empty[String,String] ++ a.groupBy { x => x("type") }
println(sortedGrouping)
What you get (printed) is:
Map(aaa -> List(Map(weight -> 40, type -> aaa, match -> ‌​fifth)), abc -> List(Map(weight -> 1000, type -> abc, match -> first), Map(weight -> 1000, type -> abc, match -> third)), bcd -> List(Map(weight -> 1‌​50, type -> bcd, m‌​atch -> fourth)), ‌​bcd -> List(Map(weig‌​ht -> 182, type -> ‌​bcd, match -> secon‌​d)))
I don't think there is out of the box solution for what you are trying to achieve. Map is used in two ways: get a value by a key, and iterate over it. The first part is unaffected by a sort order, so groupBy works perfectly well. The second part can be achieved by creating a list of keys in required order and then using that list to get key-value pairs from Map in a specific order. For example:
val xs = List(Map("weight"->"1000","type"->"abc","match"->"first"),
Map("weight"->"1000","type"->"abc","match"->"third"),
Map("weight"->"182","type"->"bcd","match"->"second"),
Map("weight"->"150","type"->"bcd","match"->"fourth"),
Map("weight"->"40","type"->"aaa","match"->"‌fifth"))
import collection.mutable.{ListBuffer, Set}
// List of types in the order of appearance in your list
val sortedKeys = xs.map(_("type")).distinct
//> sortedKeys: List[String] = List(abc, bcd, aaa)
Now when you need to iterate you simply do:
val grouped = xs.groupBy(_("type"))
sortedKeys.map(k => (k, grouped(k)))
// List[(String, List[scala.collection.immutable.Map[String,String]])] =
// List(
// (abc,List(Map(weight -> 1000, type -> abc, match -> first),
// Map(weight -> 1000, type -> abc, match -> third))),
// (bcd,List(Map(weight -> 182, type -> bcd, match -> second),
// Map(weight -> 150, type -> bcd, match -> fourth))),
// (aaa,List(Map(weight -> 40, type -> aaa, match -> ‌fifth)))
// )

How to sort a list of maps in scala?

I query data from multiple tables, and each has a customized key. I put the data from these tables into a list of maps and want to sort it by the id value.
What I end up with is:
var g = groups.map(i => Map("id" -> i._1, "job" -> i._2))
var p = people.map(i => Map("id" -> i._1, "job" -> i._2))
var party = g ++ p
Which gives me:
var party = List(
Map(id -> 1, job -> Group1),
Map(id -> 2, job -> Group2),
Map(id -> 2>1, job -> Person1Group2)
Map(id -> 1>1, job -> Person1Group1),
Map(id -> 1>2, job -> Person2Group1)
)
But I want to sort by id so that I have it in an order so that i can populate a tree structure:
var party = List(
Map(id -> 1, job -> Group1),
Map(id -> 1>1, job -> Person1Group1),
Map(id -> 1>2, job -> Person2Group1),
Map(id -> 2, job -> Group2),
Map(id -> 2>1, job -> Person1Group2)
)
How do I do this?
A minor refactoring of the associations in each Map by using case classes may simplify the subsequent coding; consider
case class Item(id: String, job: String)
and so by using (immutable) values,
val g = groups.map(i => Item(i._1, i._2)
val p = people.map(i => Item(i._1, i._2)
Then
(g ++ p).sortBy(_.id)
brings a list of items sorted by id.
If you wish to group jobs by id, consider
(g ++ p).groupBy(_.id)
which delivers a Map from ids onto lists of items with common id. From this Map you can use mapValues to extract the actual jobs.
as hinted above party.sortBy(_("id")) should do it

Creating a Map of a Map

I'm very new to Scala so I apologize for asking stupid questions. I'm coming from scripting languages such as python, perl, etc. that let you get away w/ a lot.
How do I create a map which contains a map? In Python, I can create the following:
{ 'key': { 'data': 'value' }}
...or in perl
%hash = ( 'key' => ( 'data' => 'value' ));
Also, what is the difference between Map and scala.collection.mutable/immutable.Map, or is there a difference?
A slightly more simple way to create a map of maps:
Map("german" -> Map(1 -> "eins", 2 -> "two"),
"english" -> Map(1 -> "one", 2 -> "two"))
This way you do not have to specify the type explicitly. Regarding the difference between immutable and mutable: Once you have created an immutable map, you cannot change it. You can only create a new map based on the old one with some of the elements changed.
In scala you can create a Map, if you want to fill it at creation, this way:
val mapa = Map(key1 -> value1, key2 -> value2)
Another way would be:
var mapb = Map[Key, Value]()
mapb += key1 -> value1
A map of maps could be created this way:
var mapOfMaps = Map[String, Map[Int, String]]()
mapOfMaps += ("english" -> Map(1 -> "one", 2 -> "two"))
mapOfMaps += ("french" -> Map(1 -> "un", 2 -> "deux"))
mapOfMaps += ("german" -> Map(1 -> "eins", 2 -> "zwei"))
Note that the inner Map is immutable in this example.