val adjList = Map("Logging" -> List("Networking", "Game"))
// val adjList: Map[String, List[String]] = Map(Logging -> List(Networking, Game))
adjList.flatMap { case (v, vs) => vs.map(n => (v, n)) }.toList
// val res7: List[(String, String)] = List((Logging,Game))
adjList.map { case (v, vs) => vs.map(n => (v, n)) }.flatten.toList
// val res8: List[(String, String)] = List((Logging,Networking), (Logging,Game))
I am not sure what is happening here. I was expecting the same result from both of them.
.flatMap is Map's .flatMap, but .map is Iterable's .map.
For a Map "Logging" -> "Networking" and "Logging" -> "Game" become just the latter "Logging" -> "Game" because the keys are the same.
val adjList: Map[String, List[String]] = Map("Logging" -> List("Networking", "Game"))
val x0: Map[String, String] = adjList.flatMap { case (v, vs) => vs.map(n => (v, n)) }
//Map(Logging -> Game)
val x: List[(String, String)] = x0.toList
//List((Logging,Game))
val adjList: Map[String, List[String]] = Map("Logging" -> List("Networking", "Game"))
val y0: immutable.Iterable[List[(String, String)]] = adjList.map { case (v, vs) => vs.map(n => (v, n)) }
//List(List((Logging,Networking), (Logging,Game)))
val y1: immutable.Iterable[(String, String)] = y0.flatten
//List((Logging,Networking), (Logging,Game))
val y: List[(String, String)] = y1.toList
//List((Logging,Networking), (Logging,Game))
Also https://users.scala-lang.org/t/map-flatten-flatmap/4180
Related
I am trying to reverse a map that has a String as the key and a set of numbers as its value
My goal is to create a list that contains a tuple of a number and a list of strings that had the same number in the value set
I have this so far:
def flipMap(toFlip: Map[String, Set[Int]]): List[(Int, List[String])] = {
toFlip.flatMap(_._2).map(x => (x, toFlip.keys.toList)).toList
}
but it is only assigning every String to every Int
val map = Map(
"A" -> Set(1,2),
"B" -> Set(2,3)
)
should produce:
List((1, List(A)), (2, List(A, B)), (3, List(B)))
but is producing:
List((1, List(A, B)), (2, List(A, B)), (3, List(A, B)))
This works to, but it's not exactly what you might need and you may need some conversions to get the exact data type you need:
toFlip.foldLeft(Map.empty[Int, Set[String]]) {
case (acc, (key, numbersSet)) =>
numbersSet.foldLeft(acc) {
(updatingMap, newNumber) =>
updatingMap.updatedWith(newNumber) {
case Some(existingSet) => Some(existingSet + key)
case None => Some(Set(key))
}
}
}
I used Set to avoid duplicate key insertions in the the inner List, and used Map for better look up instead of the outer List.
You can do something like this:
def flipMap(toFlip: Map[String, Set[Int]]): List[(Int, List[String])] =
toFlip
.toList
.flatMap {
case (key, values) =>
values.map(value => value -> key)
}.groupMap(_._1)(_._2)
.view
.mapValues(_.distinct)
.toList
Note, I personally would return a Map instead of a List
Or if you have cats in scope.
def flipMap(toFlip: Map[String, Set[Int]]): Map[Int, Set[String]] =
toFlip.view.flatMap {
case (key, values) =>
values.map(value => Map(value -> Set(key)))
}.toList.combineAll
// both scala2 & scala3
scala> map.flatten{ case(k, s) => s.map(v => (k, v)) }.groupMapReduce{ case(k, v) => v }{case(k, v) => List(k)}{ _ ++ _ }
val res0: Map[Int, List[String]] = Map(1 -> List(A), 2 -> List(A, B), 3 -> List(B))
// scala3 only
scala> map.flatten((k, s) => s.map(v => (k, v))).groupMapReduce((k, v) => v)((k, v) => List(k))( _ ++ _ )
val res1: Map[Int, List[String]] = Map(1 -> List(A), 2 -> List(A, B), 3 -> List(B))
val valueCountsMap: mutable.Map[String, Int] = mutable.Map[String, Int]()
valueCountsMap("a") = 1
valueCountsMap("b") = 1
valueCountsMap("c") = 1
val maxOccurredValueNCount: (String, Int) = valueCountsMap.maxBy(_._2)
// maxOccurredValueNCount: (String, Int) = (b,1)
How can I get None if there's no clear winner when I do maxBy values? I am wondering if there's any native solution already implemented within scala mutable Maps.
No, there's no native solution for what you've described.
Here's how I might go about it.
implicit class UniqMax[K,V:Ordering](m: Map[K,V]) {
def uniqMaxByValue: Option[(K,V)] = {
m.headOption.fold(None:Option[(K,V)]){ hd =>
val ev = implicitly[Ordering[V]]
val (count, max) = m.tail.foldLeft((1,hd)) {case ((c, x), v) =>
if (ev.gt(v._2, x._2)) (1, v)
else if (v._2 == x._2) (c+1, x)
else (c, x)
}
if (count == 1) Some(max) else None
}
}
}
Usage:
Map("a"->11, "b"->12, "c"->11).uniqMaxByValue //res0: Option[(String, Int)] = Some((b,12))
Map(2->"abc", 1->"abx", 0->"ab").uniqMaxByValue //res1: Option[(Int, String)] = Some((1,abx))
Map.empty[Long,Boolean].uniqMaxByValue //res2: Option[(Long, Boolean)] = None
Map('c'->2.2, 'w'->2.2, 'x'->2.1).uniqMaxByValue //res3: Option[(Char, Double)] = None
This is my code
import org.apache.spark.SparkContext..
def main(args: Array[String]): Unit = {
val conf = new sparkConf().setMaster("local").setAppname("My app")
val sc = new SparkContext(conf_
val inputfile = "D:/test.txt"
val inputData = sc.textFile(inputFile)
val DupleRawData = inputData.map(_.split("\\<\\>").toList)
.map(s => (s(8),s(18)))
.map(s => (s, 1))
.reduceByKey(_ + _)
val UserShopCount = DupleRawData.groupBy(s => s._1._1)
.map(s => (s._1, s._2.toList.sortBy(z => z._2).reverse))
val ResultSet = UserShopCount.map(s => (s._1, s._2.take(1000).map(z => z._1._2, z._2))))
ResultSet.foreach(println)
//(aaa,List((100,4), (200,4), (300,3), (800,1)))
//(bbb,List((100,6), (400,5), (500,4)))
//(ccc,List((300,7), (400,6), (700,3)))
// here now I reach..
}
and this is the result I'm getting:
(aaa,List((100,4), (200,4), (300,3), (800,1)))
(bbb,List((100,6), (400,5), (500,4)))
(ccc,List((300,7), (400,6), (700,3)))
I want to final result set RDD is
// val ResultSet: org.apache.spark.rdd.RDD[(String, List[(String, Int)])]
(aaa, List(200,4), (800,1)) // because key of bbb and ccc except 100,300
(bbb, List((500,4)) // because aaa and ccc key except 100,400
(ccc, List((700,3)) // because aaa and bbb key except 300,400
please give me a solution or advice...sincerely
Here is my attempt:
val data: Seq[(String, List[(Int, Int)])] = Seq(
("aaa",List((1,4), (2,4), (3,3), (8,1))),
("bbb",List((1,6), (4,5), (5,4))),
("ccc",List((3,7), (6,6), (7,3)))
)
val uniqKeys = data.flatMap {
case (_, v) => {
v.map(_._1)
}
} groupBy(identity(_)) filter (_._2.size == 1)
val result = data.map {
case (pk, v) => val finalValue = v.filter {
case (k, _) => !uniqKeys.contains(k)
}
(pk, finalValue)
}
Output:
result: Seq[(String, List[(Int, Int)])] = List((aaa,List((1,4), (3,3))), (bbb,List((1,6))), (ccc,List((3,7))))
I am assuming your ResultSet is an RDD[String, List[(Int, Int)]]
val zeroVal1: (Long, String, (Int, Int)) = (Long.MaxValue, "", (0, 0))
val zeroVal2: List[(String, (Int, Int))] = List()
val yourNeededRdd = ResultSet
.zipWithIndex()
.flatMap({
((key, list), index) => list.map(t => (t._1, (index, key, t)))
})
.aggregateByKey(zeroVal1)(
(t1, t2) => { if (t1._1 <= t2._1) t1 else t2 },
(t1, t2) => { if (t1._1 <= t2._1) t1 else t2 }
)
.map({ case (t_1, (index, key, t)) => (key, t) })
.aggregateByKey(zeroVal2)(
(l, t) => { t :: l },
(l1, l2) => { l1 ++ l2 }
)
I have an an RDD that looks like:
uidProcessedKeywords: org.apache.spark.rdd.RDD[(Long, Map[String,Double])]
How do I flatten the map in the RDD to get this:
org.apache.spark.rdd.RDD[(Long, String, Double)]
val x = sc.parallelize(List((2, Map("a" -> 0.2, "b" -> 0.3))))
x.flatMap {
case (id, m) => m.map { case (k, v) => (id, k, v)}
}
.collect()
res1: Array[(Int, String, Double)] = Array((2,a,0.2), (2,b,0.3))
So I'm getting into Scala and I have a question if what I want to do is possible, or if there's a better way.
I want to be able to take as a parameter a Map whose keys are either strings, or 0-args function returning a string. So for example
def main(args: List[String]){
f = F(
Map(
"key" -> "value",
"key2" ->(()=> {"valule2"})
)
)
println(f("key"))
}
case class F(arg: Map[String, ???]){
def apply(s: String): String = {arg(s)}
}
This obviously doesn't compile. Is there any way to do this?
In this case you can use scala.Either
scala> val map: Map[String, Either[String, () => String]] = Map.empty
map: Map[String,Either[String,() => String]] = Map()
scala> map + ("key" -> Left("value"))
res0: scala.collection.immutable.Map[String,Either[String,() => String]] = Map(key -> Left(value))
scala> res0("key")
res1: Either[String,() => String] = Left(value)
scala> map + ("key2" -> Right(() => "value2"))
res2: scala.collection.immutable.Map[String,Either[String,() => String]] = Map(key2 -> Right(<function0>))
scala> res2("key2")
res3: Either[String,() => String] = Right(<function0>)
Update
You can hide implementation from caller using something like this.
def toEither[T <: Any : Manifest](x: T): Either[String, () => String] =
x match {
case x: String => Left(x)
case x: Function0[String] if manifest[T] <:< manifest[() => String] => Right(x)
case _ => throw new IllegalArgumentException
}
The actual type of Function0 is eliminated due type erasure, but it can be verified with Manifest
scala> map + ("key" -> toEither("value"))
res1: scala.collection.immutable.Map[String,Either[String,() => String]] = Map(key -> Left(value))
scala> map + ("key2" -> toEither(() => "value2"))
res2: scala.collection.immutable.Map[String,Either[String,() => String]] = Map(key2 -> Right(<function0>))
scala> res2("key2").right.get()
res3: String = value2
scala> map + ("key2" -> toEither(() => 5))
java.lang.IllegalArgumentException
scala> map + ("key2" -> toEither(false))
java.lang.IllegalArgumentException
Update2
As #Submonoid rightly corrected me in the comments below, there is much simpler way of dealing with Either.
type StringOrFun = Either[String, () => String]
implicit def either(x: String): StringOrFun = Left(x)
implicit def either(x: () => String): StringOrFun = Right(x)
val m: Map[String, StringOrFun] = Map("key" -> "value", "key2" -> (() => "value2"))
Alternatively, you can wrap any string in a function which evaluates to that string:
implicit def delayed[A](a : A) = () => a
val m = Map[String, () => String]("a" -> "b", "c" -> (() => "d"))