For exemple:
mylist: Map("Start" -> 2015-05-30T00:00:00.000Z, "Daily" -> 2015-06-02T00:00:00.000Z, "Hourly" -> 2015-06-03T08:00:00.000Z, "End" -> 2015-06-04T15:00:00.000Z)
I want to output as following format:
myout: List( ("Start" -> 2015-05-30T00:00:00.000Z, "Daily" -> 2015-06-02T00:00:00.000Z), ("Daily" -> 2015-06-02T00:00:00.000Z, "Hourly" -> 2015-06-03T08:00:00.000Z), ("Hourly" -> 2015-06-03T08:00:00.000Z, "End" -> 2015-06-04T15:00:00.000) )
OR
myout: List( ("Start", "Daily"), ("Daily", "Hourly"), ("Hourly", "End"))
Case 1: Always start with "Start" key, Anything comes before "Start" key ignore it. Same for last "End" key
mylist: Map(Hourly -> 2015-06-01T08:00:00.000Z, Start -> 2015-05-30T00:00:00.000Z, Daily -> 2015-06-02T00:00:00.000Z, End -> 2015-06-04T15:00:00.000Z, Weekly-> 2015-06-05T00:00:00.000Z)
output should like:
List((Start, Daily), (Daily, End))
I am looking output using scala.
import scala.collection.immutable.ListMap
val x = ListMap("Start" -> "x", "Daily" -> "y", "Hourly" -> "z", "End" -> "a")
x.toList.sliding(2).map( a => (a(0)._1, a(1)._1)).toList
List((Start,Daily), (Daily,Hourly), (Hourly,End))
Since a Map is not ordered, I have modified the input data to get stable results.
As for the 1st question
val m =
Map(
"1-Start" -> "2015-05-30T00:00:00.000Z",
"2-Daily" -> "2015-06-02T00:00:00.000Z",
"3-Hourly" -> "2015-06-03T08:00:00.000Z",
"4-End" -> "2015-06-04T15:00:00.000Z")
The basic idea is to zip the list of keys with its own tail to get the pairs:
scala> m.keys.toList.sorted.zip(m.keys.toList.sorted.tail)
res57: List[(String, String)] = List((1-Start,2-Daily), (2-Daily,3-Hourly),
(3-Hourly,4-End))
To simplify the expression a "pipe forward operator" is helpful:
object PipeForwardContainer {
implicit class PipeForward[T](val v: T) extends AnyVal {
def |>[R](f: T => R): R = {
f(v)
}
}
}
import PipeForwardContainer._
This operator provides a reference to the intermediate result. Therefore you can write:
scala> m.keys.toList.sorted |> { l => l.zip(l.tail) }
res97: List[(String, String)] = List((1-Start,2-Daily), (2-Daily,3-Hourly),
(3-Hourly,4-End))
As for the 2nd question
val m =
Map(
"1-Hourly" -> "2015-06-03T08:00:00.000Z",
"2-Start" -> "2015-05-30T00:00:00.000Z",
"3-Daily" -> "2015-06-02T00:00:00.000Z",
"4-End" -> "2015-06-04T15:00:00.000Z",
"5-Weekly"-> "2015-06-05T00:00:00.000Z")
To get the raw list you can slice out the relevant elements by index:
scala> m.keys.toList.sorted |> { l =>
l.slice(l.indexOf("2-Start"), l.indexOf("4-End") + 1) }
res96: List[String] = List(2-Start, 3-Daily, 4-End)
Again with zip to get the pairs:
scala> m.keys.toList.sorted |> { l =>
l.slice(l.indexOf("2-Start"), l.indexOf("4-End") + 1)
} |> { l => l.zip(l.tail) }
res98: List[(String, String)] = List((1-Start,2-Daily), (2-Daily,3-Hourly),
(3-Hourly,4-End))
Related
Imagine the following list of maps (which could be potentially longer):
List(
Map[String,String]("wind"->"high", "rain"->"heavy", "class"->"very late"),
Map[String,String]("wind"->"none", "rain"->"slight", "class"->"on time"),
Map[String,String]("wind"->"high", "rain"->"none", "class"->"very late"),
...
)
How can I get to the following form:
Map("very late" -> Set(("wind",Map("high" -> 2)), ("rain",Map("heavy" -> 1, "none" -> 1))),
"on time" -> Set(("wind",Map("none" -> 1)), ("rain",Map("slight" -> 1))))
This will get you what you want.
val maps = List(...)
maps.groupBy(_.getOrElse("class","no-class"))
.mapValues(_.flatMap(_ - "class").groupBy(_._1)
.mapValues(_.map(_._2).groupBy(identity)
.mapValues(_.length)
).toSet
)
The problem is, what you want isn't a good place to be.
The result type is Map[String,Set[(String,Map[String,Int])]] which is a terrible hodgepodge of collection types. Why Set? What purpose does that serve? How is this useful? How do you retrieve meaningful data from it?
This looks like an XY problem.
Here are two versions.
First using Set, which looks like what you want,
val grouped2 = maps.foldLeft(Map.empty[String, Set[(String, Map[String, Int])]]) {
case (acc, map) =>
map.get("class").fold(acc) { key =>
val keyOpt = acc.get(key)
if (keyOpt.isDefined) {
val updatedSet = (map - "class").foldLeft(Set.empty[(String, Map[String, Int])]) {
case (setAcc, (k1, v1)) =>
keyOpt.flatMap(_.find(_._1 == k1)).map { tup =>
setAcc + ((k1, tup._2.get(v1).fold(tup._2 ++ Map(v1 -> 1))(v => tup._2 + ((v1, v + 1)))))
}.getOrElse(setAcc + (k1 -> Map(v1 -> 1)))
}
acc.updated(key, updatedSet)
} else {
acc + (key -> (map - "class").map(tup => (tup._1, Map(tup._2 -> 1))).toSet)
}
}
}
and then using a Map,
val grouped1 = maps.foldLeft(Map.empty[String, Map[String, Map[String, Int]]]) {
case (acc, map) =>
map.get("class").fold(acc) { key =>
val keyOpt = acc.get(key)
if (keyOpt.isDefined) {
val updatedMap = (map - "class").foldLeft(Map.empty[String, Map[String, Int]]) {
case (mapAcc, (k1, v1)) =>
keyOpt.flatMap(_.get(k1)).map{ statMap =>
mapAcc + ((k1, statMap.get(v1).fold(statMap ++ Map(v1 -> 1))(v => statMap + ((v1, v + 1)))))
}.getOrElse(mapAcc + (k1 -> Map(v1 -> 1)))
}
acc.updated(key, updatedMap)
} else {
acc + (key -> (map - "class").map(tup => (tup._1, Map(tup._2 -> 1))))
}
}
}
I was playing with Map version and changed it to Set. In a few days, I don't think I will understand everything above just by looking at it. So, I have tried to make it as understandable as possible for me. Adapt this to your own solution or wait for others.
Here goes another implementation, which uses the keySet method of Map to join two maps together.
Also, note that I changed the output type from a Map to a Set of tuples whose second value is another Map. To three nested Maps which, IMHO, makes more sense.
def groupMaps[K, V](groupingKey: K, data: List[Map[K, V]]): Map[V, Map[K, Map[V, Int]]] =
data.foldLeft(Map.empty[V, Map[K, Map[V, Int]]]) {
case (acc, map) =>
map.get(key = groupingKey).fold(ifEmpty = acc) { groupingValue =>
val newValues = (map - groupingKey).map {
case (key, value) =>
key -> Map(value -> 1)
}
val finalValues = acc.get(key = groupingValue).fold(ifEmpty = newValues) { oldValues =>
(oldValues.keySet | newValues.keySet).iterator.map { key =>
val oldMap = oldValues.getOrElse(key = key, default = Map.empty[V, Int])
val newMap = newValues.getOrElse(key = key, default = Map.empty[V, Int])
val finalMap = (oldMap.keySet | newMap.keySet).iterator.map { value =>
val oldCount = oldMap.getOrElse(key = value, default = 0)
val newCount = newMap.getOrElse(key = value, default = 0)
value -> (oldCount + newCount)
}.toMap
key -> finalMap
}.toMap
}
acc.updated(key = groupingValue, finalValues)
}
}
Which can be used like this:
val maps =
List(
Map("wind" -> "none", "rain" -> "none", "class" -> "on time"),
Map("wind" -> "none", "rain" -> "slight", "class" -> "on time"),
Map("wind" -> "none", "rain" -> "slight", "class" -> "late"),
Map("wind" -> "none", "rain" -> "slight")
)
val result = groupMaps(groupingKey = "class", maps)
// val result: Map[Strig, Map[String, Map[String, Int]]] =
// Map(
// on time -> Map(wind -> Map(none -> 2), rain -> Map(none -> 1, slight -> 1)),
// late -> Map(wind -> Map(none -> 1), rain -> Map(slight -> 1))
// )
If you need to maintain the output type you asked for, then you can just do a .mapValue(_.toSet) at the end of the foldLeft
You can see the code running here
I'm working with a map object in scala where the key is a basket ID and the value is a set of item ID's contained within a basket. The goal is to ingest this map object and compute for each basket, a set of other basket ID's that contain at least one common item.
Say the input map object is
val basket = Map("b1" -> Set("i1", "i2", "i3"), "b2" -> Set("i2", "i4"), "b3" -> Set("i3", "i5"), "b4" -> Set("i6"))
Is it possible to perform the computation in spark such that I get the intersecting basket information back? For example
val intersects = Map("b1" -> Set("b2", "b3"), "b2" -> Set("b1"), "b3" -> Set("b1"), "b4" -> Set())
Thanks!
Something like...
val basket = Map("b1" -> Set("i1", "i2", "i3"), "b2" -> Set("i2", "i4"), "b3" -> Set("i3", "i5"), "b4" -> Set("i6"))
def intersectKeys( set : Set[String], map : Map[String,Set[String]] ) : Set[String] = {
val checks = map.map { case (k, v) =>
if (set.intersect(v).nonEmpty) Some(k) else None
}
checks.collect { case Some(k) => k }.toSet
}
// each set picks up its own key, which we don't want, so we subtract it back out
val intersects = basket.map { case (k,v) => (k, intersectKeys(v, basket) - k) }
I have a map like :
val programming = Map(("functional", 1) -> "scala", ("functional", 2) -> "perl", ("orientedObject", 1) -> "java", ("orientedObject", 2) -> "C++")
with the same first element of key appearing multiple times.
How to regroup all the values corresponding to the same first element of key ? Which would turn this map into :
Map("functional" -> List("scala","perl"), "orientedObject" -> List("java","C++"))
UPDATE: This answer is based upon your original question. If you need the more complex Map definition, using a tuple as the key, then the other answers will address your requirements. You may still find this approach simpler.
As has been pointed out, you can't actually have multiple keys with the same value in a map. In the REPL, you'll note that your declaration becomes:
scala> val programming = Map("functional" -> "scala", "functional" -> "perl", "orientedObject" -> "java", "orientedObject" -> "C++")
programming: scala.collection.immutable.Map[String,String] = Map(functional -> perl, orientedObject -> C++)
So you end up missing some values. If you make this a List instead, you can get what you want as follows:
scala> val programming = List("functional" -> "scala", "functional" -> "perl", "orientedObject" -> "java", "orientedObject" -> "C++")
programming: List[(String, String)] = List((functional,scala), (functional,perl), (orientedObject,java), (orientedObject,C++))
scala> programming.groupBy(_._1).map(p => p._1 -> p._2.map(_._2)).toMap
res0: scala.collection.immutable.Map[String,List[String]] = Map(functional -> List(scala, perl), orientedObject -> List(java, C++))
Based on your edit, you have a data structure that looks something like this
val programming = Map(("functional", 1) -> "scala", ("functional", 2) -> "perl",
("orientedObject", 1) -> "java", ("orientedObject", 2) -> "C++")
and you want to scrap the numerical indices and group by the string key. Fortunately, Scala provides a built-in that gets you close.
programming groupBy { case ((k, _), _) => k }
This will return a new map which contains submaps of the original, grouped by the key that we return from the "partial" function. But we want a map of lists, so let's ignore the keys in the submaps.
programming groupBy { case ((k, _), _) => k } mapValues { _.values }
This gets us a map of... some kind of Iterable. But we really want lists, so let's take the final step and convert to a list.
programming groupBy { case ((k, _), _) => k } mapValues { _.values.toList }
You should try the .groupBy method
programming.groupBy(_._1._1)
and you will get
scala> programming.groupBy(_._1._1)
res1: scala.collection.immutable.Map[String,scala.collection.immutable.Map[(String, Int),String]] = Map(functional -> Map((functional,1) -> scala, (functional,2) -> perl), orientedObject -> Map((orientedObject,1) -> java, (orientedObject,2) -> C++))
you can now "clean" by doing something like:
scala> res1.mapValues(m => m.values.toList)
res3: scala.collection.immutable.Map[String,List[String]] = Map(functional -> List(scala, perl), orientedObject -> List(java, C++))
Read the csv file and create a map that contains key and list of values.
val fileStream = getClass.getResourceAsStream("/keyvaluepair.csv")
val lines = Source.fromInputStream(fileStream).getLines
var mp = Seq[List[(String, String)]]();
var codeMap=List[(String, String)]();
var res = Map[String,List[String]]();
for(line <- lines )
{
val cols=line.split(",").map(_.trim())
codeMap ++= Map(cols(0)->cols(1))
}
res = codeMap.groupBy(_._1).map(p => p._1 -> p._2.map(_._2)).toMap
Since no one has put in the specific ordering he asked for:
programming.groupBy(_._1._1)
.mapValues(_.toSeq.map { case ((t, i), l) => (i, l) }.sortBy(_._1).map(_._2))
I am using scala to implement an algorithm. I have a case where I need to implement such scenario:
test = Map(t -> List((t,2)), B -> List((B,3), (B,1)), D -> List((D,1)))
I need to some the second member of every common tuples.
The desired result :
Map((t,2),(B,4),(D,1))
val resReduce = test.foldLeft(Map.empty[String, List[Map.empty[String, Int]]){(count, tup) => count + (tup -> (count.getOrElse(tup, 0) + 1))
I am trying to use "Reduce", I have to go through every group I did and sum their second member. Any idea how to do that.
If you know that all lists are nonempty and start with the same key (e.g. they were produced by groupBy), then you can just
test.mapValues(_.map(_._2).sum).toMap
Alternatively, you might want an intermediate step that allows you to perform error-checking:
test.map{ case(k,xs) =>
val v = {
if (xs.exists(_._1 != k)) ??? // Handle key-mismatch case
else xs.reduceOption((l,r) => l.copy(_2 = l._2 + r._2))
}
v.getOrElse(??? /* Handle empty-list case */)
}
You could do something like this:
test collect{
case (key, many) => (key, many.map(_._2).sum)
}
wherein you do not have to assume that the list has any members. However, if you want to exclude empty lists, add a guard
case (key, many) if many.nonEmpty =>
like that.
scala> val test = Map("t" -> List(("t",2)), "B" -> List(("B",3), ("B",1)), "D" -> List(("D",1)))
test: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(t -> List((t,2)), B -> List((B,3), (B,1)), D -> List((D,1)))
scala> test.map{case (k,v) => (k, v.map(t => t._2).sum)}
res32: scala.collection.immutable.Map[String,Int] = Map(t -> 2, B -> 4, D -> 1)
Yet another approach, in essence quite similar to what has already been suggested,
implicit class mapAcc(val m: Map[String,List[(String,Int)]]) extends AnyVal {
def mapCount() = for ( (k,v) <- m ) yield { (k,v.map {_._2}.sum) }
}
Then for a given
val test = Map("t" -> List(("t",2)), "B" -> List(("B",3), ("B",1)), "D" -> List(("D",1)))
a call
test.mapCount()
delivers
Map(t -> 2, B -> 4, D -> 1)
I have the list of Scala Tuples2 ,which i have to group them . I currently use the following way to perform it.
var matches:List[Tuple2[String,Int]]
var m = matches.toSeq.groupBy(i=>i._1).map(t=>(t._1,t._2)).toSeq.sortWith(_._2.size>_._2.size).sortWith(_._2.size>_._2.size)
The above grouping gives me
Seq[(String,Seq[(String,Int)])]
but i would like to have
Seq[(String,Seq[Int])]
I would like to know is there any better and efficient way to the same.
First off, some thoughts:
// You should use `val` instead of `var`
var matches: List[Tuple2[String, Int]] = List("a" -> 1, "a" -> 2, "b" -> 3, "c" -> 4, "c" -> 5)
var m = matches
.toSeq // This isn't necessary: it's already a Seq
.groupBy(i => i._1)
.map(t => (t._1, t._2)) // This isn't doing anything at all
.toSeq
.sortWith(_._2.size > _._2.size) // `sortBy` will reduce redundancy
.sortWith(_._2.size > _._2.size) // Not sure why you have this twice since clearly the
// second sorting isn't doing anything...
So try this:
val matches: List[Tuple2[String, Int]] = List("a" -> 1, "a" -> 2, "b" -> 3, "c" -> 4, "c" -> 5)
val m: Seq[(String, Seq[Int])] =
matches
.groupBy(_._1)
.map { case (k, vs) => k -> vs.map(_._2) } // Drop the String part of the value
.toVector
.sortBy(_._2.size)
println(m) // Vector((b,List(3)), (a,List(1, 2)), (c,List(4, 5)))