How to find tuple with different value in a list using scala? - scala

I have following list:
val list = List(("name1",20),("name2",20),("name1",30),("name2",30),
("name3",40),("name3",30),("name3",20))
I want following output:
List(("name3",40))
I tried following:
val distElements = list.map(_._2).distinct
list.groupBy(_._1).map{ case(k,v) =>
val h = v.map(_._2)
if(distElements.equals(h)) List.empty else distElements.diff(h)
}.flatten
But this is not I am looking for.
Can anybody give answer/hint me to get expected output.

I understand the question as looking for the element of the list whose _2 (number) occurs only once.
val list = List(("name1",20),("name2",20),("name1",30),("name2",30),
("name3",40),("name3",30),("name3",20))
First you group by the _2 element, which gives you a map whose keys are lists of all elements with the same _2:
val g = list.groupBy(_._2) // Map[Int, List[(String, Int)]]
Now you can filter those entries that consists only of one element:
val opt = g.collectFirst { // Option[(String, Int)]
case (_, single :: Nil) => single
}
Or (if you are expecting possibly more than one distinct value)
val col = g.collect { // Map[String, Int]
case (_, single :: Nil) => single
}

Seems to me that you're looking to match against both the value of the left hand and the right hand at the same time while also preserving the type of collection you're looking at, a List. I would use collect:
val out = myList.collect{
case item # ("name3", 40) => item
}
which combines a PartialFunction with filter and map like qualities. In this case, it filters out any value for which the PartialFunction is not defined while mapping the values which match. Here, I've only allowed for a singular match.

Related

How to create listBuffer in collect function

I tought that List is enough but I need to add element to my list.
I've tried to put this in ListBuffer constructor but without result.
var leavesValues: ListBuffer[Double] =
leaves
.collect { case leaf: Leaf => leaf.value.toDouble }
.toList
Later on I'm going to add value to my list so my expected output is mutable list.
Solution of Raman Mishra
But what if I need to append single value to the end of leavesValues
I can reverse but it's not good enough
I can use ListBuffer like below but I believe that there is cleaner solution:
val leavesValues: ListBuffer[Double] = ListBuffer()
leavesValues.appendAll(leaves
.collect { case leaf: Leaf => leaf.value.toDouble }
.toList)
case class Leaf(value:String)
val leaves = List(Leaf("5"), Leaf("6"), Leaf("7"), Leaf("8") ,Leaf("9") )
val leavesValues: List[Double] =
leaves
.collect { case leaf: Leaf => leaf.value.toDouble }
val value = Leaf("10").value.toDouble
val answer = value :: leavesValues
println(answer)
you can do it like this after getting the list of leavesValues you can prepand the value you want to add into the list.

How to use map / flatMap on a scala Map?

I have two sequences, i.e. prices: Seq[Price] and overrides: Seq[Override]. I need to do some magic on them yet only for a subset based on a shared id.
So I grouped them both into a Map each via groupBy:
I do the group by via:
val pricesById = prices.groupBy(_.someId) // Int => Seq[Cruise]
val overridesById = overrides.groupBy(_.someId) // // Int => Seq[Override]
I expected to be able to create my wanted sequence via flatMap:
val applyOverrides = (someId: Int, prices: Seq[Price]): Seq[Price] => {
val applicableOverrides = overridesById.getOrElse(someId, Seq())
magicMethod(prices, applicableOverrides) // returns Seq[Price]
}
val myPrices: Seq[Price] = pricesById.flatMap(applyOverrides)
I expected myPrices to contain just one big Seq[Price].
Yet I get a weird type mismatch within the flatMap method with NonInferedB I am unable to resolve.
In scala, maps are tuples, not a key-value pair.
The function for flatMap hence expects only one parameter, namely the tuple (key, value), and not two parameters key, value.
Since you can access first element of a tuple via _1, the second via _2 and so on, you can generate your desired function like so:
val pricesWithMagicApplied = pricesById.flatMap(tuple =>
applyOverrides(tuple._1, tuple._2)
Another approach is to use case matching:
val pricesWithMagicApplied: Seq[CruisePrice] = pricesById.flatMap {
case (someId, prices) => applyOverrides(someId, prices)
}.toSeq

How to extract values from Some() in Scala

I have Some() type Map[String, String], such as
Array[Option[Any]] = Array(Some(Map(String, String)
I want to return it as
Array(Map(String, String))
I've tried few different ways of extracting it-
Let's say if
val x = Array(Some(Map(String, String)
val x1 = for (i <- 0 until x.length) yield { x.apply(i) }
but this returns IndexedSeq(Some(Map)), which is not what I want.
I tried pattern matching,
x.foreach { i =>
i match {
case Some(value) => value
case _ => println("nothing") }}
another thing I tried that was somewhat successful was that
x.apply(0).get.asInstanceOf[Map[String, String]]
will do something what I want, but it only gets 0th index of the entire array and I'd want all the maps in the array.
How can I extract Map type out of Some?
If you want an Array[Any] from your Array[Option[Any]], you can use this for expression:
for {
opt <- x
value <- opt
} yield value
This will put the values of all the non-empty Options inside a new array.
It is equivalent to this:
x.flatMap(_.toArray[Any])
Here, all options will be converted to an array of either 0 or 1 element. All these arrays will then be flattened back to one single array containing all the values.
Generally, the pattern is either to use transformations on the Option[T], like map, flatMap, filter, etc.
The problem is, we'll need to add a type cast to retrieve the underlying Map[String, String] from Any. So we'll use flatten to remove any potentially None types and unwrap the Option, and asInstanceOf to retreive the type:
scala> val y = Array(Some(Map("1" -> "1")), Some(Map("2" -> "2")), None)
y: Array[Option[scala.collection.immutable.Map[String,String]]] = Array(Some(Map(1 -> 1)), Some(Map(2 -> 2)), None)
scala> y.flatten.map(_.asInstanceOf[Map[String, String]])
res7: Array[Map[String,String]] = Array(Map(1 -> 1), Map(2 -> 2))
Also when you talk just about single value you can try Some("test").head and for null simply Some(null).flatten

acces tuple inside a tuple for anonymous map job in Spark

This post is essentially about how to build joint and marginal histograms from a (String, String) RDD. I posted the code that I eventually used below as the answer.
I have an RDD that contains a set of tuples of type (String,String) and since they aren't unique I want to get a look at how many times each String, String combination occurs so I use countByValue like so
val PairCount = Pairs.countByValue().toSeq
which gives me a tuple as output like this ((String,String),Long) where long is the number of times that the (String, String) tuple appeared
These Strings can be repeated in different combinations and I essentially want to run word count on this PairCount variable so I tried something like this to start:
PairCount.map(x => (x._1._1, x._2))
But the output the this spits out is String1->1, String2->1, String3->1, etc.
How do I output a key value pair from a map job in this case where the key is going to be one of the String values from the inner tuple, and the value is going to be the Long value from the outter tuple?
Update:
#vitalii gets me almost there. the answer gets me to a Seq[(String,Long)], but what I really need is to turn that into a map so that I can run reduceByKey it afterwards. when I run
PairCount.flatMap{case((x,y),n) => Seq[x->n]}.toMap
for each unique x I get x->1
for example the above line of code generates mom->1 dad->1 even if the tuples out of the flatMap included (mom,30) (dad,59) (mom,2) (dad,14) in which case I would expect toMap to provide mom->30, dad->59 mom->2 dad->14. However, I'm new to scala so I might be misinterpreting the functionality.
how can I get the Tuple2 sequence converted to a map so that I can reduce on the map keys?
If I correctly understand question, you need flatMap:
val pairCountRDD = pairs.countByValue() // RDD[((String, String), Int)]
val res : RDD[(String, Int)] = pairCountRDD.flatMap { case ((s1, s2), n) =>
Seq(s1 -> n, s2 -> n)
}
Update: I didn't quiet understand what your final goal is, but here's a few more examples that may help you, btw code above is incorrect, I have missed the fact that countByValue returns map, and not RDD:
val pairs = sc.parallelize(
List(
"mom"-> "dad", "dad" -> "granny", "foo" -> "bar", "foo" -> "baz", "foo" -> "foo"
)
)
// don't use countByValue, if pairs is large you will run out of memmory
val pairCountRDD = pairs.map(x => (x, 1)).reduceByKey(_ + _)
val wordCount = pairs.flatMap { case (a,b) => Seq(a -> 1, b ->1)}.reduceByKey(_ + _)
wordCount.take(10)
// count in how many pairs each word occur, keys and values:
val wordPairCount = pairs.flatMap { case (a,b) =>
if (a == b) {
Seq(a->1)
} else {
Seq(a -> 1, b ->1)
}
}.reduceByKey(_ + _)
wordPairCount.take(10)
to get the histograms for the (String,String) RDD I used this code.
val Hist_X = histogram.map(x => (x._1-> 1.0)).reduceByKey(_+_).collect().toMap
val Hist_Y = histogram.map(x => (x._2-> 1.0)).reduceByKey(_+_).collect().toMap
val Hist_XY = histogram.map(x => (x-> 1.0)).reduceByKey(_+_)
where histogram was the (String,String) RDD

How to find unique elements from list of tuples based on some elements using scala?

I have following list
val a = List(("name1","add1","city1",10),("name1","add1","city1",10),
("name2","add2","city2",10),("name2","add2","city2",20),("name3","add3","city3",20))
I want distinct element from above list based on first three values of tuple. Fourth value should not be consider while finding distinct elements from list.
I want following output:
val output = List(("name1","add1","city1",10),("name2","add2","city2",10),
("name3","add3","city3",20))
Is it possible to get above output?
As per my knowledge, distinct works if whole tuple/value is duplicated. I tried out with distinct like following:
val b = List(("name1","add1","city1",10),("name1","add1","city1",10),("name2","add2","city2",10),
("name2","add2","city2",20),("name3","add3","city3",20)).distinct
but it gives output as -
List(("name1","add1","city1",10),("name2","add2","city2",10),
("name2","add2","city2",20),("name3","add3","city3",20))
Any alternate approach will also appreciated.
Use groupBy like this
a.groupBy( v => (v._1,v._2,v._3)).keys.toList
This constructs a Map where each key is by definition a unique triplet as required in the lambda function above.
Should it include also the last element in the tuple, fetch the first element for each key, like this
a.groupBy( v => (v._1,v._2,v._3)).mapValues(_.head)
If the order of the output list isn't important (i.e. you are happy to get List(("name3","add3","city3",20),("name1","add1","city1",10),("name2","add2","city2",10))), the following works as specified:
a.groupBy(v => (v._1,v._2,v._3)).values.map(_.head).toList
(Due to Scala collections design, you'll see the order kept for output lists up to 4 elements, but above that size HashMap will be used.) If you do need to keep the order, you can do something like (generalizing a bit)
def distinctBy[A, B](xs: Seq[A], f: A => B) = {
val seen = LinkedHashMap.empty[B, A]
xs.foreach { x =>
val key = f(x)
if (!seen.contains(key)) { seen.update(key, x) }
}
seen.values.toList
}
distinctBy(a, v => (v._1, v._2, v._3))
You could try
a.map{case x#(name, add, city, _) => (name,add,city) -> x}.toMap.values.toList
To make sure you have the first one in list kept,
type String3 = (String, String, String)
type String3Int = (String, String, String, Int)
a.foldLeft(collection.immutable.ListMap.empty[String3, String3Int]) {
case (a, b) => if (a.contains((b._1, b._2, b._3))) {
a
} else a + ((b._1, b._2, b._3) -> b)
}.values.toList
On simple solution would be to convert the List to a Set. Sets don't contain duplicates: check the documentation.
val setOfTuples = a.toSet
println(setOfTuples)
Output: Set((1,1), (1,2), (1,3), (2,1))