The following Iterable can be o size one, two, or (up to) three.
org.apache.spark.rdd.RDD[Iterable[(String, String, String, String, Long)]] = MappedRDD[17] at map at <console>:75
The second element of each tuple can have any of the following values: A, B, C. Each of these values can appear (at most) once.
What I would like to do is sort them based on the following order (B , A , C), and then create a string by concatenating the elements of the 3rd place. If the corresponding tag is missing then concatenate with a blank space: ``. For example:
this:
CompactBuffer((blah,A,val1,blah,blah), (blah,B,val2,blah,blah), (blah,C,val3,blah,blah))
should result in:
val2,val1,val3
this:
CompactBuffer((blah,A,val1,blah,blah), (blah,C,val3,blah,blah))
should result in:
,val1,val3
this:
CompactBuffer((blah,A,val1,blah,blah), (blah,B,val2,blah,blah))
should result in:
val2,val1,
this:
CompactBuffer((blah,B,val2,blah,blah))
should result in:
val2,,
and so on so forth.
In your case when A, B and C appear at most once, you could add the corresponding values to a temporary map and retrieve the values from the map in the correct order.
If we use getOrElse to get the values from the map, we can specify the empty string as default value. This way we still get the correct result if our Iterable doesn't contain all the tuples with A, B and C.
type YourTuple = (String, String, String, String, Long)
def orderTuples(order: List[String])(iter: Iterable[YourTuple]) = {
val orderMap = iter.map { case (_, key, value, _, _) => key -> value }.toMap
order.map(s => orderMap.getOrElse(s, "")).mkString(",")
}
We can use this function as follows :
val a = ("blah","A","val1","blah",1L)
val b = ("blah","B","val2","blah",2L)
val c = ("blah","C","val3","blah",3L)
val order = List("B", "A", "C")
val bacOrder = orderTuples(order) _
bacOrder(Iterable(a, b, c)) // String = val2,val1,val3
bacOrder(Iterable(a, c)) // String = ,val1,val3
bacOrder(Iterable(a, b)) // String = val2,val1,
bacOrder(Iterable(b)) // String = val2,,
def orderTuples(xs: Iterable[(String, String, String, String, String)],
order: (String, String, String) = ("B", "A", "C")) = {
type T = Iterable[(String, String, String, String, String)]
type KV = Iterable[(String, String)]
val ord = List(order._1, order._2, order._3)
def loop(xs: T, acc: KV, vs: Iterable[String] = ord): KV = xs match {
case Nil if vs.isEmpty => acc
case Nil => vs.map((_, ",")) ++: acc
case x :: xs => loop(xs, List((x._2, x._3)) ++: acc, vs.filterNot(_ == x._2))
}
def comp(x: String) = ord.zipWithIndex.toMap.apply(x)
loop(xs, Nil).toList.sortBy(x => comp(x._1)).map(_._2).mkString(",")
}
Related
Let us have a collection of collections as below:
type Row = IndexedSeq[Any]
type RowTable = IndexedSeq[Row]
val table: RowTable = IndexedSeq(
IndexedSeq(2, "b", ... /* some elements of type Any*/),
IndexedSeq(1, "a", ...),
IndexedSeq(2, "c", ...))
Each Row in RowTable "has the same schema", meaning that as in example if the first row in the table contains Int, String, ..., then the second row in the table contains the elements of the same type in the same order, i.e., Int, String, ....
I would like to sort Rows in a RowTable by given indices of Row's elements and the sorting direction (ascending or descending sort) for that element.
For example, the collection above would be sorted this way for Index 0 ascending and Index 1 descending and the rest of elements are not important in sorting:
1, "a", ...
2, "c", ...
2, "b", ...
Since Row is IndexedSeq[Any], we do not know the type of each element to compare it; however, we know that it may be casted to Comparable[Any] and, thus, has compareTo() method to compare it with an element under the same index in another row.
The indices, as mentioned above, that will determine the sorting order are not known before we start sorting. How can I code this in Scala?
First of all, it's a bad design to compare a pair of Any.
By default, scala doesn't provide any way to get Ordering[Any]. Hence if you want to compare a pair of Any, you should implement Ordering[Any] by yourself:
object AnyOrdering extends Ordering[Any] {
override def compare(xRaw: Any, yRaw: Any): Int = {
(xRaw, yRaw) match {
case (x: Int, y: Int) => Ordering.Int.compare(x, y)
case (_: Int, _) => 1
case (_, _: Int) => -1
...
case (x: String, y: String) => Ordering.String.compare(x, y)
case (_: String, _) => 1
case (_, _: String) => -1
...
case (_, _) => 0
}
}
}
In your example, you want to compare two IndexedSeq[T] recursively. Scala doesn't provide any recursive Ordering and you need to implement it too:
def recOrdering[T](implicit ordering: Ordering[T]): Ordering[IndexedSeq[T]] = new Ordering[IndexedSeq[T]] {
override def compare(x: IndexedSeq[T], y: IndexedSeq[T]): Int = compareRec(x, y)
#tailrec
private def compareRec(x: IndexedSeq[T], y: IndexedSeq[T]): Int = {
(x.headOption, y.headOption) match {
case (Some(xHead), Some(yHead)) =>
val compare = ordering.compare(xHead, yHead)
if (compare == 0) {
compareRec(x.tail, y.tail)
} else {
compare
}
case (Some(_), None) => 1
case (None, Some(_)) => -1
}
}
}
After that you can finally sort your collection:
table.sorted(recOrdering(AnyOrdering))
(Sorry for unidiomatic (maybe not compiling) code; I can probably help with it upon request)
We can use the code below to sort a table
table.sortWith {
case (tupleL, tupleR) => isLessThan(tupleL, tupleR)
}
where isLessThan is defined as follows (unidiomatic to Scala, ik):
def isLessThan(tupleL: Row, tupleR: Row): Boolean = {
var i = 0
while (i < sortInfos.length) {
val sortInfo = sortInfos(i)
val result = tupleL(sortInfo.fieldIndex)
.asInstanceOf[Comparable[Any]].compareTo(
tupleR(sortInfo.fieldIndex)
.asInstanceOf[Comparable[Any]])
if (result != 0) {
if (sortInfo.isDescending) {
if (result > 0)
return true
else
return false
} else {
if (result < 0)
return true
else
return false
}
}
i += 1
}
true
}
where sortInfos is IndexedSeq[SortInfo] and
case class SortInfo(val fieldIndex: Int, val isDescending: Boolean)
Here is working Example with Ordering[IndexedSeq[Any]]:
val table: IndexedSeq[IndexedSeq[Any]] = IndexedSeq(
IndexedSeq(2, "b", "a"),
IndexedSeq(2, "b"),
IndexedSeq("c", 2),
IndexedSeq(1, "c"),
IndexedSeq("c", "c"),
//IndexedSeq((), "c"), //it will blow in runtime
IndexedSeq(2, "a"),
)
implicit val isaOrdering:Ordering[IndexedSeq[Any]] = { (a, b) =>
a.zip(b).filter {case (a, b)=> a != b}.collectFirst {
case (a:Int, b:Int) => a compare b
case (a:String, b:String) => a compare b
case (a:String, b:Int) => 1 //prefere ints over strings
case (a:Int, b:String) => -1 //prefere ints over strings
case _ => throw new RuntimeException(s"cannot compare $a to $b")
}.getOrElse(a.length compare b.length) //shorter will be first
}
println(table.sorted) //used implicitly
println(table.sorted(isaOrdering))
//Vector(Vector(1, c), Vector(2, a), Vector(2, b), Vector(2, b, a), Vector(c, 2), Vector(c, c))
https://scalafiddle.io/sf/yvLEnYL/4
or if you really need to compare different types somehow this is best I could find:
implicit val isaOrdering:Ordering[IndexedSeq[Any]] = { (a, b) =>
a.zip(b).filter {case (a, b)=> a != b}.collectFirst {
case (a:Int, b:Int) => a compare b
case (a:String, b:String) => a compare b
//add your known types here
// ...
//below is rule that cares about unknown cases.
//We don't know types at all, at best what we can do is compare equality.
//If they are equal then return 0... if not we throw
//this could be also very slow (don't tested)
case (a, b) =>
//not nice but it is stable at least
val ac = a.getClass.getName
val bc = b.getClass.getName
ac.compare(bc) match {
case 0 => if (ac == bc) 0 else throw new RuntimeException(s"cannot compare $a to $b")
case x => x
}
}.getOrElse(a.length compare b.length) //shorter will be first
}
https://scalafiddle.io/sf/yvLEnYL/5
This implementation will fail in runtime when we could not compare them.
I am currently working on a function that takes in a Map[String, List[String]] and a String as arguments. The map contains a user Id and the IDs of films that they liked. What I need to do is, to return a List[List[String]] which contains the other movies that where liked by the user who liked the movie that was passed into the function.
The function declaration looks as follows:
def movies(m: Map[String, List[String]], mov: String) : List[List[String]]= {
}
So lets imagine the following:
val m1 : [Map[Int, List[String]]] = Map(1 ‐> List("b", "a"), 2 ‐> List("y", "x"), 3 ‐> List("c", "a"))
val movieID = "a"
movies(m1, movieId)
This should return:
List(List("b"), List("c"))
I have tried using
m1.filter(x => x._2.contains(movieID))
So that only Lists containing movieID are kept in the map, but my problem is that I need to remove movieID from every list it occurs in, and then return the result as a List[List[String]].
You could use collect:
val m = Map("1" -> List("b", "a"), "2" -> List("y", "x"), "3" -> List("c", "a"))
def movies(m: Map[String, List[String]], mov: String) = m.collect {
case (_, l) if l.contains(mov) => l.filterNot(_ == mov)
}
movies(m, "a") //List(List(b), List(c))
Problem with this approach is, that it would iterate over every movie list twice, the first time with contains and the second time with filterNot. We could optimize it tail-recursive function, which would look for element and if found just return list without it:
import scala.annotation.tailrec
def movies(m: Map[String, List[String]], mov: String) = {
#tailrec
def withoutElement[T](l: List[T], mov: T, acc: List[T] = Nil): Option[List[T]] = {
l match {
case x :: xs if x == mov => Some(acc.reverse ++ xs)
case x :: xs => withoutElement(xs, mov, x :: acc)
case Nil => None
}
}
m.values.flatMap(withoutElement(_, mov))
}
The solution from Krzysztof is a good one. Here's an alternate way to traverse every List just once.
def movies(m: Map[String, List[String]], mov: String) =
m.values.toList.flatMap{ss =>
val tpl = ss.foldLeft((false, List.empty[String])){
case ((_,res), `mov`) => (true, res)
case ((keep,res), str) => (keep, str::res)
}
if (tpl._1) Some(tpl._2) else None
}
This should work for you:
object DemoAbc extends App {
val m1 = Map(1 -> List("b", "a"), 2 -> List("y", "x"), 3 -> List("c", "a"))
val movieID = "a"
def movies(m: Map[Int, List[String]], mov: String): List[List[String]] = {
val ans = m.foldLeft(List.empty[List[String]])((a: List[List[String]], b: (Int, List[String])) => {
if (b._2.contains(mov))
b._2.filter(_ != mov) :: a
else a
})
ans
}
print(movies(m1, movieID))
}
I have two lists with two different types, but both types have the same field to identify/compare.
My Requirement: I want to compare two lists based on some fields of objects, once matched delete element from both list/set.
For example:
case class Type1(name:String, surname: String, address: Int)
case class Type2(name:String, surname: String, address: Int, dummy: String)
So record will be matched if both lists have the same field data for both types.
My List:
val type1List = List(Type1("name1","surname1", 1),
Type1("name2","surname2", 2),
Type1("name3","surname3", 3)
)
val type2List = List(Type2("name1","surname1", 1),
Type2("name2","surname2", 2),
Type2("name4","surname4", 4)
)
Comparing type1List and type2List, removed the matched date from both lists.
type1List should contain only:
val type1List = List(
Type1("name3","surname3", 3)
)
type2List should contain only:
val type2List = List(
Type2("name4","surname4", 4)
)
I tried it with looping/interation , but that seems too complex and performance hit.
Thanks in advance.
Here is a generic approach to the task.
The idea is to find the common elements in both lists according to some custom functions, and then remove both of them.
def removeCommon[A, B, K](as: List[A], bs: List[B])
(asKey: A => K)
(bsKey: B => K): (List[A], List[B]) = {
def removeDuplicates[V](commonKeys: Set[K], map: Map[K, List[V]]): List[V] =
map
.iterator
.collect {
case (key, value) if (!commonKeys.contains(key)) =>
value.head
}.toList
val asByKey = as.groupBy(asKey)
val bsByKey = bs.groupBy(bsKey)
val commonKeys = asByKey.keySet & bsByKey.keySet
val uniqueAs = removeDuplicates(commonKeys, asByKey)
val uniqueBs = removeDuplicates(commonKeys, bsByKey)
(uniqueAs, uniqueBs)
}
Which you can use like following:
final case class Type1(name:String, surname: String, address: Int)
final case class Type2(name:String, surname: String, address: Int, dummy: String)
val type1List = List(
Type1("name1","surname1", 1),
Type1("name2","surname2", 2),
Type1("name3","surname3", 3)
)
val type2List = List(
Type2("name1","surname1", 1, "blah"),
Type2("name2","surname2", 2, "blah"),
Type2("name4","surname4", 4, "blah")
)
val (uniqueType1List, uniqueType2List) =
removeCommon(type1List, type2List) { type1 =>
(type1.name, type1.surname, type1.address)
} { type2 =>
(type2.name, type2.surname, type2.address)
}
// uniqueType1List: List[Type1] = List(Type1("name3", "surname3", 3))
// uniqueType2List: List[Type2] = List(Type2("name4", "surname4", 4, "blah"))
If you can not adjust the 2 Types, here a pragmatic solution:
First find the 'equals'
val equals: Seq[Type1] = type1List.filter {
case Type1(n, sn, a) =>
type2List.exists { case Type2(n2, sn2, a2, _) =>
n == n2 && sn == sn2 && a == a2
}
}
Then filter both Lists:
val filteredType1List = type1List.filterNot(t1 => equals.contains(t1))
val filteredType2List = type2List.filterNot {
case Type2(n2, sn2, a2, _) =>
equals.exists { case Type1(n, sn, a)=>
n == n2 && sn == sn2 && a == a2
}
}
Assuming you know what field you want to use to base your diff on, here is an outline of a solution.
First define a super class for both types:
abstract class Type(val name : String) {
def canEqual(a: Any) = a.isInstanceOf[Type]
override def equals(obj: Any): Boolean = obj match {
case obj : Type => obj.canEqual(this) && this.name == obj.name
case _ => false
}
override def hashCode(): Int = this.name.hashCode
}
Then define your types as sub types of the above class:
case class Type1(override val name: String, surname: String, address: Int) extends Type(name)
case class Type2(override val name: String, surname: String, address: Int, dummy: String) extends Type(name)
Now a simple
type1List.diff(type2List)
will result in:
List(Type1(name3,surname3,3))
and
type2List.diff(type1List)
will give:
List(Type2(name4,surname4,4,dum3))
Still, I would be careful with solutions like this. Because circumventing equals and hashcode opens up the code to all kinds of bugs. So it is better to make sure this is kept limited to the scope you are working in.
I have the following list in input:
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
and I want to write a function in scala using groupBy and foldleft ( just one function) in order to sum up third and fourth colum for lines having the same title(first column here), the wanted output is :
val listOutput1 =
List(
"itemA,CATS,5,5",
"itemB,CATQ,8,11",
"itemC,CARC,5,10"
)
def sumIndex (listIn:List[String]):List[String]={
listIn.map(_.split(",")).groupBy(_(0)).map{
case (title, label) =>
"%s,%s,%d,%d".format(
title,
label.head.apply(1),
label.map(_(2).toInt).sum,
label.map(_(3).toInt).sum)}.toList
}
Kind regards
The logic in your code looks sound, here it is with a case class implemented as that handles edge cases more cleanly:
// represents a 'row' in the original list
case class Item(
name: String,
category: String,
amount: Int,
price: Int
)
// safely converts the row of strings into case class, throws exception otherwise
def stringsToItem(strings: Array[String]): Item = {
if (strings.length != 4) {
throw new Exception(s"Invalid row: ${strings.foreach(print)}; must contain only 4 entries!")
} else {
val n = strings.headOption.getOrElse("N/A")
val cat = strings.lift(1).getOrElse("N/A")
val amt = strings.lift(2).filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
val p = strings.lastOption.filter(_.matches("^[0-9]*$")).map(_.toInt).getOrElse(0)
Item(n, cat, amt, p)
}
}
// original code with case class and method above used
listInput1.map(_.split(","))
.map(stringsToItem)
.groupBy(_.name)
.map { case (name, items) =>
Item(
name,
category = items.head.category,
amount = items.map(_.amount).sum,
price = items.map(_.price).sum
)
}.toList
You can solve it with a single foldLeft, iterating the input list only once. Use a Map to aggregate the result.
listInput1.map(_.split(",")).foldLeft(Map.empty[String, Int]) {
(acc: Map[String, Int], curr: Array[String]) =>
val label: String = curr(0)
val oldValue: Int = acc.getOrElse(label, 0)
val newValue: Int = oldValue + curr(2).toInt + curr(3).toInt
acc.updated(label, newValue)
}
result: Map(itemA -> 10, itemB -> 19, itemC -> 15)
If you have a list as
val listInput1 =
List(
"itemA,CATs,2,4",
"itemA,CATS,3,1",
"itemB,CATQ,4,5",
"itemB,CATQ,4,6",
"itemC,CARC,5,10")
Then you can write a general function that can be used with foldLeft and reduceLeft as
def accumulateLeft(x: Map[String, Tuple3[String, Int, Int]], y: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and you can call them as
foldLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldLeft(Map.empty[String, Tuple3[String, Int, Int]])(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res0: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
reduceLeft
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceLeft(accumulateLeft)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res1: List[String] = List(itemA,CATS,5,5, itemB,CATQ,8,11, itemC,CARC,5,10)
Similarly you can just interchange the variables in the general function so that it can be used with foldRight and reduceRight as
def accumulateRight(y: Map[String, Tuple3[String, Int, Int]], x: Map[String, Tuple3[String, Int, Int]]): Map[String, Tuple3[String, Int, Int]] ={
val key = y.keySet.toList(0)
if(x.keySet.contains(key)){
val oldTuple = x(key)
x.updated(key, (y(key)._1, oldTuple._2+y(key)._2, oldTuple._3+y(key)._3))
}
else{
x.updated(key, (y(key)._1, y(key)._2, y(key)._3))
}
}
and calling the function would give you
foldRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.foldRight(Map.empty[String, Tuple3[String, Int, Int]])(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res2: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
reduceRight
listInput1
.map(_.split(","))
.map(array => Map(array(0) -> (array(1), array(2).toInt, array(3).toInt)))
.reduceRight(accumulateRight)
.map(x => x._1+","+x._2._1+","+x._2._2+","+x._2._3)
.toList
//res3: List[String] = List(itemC,CARC,5,10, itemB,CATQ,8,11, itemA,CATs,5,5)
So you don't really need a groupBy and can use any of the foldLeft, foldRight, reduceLeft or reduceRight functions to get your desired output.
I am trying to find a key in a Map, given a value. I am using the 'find' function by not able to figure out the right predicate for it:
val colors = Map(1 -> "red", 2 -> "blue")
def keyForValue(map: Map[Int, String], value: String) = {
val bool = map.find{map.foreach{map.values(i) == value}}
bool.key
}
How do I iterate over the map and find the key when I know the value?
You use the same kind of predicate as with a List, but keep in mind you're evaluating it over (key,value) pairs, instead of just values (and getting a pair back as well!).
Simple example:
val default = (-1,"")
val value = "red"
colors.find(_._2==value).getOrElse(default)._1
The signature for find in Map is find(p: ((A, B)) ⇒ Boolean): Option[(A, B)]. So the predicate takes a Tuple2 and must return a Boolean. Note I changed value to an Int since the key in colors is also an Int.
scala> def keyForValue(map: Map[Int, String], value: Int) = {
| colors.find({case (a,b) => a == value})
| }
keyForValue: (map: Map[Int,String], value: Int)Option[(Int, String)]
Test:
scala> keyForValue(colors, 1)
res0: Option[(Int, String)] = Some((1,red))
You can also use get:
scala> colors.get(1)
res1: Option[String] = Some(red)
You can always go with the abstract solution and swap the keys with their values, store it in a new map, and then search the new map:
val colors = Map(1 -> "red", 2 -> "blue")
def keyForValue(map: Map[Int, String], value: String) = {
val revMap = map map {_.swap}
val key = revMap(value)
key
}
The third line swaps the keys with their values, and stores it in revMap. (map map means the name of the map, in this case, the parameter, map, then the word map, then {_.swap} to actually swap the keys with their values.
I would avoid passing the map to the find method, and rather pass only the map's keys to the find method.
This avoids dealing with an Option[Int,String] -- instead it is Option[Int].
// sample data
val colors = Map(1 -> "red", 2 -> "blue", 3 -> "yellow")
// function you need
def keyForValue(theMap: Map[Int, String], theValue: String): Int = {
val someKey = theMap.keys.find( k => theMap(k) == theValue )
someKey match {
case Some(key) => {
println(s"the map contains ${key} -> ${theValue}")
return key
}
case None => {
println(s"a key was not found for ${theValue}")
return -1
}
}
}
This gives:
scala> val result = keyForValue( colors, "blue" )
the map contains 2 -> blue
result: Int = 2
scala>