How to use 'groupBy' method on a list of Int, & Strings - scala

I managed to group a List of Strings by length:
List("Every", "student", "likes", "Scala").groupBy((element: String) => element.length)
I want to group a tuple (i.e),
("Every", "student", "likes", "Scala", 1, 5, 54, 0, 1, 0)

The groupBy method takes a predicate function as its parameter and uses it to group elements by key and values into a Map collection.
As per the Scala documentation, the definition of the groupBy method is as follows:
groupBy[K](f: (A) ⇒ K): immutable.Map[K, Repr]
Hence assuming that you have a tuple of Int and String and you want to groupBy Strings, I would perform the following steps
1. Create a list from tuple
2. Filter out types other than String
3. Apply groupBy on the list
The code for this as follows :-
("Every", "student", "likes", "Scala", 1, 5, 54, 0, 1, 0)
.productIterator.toList
.filter(_.isInstanceOf[String])
.map(_.asInstanceOf[String])
.groupBy((element: String) => element.length)

Related

Use aggregateBykey or reduceByKey to get aggregated records for a key

I have the input data of the format
RDD[
(Map1, RecordA),
(Map2, RecordX),
(Map1, RecordB),
(Map2, RecordY),
(Map1, RecordC),
(Map2, RecordZ)
]
Expected out format is (List of RDDs) :
List[
RDD[RecordA, RecordB, RecordC],
RDD[RecordX, RecordY, RecordZ]
]
I want the inner RDDs to be grouped by the key that is Map1, Map2 and I want to create a outter List as a collection of inner RDDs.
I tried using reduceByKey API and aggregateByKey API and have not been successful so far!
Real World example :
RDD[
(Map("a"->"xyz", "b"->"per"), CustomRecord("test1", 1, "abc")),
(Map("a"->"xyz", "b"->"per"), CustomRecord("test2", 1, "xyz")),
(Map("a"->"xyz", "b"->"lmm"), CustomRecord("test3", 1, "blah")),
(Map("a"->"xyz", "b"->"lmm"), CustomRecord("test4", 1, "blah"))
]
final case class CustomRecord(
string1: String,
int1: Int,
string2: String)
Appreciate your help.

Extract list to multiple distinct list

How can I extract Scala list to List with multiple distinct list in Scala?
From
val l = List(1,2,6,3,5,4,4,3,4,1)
to
List(List(1,2,3,4,5,6),List(1,3,4),List(4))
Here's a (rather inefficient) way to do this: group by value, sort result by size of group, then use first group as a basis for per-index scan of the original groups to build the distinct lists:
scala> val l = List(1,2,6,3,5,4,4,3,4,1)
l: List[Int] = List(1, 2, 6, 3, 5, 4, 4, 3, 4, 1)
scala> val groups = l.groupBy(identity).values.toList.sortBy(- _.size)
groups: List[List[Int]] = List(List(4, 4, 4), List(1, 1), List(3, 3), List(5), List(6), List(2))
scala> groups.head.zipWithIndex.map { case (_, i) => groups.flatMap(_.drop(i).headOption) }
res9: List[List[Int]] = List(List(4, 1, 3, 5, 6, 2), List(4, 1, 3), List(4))
An alternative approach after grouping like in the first answer by #TzachZohar is to keep taking one element from each list until all lists are empty:
val groups = l.groupBy(identity).values
Iterator
// continue removing the first element from every sublist, and discard empty tails
.iterate(groups)(_ collect { case _ :: (rest # (_ :: _)) => rest } )
// stop when all sublists become empty and are removed
.takeWhile(_.nonEmpty)
// build and sort result lists
.map(_.map(_.head).toList.sorted)
.toList
And here's another option - scanning the input N times with N being the largest amount of repetitions of a single value:
// this function splits input list into two:
// all duplicate values, and the longest list of unique values
def collectDistinct[A](l: List[A]): (List[A], List[A]) = l.foldLeft((List[A](), List[A]())) {
case ((remaining, distinct), candidate) if distinct.contains(candidate) => (candidate :: remaining, distinct)
case ((remaining, distinct), candidate) => (remaining, candidate :: distinct)
}
// this recursive function takes a list of "remaining" values,
// and a list of distinct groups, and adds distinct groups to the list
// until "remaining" is empty
#tailrec
def distinctGroups[A](remaining: List[A], groups: List[List[A]]): List[List[A]] = remaining match {
case Nil => groups
case _ => collectDistinct(remaining) match {
case (next, group) => distinctGroups(next, group :: groups)
}
}
// all second function with our input and an empty list of groups to begin with:
val result = distinctGroups(l, List())
Consider this approach:
trait Proc {
def process(v:Int): Proc
}
case object Empty extends Proc {
override def process(v:Int) = Processor(v, Map(0 -> List(v)), 0)
}
case class Processor(prev:Int, map:Map[Int, List[Int]], lastTarget:Int) extends Proc {
override def process(v:Int) = {
val target = if (prev==v) lastTarget+1 else 0
Processor(v, map + (target -> (v::map.getOrElse(target, Nil))), target)
}
}
list.sorted.foldLeft[Proc](Empty) {
case (acc, item) => acc.process(item)
}
Here we have simple state machine. We iterate over sorted list with initial state 'Empty'. Once 'Empty' processes item, it produces next state 'Processor'.
Processor has previous value in 'prev' and accumulated map of already grouped items. It also has lastTarget - the index of list where last write occured.
The only thing 'Processor' does is calculating the target for current processing item: if it is the same as previous, it takes next index, otherwise it starts from the beginning with index 0.

Scala Array of different types of values

Writing a function in Scala that accepts an Array/Tuples/Seq of different types of values and sorts it based on first two values in each:
def sortFunction[T](input: Array[T]) = input(0)+ " " + input(1)
The input values I have are as below:
val data = Array((1, "alpha",88.9), (2, "alpha",77), (2, "beta"), (3, "alpha"), (1, "gamma",99))
Then I call the sortFunction as:
data.sortWith(sortFunction)
It is giving below errors:
- polymorphic expression cannot be instantiated to expected type; found : [T]scala.collection.mutable.Seq[T] ⇒ Int required: ((Int, String)) ⇒ ? Error occurred in an application involving default arguments.
- type mismatch; found : scala.collection.mutable.Seq[T] ⇒ Int required: ((Int, String)) ⇒ ? Error occurred in an application involving default arguments.
What am I doing wrong, or how do I get around this? I would be grateful for any suggestions.
If you know type of element in Array[T], you can use pattern matching (when same type). But If You don't know, Program can't decide how to sort your data.
One of methods is just String compare like below.
object Hello{
def sortFunction[T](input1: T, input2: T) =
input1 match {
case t : Product =>
val t2 = input2.asInstanceOf[Product]
t.productElement(0).toString < t2.productElement(0).toString
case v => input1.toString > input2.toString
}
def main(args: Array[String]): Unit = {
val data = Array((1, "alpha",88.9), (2, "alpha",77), (2, "beta", 99), (3, "alpha"), (1, "gamma",99))
println(data.sortWith(sortFunction).mkString)
}
}
If you want to know Product tarit, see http://www.scala-lang.org/api/rc2/scala/Product.html
If you have an Array of tuples that all have the same arity, such as tuples of (Int, String), than your sort function could look something like
def sortFunction[T](fst: (Int, String), scd: (Int, String)) = fst._1 < scd._1 // sort by first element
However, since you have an Array of tuples of varying arity, Scala compiler can only put this under nearest common type Product. Then you can sort like this:
def sortFunction[T](fst: (Product), scd: (Product)) = fst.productElement(1).toString < scd.productElement(1).toString
val data = Array((1, "alpha", 99), (2, "alpha"), (2, "beta"), (3, "alpha"), (1, "gamma"))
data.sortWith(sortFunction) // List((1,alpha,99), (2,alpha), (3,alpha), (2,beta), (1,gamma))
Note that this is really bad design though. You should create an abstract data type that encapsulates your data in a more structured way. I can't say what it should look like since I don't know where and how you are getting this information, but here's an example (called Foo, but you should of course name it meaningfully):
case class Foo(index: Int, name: String, parameters: List[Int])
I just assumed that the first element in each piece of data is "index" and second one is "name". I also assumed that the rest of those elements inside will always be integers and that there may be zero, one or more of them, so hence a List (if it's only zero or one, better choice would be Option).
Then you could sort as:
def sortFunction[T](fst: Foo, scd: Foo) = fst.index < scd.index
or
def sortFunction[T](fst: Foo, scd: Foo) = fst.name < scd.name

Scala Tuple to String (using mkString)

Suppose I have a list of tuples
('a', 1), ('b', 2)...
How would one get about converting it to a String in the format
a 1
b 2
I tried using collection.map(_.mkString('\t')) However I'm getting an error since essentially I'm applying the operation to a tuple instead of a list. Using flatMap didn't help either
For Tuple2 you can use:
val list = List(("1", 4), ("dfg", 67))
list.map { case (str, int) => s"$str $int"}
For any tuples try this code:
val list = List[Product](("dfsgd", 234), ("345345", 345, 456456))
list.map { tuple =>
tuple.productIterator.mkString("\t")
}

How to sort scala maps by length of the key (assuming keys are strings)

I want to sort a scala map by key length. The map looks something like:
val randomMap = Map("short" -> "randomVal1", "muchlonger" -> "randomVal2")
I want to sort randomMap such that when I iterate over it, I will start with the longest key first...so it should iterate over the "muchlonger" element first.
Convert it to a sequence of key/value pairs and apply a sorting criteria. eg.:
randomMap.toSeq.sortBy(_._1.length).reverse
(reverse because it sorts by shortest to longest by default).
One option would be to define a custom ordering for a TreeMap. TreeMap is a sorted implementation of Map
import scala.collection.immutable.TreeMap
implicit object LengthOrder extends Ordering[String] {
def compare(s1: String, s2: String) = s1.length - s2.length
}
val randomMap = TreeMap("111" -> "111", "1" -> "1", "11" -> "11")
//randomMap: TreeMap[String,String] = Map(1 -> 1, 11 -> 11, 111 -> 111)
val keys = randomMap.keys
//keys: Iterable[String] = Set(1, 11, 111)
Note that this will affect all TreeMap[String]s where LengthOrder is in scope. In your project you could nest LengthOrder in another object (or put it in its own package) and then only import it inside the specific code blocks that need it.
Edit:
#Bharadwaj made a good point about how this would destroy all but one keys that have the same length. Something like this would fix this issue:
implicit object LengthOrder extends Ordering[String] {
def compare(s1: String, s2: String) = s1.length - s2.length match {
case 0 => s1.compareTo(s2)
case x => x
}
}