How to use MaxBy in Scala - scala

I have a list that I would like to group into group and then for each group get the max value. For example, a list of actions of user, get the last action per user.
case class UserActions(userId: String, actionId: String, actionTime: java.sql.Timestamp) extends Ordered[UserActions] {
def compare(that: UserActions) = this.actionTime.before(that.actionTime).compareTo(true)
}
val actions = List(UserActions("1","1",new java.sql.Timestamp(0L)),UserActions("1","1",new java.sql.Timestamp(1L)))
When I try the following groupBy:
actions.groupBy(_.userId)
I receive a Map
scala.collection.immutable.Map[String,List[UserActions]] = Map(1 -> List(UserActions(1,1,1970-01-01 00:00:00.0), UserActions(1,1,1970-01-01 00:00:00.001))
Which is fine, but when I try to add the maxBy I get an error:
actions.groupBy(_.userId).maxBy(_._2)
<console>:13: error: diverging implicit expansion for type
Ordering[List[UserActions]]
starting with method $conforms in object Predef
actions.groupBy(_.userId).maxBy(_._2)
What should I change?
Thanks
Nir

So you have a Map of String (userId) -> List[UserActions] and you want each list reduced to just its max element?
actions.groupBy(_.userId).mapValues(_.max)
//res0: Map[String,UserActions] = Map(1 -> UserActions(1,1,1969-12-31 16:00:00.0))
You don't need maxBy() because you've already added the information needed to order/compare different UserActions elements.
Likewise if you just want the max from the original list.
actions.max
You'd use maxBy() if you wanted the maximum as measured by some other parameter.
actions.maxBy(_.actionId.length)

Your compare method should be:
def compare(that: UserActions) = this.actionTime.compareTo(that.actionTime)
Then do actions.groupBy(_.userId).mapValues(_.max) as #jwvh shows.

You want to group values by using userId. You can do it by using groupBy().
actions.groupBy(_.userId)
Then you can take only values from key value pairs.
actions.groupBy(_.userId).values
You have used maxBy(_._2). So, I think You want to get max by comparing second value. You can do it by using map().
actions.groupBy(_.userId).values.map(_.maxBy(_._2))

Related

What is the error "error: type mismatch;" in scala?

import java.time.LocalDate
object Main extends App{
val scores: Seq[Score] = Seq(score1, score2, score3, score4)
println(getDate(scores)(LocalDate.of(2020, 1, 30))("Alice"))
def getDate(scoreSeq: Seq[Score]): Map[LocalDate, Map[String, Int]] = scores.groupMap(score => score.date)(score=>Map(score.name -> (score.english+score.math+score.science)))
}
I would like to implement a function that maps the examination date to a map of student names and the total scores of the three subjects on that date, and if there are multiple scores for the same student on the same date, the function returns the one with the highest total score. However, here is the function
found :scala.collection.immutable.Map[java.time.LocalDate,Seq[scala.collection.immutable.Map[String,Int]]]]
"required: Map[java.time.LocalDate,Map[String,Int]]".
How can I resolve this?
The error means what it says: The type of the variable and the type of the value being assigned to the variable don't match. It even tells you what the two types are!
The actual problem is that groupMap isn't returning the type you think it should. The values in the resulting Map are a Seq of Maps rather than a single Map with all the results.
You can use groupMapReduce to convert the List to a single Map by concatenation the Maps using ++:
scores
.groupMapReduce(_.date)
(score=>Map(score.name -> (score.english+score.math+score.science)))
(_ ++ _)

How to find the mean of values in an array in Scala - Apache Spark

I have an array of values as shown below:
scala> number.take(5)
res1: Array[Any] = Array(908.76, 901.74, 83.71, 39.36, 234.64)
I need to find the mean value of the array using RDD method.
I have tried using number.mean() method but it keeps giving me following error:
error: could not find implicit value for parameter num: Numeric[Any]
I am new to Spark, please provide some suggestions. Thank you.
That's not Spark related. Compiler gives you a hint - there is no .mean() method for Array[Any] because it requires that elements of Array must be Numeric.
It means that it would work if it was an Array of Double or Ints.
number.take(5) returned Array[Any] because somewhere above it you provided no guarantee that Array will contain only Numeric elements.
If you can't provide that guarantee, then you have to map over that array and explicitly cast all these values to Double or other Numeric type of your choice.
implicit class AnyExtended(value: Any) {
def toDoubleO: Option[Double] = {
Try(value.toDouble).toOption
}
}
val array: Array[Double] = number.take(5).flatMap(_.toDoubleO)
val mean: Double = array.mean
Note that instead of using basic .toDouble I've written implicit extension because .toDouble can fail and throw an exception. Instead, we can wrap that into Try and turn into Option - in case of exception we'll get None and this value will be skipped from computation of the mean due to flatMap
If you are happy to convert to a DF, then spark will do this for you with minimal effort.
val number = List(908.76, 901.74, 83.71, 39.36, 234.64)
val numberRDD = sc.parallelize(number)
numberRDD.toDF("x").agg(avg(col("x")))
res1.show
This will produce the answer 433.642

Scala: list to set using flatMap

I have class with field of type Set[String]. Also, I have list of objects of this class. I'd like to collect all strings from all sets of these objects into one set. Here is how I can do it already:
case class MyClass(field: Set[String])
val list = List(
MyClass(Set("123")),
MyClass(Set("456", "798")),
MyClass(Set("123", "798"))
)
list.flatMap(_.field).toSet // Set(123, 456, 798)
It works, but I think, I can achieve the same using only flatMap, without toSet invocation. I tried this, but it had given compilation error:
// error: Cannot construct a collection of type Set[String]
// with elements of type String based on a collection of type List[MyClass].
list.flatMap[String, Set[String]](_.field)
If I change type of list to Set (i.e., val list = Set(...)), then such flatMap invocation works.
So, can I use somehow Set.canBuildFrom or any other CanBuildFrom object to invoke flatMap on List object, so that I'll get Set as a result?
The CanBuildFrom instance you want is called breakOut and has to be provided as a second parameter:
import scala.collection.breakOut
case class MyClass(field: Set[String])
val list = List(
MyClass(Set("123")),
MyClass(Set("456", "798")),
MyClass(Set("123", "798"))
)
val s: Set[String] = list.flatMap(_.field)(breakOut)
Note that explicit type annotation on variable s is mandatory - that's how the type is chosen.
Edit:
If you're using Scalaz or cats, you can use foldMap as well:
import scalaz._, Scalaz._
list.foldMap(_.field)
This does essentially what mdms answer proposes, except the Set.empty and ++ parts are already baked in.
The way flatMap work in Scala is that it can only remove one wrapper for the same type of wrappers i.e. List[List[String]] -> flatMap -> List[String]
if you apply flatMap on different wrapper data types then you will always get the final outcome as higher level wrapper data type i.e.List[Set[String]] -> flatMap -> List[String]
if you want to apply the flatMap on different wrapper type i.e. List[Set[String]] -> flatMap -> Set[String] in you have 2 options :-
Explicitly cast the one datatype wrapper to another i.e. list.flatMap(_.field).toSet or
By providing implicit converter ie. implicit def listToSet(list: List[String]): Set[String] = list.toSet and the you can get val set:Set[String] = list.flatMap(_.field)
only then what you are trying to achieve will be accomplished.
Conclusion:- if you apply flatMap on 2 wrapped data type then you will always get the final result as op type which is on top of wrapper data type i.e. List[Set[String]] -> flatMap -> List[String] and if you want to convert or cast to different datatype then either you need to implicitly or explicitly cast it.
You could maybe provide a specific CanBuildFrom, but why not to use a fold instead?
list.foldLeft(Set.empty[String]){case (set, myClass) => set ++ myClass.field}
Still just one pass through the collection, and if you are sure the list is not empty, you could even user reduceLeft instead.

Scala compiler says: value & is not a member of java.lang.object

I am very new to Scala, I try some tutorials but I didn't get the point in this problem here:
val reverse = new mutable.HashMap[String, String]() with mutable.SynchronizedMap[String, String]
def search(query: String) = Future.value {
val tokens = query.split(" ")
val hits = tokens map { token => reverse.getOrElse(token, Set()) }
if (hits.isEmpty)
Nil
else
hits reduceLeft {_ & _} toList // value & is not a member of java.lang.Object
}
The compiler says value & is not a member of java.lang.Object. Can somebody explain me why I am getting a compiler error ? I have this from this tutorial here: https://twitter.github.io/scala_school/searchbird.html
"tokens" is of type Array[String]. Now, when you iterate over the array, there are two possibilities. Either reverse will have a value for the token or not. If it has, then the Array element get a string value otherwise an empty set.
For example: Lets say reverse has two values - ("a" -> "a1", "b" ->"b1") a maps to a1 and b maps to b1.
Suppose, The query string is "a c".
tokens will be ["a","c"] after splitting.
After mapping you will get in array ["a1", Set()] (a got mapped to a1 and there is no value for "c" in the map hence, you got an empty Set())
Now, the overall type of the hits array is Array[Object].
So, now you are getting an error as the last line will be "&" operator on 2 Objects according to the compiler.
Mohit Has The right answer, you end up with an Array of objects. This is because your HashMap for reverse has a value type of String, so it'll return a String for a given key. Your getOrElse however, will return a Set type if the key is not found in your HashMap reverse. These need to return the same type so you don't end up with an Array[Objects]
If you notice a few lines above in the tutorial you linked, reverse is defined as follows:
val reverse = new mutable.HashMap[String, Set[String]] with mutable.SynchronizedMap[String, Set[String]]

How to create an lookup Map in Scala

While I know there's a few ways to do this, I'm most interested in finding the most idiomatic and functional Scala method.
Given the following trite example:
case class User(id: String)
val users = List(User("1"), User("2"), User("3"), User("4"))
What's the best way to create an immutable lookup Map of user.id -> User so that I can perform quick lookups by user.id.
In Java I'd probably use Google-Collection's Maps.uniqueIndex although its unique property I care less about.
You can keep the users in a List and use list.find:
users.find{_.id == "3"} //returns Option[User], either Some(User("3")) or None if no such user
or if you want to use a Map, map the list of users to a list of 2-tuples, then use the toMap method:
val umap = users.map{u => (u.id, u)}.toMap
which will return an immutable Map[String, User], then you can use
umap contains "1" //return true
or
umap.get("1") //returns Some(User("1"))
If you're sure all IDs are unique, the canonical way is
users.map(u => (u.id, u)).toMap
as #Dan Simon said. However, if you are not sure all IDs are unique, then the canonical way is:
users.groupBy(_.id)
This will generate a mapping from user IDs to a list of users that share that ID.
Thus, there is an alternate not-entirely-canonical way to generate the map from ID to single users:
users.groupBy(_.id).mapValues(_.head)
For expert users who want to avoid the intermediate step of creating a map of lists, or a list which then gets turned into a map, there is the handy scala.collecion.breakOut method that builds the type that you want if there's a straightforward way to do it. It needs to know the type, though, so this will do the trick:
users.map(u => (u.id,u))(collection.breakOut): Map[String,User]
(You can also assign to a var or val of specified type.)
Convert the List into a Map and use it as a function:
case class User(id: String)
val users = List(User("1"), User("2"), User("3"))
val usersMap = users map { case user # User(id) => id -> user } .toMap
usersMap("1") // Some(User("1"))
usersMap("0") // None
If you would like to use a numeric index:
scala> users.map (u=> u.id.toInt -> u).toMap
res18: scala.collection.immutable.Map[Int,User] =
Map((1,User(1)), (2,User(2)), (3,User(3)))
Maps are functions too, their apply method provides access to the value associated with a particular key (or a NoSuchElementException is thrown for an unknown key) so this makes for a very clean lookup syntax. Following on from Dan Simon's answer and using a more semantically meaningful name:
scala> val Users = users map {u => (u.id, u)} toMap
Users: scala.collection.immutable.Map[String,User] = Map((1,User(1)), (2,User(2)), (3,User(3)))
which then provides the following lookup syntax:
scala> val user2 = Users("2")
user2: User = User(2)