A newbie qustion. I try to learn Scala from examples , I found some Spark code that creates AssociationRules source code here
def run[Item: ClassTag](freqItemsets: RDD[FreqItemset[Item]]): RDD[Rule[Item]] = {
// For candidate rule X => Y, generate (X, (Y, freq(X union Y)))
val candidates = freqItemsets.flatMap { itemset =>
val items = itemset.items
items.flatMap { item =>
items.partition(_ == item) match {
case (consequent, antecedent) if !antecedent.isEmpty =>
Some((antecedent.toSeq, (consequent.toSeq, itemset.freq)))
case _ => None
}
}
}
I try to undersatnd how the run function works and how the algorithm understands incase that antecedent is X and consequent is Y. How are the items divided?
Another question: how does the join function work (below)? is freqAntecedent is x.freq ? How does freqUnion apprear in map?
candidates.join(freqItemsets.map(x => (x.items.toSeq, x.freq)))
.map { case (antecendent, ((consequent, freqUnion), freqAntecedent)) =>
new Rule(antecendent.toArray, consequent.toArray, freqUnion, freqAntecedent)
}.filter(_.confidence >= minConfidence)
Thanks for any help !
generate (X, (Y, freq(X union Y))) means that the items are pairs (2-Tuple) of values. A Tuple has an unapply method that allows pattern matching on it, i.e exactly what you see in the case statement. Any time a class implements unapply, you can use it in a case statement where you can break it up into its attributes and assign each attribute to a variable.
Related
In my code, I very often need to process a list by performing operations on an internal model. For each processed element, the model is returned and then a 'new' model is used for the next element of the list.
Usually, I implement this by using a tail recursive method:
def createCar(myModel: Model, record: Record[Any]): Either[CarError, Model] = {
record match {
case c: Car =>
// Do car stuff...
val newModel: Model = myModel.createCar(record)
Right(newModel)
case _ => Left(CarError())
}
}
#tailrec
def processCars(myModel: Model, records: List[Record[Any]]): Either[CarError, Model] =
records match {
case x :: xs =>
createCar(myModel, x) match {
case Right(m) => processCars(m, xs)
case e#Left(_) => e
}
case Nil => Right(myModel)
}
Since I keep repeating this kind of pattern, I am searching for ways to make it more concise and more functional (i.e., the Scala way).
I have looked into foldLeft, but cannot get it to work with Either:
recordsList.foldLeft(myModel) { (m, r) =>
// Do car stuff...
Right(m)
}
Is foldLeft a proper replacement? How can I get it to work?
Following up on my earlier comment, here's how to unfold() to get your result. [Note: Scala 2.13.x]
def processCars(myModel: Model
,records: List[Record[_]]
): Either[CarError, Model] =
LazyList.unfold((myModel,records)) { case (mdl,recs) =>
recs.headOption.map{
createCar(mdl, _).fold(Left(_) -> (mdl,Nil)
,m => Right(m) -> (m,recs.tail))
}
}.last
The advantage here is:
early termination - Iterating through the records stops after the 1st Left is returned or after all the records have been processed, whichever comes first.
memory efficient - Since we're building a LazyList, and nothing is holding on to the head of the resulting list, every element except the last should be immediately released for garbage collection.
You can do that using fold like that:
def processCars(myModel: Model, records: List[Record[Any]]): Either[CarError, Model] = {
records.foldLeft[Either[CarError, Model]](Right(myModel))((m, r) => {
m.fold(Left.apply, { model =>
createCar(model, r).fold(Left.apply, Right.apply)
})
})
}
I'm using pattern matching in scala a lot. Many times I need to do some calculations in guard part and sometimes they are pretty expensive. Is there any way to bind calculated values to separate value?
//i wan't to use result of prettyExpensiveFunc in body safely
people.collect {
case ...
case Some(Right((x, y))) if prettyExpensiveFunc(x, y) > 0 => prettyExpensiveFunc(x)
}
//ideally something like that could be helpful, but it doesn't compile:
people.collect {
case ...
case Some(Right((x, y))) if {val z = prettyExpensiveFunc(x, y); y > 0} => z
}
//this sollution works but it isn't safe for some `Seq` types and is risky when more cases are used.
var cache:Int = 0
people.collect {
case ...
case Some(Right((x, y))) if {cache = prettyExpensiveFunc(x, y); cache > 0} => cache
}
Is there any better solution?
ps: Example is simplified and I don't expect anwers that shows that I don't need pattern matching here.
You can use cats.Eval to make expensive calculations lazy and memoizable, create Evals using .map and extract .value (calculated at most once - if needed) in .collect
values.map { value =>
val expensiveCheck1 = Eval.later { prettyExpensiveFunc(value) }
val expensiveCheck2 = Eval.later { anotherExpensiveFunc(value) }
(value, expensiveCheck1, expensiveCheck2)
}.collect {
case (value, lazyResult1, _) if lazyResult1.value > 0 => ...
case (value, _, lazyResult2) if lazyResult2.value > 0 => ...
case (value, lazyResult1, lazyResult2) if lazyResult1.value > lazyResult2.value => ...
...
}
I don't see a way of doing what you want without creating some implementation of lazy evaluation, and if you have to use one, you might as well use existing one instead of rolling one yourself.
EDIT. Just in case you haven't noticed - you aren't losing the ability to pattern match by using tuple here:
values.map {
// originial value -> lazily evaluated memoized expensive calculation
case a # Some(Right((x, y)) => a -> Some(Eval.later(prettyExpensiveFunc(x, y)))
case a => a -> None
}.collect {
// match type and calculation
...
case (Some(Right((x, y))), Some(lazyResult)) if lazyResult.value > 0 => ...
...
}
Why not run the function first for every element and then work with a tuple?
Seq(1,2,3,4,5).map(e => (e, prettyExpensiveFunc(e))).collect {
case ...
case (x, y) if y => y
}
I tried own matchers and effect is somehow OK, but not perfect. My matcher is untyped, and it is bit ugly to make it fully typed.
class Matcher[T,E](f:PartialFunction[T, E]) {
def unapply(z: T): Option[E] = if (f.isDefinedAt(z)) Some(f(z)) else None
}
def newMatcherAny[E](f:PartialFunction[Any, E]) = new Matcher(f)
def newMatcher[T,E](f:PartialFunction[T, E]) = new Matcher(f)
def prettyExpensiveFunc(x:Int) = {println(s"-- prettyExpensiveFunc($x)"); x%2+x*x}
val x = Seq(
Some(Right(22)),
Some(Right(10)),
Some(Left("Oh now")),
None
)
val PersonAgeRank = newMatcherAny { case Some(Right(x:Int)) => (x, prettyExpensiveFunc(x)) }
x.collect {
case PersonAgeRank(age, rank) if rank > 100 => println("age:"+age + " rank:" + rank)
}
https://scalafiddle.io/sf/hFbcAqH/3
I find myself constantly doing things like the following:
val adjustedActions = actions.scanLeft((1.0, null: CorpAction)){
case ((runningSplitAdj, _), action) => action match {
case Dividend(date, amount) =>
(runningSplitAdj, Dividend(date, amount * runningSplitAdj))
case s # Split(date, sharesForOne) =>
((runningSplitAdj * sharesForOne), s)
}
}
.drop(1).map(_._2)
Where I need to accumulate the runningSplitAdj, in this case, in order to correct the dividends in the actions list. Here, I use scan to maintain the state that I need in order to correct the actions, but in the end, I only need the actions. Hence, I need to use null for the initial action in the state, but in the end, drop that item and map away all the states.
Is there a more elegant way of structuring these? In the context of RxScala Observables, I actually made a new operator to do this (after some help from the RxJava mailing list):
implicit class ScanMappingObs[X](val obs: Observable[X]) extends AnyVal {
def scanMap[S,Y](f: (X,S) => (Y,S), s0: S): Observable[Y] = {
val y0: Y = null.asInstanceOf[Y]
// drop(1) because scan also emits initial state
obs.scan((y0, s0)){case ((y, s), x) => f(x, s)}.drop(1).map(_._1)
}
}
However, now I find myself doing it to Lists and Vectors too, so I wonder if there is something more general I can do?
The combinator you're describing (or at least something very similar) is often called mapAccum. Take the following simplified use of scanLeft:
val xs = (1 to 10).toList
val result1 = xs.scanLeft((1, 0.0)) {
case ((acc, _), i) => (acc + i, i.toDouble / acc)
}.tail.map(_._2)
This is equivalent to the following (which uses Scalaz's implementation of mapAccumLeft):
xs.mapAccumLeft[Double, Int](1, {
case (acc, i) => (acc + i, i.toDouble / acc)
})._2
mapAccumLeft returns a pair of the final state and a sequence of the results at each step, but it doesn't require you to specify a spurious initial result (that will just be ignored and then dropped), and you don't have to map over the entire collection to get rid of the state—you just take the second member of the pair.
Unfortunately mapAccumLeft isn't available in the standard library, but if you're looking for a name or for ideas about implementation, this is a place to start.
I have a map m
val m = Map(1->2, 3->4, 5->6, 7->8, 4->4, 9->9, 10->12, 11->11)
Now i want a map whose keys are equal to the values. So i do this
def eq(k: Int, v: Int) = if (k == v) Some(k->v) else None
m.flatMap((k,v) => eq(k,v))
This gives me the error
error: wrong number of parameters; expected = 1
m.flatMap((k,v) => eq(k,v))
Whats wrong with the above code? flatMap expects a one argument function and here i am passing one argument which is a Pair of integers.
Also this works
m.flatMap {
case (k,v) => eq(k,v)
}
but this does not
m.flatMap {
(k,v) => eq(k,v)
}
Looks like i am missing something. Help?
There is no such syntax:
m.flatMap((k,v) => eq(k,v))
Well, in fact there is such syntax, but actually it is used in functions that accept two arguments (like reduce):
List(1,2,3,4).reduce((acc, x) => acc + x)
The
m.flatMap {
case (k,v) => eq(k,v)
}
syntax works because in fact it is something like this:
val temp: PartialFunction[Tuple2[X,Y], Tuple2[Y,X]] = {
case (k,v) => eq(k,v) // using literal expression to construct function
}
m.flatMap(temp) // with braces ommited
They key thing here is the usage of case word (actually, there is a discussion to enable your very syntax) which turns usual braces expression, like { ... } into full blown anonymous partial function
(If you want to simply fix the error you're getting, see the 2nd solution (with flatMap); if you want a generally nicer solution, read from the beginning.)
What you need instead is filter not flatMap:
def eq(k: Int, v: Int) = k == v
val m = Map(1->2, 3->4, 5->6, 7->8, 4->4, 9->9, 10->12, 11->11)
m.filter((eq _).tupled)
...which of course reduces to just the following, without the need for eq:
m.filter { case (k, v) => k == v }
result:
Map(9 -> 9, 11 -> 11, 4 -> 4)
OR... If you want to stick with flatMap
First you must know that flatMap will pass to your function TUPLES not keys and values as separate arguments.
Additionally, you must change the Option returned by eq to something that can be fed back to flatMap on sequences such as List or Map (actually any GenTraversableOnce to be precise):
def eq(k: Int, v: Int) = if (k == v) List(k -> v) else Nil
m.flatMap { case (k,v) => eq(k,v) } // use pattern matching to unpack the tuple
or the uglier but equivalent:
m.flatMap { x => eq(x._1, x._2) }
alternatively, you can convert eq to take a tuple instead:
m.flatMap((eq _).tupled)
I think that what you want is a single argument that will be a couple, not two arguments. Something like this may work
m.flatMap(k => eq(k._1, k._2))
The code snippet that works uses pattern matching. You give names to both elements of your couple. It's a partial function and can be use here in your flatMap.
You have to do:
m.flatMap { case (k,v) => eq(k,v) }
Note that here I switch to curly braces, which indicates a function block rather than parameters, and the function here is a case statement. This means that the function block I'm passing to flatMap is a partialFunction that is only invoked for items that match the case statement.
Your eq function takes two parameters, that is why you are getting the type error. Try:
def f(p: (Int, Int)) = if (p._1 == p._2) Some(p) else None
m flatMap f
Suppose I have two Options and, if both are Some, execute one code path, and if note, execute another. I'd like to do something like
for (x <- xMaybe; y <- yMaybe) {
// do something
}
else {
// either x or y were None, handle this
}
Outside of if statements or pattern matching (which might not scale if I had more than two options), is there a better way of handling this?
Very close to your syntax proposal by using yield to wrap the for output in an Option:
val result = {
for (x <- xMaybe; y <- yMaybe) yield {
// do something
}
} getOrElse {
// either x or y were None, handle this
}
The getOrElse block is executed only if one or both options are None.
You could pattern match both Options at the same time:
(xMaybe, yMaybe) match {
case (Some(x), Some(y)) => "x and y are there"
case _ => "x and/or y were None"
}
The traverse function in Scalaz generalises your problem here. It takes two arguments:
T[F[A]]
A => F[B]
and returns F[T[B]]. The T is any traversable data structure such as List and the F is any applicative functor such as Option. Therefore, to specialise, your desired function has this type:
List[Option[A]] => (A => Option[B]) => Option[List[B]]
So put all your Option values in a List
val z = List(xMaybe, yMaybe)
Construct the function got however you want to collection the results:
val f: X => Option[Y] = ...
and call traverse
val r = z traverse f
This programming patterns occurs very often. It has a paper that talks all about it, The Essence of the Iterator Pattern.
note: I just wanted to fix the URL but the CLEVER edit help tells me I need to change at least 6 characters so I include this useful link too (scala examples):
http://etorreborre.blogspot.com/2011/06/essence-of-iterator-pattern.html
Why would something like this not work?
val opts = List[Option[Int]](Some(1), None, Some(2))
if (opts contains None) {
// Has a None
} else {
// Launch the missiles
val values = opts.map(_.get) // We know that there is no None in the list so get will not throw
}
If you don't know the number of values you are dealing with, then Tony's answer is the best. If you do know the number of values you are dealing with then I would suggest using an applicative functor.
((xMaybe |#| yMaybe) { (x, y) => /* do something */ }).getOrElse(/* something else */)
You said you want the solution to be scalable:
val optional = List(Some(4), Some(3), None)
if(optional forall {_.isDefined}) {
//All defined
} else {
//At least one not defined
}
EDIT: Just saw that Emil Ivanov's solution is a bit more elegant.
Starting Scala 2.13, we can alternatively use Option#zip which concatenates two options to Some tuple of their values if both options are defined or else None:
opt1 zip opt2 match {
case Some((x, y)) => "x and y are there"
case None => "x and/or y were None"
}
Or with Option#fold:
(opt1 zip opt2).fold("x and/or y were None"){ case (x, y) => "x and y are there" }
For scaling to many options, try something along these lines:
def runIfAllSome[A](func:(A)=>Unit, opts:Option[A]*) = {
if(opts.find((o)=>o==None) == None) for(opt<-opts) func(opt.get)
}
With this, you can do:
scala> def fun(i:Int) = println(i)
fun: (i: Int)Unit
scala> runIfAllSome(fun, Some(1), Some(2))
1
2
scala> runIfAllSome(fun, None, Some(1))
scala>
I think the key point here is to think in term of types as what you want to do. As I understand it you want to iterate over a list of Option pairs and then do something based on a certain condition. So the interesting bit of your question would be , what would the return type look like you would except? I think it would look something like this: Either[List[Option], List [Option,Option]] . on the error side (left) you would accumulate the option which was paired with a None (and was left alone so to speak) . On the right side you sum the non empty options which represent your successful values. So we would just need a function which does exactly that. Validate each pair and accumulate it according to it's result( success - failure) . I hope this helps , if not please explain in more detail your usecase. Some links to implement what I described : http://applicative-errors-scala.googlecode.com/svn/artifacts/0.6/pdf/index.pdf and : http://blog.tmorris.net/automated-validation-with-applicatives-and-semigroups-for-sanjiv/