scala: accumulate a var from collection in a functional manner (that is, no vars) - scala

this is a newbie question
I have the following code:
var total = 0L
docs.foreach(total += _.length)
in docs I have a collection of objects with the .length property
I'd like something like:
val total = docs.[someScalaMethod](0, (element, acum) => acum + element.length )
I mean, a method that iterates each element passing an accumulator variable...
The first zero I pass should be the initial value of the accumulator var..
How can it be achieved?

This called a fold. It's almost exactly what you stated:
docs.foldLeft(0)((accum, element) => accum + element.length)
for the version that traverses the collection from left to right (usually preferable; right to left is foldRight, and 2.9 has a fold that can start anywhere, but has limitations on how it can transform the type).
Once you get used to this, there is a short-hand version of fold left, where the accumulator goes on the left (think of it being pushed from left to right through the list), and you use placeholders for the variable names since you only use them once each: (0 /: docs)(_ + _.length)

docs map { _.length } reduce { _ + _ }
or (the thx goes to Luigi Plinge)
docs.map(_.length).sum

Here is a Scalaz version:
docs.foldMap(_.length)
This is equivalent to docs.map(_.length).sum but takes only one pass. Also, works with all monoids.

Related

Filtering a Scala List

I need to create function that takes double and returns new list, based on the first one, that includes absolute values of elements grom the first list that belongs to the range of <-5,12>. I need to use filtering. I have an idea, but it's not working. I'm sorry, maybe my question is easy, but I'm a begginer :)
var numbersReal = List(2.25, -1, -3, 7.32, 0.25, -6, 0, 2, 0, 1, 0, 2.99, 3.02, 0)
def magicFilter(list: List[Double]): List[Double] = {
var newList = List[Double]()
list.foreach {element => if (-5 <= element && element <= 12) newList += scala.math.abs(element) }
newList.toList
}
println(magicFilter(numbersReal))
Best Practice Solution
You can do this easily with a combination of
filter: keep only elements that satisfy a given predicate / condition. For us, it will be keeping only elements in [-5,12]
map: apply a function to every element. For us, it will be taking the absolute value.
numbersReal.filter(e => e >= -5 && e <= 12).map(math.abs)
Another way to achieve this in "one-shot" is to use collect which combines both filter and map:
numbersReal.collect { case e if e >= 5 && e <= 12 => math.abs(e) }
I personally find the first solution to be more readable in this particular case, but that's a matter of opinion.
Usually, these problems can be solved without resorting to a var or any mutable collection. Scala's collections are one of its greatest assets because they include a lot of these primitive operations, and most problems can be solved by combining them.
Note regarding your proposed solution
Your solution is not wrong per-se, but it is very error-prone to implement logic that is already part of collection methods like filter, map and collect. If you wanted to fix your approach, you just have to replace newList += ... with newList :+= .... This is because adding an element to an immutable List is done with list :+ element (or element +: list if you want to prepend). The list :+= element is syntactic sugar for list = list :+ element. Again, these are not constructs you should encounter very often, because this style is generally frowned-upon except if you know you have a very good reason to use mutability.

Check a condition within a foreach in scala

Is there a way to check a condition within a foreach loop in scala. The example is that I want to go through an array of integers and do some arbitrary math on the positive numbers.
val arr = Array(-1,2,3,0,-7,4) //The array
//What I want to do but doesn't work
arr.foreach{if(/*This condition is true*/)
{/*Do some math and print the answer*/}}
//Expected answer for division by six is 0.333333333 \n 0.5 \n 0.666666667
//Which is 2/6, 3/6 and 4/6 respectively as they appear in the array
I know how to do it with a normal for loop and if statement but I want to use this because I want to get away from java.
Thanks
foreach function brings every item in the list/array one by one, you should set it to a variable before to use it.
For example:
arr.foreach( variable_name => {
if(/*This condition is true*/){
/*Do some math and print the answer*/
}
})
The argument to foreach is a function, taking one argument, and returning a Unit. The argument is current element of the list, as it has been pointed out in other answers. You can just give it a name, and reference it as you would any other variable.
arr.foreach { x => if(x > 0) println(x/6.0) }
It is generally better and more idiomatic to split your logic into a chain of simpler "atomic" transformations rather than putting everything into one long function:
arr
.iterator
.filter(_ > 0)
.map(_ / 6.0)
.foreach(println)
The underscore _ above is shorthand for the function argument. You can use it in short functions when you only need to reference the argument once, and a few other conditions are satisfied. The last line doesn't need to pass the argument to println, because println itself is a function, being passed to foreach. I could write it as .foreach(println(_)) or .foreach(x => println(x)), it would do the same thing, but is technically a little different: this form creates an anonymous function like def foo(x: Double) { println(x) } and passes it to foreach as an argument, the way I wrote it originally, just passes println itself as an argument.
Also, note a call to .iterator in the beginning. Everything would work the same way if you take it out. The difference is that iterators are lazy. The way it is written, the code will take first argument from the array, send it through filter, if it returns false, it'll stop, and go back to the second element, if filter returns true, it'll send that element to map, then print it out, then go back, grab the next element etc.
Without .iterator call, it'd work differently: first, it would run the entire array through filter, and create a new array, containing only positive numbers, then, it'd run that new array through map, and create a new one, with the numbers divided by 6, then it'd go through this last array to print out the values. Using .iterator makes it more efficient by avoiding all the intermediate copies.
First, you'll want to use map() instead of foreach() because map() returns a result whereas foreach() does not and can only be used for side effects (which should be avoided when possible).
As has been pointed out, you can filter() before the map(), or you can combine them using collect().
arr.collect{case x if x > 0 => x/6.0}
// res0: Array[Double] = Array(0.3333333333333333, 0.5, 0.6666666666666666)
Use the filter function before using the foreach.
arr.filter(_ > 0).foreach { value => ... }
var list: ListSet[Int] = ListSet(-1, -5, -3, 8, 7, 9, 4, 6, 2, 1, 0)
list.filter(p => p > 5).foreach(f => {
print(f + " ")
})
Output : 8 7 9 6
Just do a filter and a map.
Don't forget that scala consider the array you want as Array[Int], so if you apply /6, you gonna have 0, ensure the cast by add .toDouble
val arr = Array(-1,2,3,0,-7,4)
val res = arr.filter(_>0).map(_.toDouble/6)
res.foreach(println)
Result:
0.3333333333333333
0.5
0.6666666666666666

Scala - Use of .indexOf() and .indexWhere()

I have a tuple like the following:
(Age, List(19,17,11,3,2))
and I would like to get the position of the first element where their position in the list is greater than their value. To do this I tried to use .indexOf() and .indexWhere() but I probably can't find exactly the right syntax and so I keep getting:
value indexWhere is not a member of org.apache.spark.rdd.RDD[(String,
Iterable[Int])]
My code so far is:
val test =("Age", List(19,17,11,3,2))
test.indexWhere(_.2(_)<=_.2(_).indexOf(_.2(_)) )
I also searched the documentation here with no result: http://www.scala-lang.org/api/current/index.html#scala.collection.immutable.List
If you want to perform this for each element in an RDD, you can use RDD's mapValues (which would only map the right-hand-side of the tuple) and pass a function that uses indexWhere:
rdd.mapValues(_.zipWithIndex.indexWhere { case (v, i) => i+1 > v} + 1)
Notes:
Your example seems wrong, if you want the last matching item it should be 5 (position of 2) and not 4
You did not define what should be done when no item matches your condition, e.g. for List(0,0,0) - in this case the result would be 0 but not sure that's what you need

Using foldleft or some other operator to calculate point distances?

Ok so I thought this would be a snap, trying to practice Scala's collection operators and my example is a list of points.
The class can calculate and return the distance to another point (as double).
However, fold left doesn't seem to be the right solution - considering elements e1, e2, e3.. I need a moving window to calculate, I need the last element looked at to carry forward in the function - not just the sum
Sum {
e1.dist(e2)
e2.dist(e3)
etc
}
Reading the API I noticed a function called "sliding", perhaps that's the correct solution in conjunction with another operator. I know how to do this with loops of course, but trying to learn the scala way.
Thanks
import scala.math._
case class Point(x:Int, y:Int) {
def dist(p:Point) = sqrt( (p.x-x)^2+(p.y-y)^2 )
}
object Point {
//Unsure how to define this?
def dist(l:Seq[Point]) =l.foldLeft(0.0)((sum:Double,p:Point)=>)
}
I'm not quite sure what you want to do, but assuming you want the sum of the distances:
l.zip(l.tail).map { case (x,y) => x.dist(y) }.sum
Or with sliding:
l.sliding(2).map {
case List(fst,snd) => fst.dist(snd)
case _ => 0
}.sum
If you want to do it as a fold, you can, but you need the accumulator to keep both the total and the previous element:
l.foldLeft(l.head, 0.0){
case ((prev, sum), p) => (p, sum + p.dist(prev))
}._2
You finish with a tuple consiting of the last element and sum, so use ._2 to get the sum part.
btw, ^ on Int is bitwise logical XOR, not power. Use math.pow.
The smartest way is probably using zipped, which is a kind of iterator so you don't traverse the list more than once as you would using zip:
(l, l.tail).zipped.map( _ dist _ ).sum

Infinite streams in Scala

Say I have a function, for example the old favourite
def factorial(n:Int) = (BigInt(1) /: (1 to n)) (_*_)
Now I want to find the biggest value of n for which factorial(n) fits in a Long. I could do
(1 to 100) takeWhile (factorial(_) <= Long.MaxValue) last
This works, but the 100 is an arbitrary large number; what I really want on the left hand side is an infinite stream that keeps generating higher numbers until the takeWhile condition is met.
I've come up with
val s = Stream.continually(1).zipWithIndex.map(p => p._1 + p._2)
but is there a better way?
(I'm also aware I could get a solution recursively but that's not what I'm looking for.)
Stream.from(1)
creates a stream starting from 1 and incrementing by 1. It's all in the API docs.
A Solution Using Iterators
You can also use an Iterator instead of a Stream. The Stream keeps references of all computed values. So if you plan to visit each value only once, an iterator is a more efficient approach. The downside of the iterator is its mutability, though.
There are some nice convenience methods for creating Iterators defined on its companion object.
Edit
Unfortunately there's no short (library supported) way I know of to achieve something like
Stream.from(1) takeWhile (factorial(_) <= Long.MaxValue) last
The approach I take to advance an Iterator for a certain number of elements is drop(n: Int) or dropWhile:
Iterator.from(1).dropWhile( factorial(_) <= Long.MaxValue).next - 1
The - 1 works for this special purpose but is not a general solution. But it should be no problem to implement a last method on an Iterator using pimp my library. The problem is taking the last element of an infinite Iterator could be problematic. So it should be implemented as method like lastWith integrating the takeWhile.
An ugly workaround can be done using sliding, which is implemented for Iterator:
scala> Iterator.from(1).sliding(2).dropWhile(_.tail.head < 10).next.head
res12: Int = 9
as #ziggystar pointed out, Streams keeps the list of previously computed values in memory, so using Iterator is a great improvment.
to further improve the answer, I would argue that "infinite streams", are usually computed (or can be computed) based on pre-computed values. if this is the case (and in your factorial stream it definately is), I would suggest using Iterator.iterate instead.
would look roughly like this:
scala> val it = Iterator.iterate((1,BigInt(1))){case (i,f) => (i+1,f*(i+1))}
it: Iterator[(Int, scala.math.BigInt)] = non-empty iterator
then, you could do something like:
scala> it.find(_._2 >= Long.MaxValue).map(_._1).get - 1
res0: Int = 22
or use #ziggystar sliding solution...
another easy example that comes to mind, would be fibonacci numbers:
scala> val it = Iterator.iterate((1,1)){case (a,b) => (b,a+b)}.map(_._1)
it: Iterator[Int] = non-empty iterator
in these cases, your'e not computing your new element from scratch every time, but rather do an O(1) work for every new element, which would improve your running time even more.
The original "factorial" function is not optimal, since factorials are computed from scratch every time. The simplest/immutable implementation using memoization is like this:
val f : Stream[BigInt] = 1 #:: (Stream.from(1) zip f).map { case (x,y) => x * y }
And now, the answer can be computed like this:
println( "count: " + (f takeWhile (_<Long.MaxValue)).length )
The following variant does not test the current, but the next integer, in order to find and return the last valid number:
Iterator.from(1).find(i => factorial(i+1) > Long.MaxValue).get
Using .get here is acceptable, since find on an infinite sequence will never return None.