For until square root - scala

In Scala, I want to write the equivalent of the following C++ code:
for(int i = 1 ; i * i < n ; i++)
So far I did this, but it looks ugly and I think it goes up until n:
for(i <- 1 to n
if(i * i < n))
Is there a nicer way of writing this code?

Not nicer but different approach
Using a stream
(1 to n).toStream map (i => i * i) takeWhile (_ < n)
Example for n = 100
scala> val res = (1 to 100).toStream map(i => i * i) takeWhile (_ < 100)
res: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> res.toList
res16: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64, 81)
Explanation
A Stream allows to request values on demand, i.e. lazy evaluation. So the function that is mapped will only be applied when the next value is requested.

First of all, declare a function to generate a lazy stream of squares:
def squares(i: Int = 1): Stream[Int] = Stream.cons(i * i, squares(i + 1))
then use takeWhile to get the value when i * i is smaller than n. For example:
scala> squares().takeWhile(_ < 50).foreach(println)
1
4
9
16
25
36
49

The solution you have might not be the nicest but it might be the most efficient, everything else is internally more complicated, so it might have some overhead. (In most situations not a notable overhead though, and it might be optimized very well.)
I would not go for the solution using Streams suggested in an other answer. While streams are computed lazily, they do cache the computed results, which is not required in this case and might take a lot of memory if the range iterated over is large. Instead I would use an Iterator. Operations on Iterators are typically lazy as well and do not cache anything.
If you need this more often, you could add an other "operator" using an implicit class like this:
implicit class UntilHelper(start: Int) {
def aslong(cond: Int => Boolean) =
Iterator.from(start).takeWhile(cond)
}
Your loop then looks like this:
for(i <- 1 aslong (Math.pow(_, 2) < 1000)) {
println(i)
}
From a quick micro-benchmark it looks like this is about 3 times faster than the stream solution and a little bit slower than a simple while loop. These things are however notoriously hard to measure without any context.
Remark on computing squares
A nice way of computing a sequence of Squares is by adding the difference between squares. This can be done using the scanLeft method on a Stream or an Iterator.
val itr = Iterator.from(1).scanLeft(1)((a,b)=>a + 2*b+1)
println(itr.take(10).toList)

Related

Are chained maps optimized by compiler?

Scala has an amazing way of converting a collection into another collection using map construct.
val l = List(1, 2, 3, 4)
l.map(_*_)
will return the squares of the elements in list l
I come across various instances where multiple maps are chained together say,
val l = List(1, 2, 3, 4)
val res = l.map(_ * _).map(_ + 1).filter(_ < 3)
What i believe happens underneath is equivalent to something below.
val l = List(1, 2, 3, 4)
val l1 = l.map(_*_)
val l2 = l1.map(_ + 1)
val res = l2.filter(_ < 3)
creating l1 and l2 might cause memory issues if the collection is too big.
To tackle this problem, does Scala compiler have any optimizations?
val l = List(1, 2, 3, 4)
val res = l1.map( _*_ + 1).filter(_ < 3)
in general if f, g, h are functions
val l = List(/*something*/)
val res = l.map(f(_)).map(g(_)).map(h(_))
can be converted into
val res = l.map(f _ andThen g _ andThen h _)
Scala offers Stream, which is a lazy ordered collection.
val s = Stream(1, 2, 3, 4)
// note i've changed your sequence of transformations
// a bit, so that it compiles and yields more than one result
val res = s.map(i => i * i).map(_ + 1).filter(_ < 11)
res is now a Stream. No actual evaluation has been performed yet, no blocks of memory related to the size of s have been used.
If you intend to use the elements of res one at a time, no more work is required. You can use res in a for statement or comprehension directly, for example.
for ( elem <- res ) println( s"A value is ${elem}" )
If you want res as a List, you can just call .toList at the end of the sequence of transformations. Instead of the above, use
val res = s.map(i => i * i).map(_ + 1).filter(_ < 11).toList
s will only be traversed once in creating the new List.
No, because this would require the compiler to know about the semantics of map and treat the standard library classes which implement it specially (since nobody stops you from writing a class where this doesn't hold). There is a research proposal which might end up implementing this... eventually.
There is also Scala-Blitz which optimizes some collection operations, but fusion and deforestation are listed as future work in this presentation and I don't think they are implemented yet.
As Steve Waldman's answer says, using Stream (or, better yet, Iterator) can help, but it won't eliminate the intermediate collections completely.

How to write an efficient groupBy-size filter in Scala, can be approximate

Given a List[Int] in Scala, I wish to get the Set[Int] of all Ints which appear at least thresh times. I can do this using groupBy or foldLeft, then filter. For example:
val thresh = 3
val myList = List(1,2,3,2,1,4,3,2,1)
myList.foldLeft(Map[Int,Int]()){case(m, i) => m + (i -> (m.getOrElse(i, 0) + 1))}.filter(_._2 >= thresh).keys
will give Set(1,2).
Now suppose the List[Int] is very large. How large it's hard to say but in any case this seems wasteful as I don't care about each of the Ints frequencies, and I only care if they're at least thresh. Once it passed thresh there's no need to check anymore, just add the Int to the Set[Int].
The question is: can I do this more efficiently for a very large List[Int],
a) if I need a true, accurate result (no room for mistakes)
b) if the result can be approximate, e.g. by using some Hashing trick or Bloom Filters, where Set[Int] might include some false-positives, or whether {the frequency of an Int > thresh} isn't really a Boolean but a Double in [0-1].
First of all, you can't do better than O(N), as you need to check each element of your initial array at least once. You current approach is O(N), presuming that operations with IntMap are effectively constant.
Now what you can try in order to increase efficiency:
update map only when current counter value is less or equal to threshold. This will eliminate huge number of most expensive operations — map updates
try faster map instead of IntMap. If you know that values of the initial List are in fixed range, you can use Array instead of IntMap (index as the key). Another possible option will be mutable HashMap with sufficient initail capacity. As my benchmark shows it actually makes significant difference
As #ixx proposed, after incrementing value in the map, check whether it's equal to 3 and in this case add it immediately to result list. This will save you one linear traversing (appears to be not that significant for large input)
I don't see how any approximate solution can be faster (only if you ignore some elements at random). Otherwise it will still be O(N).
Update
I created microbenchmark to measure the actual performance of different implementations. For sufficiently large input and output Ixx's suggestion regarding immediately adding elements to result list doesn't produce significant improvement. However similar approach could be used to eliminate unnecessary Map updates (which appears to be the most expensive operation).
Results of benchmarks (avg run times on 1000000 elems with pre-warming):
Authors solution:
447 ms
Ixx solution:
412 ms
Ixx solution2 (eliminated excessive map writes):
150 ms
My solution:
57 ms
My solution involves using mutable HashMap instead of immutable IntMap and includes all other possible optimizations.
Ixx's updated solution:
val tuple = (Map[Int, Int](), List[Int]())
val res = myList.foldLeft(tuple) {
case ((m, s), i) =>
val count = m.getOrElse(i, 0) + 1
(if (count <= 3) m + (i -> count) else m, if (count == thresh) i :: s else s)
}
My solution:
val map = new mutable.HashMap[Int, Int]()
val res = new ListBuffer[Int]
myList.foreach {
i =>
val c = map.getOrElse(i, 0) + 1
if (c == thresh) {
res += i
}
if (c <= thresh) {
map(i) = c
}
}
The full microbenchmark source is available here.
You could use the foldleft to collect the matching items, like this:
val tuple = (Map[Int,Int](), List[Int]())
myList.foldLeft(tuple) {
case((m, s), i) => {
val count = (m.getOrElse(i, 0) + 1)
(m + (i -> count), if (count == thresh) i :: s else s)
}
}
I could measure a performance improvement of about 40% with a small list, so it's definitely an improvement...
Edited to use List and prepend, which takes constant time (see comments).
If by "more efficiently" you mean the space efficiency (in extreme case when the list is infinite), there's a probabilistic data structure called Count Min Sketch to estimate the frequency of items inside it. Then you can discard those with frequency below your threshold.
There's a Scala implementation from Algebird library.
You can change your foldLeft example a bit using a mutable.Set that is build incrementally and at the same time used as filter for iterating over your Seq by using withFilter. However, because I'm using withFilteri cannot use foldLeft and have to make do with foreach and a mutable map:
import scala.collection.mutable
def getItems[A](in: Seq[A], threshold: Int): Set[A] = {
val counts: mutable.Map[A, Int] = mutable.Map.empty
val result: mutable.Set[A] = mutable.Set.empty
in.withFilter(!result(_)).foreach { x =>
counts.update(x, counts.getOrElse(x, 0) + 1)
if (counts(x) >= threshold) {
result += x
}
}
result.toSet
}
So, this would discard items that have already been added to the result set while running through the Seq the first time, because withFilterfilters the Seqin the appended function (map, flatMap, foreach) rather than returning a filtered Seq.
EDIT:
I changed my solution to not use Seq.count, which was stupid, as Aivean correctly pointed out.
Using Aiveans microbench I can see that it is still slightly slower than his approach, but still better than the authors first approach.
Authors solution
377
Ixx solution:
399
Ixx solution2 (eliminated excessive map writes):
110
Sascha Kolbergs solution:
72
Aivean solution:
54

Constructing BitSets in Scala from a predicate?

Suppose I want to construct a BitSet containing all integers from 0 until n satisfying some predicate f: Int => Boolean.
I could write something like
BitSet((0 until n):_*).filter(f)
which of course works. But it feels rather inefficient! I'm planning on doing this inside a pretty tight loop, and would like suggestions for more efficient ways.
This is the best I could come up with at the moment
BitSet((0 until n).view.filter(f):_*)
The view part makes the filter method lazy. This makes sure that when the BitSet is created from the given sequence, it will filter on the fly. Your original suggestion creates a new BitSet after the first one is created.
If performance is truly your major concern, the best option is probably to use a mutable.BitSet and a while loop, and then call toImmutable on the result.
val bitSet = {
val tmp = new scala.collection.mutable.BitSet(n)
var i = 0;
while (i < n) {
if (f(i)) {
tmp += i
}
i = i + 1
}
tmp.toImmutable
}
I think the most efficient "functional" way is to use foldLeft:
(1 to 5).foldLeft(BitSet())((s,i) => if (f(i)) s + i else s)
It doesn't create an intermediate collection but construct the collection from scratch while filtering.
The first thing I thought is to use breakOut, but it doesn't work for filter:
scala> val set: BitSet = (0 until 10).filter(f)(collection.breakOut)
<console>:11: error: polymorphic expression cannot be instantiated to expected type;
found : [From, T, To]scala.collection.generic.CanBuildFrom[From,T,To]
required: Int
val set: BitSet = (0 until 10).filter(f)(collection.breakOut)
^
scala> val set: BitSet = (0 until 10).map(_+1)(collection.breakOut)
set: scala.collection.immutable.BitSet = BitSet(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
breakOut doesn't create an intermediate collection too, but because filter doesn't have a second parameter list it can't work.

Why stream fold operation throws Out of memory exception?

I have following simple code
def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
(0l /: fib(1,1).take(10000000)) (_+_)
And it throws OutOfMemmoryError exception.
I can not understand why, because I think all the parts use constant memmory i.e. lazy evaluation streams and foldLeft...
Those code also don't work
fib(1,1).take(10000000).sum or max, min e.t.c.
How to correctly implement infinite streams and do iterative operations upon it?
Scala version: 2.9.0
Also scala javadoc said, that foldLeft operation is memmory safe for streams
/** Stream specialization of foldLeft which allows GC to collect
* along the way.
*/
#tailrec
override final def foldLeft[B](z: B)(op: (B, A) => B): B = {
if (this.isEmpty) z
else tail.foldLeft(op(z, head))(op)
}
EDIT:
Implementation with iterators still not useful, since it throws ${domainName} exception
def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++ fib(j, i + j)
How to define correctly infinite stream/iterator in Scala?
EDIT2:
I don't care about int overflow, I just want to understand how to create infinite stream/iterator etc in scala without side effects .
The reason to use Stream instead of Iterator is so that you don't have to calculate all the small terms in the series over again. But this means that you need to store ten million stream nodes. These are pretty large, unfortunately, so that could be enough to overflow the default memory. The only realistic way to overcome this is to start with more memory (e.g. scala -J-Xmx2G). (Also, note that you're going to overflow Long by an enormous margin; the Fibonacci series increases pretty quickly.)
P.S. The iterator implementation I have in mind is completely different; you don't build it out of concatenated singleton Iterators:
def fib(i: Long, j: Long) = Iterator.iterate((i,j)){ case (a,b) => (b,a+b) }.map(_._1)
Now when you fold, past results can be discarded.
The OutOfMemoryError happens indenpendently from the fact that you use Stream. As Rex Kerr mentioned above, Stream -- unlike Iterator -- stores everything in memory. The difference with List is that the elements of Stream are calculated lazily, but once you reach 10000000, there will be 10000000 elements, just like List.
Try with new Array[Int](10000000), you will have the same problem.
To calculate the fibonacci number as above you may want to use different approach. You can take into account the fact that you only need to have two numbers, instead of the whole fibonacci numbers discovered so far.
For example:
scala> def fib(i:Long,j:Long): Iterator[Long] = Iterator(i) ++ fib(j, i + j)
fib: (i: Long,j: Long)Iterator[Long]
And to get, for example, the index of the first fibonacci number exceeding 1000000:
scala> fib(1, 1).indexWhere(_ > 1000000)
res12: Int = 30
Edit: I added the following lines to cope with the StackOverflow
If you really want to work with 1 millionth fibonacci number, the iterator definition above will not work either for StackOverflowError. The following is the best I have in mind at the moment:
class FibIterator extends Iterator[BigDecimal] {
var i: BigDecimal = 1
var j: BigDecimal = 1
def next = {val temp = i
i = i + j
j = temp
j }
def hasNext = true
}
scala> new FibIterator().take(1000000).foldLeft(0:BigDecimal)(_ + _)
res49: BigDecimal = 82742358764415552005488531917024390424162251704439978804028473661823057748584031
0652444660067860068576582339667553466723534958196114093963106431270812950808725232290398073106383520
9370070837993419439389400053162345760603732435980206131237515815087375786729469542122086546698588361
1918333940290120089979292470743729680266332315132001038214604422938050077278662240891771323175496710
6543809955073045938575199742538064756142664237279428808177636434609546136862690895665103636058513818
5599492335097606599062280930533577747023889877591518250849190138449610994983754112730003192861138966
1418736269315695488126272680440194742866966916767696600932919528743675517065891097024715258730309025
7920682881137637647091134870921415447854373518256370737719553266719856028732647721347048627996967...
#yura's problem:
def fib(i:Long,j:Long):Stream[Long] = i #:: fib(j, i+j)
(0l /: fib(1,1).take(10000000)) (_+_)
besides using a Long which can't possibly hold the Fibonacci of 10,000,000, it does work. That is, if the foldLeft is written as:
fib(1,1).take(10000000).foldLeft(0L)(_+_)
Looking at the Streams.scala source, foldLeft() is clearly designed for Garbage Collection, but /: is not def'd.
The other answers alluded to another problem. The Fibonacci of 10 million is a big number and if BigInt is used, instead of just overflowing like with a Long, absolutely enormous numbers are being added to each over and over again.
Since Stream.foldLeft is optimized for GC it does look like the way to solve for really big Fibonacci numbers, rather than using a zip or tail recursion.
// Fibonacci using BigInt
def fib(i:BigInt,j:BigInt):Stream[BigInt] = i #:: fib(j, i+j)
fib(1,0).take(10000000).foldLeft(BigInt("0"))(_+_)
Results of the above code: 10,000,000 is a 8-figure number. How many figures in fib(10000000)? 2,089,877
fib(1,1).take(10000000) is the "this" of the method /:, it is likely that the JVM will consider the reference alive as long as the method runs, even if in this case, it might get rid of it.
So you keep a reference on the head of the stream all along, hence on the whole stream as you build it to 10M elements.
You could just use recursion, which is about as simple:
def fibSum(terms: Int, i: Long = 1, j: Long = 1, total: Long = 2): Long = {
if (terms == 2) total
else fibSum(terms - 1, j, i + j, total + i + j)
}
With this, you can "fold" a billion elements in only a couple of seconds, but as Rex points out, summing the Fibbonaci sequence overflows Long very quickly.
If you really wanted to know the answer to your original problem and don't mind sacrificing some accuracy you could do this:
def fibSum(terms: Int, i: Double = 1, j: Double = 1, tot: Double = 2,
exp: Int = 0): String = {
if (terms == 2) "%.6f".format(tot) + " E+" + exp
else {
val (i1, j1, tot1, exp1) =
if (tot + i + j > 10) (i/10, j/10, tot/10, exp + 1)
else (i, j, tot, exp)
fibSum(terms - 1, j1, i1 + j1, tot1 + i1 + j1, exp1)
}
}
scala> fibSum(10000000)
res54: String = 2.957945 E+2089876

Incrementing the for loop (loop variable) in scala by power of 5

I had asked this question on Javaranch, but couldn't get a response there. So posting it here as well:
I have this particular requirement where the increment in the loop variable is to be done by multiplying it with 5 after each iteration. In Java we could implement it this way:
for(int i=1;i<100;i=i*5){}
In scala I was trying the following code-
var j=1
for(i<-1.to(100).by(scala.math.pow(5,j).toInt))
{
println(i+" "+j)
j=j+1
}
But its printing the following output:
1 1
6 2
11 3
16 4
21 5
26 6
31 7
36 8
....
....
Its incrementing by 5 always. So how do I got about actually multiplying the increment by 5 instead of adding it.
Let's first explain the problem. This code:
var j=1
for(i<-1.to(100).by(scala.math.pow(5,j).toInt))
{
println(i+" "+j)
j=j+1
}
is equivalent to this:
var j = 1
val range: Range = Predef.intWrapper(1).to(100)
val increment: Int = scala.math.pow(5, j).toInt
val byRange: Range = range.by(increment)
byRange.foreach {
println(i+" "+j)
j=j+1
}
So, by the time you get to mutate j, increment and byRange have already been computed. And Range is an immutable object -- you can't change it. Even if you produced new ranges while you did the foreach, the object doing the foreach would still be the same.
Now, to the solution. Simply put, Range is not adequate for your needs. You want a geometric progression, not an arithmetic one. To me (and pretty much everyone else answering, it seems), the natural solution would be to use a Stream or Iterator created with iterate, which computes the next value based on the previous one.
for(i <- Iterator.iterate(1)(_ * 5) takeWhile (_ < 100)) {
println(i)
}
EDIT: About Stream vs Iterator
Stream and Iterator are very different data structures, that share the property of being non-strict. This property is what enables iterate to even exist, since this method is creating an infinite collection1, from which takeWhile will create a new2 collection which is finite. Let's see here:
val s1 = Stream.iterate(1)(_ * 5) // s1 is infinite
val s2 = s1.takeWhile(_ < 100) // s2 is finite
val i1 = Iterator.iterate(1)(_ * 5) // i1 is infinite
val i2 = i1.takeWhile(_ < 100) // i2 is finite
These infinite collections are possible because the collection is not pre-computed. On a List, all elements inside the list are actually stored somewhere by the time the list has been created. On the above examples, however, only the first element of each collection is known in advance. All others will only be computed if and when required.
As I mentioned, though, these are very different collections in other respects. Stream is an immutable data structure. For instance, you can print the contents of s2 as many times as you wish, and it will show the same output every time. On the other hand, Iterator is a mutable data structure. Once you used a value, that value will be forever gone. Print the contents of i2 twice, and it will be empty the second time around:
scala> s2 foreach println
1
5
25
scala> s2 foreach println
1
5
25
scala> i2 foreach println
1
5
25
scala> i2 foreach println
scala>
Stream, on the other hand, is a lazy collection. Once a value has been computed, it will stay computed, instead of being discarded or recomputed every time. See below one example of that behavior in action:
scala> val s2 = s1.takeWhile(_ < 100) // s2 is finite
s2: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> println(s2)
Stream(1, ?)
scala> s2 foreach println
1
5
25
scala> println(s2)
Stream(1, 5, 25)
So Stream can actually fill up the memory if one is not careful, whereas Iterator occupies constant space. On the other hand, one can be surprised by Iterator, because of its side effects.
(1) As a matter of fact, Iterator is not a collection at all, even though it shares a lot of the methods provided by collections. On the other hand, from the problem description you gave, you are not really interested in having a collection of numbers, just in iterating through them.
(2) Actually, though takeWhile will create a new Iterator on Scala 2.8.0, this new iterator will still be linked to the old one, and changes in one have side effects on the other. This is subject to discussion, and they might end up being truly independent in the future.
In a more functional style:
scala> Stream.iterate(1)(i => i * 5).takeWhile(i => i < 100).toList
res0: List[Int] = List(1, 5, 25)
And with more syntactic sugar:
scala> Stream.iterate(1)(_ * 5).takeWhile(_ < 100).toList
res1: List[Int] = List(1, 5, 25)
Maybe a simple while-loop would do?
var i=1;
while (i < 100)
{
println(i);
i*=5;
}
or if you want to also print the number of iterations
var i=1;
var j=1;
while (i < 100)
{
println(j + " : " + i);
i*=5;
j+=1;
}
it seems you guys likes functional so how about a recursive solution?
#tailrec def quints(n:Int): Unit = {
println(n);
if (n*5<100) quints(n*5);
}
Update: Thanks for spotting the error... it should of course be power, not multiply:
Annoyingly, there doesn't seem to be an integer pow function in the standard library!
Try this:
def pow5(i:Int) = math.pow(5,i).toInt
Iterator from 1 map pow5 takeWhile (100>=) toList
Or if you want to use it in-place:
Iterator from 1 map pow5 takeWhile (100>=) foreach {
j => println("number:" + j)
}
and with the indices:
val iter = Iterator from 1 map pow5 takeWhile (100>=)
iter.zipWithIndex foreach { case (j, i) => println(i + " = " + j) }
(0 to 2).map (math.pow (5, _).toInt).zipWithIndex
res25: scala.collection.immutable.IndexedSeq[(Int, Int)] = Vector((1,0), (5,1), (25,2))
produces a Vector, with i,j in reversed order.