How does LazyList.fill(n) actually works in Scala? - scala

I'm trying to understand how does the LazyList.fill actually works. I implemented a retry logic using LazyList.fill(n). But seems like it is not working as expected.
def retry[T](n: Int)(block: => T): Try[T] = {
val lazyList = LazyList.fill(n)(Try(block))
lazyList find (_.isSuccess) getOrElse lazyList.head
}
Considering the above piece of code, I am trying to execute block with a retry logic. If the execution succeeds, return the result from block else retry until it succeeds for a maximum of n attempts.
Is it like LazyList will evaluate the first element and if it finds true, it skips the evaluation for the remaining elements in the list?

As I already mentiond in the comment, this is exactly what a LazyList is supposed to do.
The elements in a LazyList are also materialized/computed only when there is demand from an actual consumer.
And find method of LazyList respect this lazyness. You can find it cleary written in documentation as well - https://www.scala-lang.org/api/2.13.x/scala/collection/immutable/LazyList.html#find(p:A=%3EBoolean):Option[A]
def find(p: (A) => Boolean): Option[A]
// Finds the first element of the lazy list satisfying a predicate, if any.
// Note: may not terminate for infinite-sized collections.
// This method does not evaluate any elements further than the first element matching the predicate.
So, If the first element succeeds, it will stop at the first element itself.
Also, if you are writing a retry method then you probably also want to stop at first success. Why would you want to continue evaluating the block even after the suceess.
You might want to better clarify your exact requirements to get a more helpful answer.

Related

Monads being a mechanism for sequencing computations, is the below list still a monad though they are printed in a random order Scala

for {
i <- 1 to 5
} yield Future(println(i))
Desugared to:
List(1,2,3,4,5).map {i => Future(println(i))}
The above code prints numbers in random order.
Now, if we see the multiple definitions of Monad:
a) Monad is a wrapper over an object
b) Monad is a mechanism for sequencing computations
The question that I'm trying to answer is that shouldn't map operation on List monad wait for the first element in the list to be printed and only then go for the computation of the second element regardless of Future?
Sorry, it might be simple and I'm complicating it but it gets trickier for me to find simple reasoning. Answers will be much appreciated:)
Compare:
for {
_ <- Future(println(1))
_ <- Future(println(2))
_ <- Future(println(3))
_ <- Future(println(4))
_ <- Future(println(5))
} yield ()
or
Future(println(1)).flatMap { _ =>
Future(println(2))
}.flatMap { _ =>
Future(println(3))
}.flatMap { _ =>
Future(println(4))
}.flatMap { _ =>
Future(println(5))
}
with
List(
Future(println(1)),
Future(println(2)),
Future(println(3)),
Future(println(4)),
Future(println(5))
)
The first two create the next Future only after the former completed and made the result available. The last one creates all Futures at once (and it doesn't differ much in this regard from your example with List[Future]).
Future (as opposed to IO from Cats Effect, Monix's Task or ZIO) is eager, so it starts execution the moment you create it. For that reason you have sequential result in the first two examples, and random order (race condition) in the third example.
If you used IO instead of Future it would be more apparent because you wouldn't be able to just have List[IO[Unit]] and execute side effects - you would have to somehow combine the different IOs into one, and the way you would do it would make it obvious whether the effects will be sequential or parallel.
The bottom line is - whether or not Future is a monad depends on how the .flatMap behaves (and how it behaves with combination with Future.successful), so your results doesn't invalidate the claim that Future is a monad. (You can have some doubts if you start checking its behavior with exceptions, but that is another topic).
The execution of map is sequential indeed, but when you wrap it to a Future it gets executed in an asynchronous manner, I mean it is evaluated in another thread and because of that, it is not possible to know what thread is going to finish earlier because it depends also in the thread management of the operating system and other considerations.
Both of your code snippets are still Monads in loose terms. When you did .map() on your object, the map picks element one by one in orderly fashion (from index 0 to index 4). And then it passes on that to an operation block (which is body of map - map is a higher order function that accepts a function of type f:This => That).
So monad operation's responsibility is picking it up and passing it as paramater to a function.
In your case the actual function type is:
f: Int => Future[Unit]
For clarity, your function actually looks like this:
def someFunction(i: Int): Future[Unit] = {
Future {
println(i)
}
}
So, what the map operation did here is that it picked on item from your object (in sequence, one by one) and called the someFunction(i). And that's all a monad does.
Now to answer why your println are random, it's because of JVM threads.
If you re-define the body of you map like this
List(1,2,3,4,5)
.map {i =>
println(s"Going to invoke the println in another thread for $i")
Future(println(i))
}
You'll see that the first println will be in sequence - always! It proves that .map() picks your elements in sequence. While the next println may or may not be out of sequence. This out of order fashion is not because of monad operation map but because of multithreading nature in multi core CPUs.

Queue implementation in Odersky Scala book. Chapter 19

I see this code on page 388 of the Odersky book on Scala:
class SlowAppendQueue[T](elems: List[T]) {
def head = elems.head
def tail = new SowAppendQueue(elems.tail)
def enqueue(x: T) = new SlowAppendQueue(elems ::: List(x))
}
class SlowHeadQueue[T](smele: List[T]) {
def head = smele.last
def tail = new SlowHeadQueue(smele.init)
def enqueue(x: T) = new SlowHeadQueue(x :: smele)
}
Is the following correct to say:
Both implementations of tail takes time proportional to the number of elements in the queue.
The second implementation of head is slower than the first. The second implementation takes time proportional to the length of the queue. Why is this? How is it implemented? Is it like a linked list where each element has a pointer to the next?
Why does Odersky say the second class' implementation of tail is problematic but not the first?
No. In the first case, tail works in constant time, because elems.tail is a constant time operation (it just returns the the tail of the list). The constructor new SlowAppendQueue(...) is also a constant time operation, because it just wraps the list.
Because if smele has N > 1 elements, then smele.init must rebuild a new list with N - 1 elements from scratch. This takes linear time, therefore it is much slower than the O(1) operation from the first queue implementation.
O(N) operations are problematic because they are slow for large N, whereas O(1) operations are essentially never problematic.
I think you should take a closer look into how the immutable single-linked list is implemented, and what it takes to prepend an element (O(1)), append an element (O(N)), to access the tail (O(1)), rebuild the init (O(N)). Then everything else becomes obvious.
No, the first tail implementation takes constant time. This is because List.tail is a constant time operation due to structural sharing, and wrapping the list in a new SlowAppendQueue is also a constant time operation.
The second implementation of head takes constant time because of the way functional linked lists (including Scala's List class) work. Each list node has a link to the node after it. In order to remove the last element via init, the entire list must be rebuilt.
In summary, List is fast when operating on the beginning, but not when solely operating on the end. See also the Scala docs for List.

Head of empty list in Scala

I've made this recursive metod in Scala that returns a list made of all distinct elements of another list.
object es20 extends App{
def filledList:List[Int]=List()
#scala.annotation.tailrec
def distinct(l:List[Int]):List[Int] ={
if (l.isEmpty) filledList
if (filledList.forall(_!=l.head)) l.head::filledList
distinct(l.tail)
}
println(distinct(List(1,1,5,6,6,3,8,3))) //Should print List(1,5,6,3,8)
}
However, when I compile the code and then I run it, there's this exception:
java.util.NoSuchElementException: head of empty list
I thought that this exception was handle by the condition if (l.isEmpty).
How can I fix the code?
In Scala method returns last expression of the block. In your case you have three expressions: two if-expressions which result in unit and call to distinct, so checks will be executed every time you call distinct, no matter if the list is empty or not.
To fix it you can use if / else construct, or pattern match on input list, or make operation on headOption.
Anyway I doubt if this code correct: you trying to check something on 'filledList' which is always empty
You can fix this particular error by inserting else before the second if. However, as mentioned in the other answer, your code isn't correct, and won't work anyway, you need to rewrite it.
Also, I understand, that you are just trying to write this function as an exercise (if not, just do list.distinct), but I submit, that implementing quadratic solutions to trivially linear problems is never a good exercise to begin with.

Scala's collect inefficient in Spark?

I am currently starting to learn to use spark with Scala. The problem I am working on needs me to read a file, split each line on a certain character, then filtering the lines where one of the columns matches a predicate and finally remove a column. So the basic, naive implementation is a map, then a filter then another map.
This meant going through the collection 3 times and that seemed quite unreasonable to me. So I tried replacing them by one collect (the collect that takes a partial function as an argument). And much to my surprise, this made it run much slower. I tried locally on regular Scala collections; as expected, the latter way of doing is much faster.
So why is that ? My idea is that the map and filter and map are not applied sequentially, but rather mixed into one operation; in other words, when an action forces evaluation every element of the list will be checked and the pending operations will be executed. Is that right ? But even so, why do the collect perform so badly ?
EDIT: a code example to show what I want to do:
The naive way:
sc.textFile(...).map(l => {
val s = l.split(" ")
(s(0), s(1))
}).filter(_._2.contains("hello")).map(_._1)
The collect way:
sc.textFile(...).collect {
case s if(s.split(" ")(0).contains("hello")) => s(0)
}
The answer lies in the implementation of collect:
/**
* Return an RDD that contains all matching values by applying `f`.
*/
def collect[U: ClassTag](f: PartialFunction[T, U]): RDD[U] = withScope {
val cleanF = sc.clean(f)
filter(cleanF.isDefinedAt).map(cleanF)
}
As you can see, it's the same sequence of filter->map, but less efficient in your case.
In scala both isDefinedAt and apply methods of PartialFunction evaluate if part.
So, in your "collect" example split will be performed twice for each input element.

Scala: Streams not acting lazy?

I know streams are supposed to be lazily evaluated sequences in Scala, but I think I am suffering from some sort of fundamental misunderstanding because they seem to be more eager than I would have expected.
In this example:
val initial = Stream(1)
lazy val bad = Stream(1/0)
println((initial ++ bad) take 1)
I get a java.lang.ArithmeticException, which seems to be cause by zero division. I would expect that bad would never get evaluated since I only asked for one element from the stream. What's wrong?
OK, so after commenting other answers, I figured I could as well turn my comments into a proper answer.
Streams are indeed lazy, and will only compute their elements on demand (and you can use #:: to construct a stream element by element, much like :: for List). By example, the following will not throw any exception:
(1/2) #:: (1/0) #:: Stream.empty
This is because when applying #::, the tail is passed by name so as to not evaluate it eagerly, but only when needed (see ConsWrapper.# ::, const.apply and class Cons in Stream.scala for more details).
On the other hand, the head is passed by value, which means that it will always be eagerly evaluated, no matter what (as mentioned by Senthil). This means that doing the following will actually throw a ArithmeticException:
(1/0) #:: Stream.empty
It is a gotcha worth knowing about streams. However, this is not the issue you are facing.
In your case, the arithmetic exception happens before even instantiating a single Stream. When calling Stream.apply in lazy val bad = Stream(1/0), the argument is eagerly executed because it is not declared as a by name parameter. Stream.apply actually takes a vararg parameter, and those are necessarily passed by value.
And even if it was passed by name, the ArithmeticException would be triggered shortly after, because as said earlier the head of a Stream is always early evaluated.
The fact that Streams are lazy doesn't change the fact that method arguments are evaluated eagerly.
Stream(1/0) expands to Stream.apply(1/0). The semantics of the language require that the arguments are evaluated before the method is called (since the Stream.apply method doesn't use call-by-name arguments), so it attempts to evaluate 1/0 to pass as the argument to the Stream.apply method, which causes your ArithmeticException.
There are a few ways you can get this working though. Since you've already declared bad as a lazy val, the easiest is probably to use the also-lazy #::: stream concatenation operator to avoid forcing evaluation:
val initial = Stream(1)
lazy val bad = Stream(1/0)
println((initial #::: bad) take 1)
// => Stream(1, ?)
The Stream will evaluate the head & remaining tail is evaluated lazily. In your example, both the streams are having only the head & hence giving an error.