Is it possible to use 'yield' to generate 'Iterator' instead of a list in Scala? - scala

Is it possible to use yield as an iterator without evaluation of every value?
It is a common task when it is easy to implement complex list generation, and then you need to convert it into Iterator, because you don't need some results...

Sure. Actually, there are three options for non-strictness, which I list below. For the examples, assume:
val list = List.range(1, 10)
def compute(n: Int) = {
println("Computing "+n)
n * 2
}
Stream. A Stream is a lazily evaluated list. It will compute values on demand, but it will not recompute values once they have been computed. It is most useful if you'll reuse parts of the stream many times. For example, running the code below will print "Computing 1", "Computing 2" and "Computing 3", one time each.
val stream = for (n <- list.toStream) yield compute(n)
val third = stream(2)
println("%d %d" format (third, stream(2)))
A view. A view is a composition of operations over a base collection. When examining a view, each element examined is computed on-demand. It is most useful if you'll randomly access the view, but will never look but at a small part of it. For example, running the code below will print "Computing 3" two times, and nothing else (well, besides the result).
val view = for (n <- list.view) yield compute(n)
val third = view(2)
println("%d %d" format (third, view(2)))
Iterator. An Iterator is something that is used to lazily walk through a collection. One can think of it as a "one-shot" collection, so to speak. It will neither recompute nor store any elements -- once an element has been "computed", it cannot be used again. It is a bit more tricky to use because of that, but it is the most efficient one given these constraints. For example, the following example needs to be different, because Iterator does not support indexed access (and view would perform badly if written this way), and the code below prints "Computing 1", "Computing 2", "Computing 3", "Computing 4", "Computing 5" and "Computing 6". Also, it prints two different numbers at the end.
val iterator = for (n <- list.iterator) yield compute(n)
val third = iterator.drop(2).next
println("%d %d" format (third, iterator.drop(2).next))

Use views if you want lazy evaluation, see Views.
The Scala 2.8 Collections API is a fantastic read if you're going to use the Scala collections a lot.

I have a List...
scala> List(1, 2, 3)
res0: List[Int] = List(1, 2, 3)
And a function...
scala> def foo(i : Int) : String = { println("Eval: " + i); i.toString + "Foo" }
foo: (i: Int)String
And now I'll use a for-comprehension with an Iterator...
scala> for { i <- res0.iterator } yield foo(i)
res2: Iterator[java.lang.String] = non-empty iterator
You can use a for comprehension on any type with flatMap, map and filter methods. You could also use the views:
scala> for { i <- res0.view } yield foo(i)
res3: scala.collection.SeqView[String,Seq[_]] = SeqViewM(...)
Evaluation is non-strict in either case...
scala> res3.head
Eval: 1
res4: String = 1Foo

Related

Why does iterating over multiple streams only iterate over the first element?

I've recently run into a bug in my code, in which iterating over multiple streams causes them to only iterate only through the first item. I converted my streams to buffers (I wasn't even aware that the function's implementation that I was calling returns a stream) and the problem was fixed. I found this hard to believe, so I created a minimum verifiable example:
def f(as: Seq[String], bs: Seq[String]): Unit =
for {
a <- as
b <- bs
} yield println((a, b))
val seq = Seq(1, 2, 3).map(_.toString)
f(seq, seq)
println()
val stream = Stream.iterate(1)(_ + 1).map(_.toString).take(3)
f(stream, stream)
A function that prints every combination of its inputs, and is invoked with the Seq [1, 2, 3] and the Stream [1, 2, 3].
The result with the seq is:
(1,1)
(1,2)
(1,3)
(2,1)
(2,2)
(2,3)
(3,1)
(3,2)
(3,3)
And the result with the stream is:
(1,1)
I've only been able to replicate this when iterating through multiple generators, iterating through a single stream seems to work fine.
So my questions are: why does this happen, and how can I avoid this kind of glitch? That is, short of using .toBuffer or .to[Vector] before every multi-generator iteration?
Thanks.
The manner in which you're using the for-comprehension (with the println in the yield) is a bit strange and probably not what you want to do. If you really just want to print out the entries, then just use foreach. This will force lazy sequences like Stream, i.e.
def f_strict(as: Seq[String], bs: Seq[String]): Unit = {
for {
a <- as
b <- bs
} println((a, b))
}
The reason you're getting the strange behavior with your f is that Streams are lazy, and elements are only computed (and then memoized) as needed. Since you never use the Stream created by f (necessarily because your f returns a Unit), only the head ever gets computed (which is why you're seeing the single (1, 1).) If you were instead to have it return the sequence it generated (which will have type Seq[Unit]), i.e.
def f_new(as: Seq[String], bs: Seq[String]): Seq[Unit] = {
for {
a <- as
b <- bs
} yield println((a, b))
}
Then you'll get the following behavior which should hopefully help to elucidate what's going on:
val xs = Stream(1, 2, 3)
val result = f_new(xs.map(_.toString), xs.map(_.toString))
//prints out (1, 1) as a result of evaluating the head of the resulting Stream
result.foreach(aUnit => {})
//prints out the other elements as the rest of the entries of Stream are computed, i.e.
//(1,2)
//(1,3)
//(2,1)
//...
result.foreach(aUnit => {})
//probably won't print out anything because elements of Stream have been computed,
//memoized and probably don't need to be computed again at this point.

How can I functionally iterate over a collection combining elements?

I have a sequence of values of type A that I want to transform to a sequence of type B.
Some of the elements with type A can be converted to a B, however some other elements need to be combined with the immediately previous element to produce a B.
I see it as a small state machine with two states, the first one handling the transformation from A to B when just the current A is needed, or saving A if the next row is needed and going to the second state; the second state combining the saved A with the new A to produce a B and then go back to state 1.
I'm trying to use scalaz's Iteratees but I fear I'm overcomplicating it, and I'm forced to return a dummy B when the input has reached EOF.
What's the most elegant solution to do it?
What about invoking the sliding() method on your sequence?
You might have to put a dummy element at the head of the sequence so that the first element (the real head) is evaluated/converted correctly.
If you map() over the result from sliding(2) then map will "see" every element with its predecessor.
val input: Seq[A] = ??? // real data here (no dummy values)
val output: Seq[B] = (dummy +: input).sliding(2).flatMap(a2b).toSeq
def a2b( arg: Seq[A] ): Seq[B] = {
// arg holds 2 elements
// return a Seq() of zero or more elements
}
Taking a stab at it:
Partition your list into two lists. The first is the one you can directly convert and the second is the one that you need to merge.
scala> val l = List("String", 1, 4, "Hello")
l: List[Any] = List(String, 1, 4, Hello)
scala> val (string, int) = l partition { case s:String => true case _ => false}
string: List[Any] = List(String, Hello)
int: List[Any] = List(1, 4)
Replace the logic in the partition block with whatever you need.
After you have the two lists, you can do whatever you need to with your second using something like this
scala> string ::: int.collect{case i:Integer => i}.sliding(2).collect{
| case List(a, b) => a+b.toString}.toList
res4: List[Any] = List(String, Hello, 14)
You would replace the addition with whatever your aggregate function is.
Hopefully this is helpful.

Populating an immutable List

Here I populate two Lists where each list is either mutable or immutable :
var mutableList = scala.collection.mutable.MutableList[String]()
//> mutableList : scala.collection.mutable.MutableList[String] = MutableList()
//|
for (a <- 1 to 100) {
mutableList += a.toString
}
println(mutableList.size); //> 100
val immutableList = List[String]() //> immutableList : List[String] = List()
for (a <- 1 to 100) {
immutableList :+ a.toString
}
println(immutableList.size); //> 0
When I print the size of the immutableList its output is 0. This is because within the for loop a new reference is created that does not point to immutableList ? Is there a functional equivalent to populating an immutable List from within loop ?
As Gabor answered in a comment, you want to use fold, or even continue with the for and yield. What he did not explain is why you are getting a size of 0. The reason is that immutableList :+ a.toString is returning a new list each time, which you are not using. the immutableList is exactly that, immutable.
Keep in mind that everything in Scala is an expression and therefore returns something. So, you can turn your regular for (which acts like a forEach) into a comprehension by adding the yield as below
val immutableList = for (a <- 1 to 100) yield a.toString
This desugars into something like:
(1 to 100).map(_.toString)
For completeness, method tabulate allows for creating and populating an immutable List, for instance as follows,
List.tabulate(100)(a => a.toString)
or equivalently
List.tabulate(100)(_.toString)

Function to return List of Map while iterating over String, kmer count

I am working on creating a k-mer frequency counter (similar to word count in Hadoop) written in Scala. I'm fairly new to Scala, but I have some programming experience.
The input is a text file containing a gene sequence and my task is to get the frequency of each k-mer where k is some specified length of the sequence.
Therefore, the sequence AGCTTTC has three 5-mers (AGCTT, GCTTT, CTTTC)
I've parsed through the input and created a huge string which is the entire sequence, the new lines throw off the k-mer counting as the end of one line's sequence should still form a k-mer with the beginning of the next line's sequence.
Now I am trying to write a function that will generate a list of maps List[Map[String, Int]] with which it should be easy to use scala's groupBy function to get the count of the common k-mers
import scala.io.Source
object Main {
def main(args: Array[String]) {
// Get all of the lines from the input file
val input = Source.fromFile("input.txt").getLines.toArray
// Create one huge string which contains all the lines but the first
val lines = input.tail.mkString.replace("\n","")
val mappedKmers: List[Map[String,Int]] = getMappedKmers(5, lines)
}
def getMappedKmers(k: Int, seq: String): List[Map[String, Int]] = {
for (i <- 0 until seq.length - k) {
Map(seq.substring(i, i+k), 1) // Map the k-mer to a count of 1
}
}
}
Couple of questions:
How to create/generate List[Map[String,Int]]?
How would you do it?
Any help and/or advice is definitely appreciated!
You're pretty close—there are three fairly minor problems with your code.
The first is that for (i <- whatever) foo(i) is syntactic sugar for whatever.foreach(i => foo(i)), which means you're not actually doing anything with the contents of whatever. What you want is for (i <- whatever) yield foo(i), which is sugar for whatever.map(i => foo(i)) and returns the transformed collection.
The second issue is that 0 until seq.length - k is a Range, not a List, so even once you've added the yield, the result still won't line up with the declared return type.
The third issue is that Map(k, v) tries to create a map with two key-value pairs, k and v. You want Map(k -> v) or Map((k, v)), either of which is explicit about the fact that you have a single argument pair.
So the following should work:
def getMappedKmers(k: Int, seq: String): IndexedSeq[Map[String, Int]] = {
for (i <- 0 until seq.length - k) yield {
Map(seq.substring(i, i + k) -> 1) // Map the k-mer to a count of 1
}
}
You could also convert either the range or the entire result to a list with .toList if you'd prefer a list at the end.
It's worth noting, by the way, that the sliding method on Seq does exactly what you want:
scala> "AGCTTTC".sliding(5).foreach(println)
AGCTT
GCTTT
CTTTC
I'd definitely suggest something like "AGCTTTC".sliding(5).toList.groupBy(identity) for real code.

scala loop through a linkedlist

In scala, what is a good way to loop through a linked list(scala.collection.mutable.LinkedList) of objects? For example, I want to have 'for' loop traverse through each object on the linked list and process it.
With foreach:
Welcome to Scala version 2.8.0.final (Java HotSpot(TM) Client VM, Java 1.6.0_21).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val ll = scala.collection.mutable.LinkedList[Int](1,2,3)
ll: scala.collection.mutable.LinkedList[Int] = LinkedList(1, 2, 3)
scala> ll.foreach(i => println(i * 2))
2
4
6
or, if your processing of each object returns a new value, use map:
scala> ll.map(_ * 2)
res3: scala.collection.mutable.LinkedList[Int] = LinkedList(2, 4, 6)
Some people prefer for comprehensions instead of foreach and map. They look like this:
scala> for (i <- ll) println(i)
1
2
3
scala> for (i <- ll) yield i * 2
res5: scala.collection.mutable.LinkedList[Int] = LinkedList(2, 4, 6)
To expand on the previous answer...
for, foreach and map are all higher-order functions - they can all take a function as an argument, so starting here:
val list = List(1,2,3)
list.foreach(i => println(i * 2))
You have a number of ways that you can make the code more declarative in nature, and cleaner at the same time.
First, you don't really need to use the name - i - for each member of the collection, you can use _ as a placeholder instead:
list.foreach(println(_ * 2))
You can also separate the logic out into a distinct method, and continue to use placeholder syntax:
def printTimesTwo(i:Int) = println(i * 2)
list.foreach(printTimesTwo(_))
Even cleaner, just pass the raw function without specifying parameters (look ma, no placeholders!)
list.foreach(printTimesTwo)
And to take it to a logical conclusion, this can be made cleaner still by using infix syntax. Which I show here working with a standard library method. Note: you could even use a method imported from a java library, if you wanted:
list foreach println
This thinking extends to anonymous functions and partially-applied functions and also to the map operation:
// "2 *" creates an anonymous function that will double its one-and-only argument
list map { 2 * }
For-comprehensions aren't really very useful when working at this level, they just add boilerplate. But they do come into their own when working with deeper nested structures:
//a list of lists, print out all the numbers
val grid = List(List(1, 2, 3), List(4, 5, 6), List(7, 8, 9))
grid foreach { _ foreach println } //hmm, could get confusing
for(line <- grid; cell <- line) println(cell) //that's clearer
I didn't need the yield keyword there, as nothing is being returned. But if I wanted to get back a list of Strings (un-nested):
for(line <- grid; cell <- line) yield { cell.toString }
With lots of generators, you'll want to split them over multiple lines:
for {
listOfGrids <- someMasterCollection
grid <- listOfGrids
line <- grid
cell <- line
} yield {
cell.toString
}