scalacheck: define a generator for an infinite stream with some dependence on previous elements - scalacheck

I'm trying to define a Gen[Stream[A]] for an infinite (lazily evaluated) stream of As where each element A can depend on previous elements.
As a minimal case, we can take Gen[Stream[Int]] where the next element is either +1 or +2 of the previous element. For reference here is a haskell implementation:
increasingInts :: Gen [Int]
increasingInts = arbitrary >>= go
where
go seed = do
inc <- choose (1,2)
let next = seed + inc
rest <- go next
return (next : rest)
I have tried Gen.sequence on a Stream[Gen[A]] but got a stackoverflow. I also tried defining the Gen from scratch, but the constructor gen for Gen is private and works with private methods/types.
This attempt also gives a stackoverflow.
def go(seed: Int): Gen[Stream[Int]] =
for {
inc <- Gen.choose(1, 2)
next = seed + inc
rest <- Gen.lzy(go(next))
} yield next #:: rest
val increasingInts: Gen[Stream[Int]] = go(0)
increasingInts(Gen.Parameters.default, Seed.random()).get foreach println
So I'm stuck. Any ideas?

What you want can be achieved with this:
val increasingInts = {
val increments = Gen.choose(1, 2)
val initialSeed = 0
for {
stream <- Gen.infiniteStream(increments)
} yield stream.scanLeft(initialSeed)(_ + _)
}
.scanLeft is like a .foldLeft but retains the intermediate values, thus giving you another Stream.
I've written about scanLeft here: https://www.scalawilliam.com/most-important-streaming-abstraction/

Related

How to access previous element when using yield in for loop chisel3

This is mix Chisel / Scala question.
Background, I need to sum up a lot of numbers (the number of input signals in configurable). Due to timing constrains I had to split it to groups of 4 and pipe(register it), then it is fed into next stage (which will be 4 times smaller, until I reach on)
this is my code:
// log4 Aux function //
def log4(n : Int): Int = math.ceil(math.log10(n.toDouble) / math.log10(4.0)).toInt
// stage //
def Adder4PipeStage(len: Int,in: Vec[SInt]) : Vec[SInt] = {
require(in.length % 4 == 0) // will not work if not a muliplication of 4
val pipe = RegInit(VecInit(Seq.fill(len/4)(0.S(in(0).getWidth.W))))
pipe.zipWithIndex.foreach {case(p,j) => p := in.slice(j*4,(j+1)*4).reduce(_ +& _)}
pipe
}
// the pipeline
val adderPiped = for(j <- 1 to log4(len)) yield Adder4PipeStage(len/j,if(j==1) io.in else <what here ?>)
how to I access the previous stage, I am also open to hear about other ways to implement the above
There are several things you could do here:
You could just use a var for the "previous" value:
var prev: Vec[SInt] = io.in
val adderPiped = for(j <- 1 to log4(len)) yield {
prev = Adder4PipeStage(len/j, prev)
prev
}
It is a little weird using a var with a for yield (since the former is fundamentally mutable while the latter tends to be used with immutable-style code).
You could alternatively use a fold building up a List
// Build up backwards and reverse (typical in functional programming)
val adderPiped = (1 to log4(len)).foldLeft(io.in :: Nil) {
case (pipes, j) => Adder4PipeStage(len/j, pipes.head) :: pipes
}.reverse
.tail // Tail drops "io.in" which was 1st element in the result List
If you don't like the backwards construction of the previous fold,
You could use a fold with a Vector (better for appending than a List):
val adderPiped = (1 to log4(len)).foldLeft(Vector(io.in)) {
case (pipes, j) => pipes :+ Adder4PipeStage(len/j, pipes.last)
}.tail // Tail drops "io.in" which was 1st element in the result Vector
Finally, if you don't like these immutable ways of doing it, you could always just embrace mutability and write something similar to what one would in Java or Python:
For loop and mutable collection
val pipes = new mutable.ArrayBuffer[Vec[SInt]]
for (j <- 1 to log4(len)) {
pipes += Adder4PipeStage(len/j, if (j == 1) io.in else pipes.last)
}

How to count the number of iterations in a for comprehension in Scala?

I am using a for comprehension on a stream and I would like to know how many iterations took to get o the final results.
In code:
var count = 0
for {
xs <- xs_generator
x <- xs
count = count + 1 //doesn't work!!
if (x prop)
yield x
}
Is there a way to achieve this?
Edit: If you don't want to return only the first item, but the entire stream of solutions, take a look at the second part.
Edit-2: Shorter version with zipWithIndex appended.
It's not entirely clear what you are attempting to do. To me it seems as if you are trying to find something in a stream of lists, and additionaly save the number of checked elements.
If this is what you want, consider doing something like this:
/** Returns `x` that satisfies predicate `prop`
* as well the the total number of tested `x`s
*/
def findTheX(): (Int, Int) = {
val xs_generator = Stream.from(1).map(a => (1 to a).toList).take(1000)
var count = 0
def prop(x: Int): Boolean = x % 317 == 0
for (xs <- xs_generator; x <- xs) {
count += 1
if (prop(x)) {
return (x, count)
}
}
throw new Exception("No solution exists")
}
println(findTheX())
// prints:
// (317,50403)
Several important points:
Scala's for-comprehension have nothing to do with Python's "yield". Just in case you thought they did: re-read the documentation on for-comprehensions.
There is no built-in syntax for breaking out of for-comprehensions. It's better to wrap it into a function, and then call return. There is also breakable though, but it works with Exceptions.
The function returns the found item and the total count of checked items, therefore the return type is (Int, Int).
The error in the end after the for-comprehension is to ensure that the return type is Nothing <: (Int, Int) instead of Unit, which is not a subtype of (Int, Int).
Think twice when you want to use Stream for such purposes in this way: after generating the first few elements, the Stream holds them in memory. This might lead to "GC-overhead limit exceeded"-errors if the Stream isn't used properly.
Just to emphasize it again: the yield in Scala for-comprehensions is unrelated to Python's yield. Scala has no built-in support for coroutines and generators. You don't need them as often as you might think, but it requires some readjustment.
EDIT
I've re-read your question again. In case that you want an entire stream of solutions together with a counter of how many different xs have been checked, you might use something like that instead:
val xs_generator = Stream.from(1).map(a => (1 to a).toList)
var count = 0
def prop(x: Int): Boolean = x % 317 == 0
val xsWithCounter = for {
xs <- xs_generator;
x <- xs
_ = { count = count + 1 }
if (prop(x))
} yield (x, count)
println(xsWithCounter.take(10).toList)
// prints:
// List(
// (317,50403), (317,50721), (317,51040), (317,51360), (317,51681),
// (317,52003), (317,52326), (317,52650), (317,52975), (317,53301)
// )
Note the _ = { ... } part. There is a limited number of things that can occur in a for-comprehension:
generators (the x <- things)
filters/guards (if-s)
value definitions
Here, we sort-of abuse the value-definition syntax to update the counter. We use the block { counter += 1 } as the right hand side of the assignment. It returns Unit. Since we don't need the result of the block, we use _ as the left hand side of the assignment. In this way, this block is executed once for every x.
EDIT-2
If mutating the counter is not your main goal, you can of course use the zipWithIndex directly:
val xsWithCounter =
xs_generator.flatten.zipWithIndex.filter{x => prop(x._1)}
It gives almost the same result as the previous version, but the indices are shifted by -1 (it's the indices, not the number of tried x-s).

Scala error "value map is not a member of Double"

Explanation: I accepted gzm0's answer because it rocked!
#Eduardo did come in with a comment suggesting:
(for(i <- 10..20; j=runTest(i)) yield i -> j).toMap
which also lets me run build, he just never posted an answer and #gzm0 answer was conceptually AWESOME so I accepted it.
Once I get this other issue figured out relating to "can't call constructor" I will be able to test these out by actually running the program LOL
Question: I have an error in this expression, specifically how to fix it but more generally what am I missing about FP or Scala to make this mistake?
timingsMap = for (i <- powersList; j <- runTest(i)) yield i -> j
I am Writing my first Gradle/Scala project for a Analysis of Algorithms assignment. Scala is not part of the assignment so I am not using a homework tag. Except for my work with Spark in Java, I am brand new to Functional Programming I am sure that's the problem.
Here's a snippet, the full .scala file is on GitHub, is that OK or I will post the full program here if I get flailed :)
val powersList = List(10 to 20)
// create a map to keep track of power of two and resulting timings
var timingsMap: Map[Integer, Double] = Map()
// call the runTest function once for each power of two we want, from powersList,
// assign timingsMap the power of 2 value and results of runTest for that tree size
timingsMap = for (i <- powersList; j <- runTest(i)) yield i -> j
The error is:
/home/jim/workspace/Scala/RedBlackTree4150/src/main/scala/Main.scala:36: value map is not a member of Double
timingsMap = for (i <- powersList; j <- runTest(i)) yield i -> j
What I think I am doing in that timingsMap = ... line is get all the elements of powersList mapped onto i for each iteration of the loop, and the return value for runTest(i) mapped onto j for each iteration, and then taking all those pairs and putting them into timingsMap. Is it the way I am trying to use i in the loop to call runTest(i) that causes the problem?
runTest looks like this:
def runTest(powerOfTwo: Range): Double = {
// create the tree here
var tree = new TreeMap[Int, Double]
// we only care to create a tree with integer keys, not what the value is
for (x <- powerOfTwo) {
// set next entry in map to key, random number
tree += (x -> math.random)
}
stopWatchInst.start()
// now go through and look up all the values in the tree,
for (x <- powerOfTwo) {
// get the value, don't take the time/overhead to store it in a var, as we don't need it
tree.get(x)
}
// stop watch check time report and return time
stopWatchInst.stop()
val totalTime = stopWatchInst.elapsed(TimeUnit.MILLISECONDS)
loggerInst.info("run for 2 to the power of " + powerOfTwo + " took " + totalTime + " ms")
return totalTime
}
Note: I've had a suggestions to change the j <- to = in j <- in this line: timingsMap = for (i <- powersList; j <- runTest(i)) yield i -> j
Another suggestion didn't like using yield at all and suggested replacing with (10 to 20).map...
The strange part is the existing code does not show an error in the IntelliJ editor, only breaks when I run it. The suggestions all give type mismatch errors in the IDE. I am really trying to figure out conceptually what I am doing wrong, thanks for any help! (and of course I need to get it to work!)
After trying gzm0 answer I am getting down the same road ... my code as presented doesn't show any type mismatches until I use gradle run ... whereas when I make the suggested changes it starts to give me errors right in the IDE ... but keep em coming! Here's the latest error based on gzm0s answer:
/home/jim/workspace/Scala/RedBlackTree4150/src/main/scala/Main.scala:37: type mismatch;
found : List[(scala.collection.immutable.Range.Inclusive, Double)]
required: Map[Integer,Double]
timingsMap = for (i <- powersList) yield i -> runTest(i)
You want:
for (i <- powersList)
yield i -> runTest(i)
The result of runTest is not a list, therefore you can't give it to the for statement. The reason that you get a bit of a strange error message, is due to how your for is desugared:
for (i <- powersList; j <- runTest(i)) yield i -> j
// turns into
powersList.flatMap { i => runTest(i).map { j => i -> j } }
However, the result of runTest(i) is a Double, which doesn't have a map method. So looking at the desugaring, the error message actually makes sense.
Note that my point about runTest's result not being a list is not really correct: Anything that has a map method that will allow the above statement (i.e. taking some kind of lambda) will do. However, this is beyond the scope of this answer.
We now have successfully created a list of tuples (since powersList is a List, the result of the for loop is a List as well). However, we want a Map. Luckily, you can call toMap on Lists that contain tuples to convert them into a Map:
val tups = for (i <- powersList) yield i -> runTest(i)
timingsMap = tups.toMap
A note aside: If you really want to keep the j inside the for-loop, you can use the equals sign instead of the left arrow:
for (i <- powersList; j = runTest(i)) yield i -> j
This will turn into:
powersList.map { i =>
val j = runTest(i)
i -> j
}
Which is what you want.

Function to return List of Map while iterating over String, kmer count

I am working on creating a k-mer frequency counter (similar to word count in Hadoop) written in Scala. I'm fairly new to Scala, but I have some programming experience.
The input is a text file containing a gene sequence and my task is to get the frequency of each k-mer where k is some specified length of the sequence.
Therefore, the sequence AGCTTTC has three 5-mers (AGCTT, GCTTT, CTTTC)
I've parsed through the input and created a huge string which is the entire sequence, the new lines throw off the k-mer counting as the end of one line's sequence should still form a k-mer with the beginning of the next line's sequence.
Now I am trying to write a function that will generate a list of maps List[Map[String, Int]] with which it should be easy to use scala's groupBy function to get the count of the common k-mers
import scala.io.Source
object Main {
def main(args: Array[String]) {
// Get all of the lines from the input file
val input = Source.fromFile("input.txt").getLines.toArray
// Create one huge string which contains all the lines but the first
val lines = input.tail.mkString.replace("\n","")
val mappedKmers: List[Map[String,Int]] = getMappedKmers(5, lines)
}
def getMappedKmers(k: Int, seq: String): List[Map[String, Int]] = {
for (i <- 0 until seq.length - k) {
Map(seq.substring(i, i+k), 1) // Map the k-mer to a count of 1
}
}
}
Couple of questions:
How to create/generate List[Map[String,Int]]?
How would you do it?
Any help and/or advice is definitely appreciated!
You're pretty close—there are three fairly minor problems with your code.
The first is that for (i <- whatever) foo(i) is syntactic sugar for whatever.foreach(i => foo(i)), which means you're not actually doing anything with the contents of whatever. What you want is for (i <- whatever) yield foo(i), which is sugar for whatever.map(i => foo(i)) and returns the transformed collection.
The second issue is that 0 until seq.length - k is a Range, not a List, so even once you've added the yield, the result still won't line up with the declared return type.
The third issue is that Map(k, v) tries to create a map with two key-value pairs, k and v. You want Map(k -> v) or Map((k, v)), either of which is explicit about the fact that you have a single argument pair.
So the following should work:
def getMappedKmers(k: Int, seq: String): IndexedSeq[Map[String, Int]] = {
for (i <- 0 until seq.length - k) yield {
Map(seq.substring(i, i + k) -> 1) // Map the k-mer to a count of 1
}
}
You could also convert either the range or the entire result to a list with .toList if you'd prefer a list at the end.
It's worth noting, by the way, that the sliding method on Seq does exactly what you want:
scala> "AGCTTTC".sliding(5).foreach(println)
AGCTT
GCTTT
CTTTC
I'd definitely suggest something like "AGCTTTC".sliding(5).toList.groupBy(identity) for real code.

What is the fastest way to subtract two arrays in scala

I have two arrays (that i have pulled out of a matrix (Array[Array[Int]]) and I need to subtract one from the other.
At the moment I am using this method however, when I profile it, it is the bottleneck.
def subRows(a: Array[Int], b: Array[Int], sizeHint: Int): Array[Int] = {
val l: Array[Int] = new Array(sizeHint)
var i = 0
while (i < sizeHint) {
l(i) = a(i) - b(i)
i += 1
}
l
}
I need to do this billions of times so any improvement in speed is a plus.
I have tried using a List instead of an Array to collect the differences and it is MUCH faster but I lose all benefit when I convert it back to an Array.
I did modify the downstream code to take a List to see if that would help but I need to access the contents of the list out of order so again there is loss of any gains there.
It seems like any conversion of one type to another is expensive and I am wondering if there is some way to use a map etc. that might be faster.
Is there a better way?
EDIT
Not sure what I did the first time!?
So the code I used to test it was this:
def subRowsArray(a: Array[Int], b: Array[Int], sizeHint: Int): Array[Int] = {
val l: Array[Int] = new Array(sizeHint)
var i = 0
while (i < sizeHint) {
l(i) = a(i) - b(i)
i += 1
}
l
}
def subRowsList(a: Array[Int], b: Array[Int], sizeHint: Int): List[Int] = {
var l: List[Int] = Nil
var i = 0
while (i < sizeHint) {
l = a(i) - b(i) :: l
i += 1
}
l
}
val a = Array.fill(100, 100)(scala.util.Random.nextInt(2))
val loops = 30000 * 10000
def runArray = for (i <- 1 to loops) subRowsArray(a(scala.util.Random.nextInt(100)), a(scala.util.Random.nextInt(100)), 100)
def runList = for (i <- 1 to loops) subRowsList(a(scala.util.Random.nextInt(100)), a(scala.util.Random.nextInt(100)), 100)
def optTimer(f: => Unit) = {
val s = System.currentTimeMillis
f
System.currentTimeMillis - s
}
The results I thought I got the first time I did this were the exact opposite... I must have misread or mixed up the methods.
My apologies for asking a bad question.
That code is the fastest you can manage single-threaded using a standard JVM. If you think List is faster, you're either fooling yourself or not actually telling us what you're doing. Putting an Int into List requires two object creations: one to create the list element, and one to box the integer. Object creations take about 10x longer than an array access. So it's really not a winning proposition to do it any other way.
If you really, really need to go faster, and must stay with a single thread, you should probably switch to C++ or the like and explicitly use SSE instructions. See this question, for example.
If you really, really need to go faster and can use multiple threads, then the easiest is to package up a chunk of work like this (i.e. a sensible number of pairs of vectors that need to be subtracted--probably at least a few million elements per chunk) into a list as long as the number of processors on your machine, and then call list.par.map(yourSubtractionRoutineThatActsOnTheChunkOfWork).
Finally, if you can be destructive,
a(i) -= b(i)
in the inner loop is, of course, faster. Likewise, if you can reuse space (e.g. with System.arraycopy), you're better off than if you have to keep allocating it. But that changes the interface from what you've shown.
You can use Scalameter to try a benchmark the two implementations which requires at least JRE 7 update 4 and Scala 2.10 to be run. I used scala 2.10 RC2.
Compile with scalac -cp scalameter_2.10-0.2.jar RangeBenchmark.scala.
Run with scala -cp scalameter_2.10-0.2.jar:. RangeBenchmark.
Here's the code I used:
import org.scalameter.api._
object RangeBenchmark extends PerformanceTest.Microbenchmark {
val limit = 100
val a = new Array[Int](limit)
val b = new Array[Int](limit)
val array: Array[Int] = new Array(limit)
var list: List[Int] = Nil
val ranges = for {
size <- Gen.single("size")(limit)
} yield 0 until size
measure method "subRowsArray" in {
using(ranges) curve("Range") in {
var i = 0
while (i < limit) {
array(i) = a(i) - b(i)
i += 1
}
r => array
}
}
measure method "subRowsList" in {
using(ranges) curve("Range") in {
var i = 0
while (i < limit) {
list = a(i) - b(i) :: list
i += 1
}
r => list
}
}
}
Here's the results:
::Benchmark subRowsArray::
Parameters(size -> 100): 8.26E-4
::Benchmark subRowsList::
Parameters(size -> 100): 7.94E-4
You can draw your own conclusions. :)
The stack blew up on larger values of limit. I'll guess it's because it's measuring the performance many times.