generating permutations with scalacheck - scala

I have some generators like this:
val fooRepr = oneOf(a, b, c, d, e)
val foo = for (s <- choose(1, 5); c <- listOfN(s, fooRepr)) yield c.mkString("$")
This leads to duplicates ... I might get two a's, etc. What I really want is to generate random permutation with exactly 0 or 1 or each of a, b, c, d, or e (with at least one of something), in any order.
I was thinking there must be an easy way, but I'm struggling to even find a hard way. :)
Edited: Ok, this seems to work:
val foo = for (s <- choose(1, 5);
c <- permute(s, a, b, c, d, e)) yield c.mkString("$")
def permute[T](n: Int, gs: Gen[T]*): Gen[Seq[T]] = {
val perm = Random.shuffle(gs.toList)
for {
is <- pick(n, 1 until gs.size)
xs <- sequence[List,T](is.toList.map(perm(_)))
} yield xs
}
...borrowing heavily from Gen.pick.
Thanks for your help, -Eric

Rex, thanks for clarifying exactly what I'm trying to do, and that's useful code, but perhaps not so nice with scalacheck, particularly if the generators in question are quite complex. In my particular case the generators a, b, c, etc. are generating huge strings.
Anyhow, there was a bug in my solution above; what worked for me is below. I put a tiny project demonstrating how to do this at github
The guts of it is below. If there's a better way, I'd love to know it...
package powerset
import org.scalacheck._
import org.scalacheck.Gen._
import org.scalacheck.Gen
import scala.util.Random
object PowersetPermutations extends Properties("PowersetPermutations") {
def a: Gen[String] = value("a")
def b: Gen[String] = value("b")
def c: Gen[String] = value("c")
def d: Gen[String] = value("d")
def e: Gen[String] = value("e")
val foo = for (s <- choose(1, 5);
c <- permute(s, a, b, c, d, e)) yield c.mkString
def permute[T](n: Int, gs: Gen[T]*): Gen[Seq[T]] = {
val perm = Random.shuffle(gs.toList)
for {
is <- pick(n, 0 until gs.size)
xs <- sequence[List, T](is.toList.map(perm(_)))
} yield xs
}
implicit def arbString: Arbitrary[String] = Arbitrary(foo)
property("powerset") = Prop.forAll {
a: String => println(a); true
}
}
Thanks,
Eric

You're not describing a permutation, but the power set (minus the empty set)Edit: you're describing a combination of a power set and a permutation. The power set of an indexed set N is isomorphic to 2^N, so we simply (in Scala alone; maybe you want to alter this for use with ScalaCheck):
def powerSet[X](xs: List[X]) = {
val xis = xs.zipWithIndex
(for (j <- 1 until (1<<xs.length)) yield {
for ((x,i) <- xis if ((j & (1<<i)) != 0)) yield x
}).toList
}
to generate all possible subsets given a set. Of course, explicit generation of power sets is unwise if they original set contains more than a handful of elements. If you don't want to generate all of them, just pass in a random number from 1 until (1<<(xs.length-1)) and run the inner loop. (Switch to Long if there are 33-64 elements, and to BitSet if there are more yet.) You can then permute the result to switch the order around if you wish.
Edit: there's another way to do this if you can generate permutations easily and you can add a dummy argument: make your list one longer, with a Stop token. Then permute and .takeWhile(_ != Stop). Ta-da! Permutations of arbitrary length. (Filter out the zero-length answer if need be.)

Related

How to get a Long typed production of a Seq[Int] in Scala?

Suppose val s = Seq[Int] and I would like to get the production of all its elements. The value is guaranteed to be greater than Int.MaxValue but less than Long.MaxValue so I hope the value to be a Long type.
It seems I cannot use product/foldLeft/reduceLeft due to the fact Long and Int are different types without any relations; therefore I need to write a for-loop myself. Is there any decent way to achieve this goal?
Note: I'm just asking the possibility to use builtin libraries but still fine with "ugly" code below.
def product(a: Seq[Int]): Long = {
var p = 1L
for (e <- a) p = p * e
p
}
There's no need to mess about with asInstanceOf or your own loop. foldLeft works just fine
val xs = Seq(1,1000000000,1000000)
xs.foldLeft(1L)((a,e) => a*e)
//> res0: Long = 1000000000000000
How about
def product(s: Seq[Int]) = s.map(_.asInstanceOf[Long]).fold(1L)( _ * _ )
In fact, having re-read your question and learnt about the existence of product itself, you could just do:
def product(s: Seq[Int]) = s.map(_.asInstanceOf[Long]).product

Scala list of tuples of different size zip issues?

Hi my two lists as follows:
val a = List((1430299869,"A",4200), (1430299869,"A",0))
val b = List((1430302366,"B",4100), (1430302366,"B",4200), (1430302366,"B",5000), (1430302366,"B",27017), (1430302366,"B",80), (1430302366,"B",9300), (1430302366,"B",9200), (1430302366,"A",5000), (1430302366,"A",4200), (1430302366,"A",80), (1430302366,"A",443), (1430302366,"C",4100), (1430302366,"C",4200), (1430302366,"C",27017), (1430302366,"C",5000), (1430302366,"C",80))
when I used zip two lists as below :
val c = a zip b
it returns results as
List(((1430299869,A,4200),(1430302366,B,4100)), ((1430299869,A,0),(1430302366,B,4200)))
Not all lists of tuples, how can I zip all above data?
EDIT
expected results as combine of two lists like :
List((1430299869,"A",4200), (1430299869,"A",0),(1430302366,"B",4100), (1430302366,"B",4200), (1430302366,"B",5000), (1430302366,"B",27017), (1430302366,"B",80), (1430302366,"B",9300), (1430302366,"B",9200), (1430302366,"A",5000), (1430302366,"A",4200), (1430302366,"A",80), (1430302366,"A",443), (1430302366,"C",4100), (1430302366,"C",4200), (1430302366,"C",27017), (1430302366,"C",5000), (1430302366,"C",80))
Second Edit
I tried this :
val d = for(((a,b,c),(d,e,f)) <- (a zip b)if(b.equals(e) && c.equals(f))) yield (d,e,f)
but it gives empty results because of (a zip b) but I replaced a zip b as a ++ b then it shows following error :
constructor cannot be instantiated to expected type;
So how can I get matching tuples?
Just add one list to another:
a ++ b
According to your 2nd edit, what you need is:
for {
(a1,b1,c) <- a //rename extracted to a1 and b1 to avoid confusion
(d,e,f) <- b
if b1.equals(e) && c.equals(f)
} yield (d,e,f)
Or:
for {
(a1, b1, c) <- a
(d, `b1`, `c`) <- b //enclosing it in backticks avoids capture and matches against already defined values
} yield (d, b1, c)
Zipping won't help since you need to compare all tuples in a with all tuples in b , it seems.
a zip b creates a list of pairs of elements from a and b.
What you're most likely looking for is list concatenation, which is a ++ b
On zipping (pairing) all data in the lists, consider first a briefer input for illustrating the case,
val a = (1 to 2).toList
val b = (10 to 12).toList
Then for instance a for comprehension may convey the needs,
for (i <- a; j <- b) yield (i,j)
which delivers
List((1,10), (1,11), (1,12),
(2,10), (2,11), (2,12))
Update
From OP latest update, consider a dedicated filtering function,
type triplet = (Int,String,Int)
def filtering(key: triplet, xs: List[triplet]) =
xs.filter( v => key._2 == v._2 && key._3 == v._3 )
and so apply it with flatMap,
a.flatMap(filtering(_, b))
List((1430302366,A,4200))
One additional step is to encapsulate this in an implicit class,
implicit class OpsFilter(val keys: List[triplet]) extends AnyVal {
def filtering(xs: List[triplet]) = {
keys.flatMap ( key => xs.filter( v => key._2 == v._2 && key._3 == v._3 ))
}
}
and likewise,
a.filtering(b)
List((1430302366,A,4200))

For comprehension and number of function creation

Recently I had an interview for Scala Developer position. I was asked such question
// matrix 100x100 (content unimportant)
val matrix = Seq.tabulate(100, 100) { case (x, y) => x + y }
// A
for {
row <- matrix
elem <- row
} print(elem)
// B
val func = print _
for {
row <- matrix
elem <- row
} func(elem)
and the question was: Which implementation, A or B, is more efficent?
We all know that for comprehensions can be translated to
// A
matrix.foreach(row => row.foreach(elem => print(elem)))
// B
matrix.foreach(row => row.foreach(func))
B can be written as matrix.foreach(row => row.foreach(print _))
Supposedly correct answer is B, because A will create function print 100 times more.
I have checked Language Specification but still fail to understand the answer. Can somebody explain this to me?
In short:
Example A is faster in theory, in practice you shouldn't be able to measure any difference though.
Long answer:
As you already found out
for {xs <- xxs; x <- xs} f(x)
is translated to
xxs.foreach(xs => xs.foreach(x => f(x)))
This is explained in §6.19 SLS:
A for loop
for ( p <- e; p' <- e' ... ) e''
where ... is a (possibly empty) sequence of generators, definitions, or guards, is translated to
e .foreach { case p => for ( p' <- e' ... ) e'' }
Now when one writes a function literal, one gets a new instance every time the function needs to be called (§6.23 SLS). This means that
xs.foreach(x => f(x))
is equivalent to
xs.foreach(new scala.Function1 { def apply(x: T) = f(x)})
When you introduce a local function type
val g = f _; xxs.foreach(xs => xs.foreach(x => g(x)))
you are not introducing an optimization because you still pass a function literal to foreach. In fact the code is slower because the inner foreach is translated to
xs.foreach(new scala.Function1 { def apply(x: T) = g.apply(x) })
where an additional call to the apply method of g happens. Though, you can optimize when you write
val g = f _; xxs.foreach(xs => xs.foreach(g))
because the inner foreach now is translated to
xs.foreach(g())
which means that the function g itself is passed to foreach.
This would mean that B is faster in theory, because no anonymous function needs to be created each time the body of the for comprehension is executed. However, the optimization mentioned above (that the function is directly passed to foreach) is not applied on for comprehensions, because as the spec says the translation includes the creation of function literals, therefore there are always unnecessary function objects created (here I must say that the compiler could optimize that as well, but it doesn't because optimization of for comprehensions is difficult and does still not happen in 2.11). All in all it means that A is more efficient but B would be more efficient if it is written without a for comprehension (and no function literal is created for the innermost function).
Nevertheless, all of these rules can only be applied in theory, because in practice there is the backend of scalac and the JVM itself which both can do optimizations - not to mention optimizations that are done by the CPU. Furthermore your example contains a syscall that is executed on every iteration - it is probably the most expensive operation here that outweighs everything else.
I'd agree with sschaef and say that A is the more efficient option.
Looking at the generated class files we get the following anonymous functions and their apply methods:
MethodA:
anonfun$2 -- row => row.foreach(new anonfun$2$$anonfun$1)
anonfun$2$$anonfun$1 -- elem => print(elem)
i.e. matrix.foreach(row => row.foreach(elem => print(elem)))
MethodB:
anonfun$3 -- x => print(x)
anonfun$4 -- row => row.foreach(new anonfun$4$$anonfun$2)
anonfun$4$$anonfun$2 -- elem => func(elem)
i.e. matrix.foreach(row => row.foreach(elem => func(elem)))
where func is just another indirection before calling to print. In addition func needs to be looked up, i.e. through a method call on an instance (this.func()) for each row.
So for Method B, 1 extra object is created (func) and there are # of elem additional function calls.
The most efficient option would be
matrix.foreach(row => row.foreach(func))
as this has the least number of objects created and does exactly as you would expect.
Benchmark
Summary
Method A is nearly 30% faster than method B.
Link to code: https://gist.github.com/ziggystar/490f693bc39d1396ef8d
Implementation Details
I added method C (two while loops) and D (fold, sum). I also increased the size of the matrix and used an IndexedSeq instead. Also I replaced the print with something less heavy (sum all entries).
Strangely the while construct is not the fastest. But if one uses Array instead of IndexedSeq it becomes the fastest by a large margin (factor 5, no boxing anymore). Using explicitly boxed integers, methods A, B, C are all equally fast. In particular they are faster by 50% compared to the implicitly boxed versions of A, B.
Results
A
4.907797735
4.369745787
4.375195012000001
4.7421321800000005
4.35150636
B
5.955951859000001
5.925475619
5.939570085000001
5.955592247
5.939672226000001
C
5.991946029
5.960122757000001
5.970733164
6.025532582
6.04999499
D
9.278486201
9.265983922
9.228320372
9.255641645
9.22281905
verify results
999000000
999000000
999000000
999000000
>$ scala -version
Scala code runner version 2.11.0 -- Copyright 2002-2013, LAMP/EPFL
Code excerpt
val matrix = IndexedSeq.tabulate(1000, 1000) { case (x, y) => x + y }
def variantA(): Int = {
var r = 0
for {
row <- matrix
elem <- row
}{
r += elem
}
r
}
def variantB(): Int = {
var r = 0
val f = (x:Int) => r += x
for {
row <- matrix
elem <- row
} f(elem)
r
}
def variantC(): Int = {
var r = 0
var i1 = 0
while(i1 < matrix.size){
var i2 = 0
val row = matrix(i1)
while(i2 < row.size){
r += row(i2)
i2 += 1
}
i1 += 1
}
r
}
def variantD(): Int = matrix.foldLeft(0)(_ + _.sum)

Scalacheck: Generate list corresponding to list of generators

I want to generate a list of integers corresponding to a list of generators in ScalaCheck.
import org.scalacheck._
import Arbitrary.arbitrary
val smallInt = Gen.choose(0,10)
val bigInt = Gen.choose(1000, 1000000)
val zeroOrOneInt = Gen.choose(0, 1)
val smallEvenInt = smallInt suchThat (_ % 2 == 0)
val gens = List(smallInt, bigInt, zeroOrOneInt, smallEvenInt)
//val listGen: Gen[Int] = ??
//println(listGen.sample) //should print something like List(2, 2000, 0, 6)
For the given gens, I would like to create a generator listGen whose valid sample can be List(2, 2000, 0, 6).
Here is my first attempt using tuples.
val gensTuple = (smallInt, bigInt, zeroOrOneInt, smallEvenInt)
val tupleGen = for {
a <- gensTuple._1
b <- gensTuple._2
c <- gensTuple._3
d <- gensTuple._4
} yield (a, b, c, d)
println(tupleGen.sample) // prints Some((1,318091,0,6))
This works, but I don't want to use tuples since the list of generators(gens) is created dynamically
and the size of the list is not fixed. Is there a way to do it with Lists?
I want the use the generator of the list(listGen) in scalacheck forAll property checking.
This looks like a toy problem but this is
the best I could do to create a standalone snippet reproducing the actual issue I am
facing.
How about using the Gen.sequence method? It transforms an Iterable[Gen[T]] into a Gen[C[T]], where C can be List:
def sequence[C[_],T](gs: Iterable[Gen[T]])(implicit b: Buildable[T,C]): Gen[C[T]] =
...
Just use Gen.sequence, but be careful as it will try to return a java.util.ArrayList[T] if you don't fully parameterize it (bug).
Full working example:
def genIntList(): Gen[List[Int]] = {
val gens = List(Gen.chooseNum(1, 2), Gen.chooseNum(3, 4))
Gen.sequence[List[Int], Int](gens)
}
println(genIntList.sample.get) // prints: List(1,4)
EDIT: Please disregard, this doesn't answer the asker's question
I can't comment on posts yet, so I'll have to venture a guess here. I presume the function 'sample' applies to the generators
Any reason why you can't do:
gens map (t=>t.sample)
For a more theoretical answer: the method you want is traverse, which is equivalent to sequence compose map although it might be more efficient. It is of the general form:
def traverse[C[_]: Traverse, F[_]: Applicative, A, B](f: A => F[B], t: C[A]): F[C[B]]
It behaves like map but allows you to carry around some extra Applicative structure during the traversal, sequencing it along the way.

How to implement lazy sequence (iterable) in scala?

I want to implement a lazy iterator that yields the next element in each call, in a 3-level nested loop.
Is there something similar in scala to this snippet of c#:
foreach (int i in ...)
{
foreach (int j in ...)
{
foreach (int k in ...)
{
yield return do(i,j,k);
}
}
}
Thanks, Dudu
Scala sequence types all have a .view method which produces a lazy equivalent of the collection. You can play around with the following in the REPL (after issuing :silent to stop it from forcing the collection to print command results):
def log[A](a: A) = { println(a); a }
for (i <- 1 to 10) yield log(i)
for (i <- (1 to 10) view) yield log(i)
The first will print out the numbers 1 to 10, the second will not until you actually try to access those elements of the result.
There is nothing in Scala directly equivalent to C#'s yield statement, which pauses the execution of a loop. You can achieve similar effects with the delimited continuations which were added for scala 2.8.
If you join iterators together with ++, you get a single iterator that runs over both. And the reduceLeft method helpfully joins together an entire collection. Thus,
def doIt(i: Int, j: Int, k: Int) = i+j+k
(1 to 2).map(i => {
(1 to 2).map(j => {
(1 to 2).iterator.map(k => doIt(i,j,k))
}).reduceLeft(_ ++ _)
}).reduceLeft(_ ++ _)
will produce the iterator you want. If you want it to be even more lazy than that, you can add .iterator after the first two (1 to 2) also. (Replace each (1 to 2) with your own more interesting collection or range, of course.)
You can use a Sequence Comprehension over Iterators to get what you want:
for {
i <- (1 to 10).iterator
j <- (1 to 10).iterator
k <- (1 to 10).iterator
} yield doFunc(i, j, k)
If you want to create a lazy Iterable (instead of a lazy Iterator) use Views instead:
for {
i <- (1 to 10).view
j <- (1 to 10).view
k <- (1 to 10).view
} yield doFunc(i, j, k)
Depending on how lazy you want to be, you may not need all of the calls to iterator / view.
If your 3 iterators are generally small (i.e., you can fully iterate them without concern for memory or CPU) and the expensive part is computing the result given i, j, and k, you can use Scala's Stream class.
val tuples = for (i <- 1 to 3; j <- 1 to 3; k <- 1 to 3) yield (i, j, k)
val stream = Stream(tuples: _*) map { case (i, j, k) => i + j + k }
stream take 10 foreach println
If your iterators are too large for this approach, you could extend this idea and create a Stream of tuples that calculates the next value lazily by keeping state for each iterator. For example (although hopefully someone has a nicer way of defining the product method):
def product[A, B, C](a: Iterable[A], b: Iterable[B], c: Iterable[C]): Iterator[(A, B, C)] = {
if (a.isEmpty || b.isEmpty || c.isEmpty) Iterator.empty
else new Iterator[(A, B, C)] {
private val aItr = a.iterator
private var bItr = b.iterator
private var cItr = c.iterator
private var aValue: Option[A] = if (aItr.hasNext) Some(aItr.next) else None
private var bValue: Option[B] = if (bItr.hasNext) Some(bItr.next) else None
override def hasNext = cItr.hasNext || bItr.hasNext || aItr.hasNext
override def next = {
if (cItr.hasNext)
(aValue get, bValue get, cItr.next)
else {
cItr = c.iterator
if (bItr.hasNext) {
bValue = Some(bItr.next)
(aValue get, bValue get, cItr.next)
} else {
aValue = Some(aItr.next)
bItr = b.iterator
(aValue get, bValue get, cItr.next)
}
}
}
}
}
val stream = product(1 to 3, 1 to 3, 1 to 3).toStream map { case (i, j, k) => i + j + k }
stream take 10 foreach println
This approach fully supports infinitely sized inputs.
I think the below code is what you're actually looking for... I think the compiler ends up translating it into the equivalent of the map code Rex gave, but is closer to the syntax of your original example:
scala> def doIt(i:Int, j:Int) = { println(i + ","+j); (i,j); }
doIt: (i: Int, j: Int)(Int, Int)
scala> def x = for( i <- (1 to 5).iterator;
j <- (1 to 5).iterator ) yield doIt(i,j)
x: Iterator[(Int, Int)]
scala> x.foreach(print)
1,1
(1,1)1,2
(1,2)1,3
(1,3)1,4
(1,4)1,5
(1,5)2,1
(2,1)2,2
(2,2)2,3
(2,3)2,4
(2,4)2,5
(2,5)3,1
(3,1)3,2
(3,2)3,3
(3,3)3,4
(3,4)3,5
(3,5)4,1
(4,1)4,2
(4,2)4,3
(4,3)4,4
(4,4)4,5
(4,5)5,1
(5,1)5,2
(5,2)5,3
(5,3)5,4
(5,4)5,5
(5,5)
scala>
You can see from the output that the print in "doIt" isn't called until the next value of x is iterated over, and this style of for generator is a bit simpler to read/write than a bunch of nested maps.
Turn the problem upside down. Pass "do" in as a closure. That's the entire point of using a functional language
Iterator.zip will do it:
iterator1.zip(iterator2).zip(iterator3).map(tuple => doSomething(tuple))
Just read the 20 or so first related links that are show on the side (and, indeed, where shown to you when you first wrote the title of your question).