Sized generators in scalacheck - scala

UserGuide of scalacheck project mentioned sized generators. The explanation code
def matrix[T](g:Gen[T]):Gen[Seq[Seq[T]]] = Gen.sized {size =>
val side = scala.Math.sqrt(size).asInstanceOf[Int] //little change to prevent compile-time exception
Gen.vectorOf(side, Gen.vectorOf(side, g))
}
explained nothing for me. After some exploration I understood that length of generated sequence does not depend on actual size of generator (there is resize method in Gen object that "Creates a resized version of a generator" according to javadoc (maybe that means something different?)).
val g = Gen.choose(1,5)
val g2 = Gen.resize(15, g)
println(matrix(g).sample) // (1)
println(matrix(g2).sample) // (2)
//1,2 produce Seq with same length
Could you explain me what had I missed and give me some examples how you use them in testing code?

The vectorOf (which now is replaced with listOf) generates lists with a size that depends (linearly) on the size parameter that ScalaCheck sets when it evaluates a generator. When ScalaCheck tests a property it will increase this size parameter for each test, resulting in properties that are tested with larger and larger lists (if listOf is used).
If you create a matrix generator by just using the listOf generator in a nested fashion, you will get matrices with a size that depends on the square of the size parameter. Hence when using such a generator in a property you might end up with very large matrices, since ScalaCheck increases the size parameter for each test run. However, if you use the resize generator combinator in the way it is done in the ScalaCheck User Guide, your final matrix size depend linearly on the size parameter, resulting in nicer performance when testing your properties.
You should really not have to use the resize generator combinator very often. If you need to generate lists that are bounded by some specific size, it's much better to do something like the example below instead, since there is no guarantee that the listOf/ containerOf generators really use the size parameter the way you expect.
def genBoundedList(maxSize: Int, g: Gen[T]): Gen[List[T]] = {
Gen.choose(0, maxSize) flatMap { sz => Gen.listOfN(sz, g) }
}

The vectorOf method that you use is deprecated , and you should use the listOf method. This generates a list of random length where the maximum length is limited by the size of the generator. You should therefore resize the generator that
actually generates the actual list if you want control over the maximum elements that are generated:
scala> val g1 = Gen.choose(1,5)
g1: org.scalacheck.Gen[Int] = Gen()
scala> val g2 = Gen.listOf(g1)
g2: org.scalacheck.Gen[List[Int]] = Gen()
scala> g2.sample
res19: Option[List[Int]] = Some(List(4, 4, 4, 4, 2, 4, 2, 3, 5, 1, 1, 1, 4, 4, 1, 1, 4, 5, 5, 4, 3, 3, 4, 1, 3, 2, 2, 4, 3, 4, 3, 3, 4, 3, 2, 3, 1, 1, 3, 2, 5, 1, 5, 5, 1, 5, 5, 5, 5, 3, 2, 3, 1, 4, 3, 1, 4, 2, 1, 3, 4, 4, 1, 4, 1, 1, 4, 2, 1, 2, 4, 4, 2, 1, 5, 3, 5, 3, 4, 2, 1, 4, 3, 2, 1, 1, 1, 4, 3, 2, 2))
scala> val g3 = Gen.resize(10, g2)
g3: java.lang.Object with org.scalacheck.Gen[List[Int]] = Gen()
scala> g3.sample
res0: Option[List[Int]] = Some(List(1))
scala> g3.sample
res1: Option[List[Int]] = Some(List(4, 2))
scala> g3.sample
res2: Option[List[Int]] = Some(List(2, 1, 2, 4, 5, 4, 2, 5, 3))

Related

Printing specific output in Scala

I have the following array of arrays that represents a cycle in a graph that I want to print in the below format.
scala> result.collect
Array[Array[Long]] = Array(Array(0, 1, 4, 0), Array(1, 5, 2, 1), Array(1, 4, 0, 1), Array(2, 3, 5, 2), Array(2, 1, 5, 2), Array(3, 5, 2, 3), Array(4, 0, 1, 4), Array(5, 2, 3, 5), Array(5, 2, 1, 5))
0:0->1->4;
1:1->5->2;1->4->0;
2:2->3->5;2->1->5;
3:3->5->2;
4:4->0->1;
5:5->2->3;5->2->1;
How can I do this? I have tried to do a for loop with if statements like other coding languages but scala's ifs in for loops are for filtering and cannot make use if/else to account for two different criteria.
example python code
for (array,i) in enumerate(range(0,result.length)):
if array[i] == array[i+1]:
//print thing needed
else:
// print other thing
I also tried to do result.groupBy to make it easier to print but doing that ruins the arrays.
Array[(Long, Iterable[Array[Long]])] = Array((4,CompactBuffer([J#3677a08a)), (0,CompactBuffer([J#695fd7e)), (1,CompactBuffer([J#50b0f441, [J#142efc4d)), (3,CompactBuffer([J#1fd66db2)), (5,CompactBuffer([J#36811d3b, [J#61c4f556)), (2,CompactBuffer([J#2eba1b7, [J#2efcf7a5)))
Is there a way to nicely print the output needed in Scala?
This should do it:
result
.groupBy(_.head)
.toArray
.sortBy(_._1)
.map {
case (node, cycles) =>
val paths = cycles.map { cycle =>
cycle
.init // drop last node
.mkString("->")
}
s"$node:${paths.mkString(";")}"
}
.mkString(";\n")
This is the output for the sample input you provided:
0:0->1->4;
1:1->5->2;1->4->0;
2:2->3->5;2->1->5;
3:3->5->2;
4:4->0->1;
5:5->2->3;5->2->1

Why are Sets of up to 4 elements ordered but larger ones are not?

Given
val xs1 = Set(3, 2, 1, 4, 5, 6, 7)
val ys1 = Set(7, 2, 1, 4, 5, 6, 3)
xs1 and ys1 both result in scala.collection.immutable.Set[Int] = Set(5, 1, 6, 2, 7, 3, 4)
but smaller sets bellow
val xt1 = Set(1, 2, 3)
val yt1 = Set(3, 2, 1)
produce
xt1: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
yt1: scala.collection.immutable.Set[Int] = Set(3, 2, 1)
Why are former not ordered whilst latter seem to be ordered?
Difference in behaviour is due to optimisations for Sets of up to 4 elements
The default implementation of an immutable set uses a representation
that adapts to the number of elements of the set. An empty set is
represented by just a singleton object. Sets of sizes up to four are
represented by a single object that stores all elements as fields.
Beyond that size, immutable sets are implemented as Compressed
Hash-Array Mapped Prefix-tree.
similarly explained by Ben James:
Set is also a companion object* with an apply** method. When you call
Set(...), you're calling this factory method and getting a return
value which is some kind of Set. It might be a HashSet, but could be
some other implementation. According to 2, the default implementation
for an immutable set has special representation for empty set and sets
size up to 4. Immutable sets size 5 and above and mutable sets all use
hashSet.
Since size of Set(3, 2, 1, 4, 5, 6, 7) is greater than 4, then its concrete implementation is HashSet
Set(3, 2, 1, 4, 5, 6, 7).getClass
class scala.collection.immutable.HashSet
which does not guarantee insertion order. On the other hand, concrete implementation of Set(1, 2, 3) is dedicated class Set3
Set(1,2,3).getClass
class scala.collection.immutable.Set$Set3
which stores the three elements in corresponding three fields
final class Set3[A] private[collection] (elem1: A, elem2: A, elem3: A) extends AbstractSet[A] ...

How would I unit test a method that internally shuffles an array randomly

I have a method that looks like this:
def compute[T](l: List[T]): List[T] = {
val shuffled = util.Random.shuffle(l)
// do some more computations
}
I wanted to seed the random number generator for my unit tests so that I don't have to break down my method into two methods and test only the computation, since this is the method that will be used externally. Is this possible to do in ScalaTest?
I don't have much background in ScalaTest, but if you call setSeed(seed: Long): Unit then you'll always get the same shuffle for any given seed value.
scala> util.Random.shuffle(Seq(1,2,3,4,5,6,7,8,9,0))
res0: Seq[Int] = List(6, 0, 8, 5, 4, 7, 2, 3, 1, 9)
scala> util.Random.setSeed(57L)
scala> util.Random.shuffle(Seq(1,2,3,4,5,6,7,8,9,0))
res1: Seq[Int] = List(5, 3, 2, 0, 6, 8, 7, 4, 1, 9)
scala> util.Random.setSeed(57L)
scala> util.Random.shuffle(Seq(1,2,3,4,5,6,7,8,9,0))
res2: Seq[Int] = List(5, 3, 2, 0, 6, 8, 7, 4, 1, 9)
how would you test the method that depends on current time? or database state? or response from google? or sleeping for some time?
generally when you test any code that depends on some external state (time, other system, or in your case: entropy / seed), you refactor that code and extract the dependency. one way, as #jwvh said is to extract seed. but imho, you should extract the whole transformation
therefore you should create a method that receives the shuffled array and test that method

Flattening a list of lists to a set with exceptions in scala

This feels like a peculiar problem and I am very new to Scala, so I don't know how to ask the right questions in order to get progress on this problem.
As a demonstration, say I have a list of lists like this:
val data = List(List(1, 2, 3, 4), List(1, 2, 2, 3, 4), List(1, 2, 3, 3, 3, 4), List(1, 2, 3, 4), List(2, 3, 4))
and I want to be able to reduce it down to a List of integers that looks mostly like a distinct set of the multiple lists, with one exception: where each list has more than one of each integer, I want to represent that in the final list. So as a general rule, the list with the most representations of that integer will have "their" repetitions of that integer in the final list. So that would ideally give:
List(1, 2, 2, 3, 3, 3, 4)
I know I can do data.flatten.distinct and get:
List(1, 2, 3, 4)
but that's not what I want and I know there's probably a bit more work to get to the desired result.
I am wondering if there is a good way to achieve the desired result in a functional way in scala.
Try this
val data = List(List(1, 2, 3, 4), List(1, 2, 2, 3, 4), List(1, 2, 3, 3, 3, 4), List(1, 2, 3, 4), List(2, 3, 4))
val map = data.map(_.groupBy(identity)).foldLeft(Map[Int, List[Int]]()) {
case (r, c) => r ++ c.map {
case (k, v) => k -> (if (v.size > r.getOrElse(k, List()).size) v else r(k))
}
}.values.flatten
//> map : Iterable[Int] = List(2, 2, 4, 1, 3, 3, 3)
It does not maintain the ordering. After this you can call to sort this.
Maybe this is cleaner
data.flatMap(_.groupBy(identity)).groupBy(_._1).mapValues(_.sortBy(_._2.size).reverse(0)._2).values.flatten
//> res0: Iterable[Int] = List(2, 2, 4, 1, 3, 3, 3)
I don't quite get it, but you can just order elements
data.flatten.sorted
Which would give you
List(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4)
if you want them ordered by number of encounters, you can do it like this:
data.flatten.groupBy(k => k).mapValues(_.size).toList.sortBy(_._2).map(_._1)
which would give you
List(1, 4, 2, 3)

Remove unique items from sequence [duplicate]

This question already has answers here:
How to get a set of all elements that occur multiple times in a list in Scala?
(2 answers)
Closed 8 years ago.
I find a lot about how to remove duplicates, but what is the most elegant way to remove unique items first and then the remaining duplicates.
E.g. a sequence (1, 2, 5, 2, 3, 4, 4, 0, 2) should be converted into (2, 4).
I can think of using a for-loop to add a count to each distinct item, but I could imagine that Scala has a more elegant way to achieve this.
distinct and diff will works for you:
val a = List(1, 2, 5, 2, 3, 4, 4, 0, 2)
> a: List[Int] = List(1, 2, 5, 2, 3, 4, 4, 0, 2)
val b = a diff a.distinct
> b: List[Int] = List(2, 4, 2)
val c = (a diff a.distinct).distinct
> c: List[Int] = List(2, 4)
In place distinct you can use toSet as well.
Also keep in mind that i => i can be replaced by identity and map(_._1) by keys, like this:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(identity).filter(_._2.size > 1).keys.toSeq
This is where a countByKey method, such as the one that can be found in Spark's API, would be useful.
Pretty straight forward:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(i => i).filter(_._2.size > 1).map(_._1).toSeq
Using the link from Ende Neu I think your code would become this:
Seq(1, 2, 5, 2, 3, 4, 4, 0, 2).groupBy(identity).collect { case (v, l) if l.length > 1 => v } toSeq