MathContexts in BigDecimals - ScalaCheck generator creates BigDecimals which can't be serialized then deserialized. How to use MathContexts correctly? - scala

I discovered an issue in Scalacheck whereby arbitrary[BigDecimal] would generate BigDecimals which could not be converted to Strings and then back into BigDecimals, and I'm trying to work with the creator to find a fix for it, but I'm unsure of how the MathContexts come into play.
The original generator looks like this:
/** Arbitrary BigDecimal */
implicit lazy val arbBigDecimal: Arbitrary[BigDecimal] = {
import java.math.MathContext._
val mcGen = oneOf(UNLIMITED, DECIMAL32, DECIMAL64, DECIMAL128)
val bdGen = for {
x <- arbBigInt.arbitrary
mc <- mcGen
limit <- const(if(mc == UNLIMITED) 0 else math.max(x.abs.toString.length - mc.getPrecision, 0))
scale <- Gen.chooseNum(Int.MinValue + limit , Int.MaxValue)
} yield {
try {
BigDecimal(x, scale, mc)
} catch {
case ae: java.lang.ArithmeticException => BigDecimal(x, scale, UNLIMITED) // Handle the case where scale/precision conflict
}
}
Arbitrary(bdGen)
}
The problem lies within the fact that the BigDecimal constructor used inverts the sign of the scale argument, thereby making Int.MinValue turn into a scale bigger than 2^32 -1.
scala> val orig = BigDecimal(BigInt("-28334198897217871282176"), -2147483640, UNLIMITED)
orig: scala.math.BigDecimal = -2.8334198897217871282176E+2147483662
scala> BigDecimal(orig.toString)
java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:554)
at java.math.BigDecimal.<init>(BigDecimal.java:383)
at java.math.BigDecimal.<init>(BigDecimal.java:806)
at scala.math.BigDecimal$.exact(BigDecimal.scala:125)
at scala.math.BigDecimal$.apply(BigDecimal.scala:283)
... 33 elided
The core of the fix is to increase the lower bound by the number of digits in the unscaledVal, but I only thought of a way to do it with only MathContext.UNLIMITED. I fear we miss out on generator robustness if we do that:
lazy val genBigDecimal: Gen[BigDecimal] = for {
unscaledVal <- arbitrary[BigInt]
scale <- Gen.chooseNum(Int.MinValue + unscaledVal.abs.toString.length, Int.MaxValue)
} yield BigDecimal(unscaledVal, scale)
So, if we want to keep using the other MathContexts, what do we have to do to ensure we use them correctly?

Related

Returning a non-unit value from a scala while loop

private def canProceed: Boolean = {
val startTime = System.currentTimeMillis
val endTime = startTime + (5 * 1000)
while (System.currentTimeMillis < endTime) {
if (isSafe) { // method where my current implementation is just true or false for testing
true
} else {
println("Not safe. Trying again")
}
}
false
}
This will just keep iterating through the while loop since the true from the conditional doesn't actually do anything as a scala while loop always returns a Unit, so the final result will always be false. Is there some idiomatic way to do this without leveraging var or return?
Well technically speaking you could write your own tail-recursive function like this following.
def attemptWhile(cond: => Boolean)(check: => Boolean): Boolean = {
#annotation.tailrec
def loop(): Boolean = {
if (cond) check || loop()
else false
}
loop()
}
Which then you could use like:
private def canProceed: Boolean = {
val startTime = System.currentTimeMillis
val endTime = startTime + (5 * 1000)
attemptWhile(System.currentTimeMillis < endTime)(isSafe)
}
However, this is still depending on mutable state so not really functional.
At that point, you may wonder if it is worth the effort adding that attemptWhile or just use return.
BTW, if you want a fully functional solution this seems something that could be solved using something like fs2 but maybe that is just too off-topic for you.
You are basically asking whether or not you can do anything that's not a side effect inside a while loop and observe it on the outside.
Just like you cannot do anything worthwhile without side-effects in a while loop, you cannot observe anything on the outside of a while loop without side effects. The var and return keywords are directly related to side effects.
(Of course you can observe the heating up of your CPU or the slowdown of your program, but this is usually disregarded as it doesn't pertain to computation directly)
edit
For yield is there to address exactly this "problem".
scala> for {
| x <- List(1,2,3)
| y <- List(4,5,6)
| } yield (x,y)
val res0: List[(Int, Int)] = List((1,4), (1,5), (1,6), (2,4), (2,5), (2,6), (3,4), (3,5), (3,6))
You may want to look into effect types (Cats effect, Zio). They deal with exactly the case of turning effects into values. In your case, there are 2 effects:
System.currentTimeMillis this value is mutable and changes over time
isSafe has to be mutable and changed from the outside.
edit2
Here's an example of how you can iterate a side effect with Cats Effect
import cats.effect.{ExitCode, IO, IOApp}
import cats.implicits.catsSyntaxMonad
import scala.concurrent.duration.DurationInt
object CheckTime extends IOApp {
val checkTime: IO[Long] = IO.delay { System.currentTimeMillis() }
val waitForSomeTime: IO[Long] = for {
start <- checkTime
_ <- IO.delay { println(s"Start time: $start") }
stopTime <- checkTime.iterateWhile(currentTime => currentTime - start < 10.seconds.toMillis)
_ <- IO.delay { println(s"Ended at $stopTime") }
} yield stopTime
override def run(args: List[String]): IO[ExitCode] = waitForSomeTime.as(ExitCode.Success)
}
Output:
Start time: 1608103950673
Ended at 1608103960673

TapeEquilibrium ScalaCheck

I have been trying to code some scalacheck property to verify the Codility TapeEquilibrium problem. For those who do not know the problem, see the following link: https://app.codility.com/programmers/lessons/3-time_complexity/tape_equilibrium/.
I coded the following yet incomplete code.
test("Lesson 3 property"){
val left = Gen.choose(-1000, 1000).sample.get
val right = Gen.choose(-1000, 1000).sample.get
val expectedSum = Math.abs(left - right)
val leftArray = Gen.listOfN(???, left) retryUntil (_.sum == left)
val rightArray = Gen.listOfN(???, right) retryUntil (_.sum == right)
val property = forAll(leftArray, rightArray){ (r: List[Int], l: List[Int]) =>
val array = (r ++ l).toArray
Lesson3.solution3(array) == expectedSum
}
property.check()
}
The idea is as follows. I choose two random numbers (values left and right) and calculate its absolute difference. Then, my idea is to generate two arrays. Each array will be random numbers whose sum will be either "left" or "right". Then by concatenating these array, I should be able to verify this property.
My issue is then generating the leftArray and rightArray. This itself is a complex problem and I would have to code a solution for this. Therefore, writing this property seems over-complicated.
Is there any way to code this? Is coding this property an overkill?
Best.
My issue is then generating the leftArray and rightArray
One way to generate these arrays or (lists), is to provide a generator of nonEmptyList whose elements sum is equal to a given number, in other word, something defined by method like this:
import org.scalacheck.{Gen, Properties}
import org.scalacheck.Prop.forAll
def listOfSumGen(expectedSum: Int): Gen[List[Int]] = ???
That verifies the property:
forAll(Gen.choose(-1000, 1000)){ sum: Int =>
forAll(listOfSumGen(sum)){ listOfSum: List[Int] =>
(listOfSum.sum == sum) && listOfSum.nonEmpty
}
}
To build such a list only poses a constraint on one element of the list, so basically here is a way to generate:
Generate list
The extra constrained element, will be given by the expectedSum - the sum of list
Insert the constrained element into a random index of the list (because obviously any permutation of the list would work)
So we get:
def listOfSumGen(expectedSum: Int): Gen[List[Int]] =
for {
list <- Gen.listOf(Gen.choose(-1000,1000))
constrainedElement = expectedSum - list.sum
index <- Gen.oneOf(0 to list.length)
} yield list.patch(index, List(constrainedElement), 0)
Now we the above generator, leftArray and rightArray could be define as follows:
val leftArray = listOfSumGen(left)
val rightArray = listOfSumGen(right)
However, I think that the overall approach of the property described is incorrect, as it builds an array where a specific partition of the array equals the expectedSum but this doesn't ensure that another partition of the array would produce a smaller sum.
Here is a counter-example run-through:
val left = Gen.choose(-1000, 1000).sample.get // --> 4
val right = Gen.choose(-1000, 1000).sample.get // --> 9
val expectedSum = Math.abs(left - right) // --> |4 - 9| = 5
val leftArray = listOfSumGen(left) // Let's assume one of its sample would provide List(3,1) (whose sum equals 4)
val rightArray = listOfSumGen(right)// Let's assume one of its sample would provide List(2,4,3) (whose sum equals 9)
val property = forAll(leftArray, rightArray){ (l: List[Int], r: List[Int]) =>
// l = List(3,1)
// r = List(2,4,3)
val array = (l ++ r).toArray // --> Array(3,1,2,4,3) which is the array from the given example in the exercise
Lesson3.solution3(array) == expectedSum
// According to the example Lesson3.solution3(array) equals 1 which is different from 5
}
Here is an example of a correct property that essentially applies the definition:
def tapeDifference(index: Int, array: Array[Int]): Int = {
val (left, right) = array.splitAt(index)
Math.abs(left.sum - right.sum)
}
forAll(Gen.nonEmptyListOf(Gen.choose(-1000,1000))) { list: List[Int] =>
val array = list.toArray
forAll(Gen.oneOf(array.indices)) { index =>
Lesson3.solution3(array) <= tapeDifference(index, array)
}
}
This property definition might collides with the way the actual solution has been implemented (which is one of the potential pitfall of scalacheck), however, that would be a slow / inefficient solution hence this would be more a way to check an optimized and fast implementation against slow and correct implementation (see this presentation)
Try this with c# :
using System;
using System.Collections.Generic;
using System.Linq;
private static int TapeEquilibrium(int[] A)
{
var sumA = A.Sum();
var size = A.Length;
var take = 0;
var res = new List<int>();
for (int i = 1; i < size; i++)
{
take = take + A[i-1];
var resp = Math.Abs((sumA - take) - take);
res.Add(resp);
if (resp == 0) return resp;
}
return res.Min();
}

How to get scalacheck to work on class with a Seq?

I have a case class that I am trying to test via ScalaCheck. The case class contains other classes.
Here are the classes:
case class Shop(name: String = "", colors: Seq[Color] = Nil)
case class Color(colorName: String = "", shades: Seq[Shade] = Nil)
case class Shade(shadeName: String, value: Int)
I have generators for each one
implicit def shopGen: Gen[Shop] =
for {
name <- Gen.alphaStr.suchThat(_.length > 0)
colors <- Gen.listOf(colorsGen)
} yield Shop(name, colors)
implicit def colorsGen: Gen[Color] =
for {
colorName <- Gen.alphaStr.suchThat(_.length > 0)
shades <- Gen.listOf(shadesGen)
} yield Color(colorName, shades)
implicit def shadesGen: Gen[Shade] =
for {
shadeName <- Gen.alphaStr.suchThat(_.length > 0) //**Note this**
value <- Gen.choose(1, Int.MaxValue)
} yield Shade(shadeName, value)
When I write my test and simply do the below:
property("Shops must encode/decode to/from JSON") {
"test" mustBe "test
}
I get an error and the test hangs and stops after 51 tries. The error I get is Gave up after 1 successful property evaluation. 51 evaluations were discarded.
If I remove Gen.alphaStr.suchThat(_.length > 0) from shadesGen and just replace it with Gen.alphaStr then it works.
Question
Why does having Gen.alphaStr work for shadesGen, however, Gen.alphaStr.suchThat(_.length > 0) does not?
Also when I run test multiple times (with Gen.alphaStr) some pass while some don't. Why is this?
You probably see this behavior because of the way listOf is implemented. Inside it is based on buildableOf which is in turn based on buildableOfN which has following comment:
... If the given generator fails generating a value, the
complete container generator will also fail.
Your data structure is essentially a list of lists so even one bad generation will curse the whole data-structure to be discarded. And obviously most of the failures happens at the bottom level. That's why removing the filter for shadeName helps. So to make it work you should generate more valid strings. You may change Gen.alphaStr to some custom-made generator based on nonEmptyListOf such as:
def nonemptyAlphaStr:Gen[String] = Gen.nonEmptyListOf(alphaChar).map(_.mkString)
Another simple way to work this around is to use retryUntil instead of suchThat such as in:
implicit def shadesGen: Gen[Shade] =
for {
//shadeName <- Gen.alphaStr.suchThat(_.length > 0) //**Note this**
shadeName <- Gen.alphaStr.retryUntil(_.length > 0)
value <- Gen.choose(1, Int.MaxValue)
} yield Shade(shadeName, value)

Scala how to decrease execution time

I have one method which generate UUID and code as below :
def generate(number : Int): List[String] = {
List.fill(number)(Generators.randomBasedGenerator().generate().toString.replaceAll("-",""))
}
and I called this as below :
for(i <-0 to 100) {
val a = generate(1000000)
println(a)
}
But for running the above for loop it take almost 8-9 minutes for execution, is there any other way to minimised execution time ?
Note: Here for understanding I added for loop but in real situation the generate method will call thousand of times from other request at same time.
The problem is the List. Filling a List with 1,000,000 generated and processed elements is going to take time (and memory) because every one of those elements has to be materialized.
You can generate an infinite number of processed UUID strings instantly if you don't have to materialize them until they are actually needed.
def genUUID :Stream[String] = Stream.continually {
Generators.randomBasedGenerator().generate().toString.filterNot(_ == '-')
}
val next5 = genUUID.take(5) //only the 1st (head) is materialized
next5.length //now all 5 are materialized
You can use Stream or Iterator for the infinite collection, whichever you find most conducive (or least annoying) to your work flow.
Basically you used not the fastest implementation. You should use that one when you pass Random to the constructor Generators.randomBasedGenerator(new Random(System.currentTimeMillis())). I did next things:
Use Array instead of List (Array is faster)
Removed string replacing, let's measure pure performance of generation
Dependency: "com.fasterxml.uuid" % "java-uuid-generator" % "3.1.5"
Result:
Generators.randomBasedGenerator(). Per iteration: 1579.6 ms
Generators.randomBasedGenerator() with passing Random Per iteration: 59.2 ms
Code:
import java.util.{Random, UUID}
import com.fasterxml.uuid.impl.RandomBasedGenerator
import com.fasterxml.uuid.{Generators, NoArgGenerator}
import org.scalatest.{FunSuiteLike, Matchers}
import scala.concurrent.duration.Deadline
class GeneratorTest extends FunSuiteLike
with Matchers {
val nTimes = 10
// Let use Array instead of List - Array is faster!
// and use pure UUID generators
def generate(uuidGen: NoArgGenerator, number: Int): Seq[UUID] = {
Array.fill(number)(uuidGen.generate())
}
test("Generators.randomBasedGenerator() without passed Random (secure one)") {
// Slow generator
val uuidGen = Generators.randomBasedGenerator()
// Warm up JVM
benchGeneration(uuidGen, 3)
val startTime = Deadline.now
benchGeneration(uuidGen, nTimes)
val endTime = Deadline.now
val perItermTimeMs = (endTime - startTime).toMillis / nTimes.toDouble
println(s"Generators.randomBasedGenerator(). Per iteration: $perItermTimeMs ms")
}
test("Generators.randomBasedGenerator() with passing Random (not secure)") {
// Fast generator
val uuidGen = Generators.randomBasedGenerator(new Random(System.currentTimeMillis()))
// Warm up JVM
benchGeneration(uuidGen, 3)
val startTime = Deadline.now
benchGeneration(uuidGen, nTimes)
val endTime = Deadline.now
val perItermTimeMs = (endTime - startTime).toMillis / nTimes.toDouble
println(s"Generators.randomBasedGenerator() with passing Random Per iteration: $perItermTimeMs ms")
}
private def benchGeneration(uuidGen: RandomBasedGenerator, nTimes: Int) = {
var r: Long = 0
for (i <- 1 to nTimes) {
val a = generate(uuidGen, 1000000)
r += a.length
}
println(r)
}
}
You could use scala's parallel collections to split the load on multiple cores/threads.
You could also avoid creating a new generator every time:
class Generator {
val gen = Generators.randomBasedGenerator()
def generate(number : Int): List[String] = {
List.fill(number)(gen.generate().toString.replaceAll("-",""))
}
}

Scala Datatype for numeric real range

Is there some idiomatic scala type to limit a floating point value to a given float range that is defined by a upper an lower bound?
Concrete i want to have a float type that is only allowed to have values between 0.0 and 1.0.
More concrete i am about to write a function that takes a Int and another function that maps this Int to the range between 0.0 and 1.0, in pseudo-scala:
def foo(x : Int, f : (Int => {0.0,...,1.0})) {
// ....
}
Already searched the boards, but found nothing appropriate. some implicit-magic or custom typedef would be also ok for me.
I wouldn't know how to do it statically, except with dependent types (example), which Scala doesn't have. If you only dealt with constants it should be possible to use macros or a compiler plug-in that performs the necessary checks, but if you have arbitrary float-typed expressions it is very likely that you have to resort to runtime checks.
Here is an approach. Define a class that performs a runtime check to ensure that the float value is in the required range:
abstract class AbstractRangedFloat(lb: Float, ub: Float) {
require (lb <= value && value <= ub, s"Requires $lb <= $value <= $ub to hold")
def value: Float
}
You could use it as follows:
case class NormalisedFloat(val value: Float)
extends AbstractRangedFloat(0.0f, 1.0f)
NormalisedFloat(0.99f)
NormalisedFloat(-0.1f) // Exception
Or as:
case class RangedFloat(val lb: Float, val ub: Float)(val value: Float)
extends AbstractRangedFloat(lb, ub)
val RF = RangedFloat(-0.1f, 0.1f) _
RF(0.0f)
RF(0.2f) // Exception
It would be nice if one could use value classes in order to gain some performance, but the call to requires in the constructor (currently) prohibits that.
EDIT : addressing comments by #paradigmatic
Here is an intuitive argument why types depending on natural numbers can be encoded in a type system that does not (fully) support dependent types, but ranged floats probably cannot: The natural numbers are an enumerable set, which makes it possible to encode each element as path-dependent types using Peano numerals. The real numbers, however, are not enumerable any more, and it is thus no longer possible to systematically create types corresponding to each element of the reals.
Now, computer floats and reals are eventually finite sets, but still way to large to be reasonably efficiently enumerable in a type system. The set of computer natural numbers is of course also very large and thus poses a problem for arithmetic over Peano numerals encoded as types, see the last paragraph of this article. However, I claim that it is often sufficient to work with the first n (for a rather small n) natural numbers, as, for example, evidenced by HLists. Making the corresponding claim for floats is less convincing - would it be better to encode 10,000 floats between 0.0 and 1.0, or rather 10,000 between 0.0 and 100.0?
Here is another approach using an implicit class:
object ImplicitMyFloatClassContainer {
implicit class MyFloat(val f: Float) {
check(f)
val checksEnabled = true
override def toString: String = {
// The "*" is just to show that this method gets called actually
f.toString() + "*"
}
#inline
def check(f: Float) {
if (checksEnabled) {
print(s"Checking $f")
assert(0.0 <= f && f <= 1.0, "Out of range")
println(" OK")
}
}
#inline
def add(f2: Float): MyFloat = {
check(f2)
val result = f + f2
check(result)
result
}
#inline
def +(f2: Float): MyFloat = add(f2)
}
}
object MyFloatDemo {
def main(args: Array[String]) {
import ImplicitMyFloatClassContainer._
println("= Checked =")
val a: MyFloat = 0.3f
val b = a + 0.4f
println(s"Result 1: $b")
val c = 0.3f add 0.5f
println("Result 2: " + c)
println("= Unchecked =")
val x = 0.3f + 0.8f
println(x)
val f = 0.5f
val r = f + 0.3f
println(r)
println("= Check applied =")
try {
println(0.3f add 0.9f)
} catch {
case e: IllegalArgumentException => println("Failed as expected")
}
}
}
It requires a hint for the compiler to use the implicit class, either by typing the summands explicitly or by choosing a method which is not provided by Scala's Float.
This way at least the checks are centralized, so you can turn it off, if performance is an issue. As mhs pointed out, if this class is converted to an implicit value class, the checks must be removed from the constructor.
I have added #inline annotations, but I'm not sure, if this is helpful/necessary with implicit classes.
Finally, I have had no success to unimport the Scala Float "+" with
import scala.{Float => RealFloat}
import scala.Predef.{float2Float => _}
import scala.Predef.{Float2float => _}
possibly there is another way to achieve this in order to push the compiler to use the implict class
You can use value classes as pointed by mhs:
case class Prob private( val x: Double ) extends AnyVal {
def *( that: Prob ) = Prob( this.x * that.x )
def opposite = Prob( 1-x )
}
object Prob {
def make( x: Double ) =
if( x >=0 && x <= 1 )
Prob(x)
else
throw new RuntimeException( "X must be between 0 and 1" )
}
They must be created using the factory method in the companion object, which will check that the range is correct:
scala> val x = Prob.make(0.5)
x: Prob = Prob(0.5)
scala> val y = Prob.make(1.1)
java.lang.RuntimeException: X must be between 0 and 1
However using operations that will never produce a number outside the range will not require validity check. For instance * or opposite.