Common practice how to deal with Integer overflows? - scala

What's the common practice to deal with Integer overflows like 999999*999999 (result > Integer.MAX_VALUE) from an Application Development Team point of view?
One could just make BigInt mandatory and prohibit the use of Integer, but is that a good/bad idea?

If it is extremely important that the integer not overflow, you can define your own overflow-catching operations, e.g.:
def +?+(i: Int, j: Int) = {
val ans = i.toLong + j.toLong
if (ans < Int.MinValue || ans > Int.MaxValue) {
throw new ArithmeticException("Int out of bounds")
}
ans.toInt
}
You may be able to use the enrich-your-library pattern to turn this into operators; if the JVM manages to do escape analysis properly, you won't get too much of a penalty for it:
class SafePlusInt(i: Int) {
def +?+(j: Int) = { /* as before, except without i param */ }
}
implicit def int_can_be_safe(i: Int) = new SafePlusInt(i)
For example:
scala> 1000000000 +?+ 1000000000
res0: Int = 2000000000
scala> 2000000000 +?+ 2000000000
java.lang.ArithmeticException: Int out of bounds
at SafePlusInt.$plus$qmark$plus(<console>:12)
...
If it is not extremely important, then standard unit testing and code reviews and such should catch the problem in the large majority of cases. Using BigInt is possible, but will slow your arithmetic down by a factor of 100 or so, and won't help you when you have to use an existing method that takes an Int.

By far the most common practice regarding integer overflows is that programmers are expected to know that the issue exists, to watch for cases where they might happen, and to make the appropriate checks or rearrange the math so that overflows won't happen, things like doing a * (b / c) rather than (a * b) / c . If the project uses unit test, they will include cases to try and force overflows to happen.
I have never worked on or seen code from a team that required more than that, so I'm going to say that's good enough for almost all software.
The one embedded application I've seen that actually, honest-to-spaghetti-monster NEEDED to prevent overflows, they did it by proving that overflows weren't possible in each line where it looked like they might happen.

If you're using Scala (and based on the tag I'm assuming you are), one very generic solution is to write your library code against the scala.math.Integral type class:
def naturals[A](implicit f: Integral[A]) =
Stream.iterate(f.one)(f.plus(_, f.one))
You can also use context bounds and Integral.Implicits for nicer syntax:
import scala.math.Integral.Implicits._
def squares[A: Integral] = naturals.map(n => n * n)
Now you can use these methods with either Int or Long or BigInt as needed, since instances of Integral exist for all of them:
scala> squares[Int].take(10).toList
res0: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
scala> squares[Long].take(10).toList
res0: List[Long] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
scala> squares[BigInt].take(10).toList
res1: List[BigInt] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
No need to change the library code: just use Long or BigInt where overflow is a concern and Int otherwise.
You will pay some penalty in terms of performance, but the genericity and the ability to defer the Int-or-BigInt decision may be worth it.

In addition to simple mindfulness, as noted by #mjfgates, there are a couple of practices that I always use when dealing with scaled-decimal (non-floating-point) real-world quantities. This may not be on point for your particular application - apologies in advance if not.
First, if there are multiple units of measure in use, values must always clearly identify what they are. This can be by naming convention, or by using a separate class for each unit of measure. I've always just used names - a suffix on every variable name. In addition to eliminating errors from confusion over the units, it encourages thinking about overflow because the measures are less likely to be thought of as just numbers.
Second, my most frequent source of overflow concern is usually rescaling - converting from one measure to another - when it requires a lot of significant digits. For example, the conversion factor from cm to inches is 0.393700787402. In order to avoid both overflow and loss of significant digits, you need to be careful to multiply and divide in the right order. I haven't done this in a long time, but I believe what you want is something like:
Add to Rational.scala, from The Book:
def rescale(i:Int) : Int = {
(i * (numer/denom)) + (i/denom * (numer % denom))
Then you get as results (shortened from a specs2 test):
val InchesToCm = new Rational(1000000000,393700787)
InchesToCm.rescale(393700787) must_== 1000000000
InchesToCm.rescale(1) must_== 2
This doesn't round, or deal with negative scaling factors.
A production implementation may want to factor out numer/denom and numer % denom.

Related

Warn about or avoid integer division (resulting in truncation) in scala

Consider
1 / 2
or
val x: Int = ..
val n: Int = ..
x / n
Both of these equal .. 0 .. since integer division results in truncation.
Also: (this is my typical use case):
val averageListSize = myLists.map(_.length).sum()/myLists.length
This has bitten me a few times when it occurs in the middle of long calculations: the first impulse is to check what logical errors have been introduced. Only after some period of debugging and head scratching does the true culprit arise.
Is there any way to expose this behavior more clearly - e.g. a warning or some (unknown-to-me) language setting or construction that would either alert to or avoid this intermittent scenario?
To the best of my knowledge, the Scala compiler does not seem to provide a warning flag that could allow you to raise a warning (documentation here).
What you could do, however, if you find the effort worth it, is using Scalafix and write your own custom rule to detect integer divisions and report warnings about it.
The following is a short example of a rule that can detect integer division on integer literals:
import scalafix.lint.{Diagnostic, LintSeverity}
import scalafix.patch.Patch
import scalafix.v1.{SemanticDocument, SemanticRule}
import scala.meta.inputs.Position
import scala.meta.{Lit, Term}
class IntDivision extends SemanticRule("IntDivision") {
override def fix(implicit doc: SemanticDocument): Patch =
doc.tree.collect({
case term # Term.ApplyInfix((_: Lit.Int, Term.Name("/"), Nil, _: List[Lit.Int])) =>
Patch.lint(new Diagnostic {
override final val severity: LintSeverity = LintSeverity.Warning
override final val message: String = "Integer division"
override final val position: Position = term.pos
})
}).asPatch
}
When run on the following piece of code:
object Main {
def main(args: Array[String]): Unit = {
println(1 / 2)
}
}
Scalafix will produce the following warning:
[warn] /path/to/Main.scala:3:13: warning: [IntDivision] Integer division
[warn] println(1 / 2)
[warn] ^^^^^
If the / op doesn't work for you, make one that does.
implicit class Divider[N](numer :N)(implicit evN :Numeric[N]) {
def /![D](denom :D)(implicit evD :Numeric[D]) :Double =
evN.toDouble(numer) / evD.toDouble(denom)
}
testing:
1 /! 2 //res0: Double = 0.5
5.2 /! 2 //res1: Double = 2.6
22 /! 1.1 //res2: Double = 20.0
2.2 /! 1.1 //res3: Double = 2.0
Any division operation can result in truncation or rounding. This is most noticeable with Int but can happen with all numeric types (e.g. 1.0/3.0). All data types have a restricted range and accuracy, and so the result of any calculation may be adjusted to fit into the resulting data type.
It is not clear that adding warnings for the specific case of Int division is going to help. It is not possible to catch all such issues, and giving warnings in some cases may lead to a false sense of security. It is also going to cause lots of warnings for perfectly valid code.
The solution is to look carefully at any calculations in a program and be aware of the range and accuracy limitations of each operation. If there is any serious computation involved it is a good idea to get a basic grounding in Numerical Analysis.

Expensive flatMap() operation on streams originating from Stream.emits()

I just encountered an issue with degrading fs2 performance using a stream of strings to be written to a file via text.utf8encode. I tried to change my source to use chunked strings to increase performance, but the observation was performance degradation instead.
As far as I can see, it boils down to the following: Invoking flatMap on a stream that originates from Stream.emits() can be very expensive. Time usage seems to be exponential based on the size of the sequence passed to Stream.emits(). The code snippet below shows an example:
/*
Test done with scala 2.11.11 and fs2 version 0.10.0-M7.
*/
val rangeSize = 20000
val integers = (1 to rangeSize).toVector
// Note that the last flatMaps are just added to show extreme load for streamA.
val streamA = Stream.emits(integers).flatMap(Stream.emit(_))
val streamB = Stream.range(1, rangeSize + 1).flatMap(Stream.emit(_))
streamA.toVector // Uses approx. 25 seconds (!)
streamB.toVector // Uses approx. 15 milliseconds
Is this a bug, or should usage of Stream.emits() for large sequences be avoided?
TLDR: Allocations.
Longer answer:
Interesting question. I ran a JFR profile on both methods separately, and looked at the results. First thing which immediately attracted my eye was the amount of allocations.
Stream.emit:
Stream.range:
We can see that Stream.emit allocates a significant amount of Append instances, which are the concrete implementation of Catenable[A], which is the type used in Stream.emit to fold:
private[fs2] final case class Append[A](left: Catenable[A], right: Catenable[A]) extends Catenable[A]
This actually comes from the implementation of how Catenable[A] implemented foldLeft:
foldLeft(empty: Catenable[B])((acc, a) => acc :+ f(a))
Where :+ allocates a new Append object for each element. This means we're at least generating 20000 such Append objects.
There is also a hint in the documentation of Stream.range about how it produces a single chunk instead of dividing the stream further, which may be bad if this was a big range we're generating:
/**
* Lazily produce the range `[start, stopExclusive)`. If you want to produce
* the sequence in one chunk, instead of lazily, use
* `emits(start until stopExclusive)`.
*
* #example {{{
* scala> Stream.range(10, 20, 2).toList
* res0: List[Int] = List(10, 12, 14, 16, 18)
* }}}
*/
def range(start: Int, stopExclusive: Int, by: Int = 1): Stream[Pure,Int] =
unfold(start){i =>
if ((by > 0 && i < stopExclusive && start < stopExclusive) ||
(by < 0 && i > stopExclusive && start > stopExclusive))
Some((i, i + by))
else None
}
You can see that there is no additional wrapping here, only the integers that get emitted as part of the range. On the other hand, Stream.emits creates an Append object for every element in the sequence, where we have a left containing the tail of the stream, and right containing the current value we're at.
Is this a bug? I would say no, but I would definitely open this up as a performance issue to the fs2 library maintainers.

chisel3 arithmetic operations on Doubles

Please I have problems manipulating arithmetic operations with doubles in chisel. I have been seeing examples that uses just the following types: Int,UInt,SInt.
I saw here that arithmetic operations where described only for SInt and UInt. What about Double?
I tried to declare my output out as Double, but didn't know how. Because the output of my code is Double.
Is there a way to declare in Bundle an input and an output of type Double?
Here is my code:
class hashfunc(val k:Int, val n: Int ) extends Module {
val a = k + k
val io = IO(new Bundle {
val b=Input(UInt(k.W))
val w=Input(UInt(k.W))
var out = Output(UInt(a.W))
})
val tabHash1 = new Array[Array[Double]](n)
val x = new ArrayBuffer[(Double, Data)]
val tabHash = new Array[Double](tabHash1.size)
for (ind <- tabHash1.indices){
var sum=0.0
for (ind2 <- 0 until x.size){
sum += ( x(ind2) * tabHash1(ind)(ind2) )
}
tabHash(ind) = ((sum + io.b) / io.w)
}
io.out := tabHash.reduce(_ + _)
}
When I compile the code, I get the following error:
code error
Thank you for your kind attention, looking forward to your responses.
Chisel does have a native FixedPoint type which maybe of use. It is in the experimental package
import chisel3.experimental.FixedPoint
There is also a project DspTools that has simulation support for Doubles. There are some nice features, e.g. it that allows modules to parameterized on the numeric types (Complex, Double, FixedPoint, SInt) so that you can run simulations on double to validate the desired mathematical behavior and then switch to a synthesizable number format that meets your precision criteria.
DspTools is an ongoing research projects and the team would appreciate outside users feedback.
Operations on floating point numbers (Double in this case) are not supported directly by any HDL. The reason for this is that while addition/subtraction/multiplication of fixed point numbers is well defined there are a lot of design space trade-offs for floating point hardware as it is a much more complex piece of hardware.
That is to say, a high performance floating point unit is a significant piece of hardware in it's own right and would be time shared in any realistic design.

Generic Numeric division

As a general rule, we can take any value of any number type, and divide it by any non-zero value of any number type, and get a reasonable result.
212.7 / 6 // Double = 35.449999999999996
77L / 2.1F // Float = 36.666668
The one exception, that I've found, is that we can't mix a BigInt with a fractional type (Float or Double).
In the realm of generics, however, there's this interesting distinction between Integral and Fractional types.
// can do this
def divideI[I](a: I, b: I)(implicit ev: Integral[I]) = ev.quot(a,b)
// or this
def divideF[F](a: F, b: F)(implicit ev: Fractional[F]) = ev.div(a,b)
// but not this
def divideN[N](a: N, b: N)(implicit ev: Numeric[N]) = ev.???(a,b)
While I am curious as to why this is, the real question is: Is there some kind of workaround available to sidestep this limitation?
The reason is because integer division and float division are two very different operations, so all Numerics do not share a common division operation, although humans might think of them both as "division."
The workaround would be to create 4 division operations: Integral/Integral, Integral/Fractional, Fractional/Integral, Fractional/Fractional. Do the calculation in whatever application-specific way you feel is appropriate. When I did this for a calculator I wrote, I kept it in Integral if possible, and cast to Double otherwise.
My understanding is that these traits describe sets closed under defined operations:
Numeric is closed under plus, minus, times, negate,
Fractional adds div (i.e. plus, minus, times, negate, div),
Integral adds quot and rem (i.e. plus, minus, times, negate, quot, rem).
Why do you want to sidestep it?

What's the best way to round a double or float in Scala?

// Purpose: Determine attendance based on ticket-price
// Example: attendance(4.90) == 135
def attendance: Double => Int = {
(ticket_price: Double) => {
120 + math.ceil(150 * (5.00 - ticket_price)).toInt
}
} //> attendance: => Double => Int
attendance(4.90) //> res0: Int = 135
assert(attendance(4.90) == 135)
Basically the assert was blowing up and attendance was returning 134 instead of 135. So I threw math.ceil at it and it worked. But I was just wondering if that's the best/proper/idiomatic way to do it.
For those who wonder where this code came from: attendance code
When working with money, you should not use float/double types. I know these ways:
Use integer numbers (i.e. Short, Int, Long etc.) with the smallest possible values (e.g. cents, satoshis, ...). This might be enhanced by value classes in Scala.
Use precise arithmetics like BigDecimal.
Use fixed point arithmetics with arbitrary precision. (This is basically the same with a).)
Note that you should be aware of integer overflows when working with money.