Warn about or avoid integer division (resulting in truncation) in scala - scala

Consider
1 / 2
or
val x: Int = ..
val n: Int = ..
x / n
Both of these equal .. 0 .. since integer division results in truncation.
Also: (this is my typical use case):
val averageListSize = myLists.map(_.length).sum()/myLists.length
This has bitten me a few times when it occurs in the middle of long calculations: the first impulse is to check what logical errors have been introduced. Only after some period of debugging and head scratching does the true culprit arise.
Is there any way to expose this behavior more clearly - e.g. a warning or some (unknown-to-me) language setting or construction that would either alert to or avoid this intermittent scenario?

To the best of my knowledge, the Scala compiler does not seem to provide a warning flag that could allow you to raise a warning (documentation here).
What you could do, however, if you find the effort worth it, is using Scalafix and write your own custom rule to detect integer divisions and report warnings about it.
The following is a short example of a rule that can detect integer division on integer literals:
import scalafix.lint.{Diagnostic, LintSeverity}
import scalafix.patch.Patch
import scalafix.v1.{SemanticDocument, SemanticRule}
import scala.meta.inputs.Position
import scala.meta.{Lit, Term}
class IntDivision extends SemanticRule("IntDivision") {
override def fix(implicit doc: SemanticDocument): Patch =
doc.tree.collect({
case term # Term.ApplyInfix((_: Lit.Int, Term.Name("/"), Nil, _: List[Lit.Int])) =>
Patch.lint(new Diagnostic {
override final val severity: LintSeverity = LintSeverity.Warning
override final val message: String = "Integer division"
override final val position: Position = term.pos
})
}).asPatch
}
When run on the following piece of code:
object Main {
def main(args: Array[String]): Unit = {
println(1 / 2)
}
}
Scalafix will produce the following warning:
[warn] /path/to/Main.scala:3:13: warning: [IntDivision] Integer division
[warn] println(1 / 2)
[warn] ^^^^^

If the / op doesn't work for you, make one that does.
implicit class Divider[N](numer :N)(implicit evN :Numeric[N]) {
def /![D](denom :D)(implicit evD :Numeric[D]) :Double =
evN.toDouble(numer) / evD.toDouble(denom)
}
testing:
1 /! 2 //res0: Double = 0.5
5.2 /! 2 //res1: Double = 2.6
22 /! 1.1 //res2: Double = 20.0
2.2 /! 1.1 //res3: Double = 2.0

Any division operation can result in truncation or rounding. This is most noticeable with Int but can happen with all numeric types (e.g. 1.0/3.0). All data types have a restricted range and accuracy, and so the result of any calculation may be adjusted to fit into the resulting data type.
It is not clear that adding warnings for the specific case of Int division is going to help. It is not possible to catch all such issues, and giving warnings in some cases may lead to a false sense of security. It is also going to cause lots of warnings for perfectly valid code.
The solution is to look carefully at any calculations in a program and be aware of the range and accuracy limitations of each operation. If there is any serious computation involved it is a good idea to get a basic grounding in Numerical Analysis.

Related

What is the difference between generating Range and NumericRange in Scala

I am new to Scala, and I tried to generate some Range objects.
val a = 0 to 10
// val a: scala.collection.immutable.Range.Inclusive = Range 0 to 10
This statement works perfectly fine and generates a range from 0 to 10. And to keyword works without any imports.
But when I try to generate a NumericRange with floating point numbers, I have to import some functions from BigDecimal object as follows, to use to keyword.
import scala.math.BigDecimal.double2bigDecimal
val f = 0.1 to 10.1 by 0.5
// val f: scala.collection.immutable.NumericRange.Inclusive[scala.math.BigDecimal] = NumericRange 0.1 to 10.1 by 0.5
Can someone explain the reason for this and the mechanism behind range generation.
Thank you.
The import you are adding adds "automatic conversion" from Double to BigDecimal as the name suggests.
It's necessary because NumericRange only works with types T for which Integral[T] exists and unfortunately it doesn't exist for Double but exists for BigDecimal.
Bringing tha automatic conversion in scope makes the Doubles converted in BigDecimal so that NumericRange can be applied/defined.
You could achieve the same range without the import by declaring directly the numbers as BigDecimals:
BigDecimal("0.1") to BigDecimal("10.1") by BigDecimal("0.5")

Expensive flatMap() operation on streams originating from Stream.emits()

I just encountered an issue with degrading fs2 performance using a stream of strings to be written to a file via text.utf8encode. I tried to change my source to use chunked strings to increase performance, but the observation was performance degradation instead.
As far as I can see, it boils down to the following: Invoking flatMap on a stream that originates from Stream.emits() can be very expensive. Time usage seems to be exponential based on the size of the sequence passed to Stream.emits(). The code snippet below shows an example:
/*
Test done with scala 2.11.11 and fs2 version 0.10.0-M7.
*/
val rangeSize = 20000
val integers = (1 to rangeSize).toVector
// Note that the last flatMaps are just added to show extreme load for streamA.
val streamA = Stream.emits(integers).flatMap(Stream.emit(_))
val streamB = Stream.range(1, rangeSize + 1).flatMap(Stream.emit(_))
streamA.toVector // Uses approx. 25 seconds (!)
streamB.toVector // Uses approx. 15 milliseconds
Is this a bug, or should usage of Stream.emits() for large sequences be avoided?
TLDR: Allocations.
Longer answer:
Interesting question. I ran a JFR profile on both methods separately, and looked at the results. First thing which immediately attracted my eye was the amount of allocations.
Stream.emit:
Stream.range:
We can see that Stream.emit allocates a significant amount of Append instances, which are the concrete implementation of Catenable[A], which is the type used in Stream.emit to fold:
private[fs2] final case class Append[A](left: Catenable[A], right: Catenable[A]) extends Catenable[A]
This actually comes from the implementation of how Catenable[A] implemented foldLeft:
foldLeft(empty: Catenable[B])((acc, a) => acc :+ f(a))
Where :+ allocates a new Append object for each element. This means we're at least generating 20000 such Append objects.
There is also a hint in the documentation of Stream.range about how it produces a single chunk instead of dividing the stream further, which may be bad if this was a big range we're generating:
/**
* Lazily produce the range `[start, stopExclusive)`. If you want to produce
* the sequence in one chunk, instead of lazily, use
* `emits(start until stopExclusive)`.
*
* #example {{{
* scala> Stream.range(10, 20, 2).toList
* res0: List[Int] = List(10, 12, 14, 16, 18)
* }}}
*/
def range(start: Int, stopExclusive: Int, by: Int = 1): Stream[Pure,Int] =
unfold(start){i =>
if ((by > 0 && i < stopExclusive && start < stopExclusive) ||
(by < 0 && i > stopExclusive && start > stopExclusive))
Some((i, i + by))
else None
}
You can see that there is no additional wrapping here, only the integers that get emitted as part of the range. On the other hand, Stream.emits creates an Append object for every element in the sequence, where we have a left containing the tail of the stream, and right containing the current value we're at.
Is this a bug? I would say no, but I would definitely open this up as a performance issue to the fs2 library maintainers.

chisel3 arithmetic operations on Doubles

Please I have problems manipulating arithmetic operations with doubles in chisel. I have been seeing examples that uses just the following types: Int,UInt,SInt.
I saw here that arithmetic operations where described only for SInt and UInt. What about Double?
I tried to declare my output out as Double, but didn't know how. Because the output of my code is Double.
Is there a way to declare in Bundle an input and an output of type Double?
Here is my code:
class hashfunc(val k:Int, val n: Int ) extends Module {
val a = k + k
val io = IO(new Bundle {
val b=Input(UInt(k.W))
val w=Input(UInt(k.W))
var out = Output(UInt(a.W))
})
val tabHash1 = new Array[Array[Double]](n)
val x = new ArrayBuffer[(Double, Data)]
val tabHash = new Array[Double](tabHash1.size)
for (ind <- tabHash1.indices){
var sum=0.0
for (ind2 <- 0 until x.size){
sum += ( x(ind2) * tabHash1(ind)(ind2) )
}
tabHash(ind) = ((sum + io.b) / io.w)
}
io.out := tabHash.reduce(_ + _)
}
When I compile the code, I get the following error:
code error
Thank you for your kind attention, looking forward to your responses.
Chisel does have a native FixedPoint type which maybe of use. It is in the experimental package
import chisel3.experimental.FixedPoint
There is also a project DspTools that has simulation support for Doubles. There are some nice features, e.g. it that allows modules to parameterized on the numeric types (Complex, Double, FixedPoint, SInt) so that you can run simulations on double to validate the desired mathematical behavior and then switch to a synthesizable number format that meets your precision criteria.
DspTools is an ongoing research projects and the team would appreciate outside users feedback.
Operations on floating point numbers (Double in this case) are not supported directly by any HDL. The reason for this is that while addition/subtraction/multiplication of fixed point numbers is well defined there are a lot of design space trade-offs for floating point hardware as it is a much more complex piece of hardware.
That is to say, a high performance floating point unit is a significant piece of hardware in it's own right and would be time shared in any realistic design.

Chisel compiler is very slow

I am working on a matrix summation kind of design. The compiler takes 4+hours to generate 1+million lines of codes. Every line is "assign....." I don't know if this is the inefficiency of the compiler or my coding style is bad. If someone could suggest some alternatives that will be great!
Here is the description of the code
The input will be AND with a random matrix element by element and summed up using .reduce, so the result matrix should be 140X6 vec, cat them together gives me a 840 bits output
(rndvec, which is supposed to be a 140x840x6 bits random matrix. since I don't know how to generate random value so I started with a fixed 140x6 to represent one row and feed it with input over and over again)
This following is my code
import Chisel._
import scala.collection.mutable.HashMap
import util.Random
class LBio(n: Int) extends Bundle {
var myinput = UInt(INPUT,840)
var myoutput = UInt (OUTPUT,840)
}
class Lbi(q: Int,n:Int,m :Int ) extends Module{
def mask(orig: Vec[UInt],maska:UInt,mi:Int)={
val result = Vec.fill(840){UInt(width =6)}
for (i<-0 until 840 ){
result(i) := orig(i)&Fill(6,maska(i)) //every bits of input AND with random vector
}
result
}
val io= new LBio(840)
val rndvec = Vec.fill(840){UInt("h13",6)} //random vector, for now its just replication of 0x13....
val resultvec = Vec.fill(140){UInt(width = 6)}
for (i<-0 until 140){
resultvec(i) := mask(rndvec,io.myinput,m).reduce(_+_) //add the entire row of 6 bits element together with reduce
}
io.myoutput := resultvec.toBits
}
The terminal report:
started inference
finished inference (4)
start width checking
finished width checking
started flattenning
finished flattening (941783)
resolving nodes to the components
finished resolving
started transforms
finished transforms
checking for combinational loops
NO COMBINATIONAL LOOP FOUND
COMPILING class TutorialExamples.Lbi 0 CHILDREN (0,0)
[success] Total time: 33453 s, completed Oct 16, 2013 10:32:10 PM
There's nothing obviously wrong with your Chisel code, but I should point out that if rndvec is 140x840x6 bits, that's ~689kB of state! And your reduce operation is on 5kB of state.
Chisel uses "assign" statements because your code is entirely combinational and Chisel produces a very structural form of Verilog.
I suspect the part that is killing the compile time (aside from the huge amount of state) is that you are generating and manipulating 140 Vecs with the mask() function.
I tried my hand at rewriting your code and got it down from 941,783 nodes to 202,723 (takes about 10-15 minutes to compile, but generates 11MB of Verilog code). I'm pretty sure this does what your code was doing:
class Hello(q: Int, dim_n:Int) extends Module
{
val io = new LBio(dim_n)
val rndvec = Vec.fill(dim_n){UInt("h13",6)}
val resultvec = Vec.fill(dim_n/6){UInt(width=6)}
// lift this work outside of the for loop
val padded_input = Vec.fill(dim_n){UInt(width=6)}
for (i <- 0 until dim_n)
{
padded_input(i) := Fill(6,io.myinput)
}
for (i <- 0 until dim_n/6)
{
val result = Bits(width=dim_n*6)
result := rndvec.toBits & padded_input.toBits
var sum = UInt(0) //advanced Chisel - be careful with the use of var!
for (j <- 0 until dim_n by 6)
{
sum = sum + result(j+6,j)
}
resultvec(i) := sum
}
io.myoutput := resultvec.toBits
}
What I did was avoid doing the same work over and over again - like padding out the myinput Vec inside of the for loop's mask() function. I also kept everything in Bits() instead of Vecs. Sadly it means I lose the awesome .reduce() function.
I think maybe the answer is "be cognizant of how much state you're creating" and "Vecs are awesome, but use carefully".
Do you have a Verilog version that's short and concise? It'd be interesting to see if there are areas where Chisel is losing out efficiency wise.

Common practice how to deal with Integer overflows?

What's the common practice to deal with Integer overflows like 999999*999999 (result > Integer.MAX_VALUE) from an Application Development Team point of view?
One could just make BigInt mandatory and prohibit the use of Integer, but is that a good/bad idea?
If it is extremely important that the integer not overflow, you can define your own overflow-catching operations, e.g.:
def +?+(i: Int, j: Int) = {
val ans = i.toLong + j.toLong
if (ans < Int.MinValue || ans > Int.MaxValue) {
throw new ArithmeticException("Int out of bounds")
}
ans.toInt
}
You may be able to use the enrich-your-library pattern to turn this into operators; if the JVM manages to do escape analysis properly, you won't get too much of a penalty for it:
class SafePlusInt(i: Int) {
def +?+(j: Int) = { /* as before, except without i param */ }
}
implicit def int_can_be_safe(i: Int) = new SafePlusInt(i)
For example:
scala> 1000000000 +?+ 1000000000
res0: Int = 2000000000
scala> 2000000000 +?+ 2000000000
java.lang.ArithmeticException: Int out of bounds
at SafePlusInt.$plus$qmark$plus(<console>:12)
...
If it is not extremely important, then standard unit testing and code reviews and such should catch the problem in the large majority of cases. Using BigInt is possible, but will slow your arithmetic down by a factor of 100 or so, and won't help you when you have to use an existing method that takes an Int.
By far the most common practice regarding integer overflows is that programmers are expected to know that the issue exists, to watch for cases where they might happen, and to make the appropriate checks or rearrange the math so that overflows won't happen, things like doing a * (b / c) rather than (a * b) / c . If the project uses unit test, they will include cases to try and force overflows to happen.
I have never worked on or seen code from a team that required more than that, so I'm going to say that's good enough for almost all software.
The one embedded application I've seen that actually, honest-to-spaghetti-monster NEEDED to prevent overflows, they did it by proving that overflows weren't possible in each line where it looked like they might happen.
If you're using Scala (and based on the tag I'm assuming you are), one very generic solution is to write your library code against the scala.math.Integral type class:
def naturals[A](implicit f: Integral[A]) =
Stream.iterate(f.one)(f.plus(_, f.one))
You can also use context bounds and Integral.Implicits for nicer syntax:
import scala.math.Integral.Implicits._
def squares[A: Integral] = naturals.map(n => n * n)
Now you can use these methods with either Int or Long or BigInt as needed, since instances of Integral exist for all of them:
scala> squares[Int].take(10).toList
res0: List[Int] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
scala> squares[Long].take(10).toList
res0: List[Long] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
scala> squares[BigInt].take(10).toList
res1: List[BigInt] = List(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
No need to change the library code: just use Long or BigInt where overflow is a concern and Int otherwise.
You will pay some penalty in terms of performance, but the genericity and the ability to defer the Int-or-BigInt decision may be worth it.
In addition to simple mindfulness, as noted by #mjfgates, there are a couple of practices that I always use when dealing with scaled-decimal (non-floating-point) real-world quantities. This may not be on point for your particular application - apologies in advance if not.
First, if there are multiple units of measure in use, values must always clearly identify what they are. This can be by naming convention, or by using a separate class for each unit of measure. I've always just used names - a suffix on every variable name. In addition to eliminating errors from confusion over the units, it encourages thinking about overflow because the measures are less likely to be thought of as just numbers.
Second, my most frequent source of overflow concern is usually rescaling - converting from one measure to another - when it requires a lot of significant digits. For example, the conversion factor from cm to inches is 0.393700787402. In order to avoid both overflow and loss of significant digits, you need to be careful to multiply and divide in the right order. I haven't done this in a long time, but I believe what you want is something like:
Add to Rational.scala, from The Book:
def rescale(i:Int) : Int = {
(i * (numer/denom)) + (i/denom * (numer % denom))
Then you get as results (shortened from a specs2 test):
val InchesToCm = new Rational(1000000000,393700787)
InchesToCm.rescale(393700787) must_== 1000000000
InchesToCm.rescale(1) must_== 2
This doesn't round, or deal with negative scaling factors.
A production implementation may want to factor out numer/denom and numer % denom.