What is the difference between generating Range and NumericRange in Scala - scala

I am new to Scala, and I tried to generate some Range objects.
val a = 0 to 10
// val a: scala.collection.immutable.Range.Inclusive = Range 0 to 10
This statement works perfectly fine and generates a range from 0 to 10. And to keyword works without any imports.
But when I try to generate a NumericRange with floating point numbers, I have to import some functions from BigDecimal object as follows, to use to keyword.
import scala.math.BigDecimal.double2bigDecimal
val f = 0.1 to 10.1 by 0.5
// val f: scala.collection.immutable.NumericRange.Inclusive[scala.math.BigDecimal] = NumericRange 0.1 to 10.1 by 0.5
Can someone explain the reason for this and the mechanism behind range generation.
Thank you.

The import you are adding adds "automatic conversion" from Double to BigDecimal as the name suggests.
It's necessary because NumericRange only works with types T for which Integral[T] exists and unfortunately it doesn't exist for Double but exists for BigDecimal.
Bringing tha automatic conversion in scope makes the Doubles converted in BigDecimal so that NumericRange can be applied/defined.
You could achieve the same range without the import by declaring directly the numbers as BigDecimals:
BigDecimal("0.1") to BigDecimal("10.1") by BigDecimal("0.5")

Related

How to convert a type Any List to a type Double (Scala)

I am new to Scala and I would like to understand some basic stuff.
First of all, I need to calculate the average of a certain column of a DataFrame and use the result as a double type variable.
After some Internet research I was able to calculate the average and at the same time pass it into a List type Any by using the following command:
val avgX_List = mainDataFrame.groupBy().agg(mean("_c1")).collect().map(_(0)).toList
where "_c1" is the second column of my dataframe. This line of code returns a List with type List[Any].
To pass the result into a variable I used the following command:
var avgX = avgX_List(0)
hoping that the var avgX would be type double automatically but that didn't happen obviously.
So now let the questions begin:
What does map(_(0)) do? I know the basic definition of the map() transformation but I can't find an explanation with this exact argument
I know that by using .toList method in the end of the command my result will be a List with type Any. Is there a way that I could change this into List which contains type Double elements? Or even convert this one
Do you think that it would be much more appropriate to pass the column of my Dataframe into a List[Double] and then calculate the average of its elements?
Is the solution I showed above at any point of view correct based on my problem? I know that "it is working" is different from "correct solution"?
Summing up, I need to calculate the average of a certain column of a Dataframe and have the result as a double type variable.
Note that: I am Greek and I find it hard sometimes to understand some English coding "slang".
map(_(0)) is a shortcut for map( (r: Row) => r(0) ), which is in turn a shortcut for map( (r: Row) => r.apply(0) ). The apply method returns Any, and so you are losing the right type. Try using map(_.getAs[Double](0)) or map(_.getDouble(0)) instead.
Collecting all entries of the column and then computing the average would be highly counterproductive, because you'd have to send huge amounts of data to the master node, and then do all the calculations on this single central node. That would be the exact opposite of what Spark is good for.
You also don't need collect(...).toList, because you can access the 0-th entry directly (it doesn't matter whether you get it from an Array or from a List). Since you are collapsing everything into a single Row anyway, you could get rid of the map step entirely by reordering the methods a little bit:
val avgX = mainDataFrame.groupBy().agg(mean("_c1")).collect()(0).getDouble(0)
It can be written even shorter using the first method:
val avgX = mainDataFrame.groupBy().agg(mean("_c1")).first().getDouble(0)
#Any dataType in Scala can't be directly converted to Double.
#Use toString & then toDouble on final captured result.
#Eg-
#scala> x
#res22: Any = 1.0
#scala> x.toString.toDouble
#res23: Double = 1.0
#Note- Instead of using map().toList() directly use (0)(0) to get the final value from your resultset.
#TestSample(Scala)-
val wa = Array("one","two","two")
val wrdd = sc.parallelize(wa,3).map(x=>(x,1))
val wdf = wrdd.toDF("col1","col2")
val x = wdf.groupBy().agg(mean("col2")).collect()(0)(0).toString.toDouble
#O/p-
#scala> val x = wdf.groupBy().agg(mean("col2")).collect()(0)(0).toString.toDouble
#x: Double = 1.0

Warn about or avoid integer division (resulting in truncation) in scala

Consider
1 / 2
or
val x: Int = ..
val n: Int = ..
x / n
Both of these equal .. 0 .. since integer division results in truncation.
Also: (this is my typical use case):
val averageListSize = myLists.map(_.length).sum()/myLists.length
This has bitten me a few times when it occurs in the middle of long calculations: the first impulse is to check what logical errors have been introduced. Only after some period of debugging and head scratching does the true culprit arise.
Is there any way to expose this behavior more clearly - e.g. a warning or some (unknown-to-me) language setting or construction that would either alert to or avoid this intermittent scenario?
To the best of my knowledge, the Scala compiler does not seem to provide a warning flag that could allow you to raise a warning (documentation here).
What you could do, however, if you find the effort worth it, is using Scalafix and write your own custom rule to detect integer divisions and report warnings about it.
The following is a short example of a rule that can detect integer division on integer literals:
import scalafix.lint.{Diagnostic, LintSeverity}
import scalafix.patch.Patch
import scalafix.v1.{SemanticDocument, SemanticRule}
import scala.meta.inputs.Position
import scala.meta.{Lit, Term}
class IntDivision extends SemanticRule("IntDivision") {
override def fix(implicit doc: SemanticDocument): Patch =
doc.tree.collect({
case term # Term.ApplyInfix((_: Lit.Int, Term.Name("/"), Nil, _: List[Lit.Int])) =>
Patch.lint(new Diagnostic {
override final val severity: LintSeverity = LintSeverity.Warning
override final val message: String = "Integer division"
override final val position: Position = term.pos
})
}).asPatch
}
When run on the following piece of code:
object Main {
def main(args: Array[String]): Unit = {
println(1 / 2)
}
}
Scalafix will produce the following warning:
[warn] /path/to/Main.scala:3:13: warning: [IntDivision] Integer division
[warn] println(1 / 2)
[warn] ^^^^^
If the / op doesn't work for you, make one that does.
implicit class Divider[N](numer :N)(implicit evN :Numeric[N]) {
def /![D](denom :D)(implicit evD :Numeric[D]) :Double =
evN.toDouble(numer) / evD.toDouble(denom)
}
testing:
1 /! 2 //res0: Double = 0.5
5.2 /! 2 //res1: Double = 2.6
22 /! 1.1 //res2: Double = 20.0
2.2 /! 1.1 //res3: Double = 2.0
Any division operation can result in truncation or rounding. This is most noticeable with Int but can happen with all numeric types (e.g. 1.0/3.0). All data types have a restricted range and accuracy, and so the result of any calculation may be adjusted to fit into the resulting data type.
It is not clear that adding warnings for the specific case of Int division is going to help. It is not possible to catch all such issues, and giving warnings in some cases may lead to a false sense of security. It is also going to cause lots of warnings for perfectly valid code.
The solution is to look carefully at any calculations in a program and be aware of the range and accuracy limitations of each operation. If there is any serious computation involved it is a good idea to get a basic grounding in Numerical Analysis.

chisel3 arithmetic operations on Doubles

Please I have problems manipulating arithmetic operations with doubles in chisel. I have been seeing examples that uses just the following types: Int,UInt,SInt.
I saw here that arithmetic operations where described only for SInt and UInt. What about Double?
I tried to declare my output out as Double, but didn't know how. Because the output of my code is Double.
Is there a way to declare in Bundle an input and an output of type Double?
Here is my code:
class hashfunc(val k:Int, val n: Int ) extends Module {
val a = k + k
val io = IO(new Bundle {
val b=Input(UInt(k.W))
val w=Input(UInt(k.W))
var out = Output(UInt(a.W))
})
val tabHash1 = new Array[Array[Double]](n)
val x = new ArrayBuffer[(Double, Data)]
val tabHash = new Array[Double](tabHash1.size)
for (ind <- tabHash1.indices){
var sum=0.0
for (ind2 <- 0 until x.size){
sum += ( x(ind2) * tabHash1(ind)(ind2) )
}
tabHash(ind) = ((sum + io.b) / io.w)
}
io.out := tabHash.reduce(_ + _)
}
When I compile the code, I get the following error:
code error
Thank you for your kind attention, looking forward to your responses.
Chisel does have a native FixedPoint type which maybe of use. It is in the experimental package
import chisel3.experimental.FixedPoint
There is also a project DspTools that has simulation support for Doubles. There are some nice features, e.g. it that allows modules to parameterized on the numeric types (Complex, Double, FixedPoint, SInt) so that you can run simulations on double to validate the desired mathematical behavior and then switch to a synthesizable number format that meets your precision criteria.
DspTools is an ongoing research projects and the team would appreciate outside users feedback.
Operations on floating point numbers (Double in this case) are not supported directly by any HDL. The reason for this is that while addition/subtraction/multiplication of fixed point numbers is well defined there are a lot of design space trade-offs for floating point hardware as it is a much more complex piece of hardware.
That is to say, a high performance floating point unit is a significant piece of hardware in it's own right and would be time shared in any realistic design.

Spark case class - decimal type encoder error "Cannot up cast from decimal"

I'm extracting data from MySQL/MariaDB and during creation of Dataset, an error occurs with the data types
Exception in thread "main" org.apache.spark.sql.AnalysisException:
Cannot up cast AMOUNT from decimal(30,6) to decimal(38,18) as it may
truncate The type path of the target object is:
- field (class: "org.apache.spark.sql.types.Decimal", name: "AMOUNT")
- root class: "com.misp.spark.Deal" You can either add an explicit cast to the input data or choose a higher precision type of the field
in the target object;
Case class is defined like this
case class
(
AMOUNT: Decimal
)
Anyone know how to fix it and not touch the database?
That error says that apache spark can’t automatically convert BigDecimal(30,6) from database to BigDecimal(38,18) which wanted in Dataset (I don't know why it needs fixed paramers 38,18. And it is even more strange that spark can’t automatically convert type with low precision to type with high precision).
There was reported a bug: https://issues.apache.org/jira/browse/SPARK-20162 (maybe it was you). Anyway I found good workaround for reading data through casting columns to BigDecimal(38,18) in dataframe and then casting dataframe to dataset.
//first read data to dataframe with any way suitable for you
var df: DataFrame = ???
val dfSchema = df.schema
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.DecimalType
dfSchema.foreach { field =>
field.dataType match {
case t: DecimalType if t != DecimalType(38, 18) =>
df = df.withColumn(field.name, col(field.name).cast(DecimalType(38,18)))
}
}
df.as[YourCaseClassWithBigDecimal]
It should solve problems with reading (but not with writing I guess)
As was previously stated, since your DB uses DecimalType(30,6) means you have 30 slots total and 6 slots past the decimal point which leaves 30-6=24 for the area in front of the decimal point. I like to call it a (24 left, 6 right) big-decimal. This of-course does not fit into a (20 left, 18 right) (i.e. DecimalType(38,18)) since the latter does not have enough slots on the left (20 vs 24 needed). We only have 20 left-slots in a DecimalType(38,18) but we need 24 left-slots to accomodate your DecimalType(30,6).
What we can do here, is to down-cast the (24 left, 6 right) into a (20 left, 6 right) (i.e. DecimalType(26,6)) so that when it's being auto-casted to a (20 left, 18 right) (I.e. DecimalType(38,18)) both sides will fit. Your DecimalType(26,6) will have 20 left-slots allowing it to fit inside of a DecimalType(38,18) and of-course 6 rights slots will fit into the 18.
The way you do that is before converting anything to a Dataset, run the following operation on the DataFrame:
val downCastableData =
originalData.withColumn("amount", $"amount".cast(DecimalType(26,6)))
Then converting to Dataset should work.
(Actually, you can cast to anything that's (20 left, 6 right) or less e.g. (19 left, 5 right) etc...).
While I don't have a solution here is my understanding of what is going on:
By default spark will infer the schema of the Decimal type (or BigDecimal) in a case class to be DecimalType(38, 18) (see org.apache.spark.sql.types.DecimalType.SYSTEM_DEFAULT). The 38 means the Decimal can hold 38 digits total (for both left and right of the decimal point) while the 18 means 18 of those 38 digits are reserved for the right of the decimal point. That means a Decimal(38, 18) may have 20 digits for the left of the decimal point. Your MySQL schema is decimal(30, 6) which means it may contain values with 24 digits (30 - 6) to the left of the decimal point and 6 digits to the right of the decimal point. Since 24 digits is greater than 20 digits there could be values that are truncated when converting from your MySQL schema to that Decimal type.
Unfortunately inferring schema from a scala case class is considered a convenience by the spark developers and they have chosen to not support allowing the programmer to specify precision and scale for Decimal or BigDecimal types within the case class (see https://issues.apache.org/jira/browse/SPARK-18484)
Building on #user2737635's answer, you can use a foldLeft rather than foreach to avoid defining your dataset as a var and redefining it:
//first read data to dataframe with any way suitable for you
val df: DataFrame = ???
val dfSchema = df.schema
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.DecimalType
dfSchema.foldLeft(df){
(dataframe, field) => field.dataType match {
case t: DecimalType if t != DecimalType(38, 18) => dataframe.withColumn(field.name, col(field.name).cast(DecimalType(38, 18)))
case _ => dataframe
}
}.as[YourCaseClassWithBigDecimal]
We are working on a work around by defining our own Encoder which we use at the call site .as. We generate the Encoder using the StructType which knows the correct precision and scales (see below link for code).
https://issues.apache.org/jira/browse/SPARK-27339
According to pyspark, the Decimal(38,18) is default.
When create a DecimalType, the default precision and scale is (10, 0). When infer schema from decimal.Decimal objects, it will be DecimalType(38, 18).

How to round up a number if it's not an integer?

I want to calculate a simple number, and if the number is not an integer I want to round it up.
For instance, if after a calculation I get 1.2, I want to change it to 2. If the number is 3.7, I want to change it to 4 and so on.
You can use math.ceil to round a Double up and toInt to convert the Double to an Int.
def roundUp(d: Double) = math.ceil(d).toInt
roundUp(1.2) // Int = 2
roundUp(3.7) // Int = 4
roundUp(5) // Int = 5
The ceil function is also directly accessible on the Double:
3.7.ceil.toInt // 4
Having first imported math
import scala.math._ (the final dot & underscore are crucial for what comes next)
you can simply write
ceil(1.2)
floor(3.7)
plus a bunch of other useful math functions like
exp(1)
pow(2,2)
sqrt(pow(2,2)