Create a Seq of Elements based on Increment in Scala

Create a Seq of Elements based on Increment in Scala - scala

I have a need to generate a sequence of elements of a certain Numeric type with a certain start and end and with the given increment. Here is what I came up with:
import java.text.DecimalFormat
def round(elem: Double): Double = {
val df2 = new DecimalFormat("###.##")
df2.format(elem).toDouble
}
def arrange(start: Double, end: Double, increment: Double): Seq[Double] = {
#scala.annotation.tailrec
def recurse(acc: Seq[Double], start: Double, end: Double): Seq[Double] = {
if (start >= end) acc
else recurse(acc :+ round(start + increment), round(start + increment), end)
}
recurse(Seq.empty[Double], start, end)
}
arrange(0.0, 0.55, 0.05) foreach println
This works as expected and gives me the following result:
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
However, I would like to see if I can make this generic so that the same function works for an Int, Long, Float. Is there a way to simplify this?

Came across this much simpler version:
# BigDecimal(0.0) to BigDecimal(0.5) by BigDecimal(0.05)
res23: collection.immutable.NumericRange.Inclusive[BigDecimal] = NumericRange(0.0, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50)

Related

Spark DataFrame filter not working as expected with Random

This is my DataFrame
df.groupBy($"label").count.show
+-----+---------+
|label| count|
+-----+---------+
| 0.0|400000000|
| 1.0| 10000000|
+-----+---------+
I am trying to subsample the records with label == 0.0 with the following:
val r = scala.util.Random
val df2 = df.filter($"label" === 1.0 || r.nextDouble > 0.5) // keep 50% of 0.0
My output looks like this:
df2.groupBy($"label").count.show
+-----+--------+
|label| count|
+-----+--------+
| 1.0|10000000|
+-----+--------+

r.nextDouble is a constant in the expression so the actual evaluation is quite different from what you mean. Depending on the actual sampled value it is either
scala> r.setSeed(0)
scala> $"label" === 1.0 || r.nextDouble > 0.5
res0: org.apache.spark.sql.Column = ((label = 1.0) OR true)
or
scala> r.setSeed(4096)
scala> $"label" === 1.0 || r.nextDouble > 0.5
res3: org.apache.spark.sql.Column = ((label = 1.0) OR false)
so after simplification it is just:
true
(keeping all the records) or
label = 1.0
(keeping only ones, the case you observed) respectively.
To generate random numbers you should use corresponding SQL function
scala> import org.apache.spark.sql.functions.rand
import org.apache.spark.sql.functions.rand
scala> $"label" === 1.0 || rand > 0.5
res1: org.apache.spark.sql.Column = ((label = 1.0) OR (rand(3801516599083917286) > 0.5))
though Spark already provides stratified sampling tools:
df.stat.sampleBy(
"label", // column
Map(0.0 -> 0.5, 1.0 -> 1.0), // fractions
42 // seed
)

ConvergenceException on SimpleCurveFitter in Scala

I'm trying to fit a curve with SimpleCurveFitter of commons.math3.fitting in Scala but I catch an exception :
org.apache.commons.math3.exception.ConvergenceException : Unable to permorm
Qr decomposition on jacobian
However, I have checked my gradient calculations.... I still don't see why the exception is raised.
See the code by yourself
def main(args: Array[String]): Unit = {
var xv: DenseVector[Double] = linspace(0, 3, 300)
var yv: DenseVector[Double] = DenseVector.zeros(300)
for (i <- xv.findAll(x => x < 1.0)) yv.update(i, 1)
for (i <- xv.findAll(x => x >= 1.0)) yv.update(i, exp(-(xv(i) - 1.0)/1))
val wop: Array[WeightedObservedPoint] = new Array[WeightedObservedPoint](xv.length)
for (i <- 0 to xv.length - 1) wop.update(i, new WeightedObservedPoint(1, xv(i), yv(i)))
val f: ParametricUnivariateFunction = new ParametricUnivariateFunction {
override def value(x: Double, parameters: Double*): Double = {
val a = parameters(0)
val b = parameters(1)
1.0 / (1.0 + a * pow(x, 2 * b))
}
override def gradient(x: Double, parameters: Double*): Array[Double] = {
val a = parameters(0)
val b = parameters(1)
val ga = - pow(x, 2 * b) / pow(1 + a * pow(x, 2 * b), 2)
val gb = - (2 * a * pow(x, 2 * b) * log(x)) / pow(1 + a * pow(x, 2 * b), 2)
val grad = Array(ga, gb)
grad
}
}
val wopc = JavaConverters.asJavaCollection(wop)
val cf = SimpleCurveFitter.create(f, Array(1, 1))
val param = cf.fit(wopc)
println(param(0), param(1))
}
Thank you for your help :)

Displaying output under a certain format

I'm quite new to Scala and Spark, and had some questions about displaying results in output file.
I actually have a Map in which each key is associated to a List of List (Map[Int, List<Double>]), such as :
(2, List(x1,x2,x3), List(y1,y2,y3), ...).
I am supposed to display for each key the values inside the lists of lists, such as:
2 x1,x2,x3
2 y1,y2,y3
1 z1,z2,z3
and so on.
When I use the saveAsTextFile function, it doesn't give me what I want in the output. Does anybody know how I can do it?
EDIT :
This is one of my function :
def PrintCluster(vectorsByKey : Map[Int, List[Double]], vectCentroidPairs : Map[Int, Int]) : Map[Int, List[Double]] = {
var vectorsByCentroid: Map[Int, List[Double]] = Map()
val SortedCentroid = vectCentroidPairs.groupBy(_._2).mapValues(x => x.map(_._1).toList).toSeq.sortBy(_._1).toMap
SortedCentroid.foreach { case (centroid, vect) =>
var nbVectors = vect.length
for (i <- 0 to nbVectors - 1) {
var vectValues = vectorsByKey(vect(i))
println(centroid + " " + vectValues)
vectorsByCentroid += (centroid -> (vectValues))
}
}
return vectorsByCentroid
}
I know it's wrong, because I only can affect one unique keys for a group of values. That is why it returns me only the first List for each key in the Map. I thought that for using the saveAsTextFile function, I've had necessarily to use a Map structure, but I don't really know.

create sample rdd as per your input data
val rdd: RDD[Map[Int, List[List[Double]]]] = spark.sparkContext.parallelize(
Seq(Map(
2 -> List(List(-4.4, -2.0, 1.5), List(-3.3, -5.4, 3.9), List(-5.8, -3.3, 2.3), List(-5.2, -4.0, 2.8)),
1 -> List(List(7.3, 1.0, -2.0), List(9.8, 0.4, -1.0), List(7.5, 0.3, -3.0), List(6.1, -0.5, -0.6), List(7.8, 2.2, -0.7), List(6.6, 1.4, -1.1), List(8.1, -0.0, 2.7)),
3 -> List(List(-3.0, 4.0, 1.4), List(-4.0, 3.9, 0.8), List(-1.4, 4.3, -0.5), List(-1.6, 5.2, 1.0)))
)
)
Transform RDD[Map[Int, List[List[Double]]]] to RDD[(Int, String)]
val result: RDD[(Int, String)] = rdd.flatMap(i => {
i.map {
case (x, y) => y.map(list => (x, list.mkString(" ")))
}
}).flatMap(z => z)
result.foreach(println)
result.saveAsTextFile("location")

Using a Map[Int, List[List[Double]]] and simply print it in the format wanted is simple, it can be done by first converting to a list and then applying flatMap. Using the data supplied in a comment:
val map: Map[Int, List[List[Double]]] = Map(
2 -> List(List(-4.4, -2.0, 1.5), List(-3.3, -5.4, 3.9), List(-5.8, -3.3, 2.3), List(-5.2, -4.0, 2.8)),
1 -> List(List(7.3, 1.0, -2.0), List(9.8, 0.4, -1.0), List(7.5, 0.3, -3.0), List(6.1, -0.5, -0.6), List(7.8, 2.2, -0.7), List(6.6, 1.4, -1.1), List(8.1, -0.0, 2.7)),
3 -> List(List(-3.0, 4.0, 1.4), List(-4.0, 3.9, 0.8), List(-1.4, 4.3, -0.5), List(-1.6, 5.2, 1.0))
)
val list = map.toList.flatMap(t => t._2.map((t._1, _)))
val result = for (t <- list) yield t._1 + "\t" + t._2.mkString(",")
// Saving the result to file
import java.io._
val pw = new PrintWriter(new File("fileName.txt"))
result.foreach{ line => pw.println(line)}
pw.close
Will print out:
2 -4.4,-2.0,1.5
2 -3.3,-5.4,3.9
2 -5.8,-3.3,2.3
2 -5.2,-4.0,2.8
1 7.3,1.0,-2.0
1 9.8,0.4,-1.0
1 7.5,0.3,-3.0
1 6.1,-0.5,-0.6
1 7.8,2.2,-0.7
1 6.6,1.4,-1.1
1 8.1,-0.0,2.7
3 -3.0,4.0,1.4
3 -4.0,3.9,0.8
3 -1.4,4.3,-0.5
3 -1.6,5.2,1.0

how to compare to previous values in Seq[Double]

I am new to functional programming, I have a Seq[Double] and I'd like to check for each value if it is higher (1), lower (-1) or equal (0) to previous value, like:
val g = Seq(0.1, 0.3, 0.5, 0.5, 0.5, 0.3)
and I'd like to have a result like:
val result = Seq(1, 1, 0, 0, 0, -1)
is there a more concise way than:
val g = Seq(0.1, 0.3, 0.5, 0.5, 0.5, 0.3)
g.sliding(2).toList.map(xs =>
if (xs(0)==xs(1)){
0
} else if(xs(0)>xs(1)){
-1
} else {
1
}
)

Use compare:
g.sliding(2).map{ case Seq(x, y) => y compare x }.toList
compare is added by an enrichment trait called OrderedProxy

That's rather concise in my opinion but I'd make it a function and pass it into map to make it more readable. I used pattern matching and guards.
//High, low, equal
scala> def hlo(x: Double, y: Double): Int = y - x match {
| case 0.0 => 0
| case x if x < 0.0 => -1
| case x if x > 0.0 => 1
| }
hlo: (x: Double, y: Double)Int
scala> g.sliding(2).map(xs => hlo(xs(0), xs(1))).toList
res9: List[Int] = List(1, 1, 0, 0, -1)

I agree with Travis Brown's comment from above so am proposing it as an answer.
Reversing the order of the values in the zip, just to match the order of g. This has the added benefit of using tuples instead of a sequence so no pattern matching is needed.
(g, g.tail).zipped.toList.map(t => t._2 compare t._1)
res0: List[Int] = List(1, 1, 0, 0, -1)

Scala nested loops yield

I wonder if there is a simple way to do something like this in scala:
case class Pot(width: Int, height: Int, flowers: Seq[FlowerInPot])
case class FlowerInPot(x: Int, y: Int, flower: String)
val flowers = Seq("tulip", "rose")
val height = 3
val width = 3
val res =
for (flower <- flowers;
h <- 0 to height;
w <- 0 to width) yield {
// ??
}
and at an output I'd like to have a Seq of Pots with all possible combinations of flowers placed in it. So in following example, the output should be:
Seq(
Pot(3, 3, Seq(FlowerInPot(0, 0, "tulip"), FlowerInPot(0, 1, "rose"))),
Pot(3, 3, Seq(FlowerInPot(0, 0, "tulip"), FlowerInPot(0, 2, "rose"))),
Pot(3, 3, Seq(FlowerInPot(0, 0, "tulip"), FlowerInPot(1, 0, "rose"))),
Pot(3, 3, Seq(FlowerInPot(0, 0, "tulip"), FlowerInPot(1, 1, "rose"))),
...
Pot(3, 3, Seq(FlowerInPot(2, 2, "tulip"), FlowerInPot(2, 1, "rose")))
)
any ideas?

Is this what you want?
case class FlowerInPot(x: Int, y: Int, flower: String)
case class Pot(width: Int, height: Int, flowers: Seq[FlowerInPot])
val x, y = 0
val flowers = Seq("tulip", "rose")
val height = 3
val width = 3
val res = for {
h <- 0 to height
w <- 0 to width
} yield Pot(height, width, flowers.map(flower => FlowerInPot(w, h, flower)))

I figured it out, for now this solution seems to work:
val res = for {
h <- 0 to height;
w <- 0 to width;
flower <- flowers
} yield (h, w, flower)
val pots: Seq[Pot] = res.sliding(flowers.size).map(l => Pot(width, height, l.map(f => FlowerInPot(f._1, f._2, f._3)))).toList

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Create a Seq of Elements based on Increment in Scala - scala

Came across this much simpler version: # BigDecimal(0.0) to BigDecimal(0.5) by BigDecimal(0.05) res23: collection.immutable.NumericRange.Inclusive[BigDecimal] = NumericRange(0.0, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50)

Related

Spark DataFrame filter not working as expected with Random

ConvergenceException on SimpleCurveFitter in Scala

Displaying output under a certain format

how to compare to previous values in Seq[Double]

Scala nested loops yield

Categories

Resources