I'm trying to fit a curve with SimpleCurveFitter of commons.math3.fitting in Scala but I catch an exception :
org.apache.commons.math3.exception.ConvergenceException : Unable to permorm
Qr decomposition on jacobian
However, I have checked my gradient calculations.... I still don't see why the exception is raised.
See the code by yourself
def main(args: Array[String]): Unit = {
var xv: DenseVector[Double] = linspace(0, 3, 300)
var yv: DenseVector[Double] = DenseVector.zeros(300)
for (i <- xv.findAll(x => x < 1.0)) yv.update(i, 1)
for (i <- xv.findAll(x => x >= 1.0)) yv.update(i, exp(-(xv(i) - 1.0)/1))
val wop: Array[WeightedObservedPoint] = new Array[WeightedObservedPoint](xv.length)
for (i <- 0 to xv.length - 1) wop.update(i, new WeightedObservedPoint(1, xv(i), yv(i)))
val f: ParametricUnivariateFunction = new ParametricUnivariateFunction {
override def value(x: Double, parameters: Double*): Double = {
val a = parameters(0)
val b = parameters(1)
1.0 / (1.0 + a * pow(x, 2 * b))
}
override def gradient(x: Double, parameters: Double*): Array[Double] = {
val a = parameters(0)
val b = parameters(1)
val ga = - pow(x, 2 * b) / pow(1 + a * pow(x, 2 * b), 2)
val gb = - (2 * a * pow(x, 2 * b) * log(x)) / pow(1 + a * pow(x, 2 * b), 2)
val grad = Array(ga, gb)
grad
}
}
val wopc = JavaConverters.asJavaCollection(wop)
val cf = SimpleCurveFitter.create(f, Array(1, 1))
val param = cf.fit(wopc)
println(param(0), param(1))
}
Thank you for your help :)
Related
I would like to find the fastest way to write euclidean distance in Scala. After some attemps, i'm here.
def euclidean[V <: Seq[Double]](dot1: V, dot2: V): Double = {
var d = 0D
var i = 0
while( i < dot1.size ) {
val toPow2 = dot1(i) - dot2(i)
d += toPow2 * toPow2
i += 1
}
sqrt(d)
}
Fastest results are obtain with mutable.ArrayBuffer[Double] as V and no collection.parallel._ are authorized for various vector size from 2 up to 10000
For those who desire to test breeze its slower with following distance function :
def euclideanDV(v1: DenseVector[Double], v2: DenseVector[Double]) = norm(v1 - v2)
If anyone knows any pure scala code or library that could help to improve speed it would be greatly appreciated.
The way i tested speed was i follow.
val te1 = 0L
val te2 = 0L
val runNumber = 100000
val warmUp = 60000
(0 until runNumber).foreach{ x =>
val t1 = System.nanoTime
euclidean1(v1, v2)
val t2 = System.nanoTime
euclidean2(v1, v2)
val t3 = System.nanoTime
if( x >= warmUp ) {
te1 += t2 - t1
te2 += t3 - t2
}
}
Here a some of my tries
// Fast on ArrayBuffer, quadratic on List
def euclidean1[V <: Seq[Double]](v1: V, v2: V) =
{
var d = 0D
var i = 0
while( i < v1.size ){
val toPow2 = v1(i) - v2(i)
d += toPow2 * toPow2
i += 1
}
sqrt(d)
}
// Breeze test
def euclideanDV(v1: DenseVector[Double], v2: DenseVector[Double]) = norm(v1 - v2)
// Slower than euclidean1
def euclidean2[V <: Seq[Double]](v1: V, v2: V) =
{
var d = 0D
var i = 0
while( i < v1.size )
{
d += pow(v1(i) - v2(i), 2)
i += 1
}
d
}
// Slower than 1 for Vsize ~< 1000 and a bit better over 1000 on ArrayBuffer
def euclidean3[V <: Seq[Double]](v1: V, v2: V) =
{
var d = 0D
var i = 0
(0 until v1.size).foreach{ i=>
val toPow2 = v1(i) - v2(i)
d += toPow2 * toPow2
}
sqrt(d)
}
// Slower than 1 for Vsize ~< 1000 and a bit better over 1000 on ArrayBuffer
def euclidean3bis(dot1: Seq[Double], dot2: Seq[Double]): Double =
{
var sum = 0D
dot1.indices.foreach{ id =>
val toPow2 = dot1(id) - dot2(id)
sum += toPow2 * toPow2
}
sqrt(sum)
}
// Slower than 1
def euclidean4[V <: Seq[Double]](v1: V, v2: V) =
{
var d = 0D
var i = 0
val vz = v1.zip(v2)
while( i < vz.size )
{
val (a, b) = vz(i)
val toPow2 = a - b
d += toPow2 * toPow2
i += 1
}
d
}
// Slower than 1
def euclideanL1(v1: List[Double], v2: List[Double]) = sqrt(v1.zip(v2).map{ case (a, b) =>
val toPow2 = a - b
toPow2 * toPow2
}.sum)
// Slower than 1
def euclidean5(dot1: Seq[Double], dot2: Seq[Double]): Double =
{
var sum = 0D
dot1.zipWithIndex.foreach{ case (a, id) =>
val toPow2 = a - dot2(id)
sum += toPow2 * toPow2
}
sqrt(sum)
}
// super super slow
def euclidean6(v1: Seq[Double], v2: Seq[Double]) = sqrt(v1.zip(v2).map{ case (a, b) => pow(a - b, 2) }.sum)
// Slower than 1
def euclidean7(dot1: Seq[Double], dot2: Seq[Double]): Double =
{
var sum = 0D
dot1.zip(dot2).foreach{ case (a, b) => sum += pow(a - b, 2) }
sum
}
// Slower than 1
def euclidean8(v1: Seq[Double], v2: Seq[Double]) =
{
def inc(n: Int, v: Double) = {
val toPow2 = v1(n) - v2(n)
v + toPow2 * toPow2
}
#annotation.tailrec
def go(n: Int, v: Double): Double =
{
if( n < v1.size - 1 ) go(n + 1, inc(n, v))
else inc(n, v)
}
sqrt(go(0, 0D))
}
// Slower than 1
def euclideanL2(v1: List[Double], v2: List[Double]) =
{
def inc(vzz: List[(Double, Double)], v: Double): Double =
{
val (a, b) = vzz.head
val toPow2 = a - b
v + toPow2 * toPow2
}
#annotation.tailrec
def go(vzz: List[(Double, Double)], v: Double): Double =
{
if( vzz.isEmpty ) v
else go(vzz.tail, inc(vzz, v))
}
sqrt(go(v1.zip(v2), 0D))
}
I tried tailrecursion on List but not enough efficiently on ArrayBuffer, i totally agree with the fact that proper tools like JMH are needed to test speed efficiency properly. But when order of magnitude is between 10-50% faster, we can be confident that it is better.
Even if it is V <: Seq[Double] it is NOT appropriate for List but for ArrayLike structure.
Here my proposal
def euclideanF[V <: Seq[Double]](v1: V, v2: V) = {
#annotation.tailrec
def go(d: Double, i: Int): Double = {
if( i < v1.size ) {
val toPow2 = v1(i) - v2(i)
go(d + toPow2 * toPow2, i + 1)
}
else d
}
sqrt(go(0D, 0))
}
I am wondering if there's a way to deal with a while (n > 0) loop in a more functional way, I have a small Scala app that counts the number of digits equal to K from a range from 1 to N:
for example 30 and 3 would return 4 [3, 13, 23, 30]
object NumKCount {
def main(args: Array[String]): Unit = {
println(countK(30,3))
}
def countKDigit(n:Int, k:Int):Int = {
var num = n
var count = 0
while (num > 10) {
val digit = num % 10
if (digit == k) {count += 1}
num = num / 10
}
if (num == k) {count += 1}
count
}
def countK(n:Int, k:Int):Int = {
1.to(n).foldLeft(0)((acc, x) => acc + countKDigit(x, k))
}
}
I'm looking for a way to define the function countKDigit using a purely functional approach
First expand number n into a sequence of digits
def digits(n: Int): Seq[Int] = {
if (n < 10) Seq(n)
else digits(n / 10) :+ n % 10
}
Then reduce the sequence by counting occurrences of k
def countKDigit(n:Int, k:Int):Int = {
digits(n).count(_ == k)
}
Or you can avoid countKDigit entirely by using flatMap
def countK(n:Int, k:Int):Int = {
1.to(n).flatMap(digits).count(_ == k)
}
Assuming that K is always 1 digit, you can convert n to String and use collect or filter, like below (there's not much functional stuff you can do with Integer):
def countKDigit(n: Int, k: Int): Int = {
n.toString.collect({ case c if c.asDigit == k => true }).size
}
or
def countKDigit(n: Int, k: Int): Int = {
n.toString.filter(c => c.asDigit == 3).length
}
E.g.
scala> 343.toString.collect({ case c if c.asDigit == 3 => true }).size
res18: Int = 2
scala> 343.toString.filter(c => c.asDigit == 3).length
res22: Int = 2
What about the following approach:
scala> val myInt = 346763
myInt: Int = 346763
scala> val target = 3
target: Int = 3
scala> val temp = List.tabulate(math.log10(myInt).toInt + 1)(x => math.pow(10, x).toInt)
temp: List[Int] = List(1, 10, 100, 1000, 10000, 100000)
scala> temp.map(x => myInt / x % 10)
res17: List[Int] = List(3, 6, 7, 6, 4, 3)
scala> temp.count(x => myInt / x % 10 == target)
res18: Int = 2
Counting the occurrences of a single digit in a number sequence.
def countK(n:Int, k:Int):Int = {
assert(k >= 0 && k <= 9)
1.to(n).mkString.count(_ == '0' + k)
}
If you really only want to modify countKDigit() to a more functional design, there's always recursion.
def countKDigit(n:Int, k:Int, acc: Int = 0):Int =
if (n == 0) acc
else countKDigit(n/10, k, if (n%10 == k) acc+1 else acc)
def interpolate(l:List[Tuple2[String,String]]) : List[Tuple2[java.util.Date, Long]] = {
val mapped : List[Tuple2[java.util.Date, Long]] = l.map(item => (format.parse(item._1), item._2.toLong ) )
val results = ListBuffer[Tuple2[java.util.Date, Long]]()
val last : Option[Tuple2[java.util.Date, Long]] = None
mapped.foreach( item =>
if(!last.isEmpty) {
val daysItem = item._1.getTime() / 1000 / 60 / 60 / 24
val daysLast = last.get._1.getTime() / 1000 / 60 / 60 / 24
if( daysItem - daysLast > 1 ) {
val slope = (item._2 - last.get._2) / (daysItem - daysLast)
val days = daysLast until daysItem
val missingChunk : List[Tuple2[java.util.Date, Long]] = days.map( day => (new Date(day * 24 * 60 * 60 * 1000), slope * day)).toList
results ++= missingChunk
}
}
//results += item
last = Some(item)
)
results.toList
}
Error:
<console>:45: error: value last is not a member of Any
possible cause: maybe a semicolon is missing before `value last'?
last = Some(item)
^
There are 2 problems here:
1) multiple statements needs {...} brackets:
from
mapped.foreach((item: (Date, Long)) => item
XXX // OK
YYY // NO
)
to
mapped.foreach { (item: (Date, Long)) => item
XXX // OK
YYY // OK
}
2) val can't be reassigned:
from
val last: Option[Tuple2[java.util.Date, Long]] = None
to
var last: Option[Tuple2[java.util.Date, Long]] = None
Refactor #1
I would try to avoid using var. It seems that with this condition if (last.isDefined) probably we are trying to zip the list with itself:
scala> val l = List(1, 2, 3, 4, 5)
scala> l.zip(l.tail)
List[(Int, Int)] = List((1,2), (2,3), (3,4), (4,5))
Refactoring your example:
import java.util.Date
def interpolate(l: List[(String, String)]): List[(Date, Long)] = {
val mapped: List[(Date, Long)] = l.map(item => (format.parse(item._1), item._2.toLong))
val results = ListBuffer[(Date, Long)]()
mapped.zip(mapped.tail).foreach { case ((lastDate, lastLong), (itemDate, itemLong)) =>
val daysItem = itemDate.getTime / 1000 / 60 / 60 / 24
val daysLast = lastDate.getTime / 1000 / 60 / 60 / 24
if (daysItem - daysLast > 1) {
val slope = (itemLong - lastLong) / (daysItem - daysLast)
val days = daysLast until daysItem
val missingChunk: List[(Date, Long)] = days.map(day => (new Date(day * 24 * 60 * 60 * 1000), slope * day)).toList
results ++= missingChunk
}
}
results.toList
}
Refactor #2
ListBuffer is a mutable collection. In our scenario it seems we are trying to flatten the missingChunks.
Keep refactoring:
def interpolate(l: List[(String, String)]): List[(Date, Long)] = {
val mapped: List[(Date, Long)] = l.map(item => (format.parse(item._1), item._2.toLong))
val missingChunks = mapped.zip(mapped.tail).map { case ((lastDate, lastLong), (itemDate, itemLong)) =>
val daysItem = itemDate.getTime / 1000 / 60 / 60 / 24
val daysLast = lastDate.getTime / 1000 / 60 / 60 / 24
if (daysItem - daysLast > 1) {
val slope = (itemLong - lastLong) / (daysItem - daysLast)
val days = daysLast until daysItem
days.map(day => (new Date(day * 24 * 60 * 60 * 1000), slope * day)).toList
} else List.empty[(Date, Long)]
}
missingChunks.flatten
}
How to split this data T_32_P_1_A_420_H_60_R_0.30841494477846165_S_0 into two columns using hive function
For example
T 32
P 1
A 420
H 60
R 0.30841494477846165
S 0
You can do this with a regex implementation:
def main(args: Array[String]) {
val s = "T_32_P_1_A_420_H_60_R_0.30841494477846165_S_0"
val pattern = "[A-Z]\\_\\d+\\.?\\d*"
var buff = new String()
val r = Pattern.compile(pattern)
val m = r.matcher(s)
while (m.find()) {
buff = buff + (m.group(0))
buff = buff + "\n"
}
buff = buff.toString.replaceAll("\\_", " ")
println("output:\n" + buff)
}
Output:
output:
T 32
P 1
A 420
H 60
R 0.30841494477846165
S 0
If you need to collect the data for further processing, and you're guaranteed it's always paired correctly, you could do something like this.
scala> val str = "T_32_P_1_A_420_H_60_R_0.30841494477846165_S_0"
str: String = T_32_P_1_A_420_H_60_R_0.30841494477846165_S_0
scala> val data = str.split("_").sliding(2,2)
data: Iterator[Array[String]] = non-empty iterator
scala> data.toList // just to see it
res29: List[Array[String]] = List(Array(T, 32), Array(P, 1), Array(A, 420), Array(H, 60), Array(R, 0.30841494477846165), Array(S, 0))
You can split your string, get an array, zipWithIndex and filter based on index to get two arrays col1 and col2 and then use it for printing:
val str = "T_32_P_1_A_420_H_60_R_0.30841494477846165_S_0"
val tmp = str.split('_').zipWithIndex
val col1 = tmp.filter( p => p._2 % 2 == 0 ).map( p => p._1)
val col2 = tmp.filter( p => p._2 % 2 != 0 ).map( p => p._1)
//col1: Array[String] = Array(T, P, A, H, R, S)
//col2: Array[String] = Array(32, 1, 420, 60, ...
I'm trying to generate a list in scala according to the formula:
for n > 1 f(n) = 4*n^2 - 6*n + 6 and for n == 1 f(n) = 1
currently I have:
def lGen(end: Int): List[Int] = {
for { n <- List.range(3 , end + 1 , 2) } yields { 4*n*n - 6*n - 6 }
}
For end = 5 this would give the list:
List(24 , 76)
Right now I'm stuck on trying to find a gracefull way to make this function give
List(1 , 24 , 74)
Any suggestions would be greatly appreciated.
-Lee
I'd separate out the "formula" from the list generation:
val f : Int => Int = {
case 1 => 1
case x if x > 1 => 4*x*x - 6*x + 6
}
def lGen(end: Int) = (1 to end by 2 map f).toList
or
def lGen(end: Int) = List.range(1, end + 1, 2) map f
How about this:
scala> def lGen(end: Int): List[Int] =
1 :: List.range(3, end+1, 2).map(n => 4*n*n - 6*n + 6)
scala> lGen(5)
res0: List[Int] = List(1, 24, 76)