Relative sum of squared error with SciPy least_squares - scipy

I am relatively new to model fitting and SciPy; apologies in advance for any ignorance.
I am trying to fit a non-linear model using scipy.optimize least_squares.
Here's the function:
def growthfunction(theta, t):
return (theta[0]*np.exp(-np.exp(-theta[1]*(t-theta[2]))))
and some data
t = [1, 2, 3, 4]
observed = [3, 10, 14, 17]
I first define the model
def fun(theta):
return (myfunction(theta, ts) - observed)
Select some random starting parameters to be optimized below:
theta0 = [1, 1, 1]
Then I utilize leas_squares to optimize
res1 = least_squares(fun, theta0)
This works great, except for the fact that least_squares is here optimizing the absolute error. My data changes with time, meaning an error of 5 at time point 1 is proportionally larger than an error of 5 at time point 100. I would like to change this so that instead the relative error is optimized.
I tried doing it manually, but if I divide by the predicted values in fun(theta) like so:
def fun(theta):
return (myfunction(theta, ts) - observed)/myfunction(theta, ts)
least_squares displays an error that there are too many parameters and cannot optimize

This is working by taking the relative error:
from scipy.optimize import least_squares
import numpy as np
def growthfunction(theta, t):
return (theta[0]*np.exp(-np.exp(-theta[1]*(t-theta[2]))))
t = [1, 2, 3, 4]
observed = [3, 10, 14, 17]
def fun(theta):
return (growthfunction(theta, t) - observed)/growthfunction(theta, t)
theta0 = [1,1,1]
res1 = least_squares(fun, theta0)
print(res1)
Output:
>>> active_mask: array([0., 0., 0.])
cost: 0.0011991963091748607
fun: array([ 0.00255037, -0.0175105 , 0.0397808 , -0.02242228])
grad: array([ 3.15774533e-13, -2.50283465e-08, -1.46139239e-08])
jac: array([[ 0.05617851, -0.92486809, -1.94678829],
[ 0.05730839, 0.28751647, -0.6615416 ],
[ 0.05408162, 0.27956135, -0.20795969],
[ 0.05758503, 0.166258 , -0.07376148]])
message: '`ftol` termination condition is satisfied.'
nfev: 10
njev: 10
optimality: 2.5028346541978996e-08
status: 2
success: True
x: array([17.7550016 , 1.09927597, 1.52223722])

Without a minimal reproducible example it is very hard to help you, but you can try a more traditional version of relative least squares which is
def fun(theta):
return (myfunction(theta, ts) - observed)/observed
or, perhaps, to guard against small/zero values,
def fun(theta):
cutoff = 1e-4
return (myfunction(theta, ts) - observed)/np.maximum(np.abs(observed),cutoff)

Related

pytest use data from one test as parameter to another test

I am wondering if it is possible to use data generated from one test as parameter to another test. In my case I need to modify variable (this is list) and it will be great if I can use this list as param (run as many tests as list have)
Here is code (it is not working, but maybe You can give me some hints)
import pytest
class TestCheck:
x = [1, 4]
#classmethod
def setup_class(self):
print('here setup')
#classmethod
def teardown_class(self):
print('here is finish')
def test_5(self):
self.x.append(6)
assert 1 == 2
#pytest.mark.parametrize("region", x)
def test_6(self, region):
assert region > 5, f"x: {self.x}"
Output:
FAILED sandbox_tests3.py::TestCheck::test_5 - assert 1 == 2
FAILED sandbox_tests3.py::TestCheck::test_6[1] - AssertionError: x: [1, 4, 6]
FAILED sandbox_tests3.py::TestCheck::test_6[4] - AssertionError: x: [1, 4, 6]
So it looks that in x there is good values, however in fixture new values are not visible.
I was also trying to use pytest_cases, but results are very similar.
Any help is appreciate

Root finding using a loop

I have one equation defined in the function
def fun(x, y, z, v, b):
Y = (z*(np.sign(x) * (np.abs(x))**(y-1))) - (v*np.sign(b) * (np.abs(b))**(v-1))/(1-b**v)
return Y.flatten()
that I want to solve for the value of x, given the values of Z0, SS (year 1: Z0=1.2, SS=2, ...) and different combinations of alpha and kappa, for which I am creating a grid.
Z0 = [1.2, 5, 3, 2.5, 4.2]
SS = [2, 3, 2.2, 3.5, 5]
ngrid = 10
kv = np.linspace(0.05, 2, ngrid)
av = np.linspace(1.5, 4, ngrid)
q0 = []
for z in range(len(Z0)):
zz = Z0[z]
ss = SS[z]
for i in range(ngrid):
for j in range(ngrid):
kappa = kv[i]
alpha = av[j]
res0 = root(lambda x: fun(x, alpha, zz, kappa, ss), x0=np.ones(range(ngrid)))
q0 = res0.x
print(q0)
where y = alpha; v=kappa, z = Z0; b = S.
I am getting all [], [], ....
Not sure what is going on. Thanks for your help
Before you attempt to use res0.x, check res0.success. In this case, you'll find that it is False in each case. When res0.success is False, take a look at res0.message for information about why root failed.
During development and debugging, you might also consider getting the solver working for just one set of parameter values before you embed root in three nested loops. For example, here are a few lines from an ipython session (variables were defined in previous lines, not shown):
In [37]: res0 = root(lambda x: fun(x, av[0], Z0[0], kv[0], SS[0]), x0=np.ones(range(ngrid)))
In [38]: res0.success
Out[38]: False
In [39]: res0.message
Out[39]: 'Improper input parameters were entered.'
The message suggests that something is wrong with the input parameters. You call root like this:
res0 = root(lambda x: fun(x, alpha, zz, kappa, ss), x0=np.ones(range(ngrid)))
A close look at that line shows the problem: the initial guess is np.ones(range(ngrid)):
In [41]: np.ones(range(ngrid))
Out[41]: array([], shape=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), dtype=float64)
That's not what you want! The use of range looks like a simple typo (or "thinko"). The initial guess should be
x0=np.ones(ngrid)
In ipython, we get:
In [50]: res0 = root(lambda x: fun(x, av[0], Z0[0], kv[0], SS[0]), x0=np.ones(ngrid))
In [51]: res0.success
Out[51]: True
In [52]: res0.x
Out[52]:
array([-0.37405428, -0.37405428, -0.37405428, -0.37405428, -0.37405428,
-0.37405428, -0.37405428, -0.37405428, -0.37405428, -0.37405428])
All the return values are the same (and this happens for other parameters values), which suggests that you are solving a scalar equation. A closer look at fun shows that you only use x in element-wise operations, so you are in fact solving just a scalar equation. In that case, you can use x0=1:
In [65]: res0 = root(lambda x: fun(x, av[0], Z0[0], kv[0], SS[0]), x0=1)
In [66]: res0.success
Out[66]: True
In [67]: res0.x
Out[67]: array([-0.37405428])
You could also consider using root_scalar instead of root.

calculating kurtosis array[Double] filed in spark scala

how to calculate the kurtosis of array field in spark
spark built-in function is failing array field.
due to data type mismatch: argument 1 requires double type, however, 'SERIES' is of array<double> type.;;
Example in python
from scipy.stats import kurtosis
kurtosis([1, 2, 3, 4, 5])
-1.3
i used spark built in function
df.withColumn("newcolumn",when(col("SERIES").isNotNull,kurtosis(columnName))
using Twitter Algebra package i can get kurtosis value.
import com.twitter.algebird._
val y = List(1, 2, 3, 4, 5)
def getMoments(xs: List[Int]): Moments =
xs.foldLeft(MomentsGroup.zero) { (m, x) =>
MomentsGroup.plus(m, Moments(x))
}
println(getMoments(y).kurtosis) // -1.3

What is a BitVector and how to use it as return in Breeze Scala?

I am doing a comparison between two BreezeDenseVectors with the following way a :< b and what i get as a return is a BitVector.
I haven't worked again with this and everything i read about it, was not helpful enough.
Can anyone explain to me how they work?
Additionally, by printing the output, i get: {0, 1, 2, 3, 4 }. What is this supposed to mean?
You can check BitVectorTest.scala for more detail usage.
Basically, a :< b gives you a BitVector, which indicates which elements in a smaller than the ones in b.
For example val a = DenseVector[Int](4, 9, 3); val b = DenseVector[Int](8, 2, 5); a :< b will gives you BitVector(0, 2), it means that a(0) < b(0) and a(2) < b(2), which is correct.

Spark - correlation matrix from file of ratings

I'm pretty new to Scala and Spark and I'm not able to create a correlation matrix from a file of ratings. It's similar to this question but I have sparse data in the matrix form. My data looks like this:
<user-id>, <rating-for-movie-1-or-null>, ... <rating-for-movie-n-or-null>
123, , , 3, , 4.5
456, 1, 2, 3, , 4
...
The code that is most promising so far looks like this:
val corTest = sc.textFile("data/collab_filter_data.txt").map(_.split(","))
Statistics.corr(corTest, "pearson")
(I know the user_ids in there are a defect, but I'm willing to live with that for the moment)
I'm expecting output like:
1, .123, .345
.123, 1, .454
.345, .454, 1
It's a matrix showing how each user is correlated to every other user. Graphically, it would be a correlogram.
It's a total noob problem but I've been fighting with it for a few hours and can't seem to Google my way out of it.
I believe this code should accomplish what you want:
import org.apache.spark.mllib.stat.Statistics
import org.apache.spark.mllib.linalg._
...
val corTest = input.map { case (line: String) =>
val split = line.split(",").drop(1)
split.map(elem => if (elem.trim.isEmpty) 0.0 else elem.toDouble)
}.map(arr => Vectors.dense(arr))
val corrMatrix = Statistics.corr(corTest)
Here, we are mapping your input into a String array, dropping the user id element, zeroing out your whitespace, and finally creating a dense vector from the resultant array. Also, note that Pearson's method is used by default if no method is supplied.
When run in shell with some examples, I see the following:
scala> val input = sc.parallelize(Array("123, , , 3, , 4.5", "456, 1, 2, 3, , 4", "789, 4, 2.5, , 0.5, 4", "000, 5, 3.5, , 4.5, "))
input: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[18] at parallelize at <console>:16
scala> val corTest = ...
corTest: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MappedRDD[20] at map at <console>:18
scala> val corrMatrix = Statistics.corr(corTest)
...
corrMatrix: org.apache.spark.mllib.linalg.Matrix =
1.0 0.9037378388935388 -0.9701425001453317 ... (5 total)
0.9037378388935388 1.0 -0.7844645405527361 ...
-0.9701425001453317 -0.7844645405527361 1.0 ...
0.7709910794438823 0.7273340668525836 -0.6622661785325219 ...
-0.7513578452729373 -0.7560667258329613 0.6195855517393626 ...