How to make a Weibull fit on an existing histogram? - scipy

I created a histogram plot based on my datasets. I would like to create a Weibull fit for this histogram.
I used scipy and the stats.weibull function, but unfortunately, it does not work.
Do you have an idea of how to use the stats.weibull in this case?
Here is the code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
data = 'Figures/Histogram/Histogram.xlsx'
hist= pd.read_excel('Histogram/Histogram.xlsx')
# x= hist['DeltaT_value']
x= hist['DeltaT_-250_2017']
X=x[(x>0)]
plt.figure(figsize=(15,4))
plt.hist(X, bins= np.arange (0,1500,25), color='#0504aa', edgecolor ='red', rwidth= 0.8)
plt.ylabel('Number of EL')
plt.xlabel('Delta T (years CE) between EL')
plt.xlim(0, 401)
plt.xticks(np.arange(0,401,25))
plt.yticks(np.arange(0,2.2,1))`
# Weibull
####
shape, loc, scale = stats.weibull_min.fit(X)
x = np.linspace(stats.weibull_min.ppf(0.01, shape, loc=loc, scale=scale), stats.weibull_min.ppf(0.99, shape, loc=loc, scale=scale), 100)
plt.plot(x, stats.weibull_min.pdf(x, shape, loc=loc, scale=scale), 'r-', lw=5, alpha=0.6, label='weibull')
I tried this:
shape, loc, scale = stats.weibull_min.fit(X)
x = np.linspace(stats.weibull_min.ppf(0.01, shape, loc=loc, scale=scale), stats.weibull_min.ppf(0.99, shape, loc=loc, scale=scale), 100)
plt.plot(x, stats.weibull_min.pdf(x, shape, loc=loc, scale=scale), 'r-', lw=5, alpha=0.6, label='weibull')
Unfortunately, it seems another graph is created on top of the histogram instead of a fit.

After working on it for a little while, I found a solution:
shape, loc, scale = stats.weibull_min.fit(X, floc = 0, f0 = 1)
W = np.linspace(stats.weibull_min.ppf(0.001, shape, loc=loc, scale=scale), stats.weibull_min.ppf(0.99, shape, loc=loc, scale=scale), 1000)
p = stats.weibull_min.pdf(W, shape, scale = scale)
Robert Dodier's comments on my original post are very interesting and I will try to see how I can use a "log likelihood function" on the original dataset rather than a fit on the histogram. This is the first time I will use a log likelihood function. If anyone has some advice, I would gladly take it.
Thanks.

Mat, here's what I get with the approach I was arguing for, which is to work with the likelihood function as derived from quantile data (i.e., the cumulative summation of histogram bars).
I've devised some code for the computer algebra system Maxima (https://maxima.sourceforge.io) to do the calculations. I'll link to the code, but it's probably difficult to use at this point; mostly what I want to say is that this business about working with the likelihood function is a good path forward. See also my comments on this other question about deriving the likelihood function for binned data: https://stats.stackexchange.com/questions/11176/can-anyone-explain-quantile-maximum-probability-estimation-qmpe/442966#442966
The code I'm using is the package robert-dodier/qmpe under https://github.com/maxima-project-on-github/maxima-packages . That in turn makes use of as-yet-unreleased functions (to be released in the next version of Maxima, namely 5.47) from the Maxima package distrib; that unreleased version may be found at https://sourceforge.net/p/maxima/code/ci/master/tree/share/distrib/ .
Looks like the data shown in the histograms is just:
q: [0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325];
n: [0, 1, 3, 2, 2, 3, 0, 2, 1, 0, 1, 1, 0, 1];
with q being the quantiles (0, 25, 50, ... 325 years) and n being the number of events in each bin (0, 1, 2, or 3 being the only observed values).
I tried a lognormal distribution and a Weibull distribution. Other distributions are possible by the same method, I just haven't written the code for them yet; I might still try it.
For the lognormal, I get mu = 4.546, sigma = 0.7794, with negative log likelihood nll = 2.528. For the Weibull, I get shape = 1.488, scale = 136.2, with nll = 2.491. Here are figures comparing lognormal to data and Weibull to data. The green line is the final fit; the red line, barely visible in the Weibull plot, is from the initial parameters for the optimization.
As you can see, the Weibull fit is a little better as determined from nll, but neither one is really very good, and in fact it's likely there's no distribution that gives a good fit -- there are only a few data per bin and they're pretty noisy.
As a postscript, here is the code I am working with to generate the figures shown. As I was saying, I don't expect anyone to run this code, it's just to present the general method about working with the likelihood function as derive from the histogram data.
/* inspired by: https://stackoverflow.com/questions/75090377/how-to-make-a-weibull-fit-on-an-existing-histogram */
/* data shown in histogram https://i.stack.imgur.com/uYt70.png
* values appear to be just 0, 1, 2, or 3
*/
q: [0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325];
n: [0, 1, 3, 2, 2, 3, 0, 2, 1, 0, 1, 1, 0, 1];
p_unnormalized: makelist (lsum (n1, n1, firstn (n, k)), k, 1, length (n));
p: p_unnormalized / last (p_unnormalized);
load (distrib);
load ("qmpe.mac");
quux: construct_qmpe (cdf_weibull (u, shape, scale), 'u, '[shape, scale], mle_weibull, 1e-4, [1, 0]);
mumble: quux (q, p);
set_plot_option ([svg_file, "./SO-75090377-weibull-fit-compare-cdf.svg"]);
plot_qmpe_comparison_cdf (q, p, cdf_weibull (u, shape, scale), u, '[shape, scale], quantile_weibull (0.99, shape, scale), assoc ('initial, mumble), assoc ('final, mumble));
set_plot_option ([svg_file, "./SO-75090377-weibull-fit-compare-pdf.svg"]);
plot_qmpe_comparison_pdf (q, p, pdf_weibull (u, shape, scale), u, '[shape, scale], quantile_weibull (0.99, shape, scale), assoc ('initial, mumble), assoc ('final, mumble));
foo: construct_qmpe (cdf_lognormal (u, location, scale), 'u, '[location, scale], mle_lognormal, 1e-4, [1, 0]);
baz: foo (q, p);
set_plot_option ([svg_file, "./SO-75090377-lognormal-fit-compare-cdf.svg"]);
plot_qmpe_comparison_cdf (q, p, cdf_lognormal (u, location, scale), u, '[location, scale], quantile_lognormal (0.99, location, scale), assoc ('initial, baz), assoc ('final, baz));
set_plot_option ([svg_file, "./SO-75090377-lognormal-fit-compare-pdf.svg"]);
plot_qmpe_comparison_pdf (q, p, pdf_lognormal (u, location, scale), u, '[location, scale], quantile_lognormal (0.99, location, scale), assoc ('initial, baz), assoc ('final, baz));
ev (assoc ('nll, mumble), assoc ('final, mumble), nouns, numer);
ev (assoc ('nll, baz), assoc ('final, baz), nouns, numer);

Related

How do I fill Matrix4 with translation, skew and scale values in flutter?

Suppose, I have these values for a container of height 200 and width 300:
scaleX = 0.9198
scaleY = 0.9198
skewX = -0.3923
skewY = 0.3923
translateX = 150
translateY = 150
Now, how do I fill this values in Matrix4 correctly?
I tried doing this:
Matrix4(
0.9198, 0, 0, 0, //
0, 0.9198, 0, 0, //
0, 0, 1, 0, //
150, 150, 0, 1,
)
which is,
Matrix4(
scaleX, 0, 0, 0, //
0, scaleY, 0, 0, //
0, 0, 1, 0, //
translateX, translateY, 0, 1,
)
But I am not sure where to put skewX and skewY values in this matrix. Please help me with this.
Skew Values
This is a bit of a nuanced topic, as it could be interpreted in a couple of different ways. There are specific cells of a matrix that are associated with specific names, as identified in your question, translate x, translate y, scale x, and scale y. In this context, you most likely mean the values from a matrix that are called skew x and skew y (also sometimes known as shear x and shear y), which refers to indices 4 and 1 (zero-based, column-major order). They're called these names because when put into an identity matrix by themselves, they do that operation (translate, scale, or skew), but it gets more complicated when there are multiple values.
On the other hand, this could also be interpreted as a series of operations (e.g. scale by (0.9198, 0.9198, 1), then skew by (-0.3923, 0.3923), then translate by (150, 150, 0)), and then it's a series of matrix multiplications that would ultimately result in a similar-looking, but numerically different matrix. I'll assume you don't mean this for this question. You can read more about it here though.
You can consult the Flutter Matrix4 documentation, which also provides implementation notes for Matrix4.skewX and Matrix4.skewY. The skews are stored in (zero-based) indices 4, and 1, as the tangent of the skewed angle.
Matrix4(
scaleX, skewY, 0, 0, // skewY could also be tan(ySkewAngle)
skewX, scaleY, 0, 0, // skewX could also be tan(xSkewAngle)
0, 0, 1, 0, //
translateX, translateY, 0, 1,
)
Note to those that aren't familiar with Flutter's data structures that values are stored in column-major order which means that each row in the above code is actually a column, so if you were to represent the matrix as a normal transformation matrix, it's transposed.
More information:
Transformation Matrices: https://en.wikipedia.org/wiki/Transformation_matrix
How matrices are used with CSS Transforms: How do I use the matrix transform and other transform CSS properties?

How to make custom 'any object' Cascade (.xml) for opencv-python?

I want to make a haar cascade so that I can use it to detect a object in opencv-python.For eg, I want to detect a watch. I tried making a cascade using cascade trainer gui but it isn't giving me expected results.
Well, before training, search through the internet. Maybe the object you want to detect has already been trained, so you don't need to train again.
For example, you want to detect a watch. The haar-file is available here.
So I used the file whether it is working or not, the result is:
Code:
import cv2
w_cascade = cv2.CascadeClassifier('watchcascade10stage.xml')
cap = cv2.VideoCapture(0)
while True:
ret, img = cap.read()
if ret:
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
w = w_cascade.detectMultiScale(image=gray,
scaleFactor=1.3
minNeighbors=50)
for (x, y, w, h) in watches:
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 255, 0), 2)
font = cv2.FONT_HERSHEY_SIMPLEX
cv2.putText(img, 'Watch', (x - w, y - h), font, 0.5, (11, 255, 255), 2, cv2.LINE_AA)
cv2.imshow('img', img)
k = cv2.waitKey(0) & 0xff
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
You can find other tutorial searching through the internet. For instance start with this video
So the thing is Haar Cascade is not a detector or even a classifier. It is a feature extractor IF you are going to use Haar Cascade you will use it in conjunction with SVM (support vector machines) for classification and then implement a sliding window to detect watches.
So the steps are a fallowed.
1 Extract a patch of images using sliding window.
2 pass it to SVM trained on Haar Cascade
3 Draw rect if prediction is true
I recommend this tutorial series https://pythonprogramming.net/haar-cascade-object-detection-python-opencv-tutorial/.please do reach out to me if you still need help.

How to create Bezier curves from B-Splines in Sympy?

I need to draw a smooth curve through some points, which I then want to show as an SVG path. So I create a B-Spline with scipy.interpolate, and can access some arrays that I suppose fully define it. Does someone know a reasonably simple way to create Bezier curves from these arrays?
import numpy as np
from scipy import interpolate
x = np.array([-1, 0, 2])
y = np.array([ 0, 2, 0])
x = np.r_[x, x[0]]
y = np.r_[y, y[0]]
tck, u = interpolate.splprep([x, y], s=0, per=True)
cx = tck[1][0]
cy = tck[1][1]
print( 'knots: ', list(tck[0]) )
print( 'coefficients x: ', list(cx) )
print( 'coefficients y: ', list(cy) )
print( 'degree: ', tck[2] )
print( 'parameter: ', list(u) )
The red points are the 3 initial points in x and y. The green points are the 6 coefficients in cx and cy. (Their values repeat after the 3rd, so each green point has two green index numbers.)
Return values tck and u are described scipy.interpolate.splprep documentation
knots: [-1.0, -0.722, -0.372, 0.0, 0.277, 0.627, 1.0, 1.277, 1.627, 2.0]
# 0 1 2 3 4 5
coefficients x: [ 3.719, -2.137, -0.053, 3.719, -2.137, -0.053]
coefficients y: [-0.752, -0.930, 3.336, -0.752, -0.930, 3.336]
degree: 3
parameter: [0.0, 0.277, 0.627, 1.0]
Not sure starting with a B-Spline makes sense: form a catmull-rom curve through the points (with the virtual "before first" and "after last" overlaid on real points) and then convert that to a bezier curve using a relatively trivial transform? E.g. given your points p0, p1, and p2, the first segment would be a catmull-rom curve {p2,p0,p1,p2} for the segment p1--p2, {p0,p1,p2,p0} will yield p2--p0, and {p1, p2, p0, p1} will yield p0--p1. Then you trivially convert those and now you have your SVG path.
As demonstrator, hit up https://editor.p5js.org/ and paste in the following code:
var points = [{x:150, y:100 },{x:50, y:300 },{x:300, y:300 }];
// add virtual points:
points = points.concat(points);
function setup() {
createCanvas(400, 400);
tension = createSlider(1, 200, 100);
}
function draw() {
background(220);
points.forEach(p => ellipse(p.x, p.y, 4));
for (let n=0; n<3; n++) {
let [c1, c2, c3, c4] = points.slice(n,n+4);
let t = 0.06 * tension.value();
bezier(
// on-curve start point
c2.x, c2.y,
// control point 1
c2.x + (c3.x - c1.x)/t,
c2.y + (c3.y - c1.y)/t,
// control point 2
c3.x - (c4.x - c2.x)/t,
c3.y - (c4.y - c2.y)/t,
// on-curve end point
c3.x, c3.y
);
}
}
Which will look like this:
Converting that to Python code should be an almost effortless exercise: there is barely any code for us to write =)
And, of course, now you're left with creating the SVG path, but that's hardly an issue: you know all the Bezier points now, so just start building your <path d=...> string while you iterate.
A B-spline curve is just a collection of Bezier curves joined together. Therefore, it is certainly possible to convert it back to multiple Bezier curves without any loss of shape fidelity. The algorithm involved is called "knot insertion" and there are different ways to do this with the two most famous algorithm being Boehm's algorithm and Oslo algorithm. You can refer this link for more details.
Here is an almost direct answer to your question (but for the non-periodic case):
import aggdraw
import numpy as np
import scipy.interpolate as si
from PIL import Image
# from https://stackoverflow.com/a/35007804/2849934
def scipy_bspline(cv, degree=3):
""" cv: Array of control vertices
degree: Curve degree
"""
count = cv.shape[0]
degree = np.clip(degree, 1, count-1)
kv = np.clip(np.arange(count+degree+1)-degree, 0, count-degree)
max_param = count - (degree * (1-periodic))
spline = si.BSpline(kv, cv, degree)
return spline, max_param
# based on https://math.stackexchange.com/a/421572/396192
def bspline_to_bezier(cv):
cv_len = cv.shape[0]
assert cv_len >= 4, "Provide at least 4 control vertices"
spline, max_param = scipy_bspline(cv, degree=3)
for i in range(1, max_param):
spline = si.insert(i, spline, 2)
return spline.c[:3 * max_param + 1]
def draw_bezier(d, bezier):
path = aggdraw.Path()
path.moveto(*bezier[0])
for i in range(1, len(bezier) - 1, 3):
v1, v2, v = bezier[i:i+3]
path.curveto(*v1, *v2, *v)
d.path(path, aggdraw.Pen("black", 2))
cv = np.array([[ 40., 148.], [ 40., 48.],
[244., 24.], [160., 120.],
[240., 144.], [210., 260.],
[110., 250.]])
im = Image.fromarray(np.ones((400, 400, 3), dtype=np.uint8) * 255)
bezier = bspline_to_bezier(cv)
d = aggdraw.Draw(im)
draw_bezier(d, bezier)
d.flush()
# show/save im
I didn't look much into the periodic case, but hopefully it's not too difficult.

Matlab gradient equivalent in opencv

I am trying to migrate some code from Matlab to Opencv and need an exact replica of the gradient function. I have tried the cv::Sobel function but for some reason the values in the resulting cv::Mat are not the same as the values in the Matlab version. I need the X and Y gradient in separate matrices for further calculations.
Any workaround that could achieve this would be great
Sobel can only compute the second derivative of the image pixel which is not what we want.
(f(i+1,j) + f(i-1,j) - 2f(i,j)) / 2
What we want is
(f(i+i,j)-f(i-1,j)) / 2
So we need to apply
Mat kernelx = (Mat_<float>(1,3)<<-0.5, 0, 0.5);
Mat kernely = (Mat_<float>(3,1)<<-0.5, 0, 0.5);
filter2D(src, fx, -1, kernelx)
filter2D(src, fy, -1, kernely);
Matlab treats border pixels differently from inner pixels. So the code above is wrong at the border values. One can use BORDER_CONSTANT to extent the border value out with a constant number, unfortunately the constant number is -1 by OpenCV and can not be changed to 0 (which is what we want).
So as to border values, I do not have a very neat answer to it. Just try to compute the first derivative by hand...
You have to call Sobel 2 times, with arguments:
xorder = 1, yorder = 0
and
xorder = 0, yorder = 1
You have to select the appropriate kernel size.
See documentation
It might still be that the MatLab implementation was different, ideally you should retrieve which kernel was used there...
Edit:
If you need to specify your own kernel, you can use the more generic filter2D. Your destination depth will be CV_16S (16bit signed).
Matlab computes the gradient differently for interior rows and border rows (the same is true for the columns of course). At the borders, it is a simple forward difference gradY(1) = row(2) - row(1). The gradient for interior rows is computed by the central difference gradY(2) = (row(3) - row(1)) / 2.
I think you cannot achieve the same result with just running a single convolution filter over the whole matrix in OpenCV. Use cv::Sobel() with ksize = 1, then treat the borders (either manually or by applying a [ 1 -1 ] filter).
Pei's answer is partly correct. Matlab uses these calculations for the borders:
G(:,1) = A(:,2) - A(:,1);
G(:,N) = A(:,N) - A(:,N-1);
so used the following opencv code to complete the gradient:
static cv::Mat kernelx = (cv::Mat_<double>(1, 3) << -0.5, 0, 0.5);
static cv::Mat kernely = (cv::Mat_<double>(3, 1) << -0.5, 0, 0.5);
cv::Mat fx, fy;
cv::filter2D(Image, fx, -1, kernelx, cv::Point(-1, -1), 0, cv::BORDER_REPLICATE);
cv::filter2D(Image, fy, -1, kernely, cv::Point(-1, -1), 0, cv::BORDER_REPLICATE);
fx.col(fx.cols - 1) *= 2;
fx.col(0) *= 2;
fy.row(fy.rows - 1) *= 2;
fy.row(0) *= 2;
Jorrit's answer is partly correct.
In some cases, the value of the directional derivative may be negative, and MATLAB will retain these negative numbers, but OpenCV Mat will set the negative number to 0.

Program for specific sequence of Integers

I am solving steady state heat equation with the boundary condition varying like this 10,0,0,10,0,0,10,0,0,10,0,0,10.... and so on depending upon number of points i select.
I want to construct a matrix for these boundary conditions but unable to specify the logic for the sequence in terms of ith element for a matrix.
i am using mathematica for this however i need the formula only like for odd we can specify 2n+1 and for even 2n , something like this for the sequence 10,0,0,10,0,0,10,0,0,10,....
In MATLAB, it would be
M = zeros(1000, 1);
M(1:3:1000) = 10;
to make a 1000 long vector with such structure. 1:3:1000 is 1,4,7,....
Since you specifically want a mathematical formula let me suggest a method:
seq = PadRight[{}, 30, {10, 0, 0}];
func = FindSequenceFunction[seq]
10/3 (1 + Cos[2/3 \[Pi] (-1 + #1)] + Cos[4/3 \[Pi] (-1 + #1)]) &
Test it:
Array[func, 10]
{10, 0, 0, 10, 0, 0, 10, 0, 0, 10}
There are surely simpler programs to generate this sequence, such as:
Array[10 Boole[1 == Mod[#, 3]] &, 10]
{10, 0, 0, 10, 0, 0, 10, 0, 0, 10}
A way to do this in Mathematica:
Take[Flatten[ConstantArray[{10, 0, 0}, Ceiling[1000/3] ], 1],1000]
Another way
Table[Boole[Mod[i,3]==1]*10, {i,1,1000}]