Survival probabilities in AFT survival model Pyspark

Survival probabilities in AFT survival model Pyspark - pyspark

I would like to know how to calculate the survival probabilities in pyspark with the AFTSurvivalRegression method. I have seen this example on the web:
from pyspark.ml.regression import AFTSurvivalRegression
from pyspark.ml.linalg import Vectors
training = spark.createDataFrame([
(1.218, 1.0, Vectors.dense(1.560, -0.605)),
(2.949, 0.0, Vectors.dense(0.346, 2.158)),
(3.627, 0.0, Vectors.dense(1.380, 0.231)),
(0.273, 1.0, Vectors.dense(0.520, 1.151)),
(4.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", "features"])
quantileProbabilities = [0.3, 0.6]
aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
quantilesCol="quantiles")
model = aft.fit(training)
# Print the coefficients, intercept and scale parameter for AFT survival regression
print("Coefficients: " + str(model.coefficients))
print("Intercept: " + str(model.intercept))
print("Scale: " + str(model.scale))
model.transform(training).show(truncate=False)
But with this I can only predict the survival times. I also can get quantile probabilities but I do not know exactly how them work. My question is how can I get the probability of one person will survive at specific time?

Related

getting 'StructField' object has no attribute '_get_object_id' on BinaryClassificationMetrics

I was trying to get the binary classification report on pyspark and I ran into this error
StructField' object has no attribute '_get_object_id'
Here is my code
%%spark
from pyspark.mllib.evaluation import BinaryClassificationMetrics
#from pyspark.mllib.evaluation import BinaryClassificationMetrics
predictionAndLabels = test_pred.rdd.map(lambda Row : (float(Row['label']) , Row['prediction']))
metrics = BinaryClassificationMetrics(predictionAndLabels)
Also , Based on the documentation a link! , apparently it does not support f1 measure and recall etc . Any idea why or how we can extract them without low level coding ?

I don't think you have to go that deep. Taking their example of the data from the binary from the documentation you linked and assuming your threshold is p=0.5 cutoff you can just do something like
# f1 = 2 · Precision · Recall/Precision + Recall
# precision = tp / tp+fp
# recall = tp / tp+fn
from pyspark.sql.functions import col
scoreAndLabels = sc.parallelize([(0.1, 0.0), (0.1, 1.0), (0.4, 0.0), (0.6, 0.0), (0.6, 1.0), (0.6, 1.0), (0.8, 1.0)], 2)
df = scoreAndLabels.toDF()
threshold = 0.5
tp = df.where((col('_1')>=threshold) &(col('_2')==1.0)).count()
fp = df.where((col('_1')<threshold) &(col('_2')==1.0)).count()
fn = df.where((col('_1')>=threshold) &(col('_2')==0.0)).count()
precision = tp / (tp+fp)
recall = tp / (tp+fn)
f1 = 2 * (precision * recall) / (precision + recall)
returns f1 = 0.75.

How to create Bezier curves from B-Splines in Sympy?

I need to draw a smooth curve through some points, which I then want to show as an SVG path. So I create a B-Spline with scipy.interpolate, and can access some arrays that I suppose fully define it. Does someone know a reasonably simple way to create Bezier curves from these arrays?
import numpy as np
from scipy import interpolate
x = np.array([-1, 0, 2])
y = np.array([ 0, 2, 0])
x = np.r_[x, x[0]]
y = np.r_[y, y[0]]
tck, u = interpolate.splprep([x, y], s=0, per=True)
cx = tck[1][0]
cy = tck[1][1]
print( 'knots: ', list(tck[0]) )
print( 'coefficients x: ', list(cx) )
print( 'coefficients y: ', list(cy) )
print( 'degree: ', tck[2] )
print( 'parameter: ', list(u) )
The red points are the 3 initial points in x and y. The green points are the 6 coefficients in cx and cy. (Their values repeat after the 3rd, so each green point has two green index numbers.)
Return values tck and u are described scipy.interpolate.splprep documentation
knots: [-1.0, -0.722, -0.372, 0.0, 0.277, 0.627, 1.0, 1.277, 1.627, 2.0]
# 0 1 2 3 4 5
coefficients x: [ 3.719, -2.137, -0.053, 3.719, -2.137, -0.053]
coefficients y: [-0.752, -0.930, 3.336, -0.752, -0.930, 3.336]
degree: 3
parameter: [0.0, 0.277, 0.627, 1.0]

Not sure starting with a B-Spline makes sense: form a catmull-rom curve through the points (with the virtual "before first" and "after last" overlaid on real points) and then convert that to a bezier curve using a relatively trivial transform? E.g. given your points p0, p1, and p2, the first segment would be a catmull-rom curve {p2,p0,p1,p2} for the segment p1--p2, {p0,p1,p2,p0} will yield p2--p0, and {p1, p2, p0, p1} will yield p0--p1. Then you trivially convert those and now you have your SVG path.
As demonstrator, hit up https://editor.p5js.org/ and paste in the following code:
var points = [{x:150, y:100 },{x:50, y:300 },{x:300, y:300 }];
// add virtual points:
points = points.concat(points);
function setup() {
createCanvas(400, 400);
tension = createSlider(1, 200, 100);
}
function draw() {
background(220);
points.forEach(p => ellipse(p.x, p.y, 4));
for (let n=0; n<3; n++) {
let [c1, c2, c3, c4] = points.slice(n,n+4);
let t = 0.06 * tension.value();
bezier(
// on-curve start point
c2.x, c2.y,
// control point 1
c2.x + (c3.x - c1.x)/t,
c2.y + (c3.y - c1.y)/t,
// control point 2
c3.x - (c4.x - c2.x)/t,
c3.y - (c4.y - c2.y)/t,
// on-curve end point
c3.x, c3.y
);
}
}
Which will look like this:
Converting that to Python code should be an almost effortless exercise: there is barely any code for us to write =)
And, of course, now you're left with creating the SVG path, but that's hardly an issue: you know all the Bezier points now, so just start building your <path d=...> string while you iterate.

A B-spline curve is just a collection of Bezier curves joined together. Therefore, it is certainly possible to convert it back to multiple Bezier curves without any loss of shape fidelity. The algorithm involved is called "knot insertion" and there are different ways to do this with the two most famous algorithm being Boehm's algorithm and Oslo algorithm. You can refer this link for more details.

Here is an almost direct answer to your question (but for the non-periodic case):
import aggdraw
import numpy as np
import scipy.interpolate as si
from PIL import Image
# from https://stackoverflow.com/a/35007804/2849934
def scipy_bspline(cv, degree=3):
""" cv: Array of control vertices
degree: Curve degree
"""
count = cv.shape[0]
degree = np.clip(degree, 1, count-1)
kv = np.clip(np.arange(count+degree+1)-degree, 0, count-degree)
max_param = count - (degree * (1-periodic))
spline = si.BSpline(kv, cv, degree)
return spline, max_param
# based on https://math.stackexchange.com/a/421572/396192
def bspline_to_bezier(cv):
cv_len = cv.shape[0]
assert cv_len >= 4, "Provide at least 4 control vertices"
spline, max_param = scipy_bspline(cv, degree=3)
for i in range(1, max_param):
spline = si.insert(i, spline, 2)
return spline.c[:3 * max_param + 1]
def draw_bezier(d, bezier):
path = aggdraw.Path()
path.moveto(*bezier[0])
for i in range(1, len(bezier) - 1, 3):
v1, v2, v = bezier[i:i+3]
path.curveto(*v1, *v2, *v)
d.path(path, aggdraw.Pen("black", 2))
cv = np.array([[ 40., 148.], [ 40., 48.],
[244., 24.], [160., 120.],
[240., 144.], [210., 260.],
[110., 250.]])
im = Image.fromarray(np.ones((400, 400, 3), dtype=np.uint8) * 255)
bezier = bspline_to_bezier(cv)
d = aggdraw.Draw(im)
draw_bezier(d, bezier)
d.flush()
# show/save im
I didn't look much into the periodic case, but hopefully it's not too difficult.

Basemap plus 3d graph

Hello Stackoverflow forks,
I'm a enthusiastic python learner.
I have studied python to visualiza my personal project about population density.
I have gone through tutorials about matplotlib and basemap in python.
I came across with the idea about
mapping my 3dimensional graph on top of the basemap which allows me to use geographycal coordinate information.
Can anyone let me know how I could use basemap as a base plane for the 3dimensional graph?
Please let me know which tutorial or references I could go with for developing this.
Best,
Thank you always Stackoverflow forks.

The basemap documentation has a small section on 3D plotting. Here's a simple script to get you started:
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
plt.close('all')
fig = plt.figure()
ax = fig.gca(projection='3d')
extent = [-127, -65, 25, 51]
# make the map and axis.
m = Basemap(llcrnrlon=extent[0], llcrnrlat=extent[2],
urcrnrlon=extent[1], urcrnrlat=extent[3],
projection='cyl', resolution='l', fix_aspect=False, ax=ax)
ax.add_collection3d(m.drawcoastlines(linewidth=0.25))
ax.add_collection3d(m.drawcountries(linewidth=0.25))
ax.add_collection3d(m.drawstates(linewidth=0.25))
ax.view_init(azim = 230, elev = 15)
ax.set_xlabel(u'Longitude (°E)', labelpad=10)
ax.set_ylabel(u'Latitude (°N)', labelpad=10)
ax.set_zlabel(u'Altitude (ft)', labelpad=20)
# values to plot - change as needed. Plots 2 dots, one at elevation 0 and another 100.
# also draws a line between the two.
x, y = m(-85.4808, 32.6099)
ax.plot3D([x, x], [y, y], [0, 100], color = 'green', lw = 0.5)
ax.scatter3D(x, y, 100, s = 5, c = 'k', zorder = 4)
ax.scatter3D(x, y, 0, s = 2, c = 'k', zorder = 4)
ax.set_zlim(0., 400.)
plt.show()

Train HMM using MALLET

I am very new in using MALLET. I need to have a library of HMM for sequence labelling task. I already look at Sequence Tagging Developer's Guide, but i am unable to understand that how can I train HMM. I have a list of Hidden States, a list of observation symbols, initial probability matrix, transition probability matrix and emission probability matrix. I need to train HMM by using B-W algorithm, to re-estimate the parameters and then want to perform sequence labelling task using those parameters.
As example, I have the following values:
hidden_states = ('Rainy', 'Sunny')
observation_symbols = ('walk', 'shop', 'clean')
initial_probability = {'Rainy': 0.6, 'Sunny': 0.4}
transition_probability = {
'Rainy' : {'Rainy': 0.7, 'Sunny': 0.3},
'Sunny' : {'Rainy': 0.4, 'Sunny': 0.6},
}
emission_probability = {
'Rainy' : {'walk': 0.1, 'shop': 0.4, 'clean': 0.5},
'Sunny' : {'walk': 0.6, 'shop': 0.3, 'clean': 0.1},
}
observation_sequence = {
walk clean shop,
clean walk shop
}
How can I train HMM using the above parameters? Please help.

How to specify edge length in Networkx based off of edge weight

import networkx as nx
import numpy as np
import pylab as plt
#Generate graph with 4 nodes
erdo_graph = nx.erdos_renyi_graph(4, 0.7, directed=False)
#Add edge weights
for u,v,d in erdo_graph.edges(data=True):
d['weight'] = np.random.choice(np.arange(1, 7), p=[0.1, 0.05, 0.05, 0.2,])
#If you want to print out the edge weights:
labels = nx.get_edge_attributes(erdo_graph,'weight')
print("Here are the edge weights: ", labels)
#Following "Networkx Spring Layout with different edge values" link that you supplied:
initialpos = {1:(0,0), 2:(0,3), 3:(0,-1), 4:(5,5)}
pos = nx.spring_layout(erdo_graph, weight='weight', pos=initialpos)
nx.draw_networkx(erdo_graph, pos)
plt.show()
Whenever I try to position the nodes based off of a layout, I expect nodes that are connected by a lower edge weight to be closer to each other, but that doesn't seem to be the case.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Survival probabilities in AFT survival model Pyspark - pyspark

Related

getting 'StructField' object has no attribute '_get_object_id' on BinaryClassificationMetrics

How to create Bezier curves from B-Splines in Sympy?

Basemap plus 3d graph

Train HMM using MALLET

How to specify edge length in Networkx based off of edge weight

Categories

Resources