Which algorithm is the best based on P-Values - classification

I used the following code:
plt.figure(figsize = (7, 7))
plt.boxplot([totalP['poly'], totalP['rbf'], totalP['linear'], totalP['gf']])
plt.xticks(np.arange(1, 5), kernels)
plt.title('P values for each svm kernel')
plt.xlabel('SVM kernel')
plt.ylabel('P values Rate')
plt.ioff()
plt.savefig('images/pValues.png')
plt.show()
Which one of the algorithms do we consider the best (in t-values and p-values) and why?
Is it the nearest to 1 or the nearest to 0?

Related

How is RMSD calculated in the Scipy implementation of the Kabsch algorithm?

Scipy calculates the rmsd like this, and I'll paraphrase it here for convenience (for readability I drop the weights and the max(*, 0))
rmsd = np.sqrt(np.sum(b ** 2 + a ** 2) - 2 * np.sum(s))
To me this does not look like RMSD.
Now from the docs one would infer that the rmsd return value is defined as the square root of double this expression:
The latter is indeed what I would consider to be the RMSD. In fact I went ahead and coded it up (note that this function expects me to apply the estimated transformation to one of the sets of points first whereas the snippet above does not)
def _calc_rmsd(a: np.ndarray, b_transformed: np.ndarray) -> float:
distances = np.linalg.norm(a - b_transformed, axis=-1)
rmsd = np.sqrt((distances ** 2).sum() / len(distances))
return rmsd
I also plotted out what these would look like for randomly generated point pairs with normally distributed noise (blue is scipy, orange is mine)
Or extending the plot out to 200 point pairs:
So to sum it up:
The definition of rmsd in the docs is in agreement with what I believe to be the widely accepted notion of rmsd
The scipy code implementation of rmsd disagrees with the latter. I don't even understand what it's supposed to mathematically represent.
From monte-carlo simulations, clearly the two implementations have different outcomes.
So what's going on?
Apparently the SciPy code is not returning the root-mean-squared distance. It sums the squared differences, but it does not divide by the number of vectors before taking the square root. The difference between the SciPy calculation and yours is a factor of sqrt(len(a)). You can verify this with an example such as the following.
In [157]: from scipy.spatial.transform import Rotation
In [158]: def _calc_rmsd(a: np.ndarray, b_transformed: np.ndarray) -> float:
...: distances = np.linalg.norm(a - b_transformed, axis=-1)
...: rmsd = np.sqrt((distances ** 2).sum() / len(distances))
...: return rmsd
...:
Some test data:
In [159]: a = np.array([[0, 1, 1], [1, 1, 1.5], [2.0, -1.0, 4.0], [-1, 0, 5]])
In [160]: b = np.array([[0, 1, 1.5], [2, 2, 2], [1, -1, 5], [-3, 0.1, 1]])
Compute the rotation:
In [161]: R, rmsd = Rotation.align_vectors(a, b)
In [162]: rmsd
Out[162]: 3.8753534834716685
Here's your calculation of the RMSD:
In [163]: _calc_rmsd(a, R.apply(b))
Out[163]: 1.9376767417358356
And here is your calculation, multiplied by sqrt(len(a)), so it matches the result returned by Rotation.align_vectors:
In [164]: _calc_rmsd(a, R.apply(b)) * np.sqrt(len(a))
Out[164]: 3.875353483471671
This looks like a documentation issue. If you have a moment, you could create a new issue for this over in https://github.com/scipy/scipy/issues

Neural Network Always Predicting Average Value

I'm trying to train a neural network to approximate a known scalar function of two variables; however, no matter the parameters of my training, the network always just ends up simply predicting the average value of the true outputs.
I am using an MLP and have tried:
using several network depths and widths
different optimizers (SGD and ADAM)
different activations (ReLU and Sigmoid)
changing the learning rate (several points within the range 0.1 to 0.001)
increasing the data (to 10,000 points)
increasing the number of epochs (to 2,000)
and different random seeds
to no avail.
My loss function is MSE and always plateaus to a value of about 5.14.
Regardless of changes I make, I get the following results:
Where the blue surface is the function to be approximated, and the green surface is the MLP approximation of the function, having a value that is roughly the average of the true function over that domain (the true average is 2.15 with a square of 4.64 - not far from the loss plateau value).
I feel like I could be missing something very obvious and have just been looking at it for too long. Any help is greatly appreciated! Thanks
I've attached my code here (I'm using JAX):
import jax.numpy as jnp
from jax import grad, jit, vmap, random, value_and_grad
import flax
import flax.linen as nn
import optax
seed = 2
key, data_key = random.split(random.PRNGKey(seed))
x1, x2, y= generate_data(data_key) # Data generation function
# Using Flax - define an MLP
class MLP(nn.Module):
features: Sequence[int]
#nn.compact
def __call__(self, x):
for feat in self.features[:-1]:
x = nn.relu(nn.Dense(feat)(x))
x = nn.Dense(self.features[-1])(x)
return x
# Define function that returns JITted loss function
def make_mlp_loss(input_data, true_y):
def mlp_loss(params):
pred_y = model.apply(params, input_data)
loss_vector = jnp.square(true_y.reshape(-1) - pred_y)
return jnp.average(loss_vector)
# Outer scope incapsulation saves the data and true output
return jit(mlp_loss)
# Concatenate independent variable vectors to be proper input shape
input_data = jnp.hstack((x1.reshape(-1, 1), x2.reshape(-1, 1)))
# Create loss function with data and true output
mlp_loss = make_mlp_loss(input_data, y)
# Create function that returns loss and gradient
loss_and_grad = value_and_grad(mlp_loss)
# Example architectures I've tried
architectures = [[16, 16, 1], [8, 16, 1], [16, 8, 1], [8, 16, 8, 1], [32, 32, 1]]
# Only using one seed but iterated over several
for seed in [645]:
for architecture in architectures:
# Create model
model = MLP(architecture)
# Initialize model with random parameters
key, params_key = random.split(key)
dummy = jnp.ones((1000, 2))
params = model.init(params_key, dummy)
# Create optimizer
opt = optax.adam(learning_rate=0.01) #sgd
opt_state = opt.init(params)
epochs = 50
for i in range(epochs):
# Get loss and gradient
curr_loss, curr_grad = loss_and_grad(params)
if i % 5 == 0:
print(curr_loss)
# Update
updates, opt_state = opt.update(curr_grad, opt_state)
params = optax.apply_updates(params, updates)
print(f"Architecture: {architecture}\nLoss: {curr_loss}\nSeed: {seed}\n\n")

Normcdf() input arguments

I am trying to solve:
Use the technique of importance sampling via tilted densities to
estimate the quantity where X is a standard normal random variable.
Generate estimates for a = 1, 2, 3, 4, 5. Use sample sizes 1e3, 1e5,
and 1e7 in each estimation.
theta=Pr{X>a}
and my teacher gave the example code:
n2=1e7;
for a2=1:5
x2=rand(1,n2(1))+a2;
theta2(a2)=mean((x2>a2).*exp(a2^2/2-a2*x2));
end
tp2=1-normcdf(1:5);
abs(tp2-theta2)./tp2
but when I attempt to run it I get:
Not enough input arguments.
Error in normcdf (line 16) cdf = 0.5 * erfc(-(x-mu)/(sigma*sqrt(2)));
Error in HW10 (line 18) tp2=1-normcdf(1:5);
But he didn't have any errors come up in class. I can't see what I am missing to run this.

Scipy.curve_fit() vs. Matlab fit() weighted nonlinear least squares

I have a Matlab reference routine that I am trying to convert to numpy/scipy. I have encountered a curve fitting problem that does I cannot solve in Python. So here is a simple example which demonstrates the problem. The data is completely synthetic and not part of the problem.
Let's say I'm trying to fit a straight-line model of noisy data -
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
y = [0.1075, 1.3668, 1.5482, 3.1724, 4.0638, 4.7385, 5.9133, 7.0685, 8.7157, 9.5539]
For the unweighted solution in Matlab, I would code
g = #(m, b, x)(m*x + b)
f = fittype(g)
bestfit = fit(x, y, g)
which produces a solution of bestfit.m = 1.048, bestfit.b = -0.09219
Running this data through scipy.optimize.curve_fit() produces identical results.
If instead the fit uses a decay function to reduce the impact of data points
dw = [0.7290, 0.5120, 0.3430, 0.2160, 0.1250, 0.0640, 0.0270, 0.0080, 0.0010, 0]
weightedfit = fit(x, y, g, 'Weights', dw)
This produces a slope if 0.944 and offset 0.1484.
I have not figured out how to conjure this result from scipy.optimize.curve_fit using the sigma parameter. If I pass the weights as provided to Matlab, the '0' causes a divide by zero exception. Clearly Matlab and scipy are thinking very differently about the meaning of the weights in the underlying optimization routine. Is there a simple way of converting between the two that allows me to provide a weighting function which produces identical results?
Ok, so after further investigation I can offer the answer, at least for this simple example.
import numpy as np
import scipy as sp
import scipy.optimize
def modelFun(x, m, b):
return m * x + b
def testFit():
w = np.diag([1.0, 1/0.7290, 1/0.5120, 1/0.3430, 1/0.2160, 1/0.1250, 1/0.0640, 1/0.0270, 1/0.0080, 1/0.0010])
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([0.1075, 1.3668, 1.5482, 3.1724, 4.0638, 4.7385, 5.9133, 7.0685, 8.7157, 9.5539])
popt = sp.optimize.curve_fit(modelFun, x, y, sigma=w)
print(popt[0])
print(popt[1])
Which produces the desired result.
In order to force sp.optimize.curve_fit to minimize the same chisq metric as Matlab using the curve fitting toolbox, you must do two things:
Use the reciprocal of the weight factors
Create a diagonal matrix from the new weight factors. According to the scipy reference:
sigma None or M-length sequence or MxM array, optional
Determines the uncertainty in ydata. If we define residuals as r =
ydata - f(xdata, *popt), then the interpretation of sigma depends on
its number of dimensions:
A 1-d sigma should contain values of standard deviations of errors in
ydata. In this case, the optimized function is chisq = sum((r / sigma)
** 2).
A 2-d sigma should contain the covariance matrix of errors in ydata.
In this case, the optimized function is chisq = r.T # inv(sigma) # r.
New in version 0.19.
None (default) is equivalent of 1-d sigma filled with ones.

In Tensorflow, What kind of neural network should I use?

I am doing Tensorflow tutorial, getting what TF is. But I am confused about what neural network should I use in my work.
I am looking at Single Layer Neural Network, CNN, RNN, and LSTM RNN.
There is a sensor which measures something and represents the result in 2 boolean ways. Here, they are Blue and Red, like this:
the sensor gives result values every 5minutes. If we pile up the values for each color, we can see some patterns:
number inside each circle represents the sequence of result values given from sensor. (107 was given right after 106) when you see from 122 to 138, you can see decalcomanie-like pattern.
I want to predict the next boolean value before the sensor result. I may do supervised learning using past results. But I'm not sure which neural network or method is suitable. Thinking that this work needs pattern using past results (have to see context), and memorize past results, maybe LSTM RNN (long-short term memory recurrent neural network) would be suitable one. Could you tell me what is the right one?
So it sounds like you need to process a sequences of images. You could actually use both CNN and RNN together. I did this a month ago when I was training a network to swipe left or right on tinder using the sequence of profile pictures. What you would do is pass all of the images through a CNN and then into the RNN. Below is part of the code for my tinder bot. See how I distribute the convolutions over the sequence and then push it through the RNN. Finally I put a softmax classifier on the last time step to make the prediction, however in your case I think you will distribuite the prediction in time since you want the next item in the sequence.
self.input_tensor = tf.placeholder(tf.float32, (None, self.max_seq_len, self.img_height, self.img_width, 3), 'input_tensor')
self.expected_classes = tf.placeholder(tf.int64, (None,))
self.is_training = tf.placeholder_with_default(False, None, 'is_training')
self.learning_rate = tf.placeholder(tf.float32, None, 'learning_rate')
self.tensors = {}
activation = tf.nn.elu
rnn = tf.nn.rnn_cell.LSTMCell(256)
with tf.variable_scope('series') as scope:
state = rnn.zero_state(tf.shape(self.input_tensor)[0], tf.float32)
for t, img in enumerate(reversed(tf.unpack(self.input_tensor, axis = 1))):
y = tf.map_fn(tf.image.per_image_whitening, img)
features = 48
for c_layer in range(3):
with tf.variable_scope('pool_layer_%d' % c_layer):
with tf.variable_scope('conv_1'):
filter = tf.get_variable('filter', (3, 3, y.get_shape()[-1].value, features))
b = tf.get_variable('b', (features,))
y = tf.nn.conv2d(y, filter, (1, 1, 1, 1), 'SAME') + b
y = activation(y)
self.tensors['img_%d_conv_%d' % (t, 2 * c_layer)] = y
with tf.variable_scope('conv_2'):
filter = tf.get_variable('filter', (3, 3, y.get_shape()[-1].value, features))
b = tf.get_variable('b', (features,))
y = tf.nn.conv2d(y, filter, (1, 1, 1, 1), 'SAME') + b
y = activation(y)
self.tensors['img_%d_conv_%d' % (t, 2 * c_layer + 1)] = y
y = tf.nn.max_pool(y, (1, 3, 3, 1), (1, 3, 3, 1), 'SAME')
self.tensors['pool_%d' % c_layer] = y
features *= 2
print(y.get_shape())
with tf.variable_scope('rnn'):
y = tf.reshape(y, (-1, np.prod(y.get_shape().as_list()[1:])))
y, state = rnn(y, state)
self.tensors['rnn_%d' % t] = y
scope.reuse_variables()
with tf.variable_scope('output_classifier'):
W = tf.get_variable('W', (y.get_shape()[-1].value, 2))
b = tf.get_variable('b', (2,))
y = tf.nn.dropout(y, tf.select(self.is_training, 0.5, 1.0))
y = tf.matmul(y, W) + b
self.tensors['classifier'] = y
Yes, an RNN (recurrent neural network) fits the task of accumulating state along along a sequence in order to predict its next element. LSTM (long short-term memory) is a particular design for the recurrent pieces of the network that has turned out to be very successful in avoiding numerical challenges from long-lasting recurrences; see colah's much-cited blogpost for more. (Alternatives to the LSTM cell design exist but I would only fine tune that much later, possibly never.)
The TensorFlow RNN codelab explains LSTM RNNs for the case of language models, which predict the (n+1)-st word of a sentence from the preceding n words, for each n (like for each timestep in your series of measurements). Your case is simpler than language models in that you only have two words (red and blue), so if you read anything about embeddings of words, ignore it.
You also mentioned other types of neural networks. These are not aimed at accumulating state along a sequence, such as your boolean sequence of red/blue inputs. However, your second image suggests that there might be pattern in the sequence of counts of successive red/blue values. You could try using the past k counts as input to a plain feed-forward (i.e., non-recursive) neural network that predicts the probability of the next measurement having the same color as the current one. - Maybe that works with a single layer, or maybe two or even three work better; experimentation will tell. This is a less fancy approach than an RNN, but if it works good enough, it gives you a simpler solution with fewer technicalities to worry about.
CNNs (convolutional neural networks) would not be my first choice here. These aim to discover a set of fixed-scale features at various places in the input, for example, some texture or curved edge anywhere in an image. But you only want to predict one next item that extends your input sequence. A plain neural network (see above) may discover useful patterns on the k previous values, and training it with all earlier partial sequences will help it find those patterns. The CNN approach would help to discover them during prediction at long-gone parts of the input; I have no intuition why that would help.