GPflow - GP classification with 1-dim Linear kernel fits poorly for 2 dimension data - gpflow

Following the issue #1435, I have an additional question to how to use GPflow.
I replicate the issue in an additional kernel: https://github.com/avalonhse/BayesNotebook/blob/master/Issue_2_GPFlow_Linear_Classification.ipynb
My purpose is fitting an additive kernel to a 2-dimensional data (squared exponential in dimension 1 and linear kernel in dimension 2). Following the instruction of #1435, I have been successfully fitting the model with kernel gpflow.kernels.Linear(variance= 0.1).
Linear kernel
However, when I use the kernel gpflow.kernels.Linear(active_dims=1,variance= 0.01) as I original planned, the model is not fitted. I used the GPy with same kernel as a reference then the result looks reasonable.
1-dim GPFlow kernel
import numpy as np
X = np.array([[ 9.96578428, 60.],[ 9.96578428, 40.],[ 9.96578428, 20.],
[10.96578428, 30.],[11.96578428, 40.],[12.96578428, 50.],
[12.96578428, 70.],[8.96578428, 30. ],[ 7.96578428, 40.],
[ 6.96578428, 50.],[ 6.96578428, 30.],[ 6.96578428, 10.],
[11.4655664 , 71.],[ 8.56605404, 63.],[12.41574177, 69.],
[10.61562964, 48.],[ 7.61470984, 51.],[ 9.31514956, 45.]])
Y = np.array([[1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 1., 0., 0., 1., 0.]]).T
# plotting
import matplotlib.pyplot as plt
import matplotlib
%matplotlib inline
def plot(X,Y):
mask = Y[:, 0] == 1
plt.figure(figsize=(6, 6))
plt.plot(X[mask, 0], X[mask, 1], "oC0", mew=0, alpha=0.5)
plt.ylim(-10, 100)
plt.xlim(5, 15)
_ = plt.plot(X[np.logical_not(mask), 0], X[np.logical_not(mask), 1], "oC1", mew=0, alpha=0.5)
plot(X,Y)
# Evaluate real function and the predicted probability
res = 500
xx, yy = np.meshgrid(np.linspace(5, 15, res),
np.linspace(- 10, 120, res))
Xplot = np.vstack((xx.flatten(), yy.flatten())).T
# Code followed the Notebook : https://gpflow.readthedocs.io/en/develop/notebooks/basics/classification.html
import tensorflow as tf
import tensorflow_probability as tfp
import gpflow
from gpflow.utilities import print_summary, set_trainable, to_default_float
gpflow.config.set_default_summary_fmt("notebook")
def testGPFlow(k):
m = gpflow.models.VGP(
(X, Y),
kernel= k,
likelihood=gpflow.likelihoods.Bernoulli()
)
print("\n ########### Model before optimzation ########### \n")
print_summary(m)
print("\n ########### Model after optimzation ########### \n")
opt = gpflow.optimizers.Scipy()
res = opt.minimize(
m.training_loss, variables=m.trainable_variables, options=dict(maxiter=2500), method="L-BFGS-B"
)
print(' Message: ' + str(res.message) + '\n Status = ' + str(res.status) + '\n Number of iterations = ' + str(res.nit))
print_summary(m)
means, _ = m.predict_y(Xplot) # here we only care about the mean
y_prob = means.numpy().reshape(*xx.shape)
print("Fitting model using GPFlow")
plot(X,Y)
_ = plt.contour(
xx,
yy,
y_prob,
[0.5], # plot the p=0.5 contour line only
colors="k",
linewidths=1.8,
zorder=100,
)
k = gpflow.kernels.Linear(active_dims=[1],variance= 0.01)
testGPFlow(k)
k = gpflow.kernels.Linear(variance= 1)
testGPFlow(k)
The GPy code is for reference only to suggest how a fitted model should be. I am aware that GPy and GPflow use different methods. My question is why GPflow model does not fit when I specify the Linear kernel in 1 dimension.

Thanks for posting this question, Hoang, and for using GPflow.
When you specify input_dim in Gpy, you are telling the algorithm to act on two dimensions. Active_dims in GPflow behaves differently. It specifies which dimensions you want the kernel to act on. 'active_dims = 1' is telling GPflow to apply your linear kernel to only the y dimension.
Since you want your kernel to act on both x and y dimensions, you should specify active_dims = [0,1] rather than just 'active_dims = 1.' When I run your code with this fix, I get a result identical to GPy's result:

Related

Is the inplace operation with scipy gaussian_filter1d safe?

Here is the sample code I wrote to examine this issue.
It can be seen that in this case we get the same result, but I want to know if it is safe to compute inplace with other options (scipy version, augment, ...).
import numpy as np
from scipy.ndimage import gaussian_filter1d
X = np.random.normal(0, 1, size=[64, 1024, 2048])
OPX = X.copy()
for axis, sigma in zip([-2, -1], [3, 7]):
gaussian_filter1d(OPX, sigma, axis, output=OPX)
OPY, OPZ = X.copy(), X.copy()
for axis, sigma in zip([-2, -1], [3, 7]):
gaussian_filter1d(OPY, sigma, axis, output=OPZ)
OPY, OPZ = OPZ, OPY
(OPX == OPY).all() # True
python 3.7.15
scipy 1.7.3
numpy 1.21.6

Tensorflow 2.X : Understanding hinge loss

I am learning Tensorflow 2.X. I am following this page to understand hinge loss.
I went through the standalone usage code.
Code is below -
y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
h = tf.keras.losses.Hinge()
h(y_true, y_pred).numpy()
the output is 1.3
I tried to manually calculate it & writing code by given formula
loss = maximum(1 - y_true * y_pred, 0)
my code -
y_true = tf.Variable([[0., 1.], [0., 0.]])
y_pred = tf.Variable([[0.6, 0.4], [0.4, 0.6]])
def hinge_loss(y_true, y_pred):
return tf.reduce_mean(tf.math.maximum(1. - y_true * y_pred, 0.))
print("Hinge Loss :: ", hinge_loss(y_true, y_pred).numpy())
But I am getting 0.9.
Where am i doing wrong ? Am i missing any concept here ?
Kindly guide.
You have to change the 0 values of the y_true to -1. In the link you shared it is mentioned that that if your y_true is originally {0,1} that you have to change it to {-1,1} for the Hinge Loss calculation. Then you will get the same value for the example which is 1.3.
From the link shared: https://www.tensorflow.org/api_docs/python/tf/keras/losses/Hinge
y_true values are expected to be -1 or 1. If binary (0 or 1) labels are provided we will convert them to -1 or 1.
import tensorflow as tf
y_true = tf.Variable([[0., 1.], [0., 0.]])
y_pred = tf.Variable([[0.6, 0.4], [0.4, 0.6]])
def hinge_loss(y_true, y_pred):
return tf.reduce_mean(tf.math.maximum(1. - y_true * y_pred, 0.))
# convert y_true from {0,1} to {-1,1} before passing them to hinge_loss
y_true = y_true * 2 - 1
print(hinge_loss(y_true, y_pred))
Output:
tf.Tensor(1.3, shape=(), dtype=float32)

gaussian process regression in multiple dimensions with GPflow

I would like to perform some multivariant regression using gaussian process regression as implemented in GPflow using version 2.
Installed with pip install gpflow==2.0.0rc1
Below is some example code that generates some 2D data and then attempts to fit it with using GPR and the finally computes the difference
between the true input data and the GPR prediction.
Eventually I would like to extend to higher dimensions
and do tests against a validation set to check for over-fitting
and experiment with other kernels and "Automatic Relevance Determination"
but understanding how to get this to work is the first step.
Thanks!
The following code snippet will work in a jupyter notebook.
import gpflow
import numpy as np
import matplotlib
from gpflow.utilities import print_summary
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (12, 6)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def gen_data(X, Y):
"""
make some fake data.
X, Y are np.ndarrays with shape (N,) where
N is the number of samples.
"""
ys = []
for x0, x1 in zip(X,Y):
y = x0 * np.sin(x0*10)
y = x1 * np.sin(x0*10)
y += 1
ys.append(y)
return np.array(ys)
# generate some fake data
x = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, x)
X = X.ravel()
Y = Y.ravel()
z = gen_data(X, Y)
#note X.shape, Y.shape and z.shape
#are all (400,) for this case.
# if you would like to plot the data you can do the following
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X, Y, z, s=100, c='k')
# had to set this
# to avoid the following error
# tensorflow.python.framework.errors_impl.InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid. [Op:Cholesky]
gpflow.config.set_default_positive_minimum(1e-7)
# setup the kernel
k = gpflow.kernels.Matern52()
# set up GPR model
# I think the shape of the independent data
# should be (400, 2) for this case
XY = np.column_stack([[X, Y]]).T
print(XY.shape) # this will be (400, 2)
m = gpflow.models.GPR(data=(XY, z), kernel=k, mean_function=None)
# optimise hyper-parameters
opt = gpflow.optimizers.Scipy()
def objective_closure():
return - m.log_marginal_likelihood()
opt_logs = opt.minimize(objective_closure,
m.trainable_variables,
options=dict(maxiter=100)
)
# predict training set
mean, var = m.predict_f(XY)
print(mean.numpy().shape)
# (400, 400)
# I would expect this to be (400,)
# If it was then I could compute the difference
# between the true data and the GPR prediction
# `diff = mean - z`
# but because the shape is not as expected this of course
# won't work.
The shape of z must be (N, 1), whereas in your case it is (N,). However, this is a missing check in GPflow and not your fault.

Kalman Filter (pykalman): Value for obs_covariance and model without intercept

I am looking at the KalmanFilter from pykalman shown in examples:
pykalman documentation
Example 1
Example 2
and I am wondering
observation_covariance=100,
vs
observation_covariance=1,
the documentation states
observation_covariance R: e(t)^2 ~ Gaussian (0, R)
How should the value be set here correctly?
Additionally, is it possible to apply the Kalman filter without intercept in the above module?
The observation covariance shows how much error you assume to be in your input data. Kalman filter works fine on normally distributed data. Under this assumption you can use the 3-Sigma rule to calculate the covariance (in this case the variance) of your observation based on the maximum error in the observation.
The values in your question can be interpreted as follows:
Example 1
observation_covariance = 100
sigma = sqrt(observation_covariance) = 10
max_error = 3*sigma = 30
Example 2
observation_covariance = 1
sigma = sqrt(observation_covariance) = 1
max_error = 3*sigma = 3
So you need to choose the value based on your observation data. The more accurate the observation, the smaller the observation covariance.
Another point: you can tune your filter by manipulating the covariance, but I think it's not a good idea. The higher the observation covariance value the weaker impact a new observation has on the filter state.
Sorry, I did not understand the second part of your question (about the Kalman Filter without intercept). Could you please explain what you mean?
You are trying to use a regression model and both intercept and slope belong to it.
---------------------------
UPDATE
I prepared some code and plots to answer your questions in details. I used EWC and EWA historical data to stay close to the original article.
First of all here is the code (pretty the same one as in the examples above but with a different notation)
from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
# reading data (quick and dirty)
Datum=[]
EWA=[]
EWC=[]
for line in open('data/dataset.csv'):
f1, f2, f3 = line.split(';')
Datum.append(f1)
EWA.append(float(f2))
EWC.append(float(f3))
n = len(Datum)
# Filter Configuration
# both slope and intercept have to be estimated
# transition_matrix
F = np.eye(2) # identity matrix because x_(k+1) = x_(k) + noise
# observation_matrix
# H_k = [EWA_k 1]
H = np.vstack([np.matrix(EWA), np.ones((1, n))]).T[:, np.newaxis]
# transition_covariance
Q = [[1e-4, 0],
[ 0, 1e-4]]
# observation_covariance
R = 1 # max error = 3
# initial_state_mean
X0 = [0,
0]
# initial_state_covariance
P0 = [[ 1, 0],
[ 0, 1]]
# Kalman-Filter initialization
kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
transition_matrices = F,
observation_matrices = H,
transition_covariance = Q,
observation_covariance = R,
initial_state_mean = X0,
initial_state_covariance = P0)
# Filtering
state_means, state_covs = kf.filter(EWC)
# Restore EWC based on EWA and estimated parameters
EWC_restored = np.multiply(EWA, state_means[:, 0]) + state_means[:, 1]
# Plots
plt.figure(1)
ax1 = plt.subplot(211)
plt.plot(state_means[:, 0], label="Slope")
plt.grid()
plt.legend(loc="upper left")
ax2 = plt.subplot(212)
plt.plot(state_means[:, 1], label="Intercept")
plt.grid()
plt.legend(loc="upper left")
# check the result
plt.figure(2)
plt.plot(EWC, label="EWC original")
plt.plot(EWC_restored, label="EWC restored")
plt.grid()
plt.legend(loc="upper left")
plt.show()
I could not retrieve data using pandas, so I downloaded them and read from the file.
Here you can see the estimated slope and intercept:
To test the estimated data I restored the EWC value from the EWA using the estimated parameters:
About the observation covariance value
By varying the observation covariance value you tell the Filter how accurate the input data is (normally you just describe your confidence in the observation using some datasheets or your knowledge about the system).
Here are estimated parameters and the restored EWC values using different observation covariance values:
You can see the filter follows the original function better with a bigger confidence in observation (smaller R). If the confidence is low (bigger R) the filter leaves the initial estimate (slope = 0, intercept = 0) very slowly and the restored function is far away from the original one.
About the frozen intercept
If you want to freeze the intercept for some reason, you need to change the whole model and all filter parameters.
In the normal case we had:
x = [slope; intercept] #estimation state
H = [EWA 1] #observation matrix
z = [EWC] #observation
Now we have:
x = [slope] #estimation state
H = [EWA] #observation matrix
z = [EWC-const_intercept] #observation
Results:
Here is the code:
from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
# only slope has to be estimated (it will be manipulated by the constant intercept) - mathematically incorrect!
const_intercept = 10
# reading data (quick and dirty)
Datum=[]
EWA=[]
EWC=[]
for line in open('data/dataset.csv'):
f1, f2, f3 = line.split(';')
Datum.append(f1)
EWA.append(float(f2))
EWC.append(float(f3))
n = len(Datum)
# Filter Configuration
# transition_matrix
F = 1 # identity matrix because x_(k+1) = x_(k) + noise
# observation_matrix
# H_k = [EWA_k]
H = np.matrix(EWA).T[:, np.newaxis]
# transition_covariance
Q = 1e-4
# observation_covariance
R = 1 # max error = 3
# initial_state_mean
X0 = 0
# initial_state_covariance
P0 = 1
# Kalman-Filter initialization
kf = KalmanFilter(n_dim_obs=1, n_dim_state=1,
transition_matrices = F,
observation_matrices = H,
transition_covariance = Q,
observation_covariance = R,
initial_state_mean = X0,
initial_state_covariance = P0)
# Creating the observation based on EWC and the constant intercept
z = EWC[:] # copy the list (not just assign the reference!)
z[:] = [x - const_intercept for x in z]
# Filtering
state_means, state_covs = kf.filter(z) # the estimation for the EWC data minus constant intercept
# Restore EWC based on EWA and estimated parameters
EWC_restored = np.multiply(EWA, state_means[:, 0]) + const_intercept
# Plots
plt.figure(1)
ax1 = plt.subplot(211)
plt.plot(state_means[:, 0], label="Slope")
plt.grid()
plt.legend(loc="upper left")
ax2 = plt.subplot(212)
plt.plot(const_intercept*np.ones((n, 1)), label="Intercept")
plt.grid()
plt.legend(loc="upper left")
# check the result
plt.figure(2)
plt.plot(EWC, label="EWC original")
plt.plot(EWC_restored, label="EWC restored")
plt.grid()
plt.legend(loc="upper left")
plt.show()

scipy.optimize.linprog seems to solve the task but doesn't return the x?

I'm trying to solve a very simple linear program using scipy.optimize.linprog, and it seems the function does what I want it to do, but somehow it doesn't return the 'x' (it does return the correct minimal function value)
Just for a simple example (in matlab notation), I have a 2-D a=[a1; a2] and simple linear constraint [1, 2] * a = 1, and want to minimize the L1 norm of a. The optimum should be a=[0, 0.5].
As far as I understand, I can formulate this in standard form by using an extra variable s, such that b>=abs(a) (i.e. a-b<=0 and -a-b<=0) and minimize sum(b) subject to these constraints and the original equality constraint [1, 2] * a = 1.
So I define x= [a; b], plug it into scipy's linprog, it returns with a success, and I get the correct answer: optimal value of sum(b) is 0.5. However, the x that it returns is full of nan's instead of [0; 0.5; 0; 0.5]
Here's the code:
A = np.array([1,2]).reshape([1,2])
b_eq = np.array([1])
ones = np.ones([2,])
zeros = np.zeros([2,])
zerosm = np.zeros([1, 2])
eye = np.eye(2)
c = np.hstack([zeros, ones])
A_ub = np.vstack([np.hstack([eye, -eye]), np.hstack([-eye, -eye])])
b_ub = np.hstack([zeros, zeros])
A_eq = np.hstack([A, zerosm])
res = scipy.optimize.linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=(None, None),
A_eq=A_eq, b_eq=b_eq)
The result:
success: True
status: 0
fun: 0.5
x: array([ nan, nan, nan, nan])
nit: 3
slack: array([ 0., 0., 0., 1.])
message: 'Optimization terminated successfully.'
I.e. x is nan's instead of the solution. The function value is correct (0.5), and slacks seem fine - according to the scipy docs slack of 0 means that the constraint is active, so 1st and 3rd zero mean that a1=b1=0, and the 2nd zero means a2=b2 and they are not zero (otherwise the 4th slack would also be 0). This is again expected as [0, 0.5] is the solution.
What am I doing wrong? Is this a bug? (using scipy 0.15.1)
Thanks!
Apparently you are running into a bug that has been fixed since version 0.15.1.
When I run your code with scipy 0.18.0, I get:
In [3]: import scipy.optimize
In [4]: %paste
A = np.array([1,2]).reshape([1,2])
b_eq = np.array([1])
ones = np.ones([2,])
zeros = np.zeros([2,])
zerosm = np.zeros([1, 2])
eye = np.eye(2)
c = np.hstack([zeros, ones])
A_ub = np.vstack([np.hstack([eye, -eye]), np.hstack([-eye, -eye])])
b_ub = np.hstack([zeros, zeros])
A_eq = np.hstack([A, zerosm])
res = scipy.optimize.linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=(None, None),
A_eq=A_eq, b_eq=b_eq)
## -- End pasted text --
In [5]: res
Out[5]:
fun: 0.5
message: 'Optimization terminated successfully.'
nit: 4
slack: array([ 0., 0., 0., 1.])
status: 0
success: True
x: array([ 0. , 0.5, 0. , 0.5])
I try to solve your problem with scipy 1.3.1 and it works well:
con: array([ 7.07545134e-12])
fun: 0.49999999999882094
message: 'Optimization terminated successfully.'
nit: 4
slack: array([ 1.07107656e-11, -4.48552306e-12, -4.75552930e-12,
1.00000000e+00])
status: 0
success: True
x: array([ -7.73314746e-12, 5.00000000e-01, 2.97761815e-12,
5.00000000e-01])