Kalman Filter (pykalman): Value for obs_covariance and model without intercept - transition

I am looking at the KalmanFilter from pykalman shown in examples:
pykalman documentation
Example 1
Example 2
and I am wondering
observation_covariance=100,
vs
observation_covariance=1,
the documentation states
observation_covariance R: e(t)^2 ~ Gaussian (0, R)
How should the value be set here correctly?
Additionally, is it possible to apply the Kalman filter without intercept in the above module?

The observation covariance shows how much error you assume to be in your input data. Kalman filter works fine on normally distributed data. Under this assumption you can use the 3-Sigma rule to calculate the covariance (in this case the variance) of your observation based on the maximum error in the observation.
The values in your question can be interpreted as follows:
Example 1
observation_covariance = 100
sigma = sqrt(observation_covariance) = 10
max_error = 3*sigma = 30
Example 2
observation_covariance = 1
sigma = sqrt(observation_covariance) = 1
max_error = 3*sigma = 3
So you need to choose the value based on your observation data. The more accurate the observation, the smaller the observation covariance.
Another point: you can tune your filter by manipulating the covariance, but I think it's not a good idea. The higher the observation covariance value the weaker impact a new observation has on the filter state.
Sorry, I did not understand the second part of your question (about the Kalman Filter without intercept). Could you please explain what you mean?
You are trying to use a regression model and both intercept and slope belong to it.
---------------------------
UPDATE
I prepared some code and plots to answer your questions in details. I used EWC and EWA historical data to stay close to the original article.
First of all here is the code (pretty the same one as in the examples above but with a different notation)
from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
# reading data (quick and dirty)
Datum=[]
EWA=[]
EWC=[]
for line in open('data/dataset.csv'):
f1, f2, f3 = line.split(';')
Datum.append(f1)
EWA.append(float(f2))
EWC.append(float(f3))
n = len(Datum)
# Filter Configuration
# both slope and intercept have to be estimated
# transition_matrix
F = np.eye(2) # identity matrix because x_(k+1) = x_(k) + noise
# observation_matrix
# H_k = [EWA_k 1]
H = np.vstack([np.matrix(EWA), np.ones((1, n))]).T[:, np.newaxis]
# transition_covariance
Q = [[1e-4, 0],
[ 0, 1e-4]]
# observation_covariance
R = 1 # max error = 3
# initial_state_mean
X0 = [0,
0]
# initial_state_covariance
P0 = [[ 1, 0],
[ 0, 1]]
# Kalman-Filter initialization
kf = KalmanFilter(n_dim_obs=1, n_dim_state=2,
transition_matrices = F,
observation_matrices = H,
transition_covariance = Q,
observation_covariance = R,
initial_state_mean = X0,
initial_state_covariance = P0)
# Filtering
state_means, state_covs = kf.filter(EWC)
# Restore EWC based on EWA and estimated parameters
EWC_restored = np.multiply(EWA, state_means[:, 0]) + state_means[:, 1]
# Plots
plt.figure(1)
ax1 = plt.subplot(211)
plt.plot(state_means[:, 0], label="Slope")
plt.grid()
plt.legend(loc="upper left")
ax2 = plt.subplot(212)
plt.plot(state_means[:, 1], label="Intercept")
plt.grid()
plt.legend(loc="upper left")
# check the result
plt.figure(2)
plt.plot(EWC, label="EWC original")
plt.plot(EWC_restored, label="EWC restored")
plt.grid()
plt.legend(loc="upper left")
plt.show()
I could not retrieve data using pandas, so I downloaded them and read from the file.
Here you can see the estimated slope and intercept:
To test the estimated data I restored the EWC value from the EWA using the estimated parameters:
About the observation covariance value
By varying the observation covariance value you tell the Filter how accurate the input data is (normally you just describe your confidence in the observation using some datasheets or your knowledge about the system).
Here are estimated parameters and the restored EWC values using different observation covariance values:
You can see the filter follows the original function better with a bigger confidence in observation (smaller R). If the confidence is low (bigger R) the filter leaves the initial estimate (slope = 0, intercept = 0) very slowly and the restored function is far away from the original one.
About the frozen intercept
If you want to freeze the intercept for some reason, you need to change the whole model and all filter parameters.
In the normal case we had:
x = [slope; intercept] #estimation state
H = [EWA 1] #observation matrix
z = [EWC] #observation
Now we have:
x = [slope] #estimation state
H = [EWA] #observation matrix
z = [EWC-const_intercept] #observation
Results:
Here is the code:
from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
# only slope has to be estimated (it will be manipulated by the constant intercept) - mathematically incorrect!
const_intercept = 10
# reading data (quick and dirty)
Datum=[]
EWA=[]
EWC=[]
for line in open('data/dataset.csv'):
f1, f2, f3 = line.split(';')
Datum.append(f1)
EWA.append(float(f2))
EWC.append(float(f3))
n = len(Datum)
# Filter Configuration
# transition_matrix
F = 1 # identity matrix because x_(k+1) = x_(k) + noise
# observation_matrix
# H_k = [EWA_k]
H = np.matrix(EWA).T[:, np.newaxis]
# transition_covariance
Q = 1e-4
# observation_covariance
R = 1 # max error = 3
# initial_state_mean
X0 = 0
# initial_state_covariance
P0 = 1
# Kalman-Filter initialization
kf = KalmanFilter(n_dim_obs=1, n_dim_state=1,
transition_matrices = F,
observation_matrices = H,
transition_covariance = Q,
observation_covariance = R,
initial_state_mean = X0,
initial_state_covariance = P0)
# Creating the observation based on EWC and the constant intercept
z = EWC[:] # copy the list (not just assign the reference!)
z[:] = [x - const_intercept for x in z]
# Filtering
state_means, state_covs = kf.filter(z) # the estimation for the EWC data minus constant intercept
# Restore EWC based on EWA and estimated parameters
EWC_restored = np.multiply(EWA, state_means[:, 0]) + const_intercept
# Plots
plt.figure(1)
ax1 = plt.subplot(211)
plt.plot(state_means[:, 0], label="Slope")
plt.grid()
plt.legend(loc="upper left")
ax2 = plt.subplot(212)
plt.plot(const_intercept*np.ones((n, 1)), label="Intercept")
plt.grid()
plt.legend(loc="upper left")
# check the result
plt.figure(2)
plt.plot(EWC, label="EWC original")
plt.plot(EWC_restored, label="EWC restored")
plt.grid()
plt.legend(loc="upper left")
plt.show()

Related

How to run an exponential decay mixed model?

I am not familiar with nonlinear regression and would appreciate some help with running an exponential decay model in R. Please see the graph for how the data looks like. My hunch is that an exponential model might be a good choice. I have one fixed effect and one random effect. y ~ x + (1|random factor). How to get the starting values for the exponential model (please assume that I know nothing about nonlinear regression) in R? How do I subsequently run a nonlinear model with these starting values? Could anyone please help me with the logic as well as the R code?
As I am not familiar with nonlinear regression, I haven't been able to attempt it in R.
raw plot
The correct syntax will depend on your experimental design and model but I hope to give you a general idea on how to get started.
We begin by generating some data that should match the type of data you are working with. You had mentioned a fixed factor and a random one. Here, the fixed factor is represented by the variable treatment and the random factor is represented by the variable grouping_factor.
library(nlraa)
library(nlme)
library(ggplot2)
## Setting this seed should allow you to reach the same result as me
set.seed(3232333)
example_data <- expand.grid(treatment = c("A", "B"),
grouping_factor = c('1', '2', '3'),
replication = c(1, 2, 3),
xvar = 1:15)
The next step is to create some "observations". Here, we use an exponential function y=a∗exp(c∗x) and some random noise to create some data. Also, we add a constant to treatment A just to create some treatment differences.
example_data$y <- ave(example_data$xvar, example_data[, c('treatment', 'replication', 'grouping_factor')],
FUN = function(x) {expf(x = x,
a = 10,
c = -0.3) + rnorm(1, 0, 0.6)})
example_data$y[example_data$treatment == 'A'] <- example_data$y[example_data$treatment == 'A'] + 0.8
All right, now we start fitting the model.
## Create a grouped data frame
exampleG <- groupedData(y ~ xvar|grouping_factor, data = example_data)
## Fit a separate model to each groupped level
fitL <- nlsList(y ~ SSexpf(xvar, a, c), data = exampleG)
## Grab the coefficients of the general model
fxf <- fixed.effects(fit1)
## Add treatment as a fixed effect. Also, use the coeffients from the previous
## regression model as starting values.
fit2 <- update(fit1, fixed = a + c ~ treatment,
start = c(fxf[1], 0,
fxf[2], 0))
Looking at the model output, it will give you information like the following:
Nonlinear mixed-effects model fit by maximum likelihood
Model: y ~ SSexpf(xvar, a, c)
Data: exampleG
AIC BIC logLik
475.8632 504.6506 -229.9316
Random effects:
Formula: list(a ~ 1, c ~ 1)
Level: grouping_factor
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
a.(Intercept) 3.254827e-04 a.(In)
c.(Intercept) 1.248580e-06 0
Residual 5.670317e-01
Fixed effects: a + c ~ treatment
Value Std.Error DF t-value p-value
a.(Intercept) 9.634383 0.2189967 264 43.99329 0.0000
a.treatmentB 0.353342 0.3621573 264 0.97566 0.3301
c.(Intercept) -0.204848 0.0060642 264 -33.77976 0.0000
c.treatmentB -0.092138 0.0120463 264 -7.64867 0.0000
Correlation:
a.(In) a.trtB c.(In)
a.treatmentB -0.605
c.(Intercept) -0.785 0.475
c.treatmentB 0.395 -0.792 -0.503
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.93208903 -0.34340037 0.04767133 0.78924247 1.95516431
Number of Observations: 270
Number of Groups: 3
Then, if you wanted to visualize the model fit, you could do the following.
## Here we store the model predictions for visualization purposes
predictionsDf <- cbind(example_data,
predict_nlme(fit2, interval = 'conf'))
## Here we make a graph to check it out
ggplot()+
geom_ribbon(data = predictionsDf,
aes( x = xvar , ymin = Q2.5, ymax = Q97.5, fill = treatment),
color = NA, alpha = 0.3)+
geom_point(data = example_data, aes( x = xvar, y = y, col = treatment))+
geom_line(data = predictionsDf, aes(x = xvar, y = Estimate, col = treatment), size = 1.1)
This shows the model fit.

I want to use Numpy to simulate the inference process of a quantized MobileNet V2 network, but the outcome is different with pytorch realized one

Python version: 3.8
Pytorch version: 1.9.0+cpu
Platform: Anaconda Spyder5.0
To reproduce this problem, just copy every code below to a single file.
The ILSVRC2012_val_00000293.jpg file used in this code is shown below, you also need to download it and then change its destination in the code.
Some background of this problem:
I am now working on a project that aims to develop a hardware accelerator to complete the inference process of the MobileNet V2 network. I used pretrained quantized Pytorch model to simulate the outcome, and the result comes out very well.
In order to use hardware to complete this task, I wish to know every inputs and outputs as well as intermidiate variables during runing this piece of pytorch code. I used a package named torchextractor to fetch the outcomes of first layer, which in this case, is a 3*3 convolution layer.
import numpy as np
import torchvision
import torch
from torchvision import transforms, datasets
from PIL import Image
from torchvision import transforms
import torchextractor as tx
import math
#########################################################################################
##### Processing of input image
#########################################################################################
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
test_transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,])
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
#image file destination
filename = "D:\Project_UM\MobileNet_VC709\MobileNet_pytorch\ILSVRC2012_val_00000293.jpg"
input_image = Image.open(filename)
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0)
#########################################################################################
#########################################################################################
#########################################################################################
#----First verify that the torchextractor class should not influent the inference outcome
# ofmp of layer1 before putting into torchextractor
a,b,c = quantize_tensor(input_batch)# to quantize the input tensor and return an int8 tensor, scale and zero point
input_qa = torch.quantize_per_tensor(torch.tensor(input_batch.clone().detach()), b, c, torch.quint8)# Using quantize_per_tensor method of torch
# Load a quantized mobilenet_v2 model
model_quantized = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True)
model_quantized.eval()
with torch.no_grad():
output = model_quantized.features[0][0](input_qa)# Ofmp of layer1, datatype : quantized_tensor
# print("FM of layer1 before tx_extractor:\n",output.int_repr())# Ofmp of layer1, datatype : int8 tensor
output1_clone = output.int_repr().detach().numpy()# Clone ofmp of layer1, datatype : ndarray
#########################################################################################
#########################################################################################
#########################################################################################
# ofmp of layer1 after adding torchextractor
model_quantized_ex = tx.Extractor(model_quantized, ["features.0.0"])#Capture of the module inside first layer
model_output, features = model_quantized_ex(input_batch)# Forward propagation
# feature_shapes = {name: f.shape for name, f in features.items()}
# print(features['features.0.0']) # Ofmp of layer1, datatype : quantized_tensor
out1_clone = features['features.0.0'].int_repr().numpy() # Clone ofmp of layer1, datatype : ndarray
if(out1_clone.all() == output1_clone.all()):
print('Model with torchextractor attached output the same value as the original model')
else:
print('Torchextractor method influence the outcome')
Here I define a numpy quantization scheme based on the quantization scheme proposed by
Quantization and Training of Neural Networks for Efficient
Integer-Arithmetic-Only Inference
# Convert a normal regular tensor to a quantized tensor with scale and zero_point
def quantize_tensor(x, num_bits=8):# to quantize the input tensor and return an int8 tensor, scale and zero point
qmin = 0.
qmax = 2.**num_bits - 1.
min_val, max_val = x.min(), x.max()
scale = (max_val - min_val) / (qmax - qmin)
initial_zero_point = qmin - min_val / scale
zero_point = 0
if initial_zero_point < qmin:
zero_point = qmin
elif initial_zero_point > qmax:
zero_point = qmax
else:
zero_point = initial_zero_point
# print(zero_point)
zero_point = int(zero_point)
q_x = zero_point + x / scale
q_x.clamp_(qmin, qmax).round_()
q_x = q_x.round().byte()
return q_x, scale, zero_point
#%%
# #############################################################################################
# --------- Simulate the inference process of layer0: conv33 using numpy
# #############################################################################################
# get the input_batch quantized buffer data
input_scale = b.item()
input_zero = c
input_quantized = a[0].detach().numpy()
# get the layer0 output scale and zero_point
output_scale = model_quantized.features[0][0].state_dict()['scale'].item()
output_zero = model_quantized.features[0][0].state_dict()['zero_point'].item()
# get the quantized weight with scale and zero_point
weight_scale = model_quantized.features[0][0].state_dict()["weight"].q_scale()
weight_zero = model_quantized.features[0][0].state_dict()["weight"].q_zero_point()
weight_quantized = model_quantized.features[0][0].state_dict()["weight"].int_repr().numpy()
# print(weight_quantized)
# print(weight_quantized.shape)
# bias_quantized,bias_scale,bias_zero= quantize_tensor(model_quantized.features[0][0].state_dict()["bias"])# to quantize the input tensor and return an int8 tensor, scale and zero point
# print(bias_quantized.shape)
bias = model_quantized.features[0][0].state_dict()["bias"].detach().numpy()
# print(input_quantized)
print(type(input_scale))
print(type(output_scale))
print(type(weight_scale))
Then I write a quantized 2D convolution using numpy, hope to figure out every details in pytorch data flow during the inference.
#%% numpy simulated layer0 convolution function define
def conv_cal(input_quantized, weight_quantized, kernel_size, stride, out_i, out_j, out_k):
weight = weight_quantized[out_i]
input = np.zeros((input_quantized.shape[0], kernel_size, kernel_size))
for i in range(weight.shape[0]):
for j in range(weight.shape[1]):
for k in range(weight.shape[2]):
input[i][j][k] = input_quantized[i][stride*out_j+j][stride*out_k+k]
# print(np.dot(weight,input))
# print(input,"\n")
# print(weight)
return np.multiply(weight,input).sum()
def QuantizedConv2D(input_scale, input_zero, input_quantized, output_scale, output_zero, weight_scale, weight_zero, weight_quantized, bias, kernel_size, stride, padding, ofm_size):
output = np.zeros((weight_quantized.shape[0],ofm_size,ofm_size))
input_quantized_padding = np.full((input_quantized.shape[0],input_quantized.shape[1]+2*padding,input_quantized.shape[2]+2*padding),0)
zero_temp = np.full(input_quantized.shape,input_zero)
input_quantized = input_quantized - zero_temp
for i in range(input_quantized.shape[0]):
for j in range(padding,padding + input_quantized.shape[1]):
for k in range(padding,padding + input_quantized.shape[2]):
input_quantized_padding[i][j][k] = input_quantized[i][j-padding][k-padding]
zero_temp = np.full(weight_quantized.shape, weight_zero)
weight_quantized = weight_quantized - zero_temp
for i in range(output.shape[0]):
for j in range(output.shape[1]):
for k in range(output.shape[2]):
# output[i][j][k] = (weight_scale*input_scale)*conv_cal(input_quantized_padding, weight_quantized, kernel_size, stride, i, j, k) + bias[i] #floating_output
output[i][j][k] = weight_scale*input_scale/output_scale*conv_cal(input_quantized_padding, weight_quantized, kernel_size, stride, i, j, k) + bias[i]/output_scale + output_zero
output[i][j][k] = round(output[i][j][k])
# int_output
return output
Here I input the same image, weight, and bias together with their zero_point and scale, then compare this "numpy simulated" result to the PyTorch calculated one.
quantized_model_out1_int8 = np.squeeze(features['features.0.0'].int_repr().numpy())
print(quantized_model_out1_int8.shape)
print(quantized_model_out1_int8)
out1_np = QuantizedConv2D(input_scale, input_zero, input_quantized, output_scale, output_zero, weight_scale, weight_zero, weight_quantized, bias, 3, 2, 1, 112)
np.save("out1_np.npy",out1_np)
for i in range(quantized_model_out1_int8.shape[0]):
for j in range(quantized_model_out1_int8.shape[1]):
for k in range(quantized_model_out1_int8.shape[2]):
if(out1_np[i][j][k] < 0):
out1_np[i][j][k] = 0
print(out1_np)
flag = np.zeros(quantized_model_out1_int8.shape)
for i in range(quantized_model_out1_int8.shape[0]):
for j in range(quantized_model_out1_int8.shape[1]):
for k in range(quantized_model_out1_int8.shape[2]):
if(quantized_model_out1_int8[i][j][k] == out1_np[i][j][k]):
flag[i][j][k] = 1
out1_np[i][j][k] = 0
quantized_model_out1_int8[i][j][k] = 0
# Compare the simulated result to extractor fetched result, gain the total hit rate
print(flag.sum()/(112*112*32)*100,'%')
If the "numpy simulated" results are the same as the extracted one, call it a hit. Print the total hit rate, it shows that numpy gets 92% of the values right. Now the problem is, I have no idea why the rest 8% of values come out wrong.
Comparison of two outcomes:
The picture below shows the different values between Numpy one and PyTorch one, the sample channel is index[1]. The left upper corner is Numpy one, and the upright corner is PyTorch one, I have set all values that are the same between them to 0, as you can see, most of the values just have a difference of 1(This can be view as the error brought by the precision loss of fixed point arithmetics), but some have large differences, e.g. the value[1][4], 121 vs. 76 (I don't know why)
Focus on one strange value:
This code is used to replay the calculation process of the value[1][4], originally I was expecting a trial and error process could lead me to solve this problem, to get my wanted number of 76, but no matter how I tried, it didn't output 76. If you want to try this, I paste this code for your convenience.
#%% A test code to check the calculation process
weight_quantized_sample = weight_quantized[2]
M_t = input_scale * weight_scale / output_scale
ifmap_t = np.int32(input_quantized[:,1:4,7:10])
weight_t = np.int32(weight_quantized_sample)
bias_t = bias[2]
bias_q = bias_t/output_scale
res_t = 0
for ch in range(3):
ifmap_offset = ifmap_t[ch]-np.int32(input_zero)
weight_offset = weight_t[ch]-np.int32(weight_zero)
res_ch = np.multiply(ifmap_offset, weight_offset)
res_ch = res_ch.sum()
res_t = res_t + res_ch
res_mul = M_t*res_t
# for n in range(1, 30):
# res_mul = multiply(n, M_t, res_t)
res_t = round(res_mul + output_zero + bias_q)
print(res_t)
Could you help me out of this, have been stuck here for a long time.
I implemented my own version of quantized convolution and got from 99.999% to 100% hitrate (and mismatch of a single value is by 1 that I can consider to be a rounding issue). The link on the paper in the question helped a lot.
But I found that your formulas are the same as mine. So I don't know what was your issue. As I understand quantization in pytorch is hardware dependent.
Here is my code:
def my_Conv2dRelu_b2(input_q, conv_layer, output_shape):
'''
Args:
input_q: quantized tensor
conv_layer: quantized tensor
output_shape: the pre-computed shape of the result
Returns:
'''
output = np.zeros(output_shape)
# extract needed float numbers from quantized operations
weights_scale = conv_layer.weight().q_per_channel_scales()
input_scale = input_q.q_scale()
weights_zp = conv_layer.weight().q_per_channel_zero_points()
input_zp = input_q.q_zero_point()
# extract needed convolution parameters
padding = conv_layer.padding
stride = conv_layer.stride
# extract float numbers for results
output_zp = conv_layer.zero_point
output_scale = conv_layer.scale
conv_weights_int = conv_layer.weight().int_repr()
input_int = input_q.int_repr()
biases = conv_layer.bias().numpy()
for k in range(input_q.shape[0]):
for i in range(conv_weights_int.shape[0]):
output[k][i] = manual_convolution_quant(
input_int[k].numpy(),
conv_weights_int[i].numpy(),
biases[i],
padding=padding,
stride=stride,
image_zp=input_zp, image_scale=input_scale,
kernel_zp=weights_zp[i].item(), kernel_scale=weights_scale[i].item(),
result_zp=output_zp, result_scale=output_scale
)
return output
def manual_convolution_quant(image, kernel, b, padding, stride, image_zp, image_scale, kernel_zp, kernel_scale,
result_zp, result_scale):
H = image.shape[1]
W = image.shape[2]
new_H = H // stride[0]
new_W = W // stride[1]
results = np.zeros([new_H, new_W])
M = image_scale * kernel_scale / result_scale
bias = b / result_scale
paddedIm = np.pad(
image,
[(0, 0), (padding[0], padding[0]), (padding[1], padding[1])],
mode="constant",
constant_values=image_zp,
)
s = kernel.shape[1]
for i in range(new_H):
for j in range(new_W):
patch = paddedIm[
:, i * stride[0]: i * stride[0] + s, j * stride[1]: j * stride[1] + s
]
res = M * ((kernel - kernel_zp) * (patch - image_zp)).sum() + result_zp + bias
if res < 0:
res = 0
results[i, j] = round(res)
return results
Code to compare pytorch and my own version.
def calc_hit_rate(array1, array2):
good = (array1 == array2).astype(np.int).sum()
all = array1.size
return good / all
# during inference
y2 = model.conv1(y1)
y2_int = torch.int_repr(y2)
y2_int_manual = my_Conv2dRelu_b2(y1, model.conv1, y2.shape)
print(f'y2 hit rate= {calc_hit_rate(y2.int_repr().numpy(), y2_int_manual)}') #hit_rate=1.0

GPFlow multiple independent realizations of same GP, irregular sampling times/lengths

In GPflow I have multiple time series and the sampling times are not aligned across time series, and the time series may have different length (longitudinal data). I assume that they are independent realizations from the same GP. What is the right way to handle this with svgp, and more generally with GPflow? Do i need to use coregionalization? The coregionalization notebook assumed correlated trajectories, while I want shared mean/kernel but independent.
Yes, the Coregion kernel implemented in GPflow is what you can use for your problem.
Let's set up some data from the generative model you describe, with different lengths for the timeseries:
import numpy as np
import gpflow
import matplotlib.pyplot as plt
Ns = [80, 90, 100] # number of observations for three different realizations
Xs = [np.random.uniform(0, 10, size=N) for N in Ns] # observation locations
# three different draws from the same GP:
k = gpflow.kernels.Matern52(variance=2.0, lengthscales=0.5) # kernel
Ks = [k(X[:, None]) for X in Xs]
Ls = [np.linalg.cholesky(K) for K in Ks]
vs = [np.random.randn(N, 1) for N in Ns]
fs = [(L # v).squeeze(axis=-1) for L, v in zip(Ls, vs)]
To actually set up the training data for the gpflow GP model:
# output indicator for the observations: which timeseries is this?
os = [o * np.ones(N) for o, N in enumerate(Ns)] # [0 ... 0, 1 ... 1, 2 ... 2]
# now assemble the three timeseries in single data set:
allX = np.concatenate(Xs)
allo = np.concatenate(os)
allf = np.concatenate(fs)
X = np.c_[allX, allo]
Y = allf[:, None]
assert X.shape == (sum(Ns), 2)
assert Y.shape == (sum(Ns), 1)
# now let's set up a copy of the original kernel:
k2 = gpflow.kernels.Matern52(active_dims=[0]) # the same as k above, but with different hyperparameters
# and a Coregionalization kernel that effectively says they are all independent:
kc = gpflow.kernels.Coregion(output_dim=len(Ns), rank=1, active_dims=[1])
kc.W.assign(np.zeros(kc.W.shape))
kc.kappa.assign(np.ones(kc.kappa.shape))
gpflow.set_trainable(kc, False) # we want W and kappa fixed
The Coregion kernel defines a covariance matrix B = W Wᵀ + diag(kappa), so by setting W=0 we prescribe zero correlations (independent realizations) and kappa=1 (actually the default) ensures that the variance hyperparameter of the copy of the original kernel remains interpretable.
Now construct the actual model and optimize hyperparameters:
k2c = k2 * kc
m = gpflow.models.GPR((X, Y), k2c, noise_variance=1e-5)
opt = gpflow.optimizers.Scipy()
opt.minimize(m.training_loss, m.trainable_variables, compile=False)
which recovers the initial variance and lengthscale hyperparameters pretty well.
If you want to predict, you have to provide the extra "output" column in the Xnew argument to m.predict_f(), e.g. as follows:
Xtest = np.linspace(0, 10, 100)
Xtest_augmented = np.c_[Xtest, np.zeros_like(Xtest)]
f_mean, f_var = m.predict_f(Xtest_augmented)
(whether you set the output column to 0, 1, or 2 does not matter, as we set them all to be the same with our choice of W and kappa).
If your input was more than one-dimensional, you could set
active_dims=list(range(X.shape[1] - 1)) for the first kernel(s) and active_dims=[X.shape[1]-1] for the Coregion kernel.

Split dataset to test and train MATLAB [duplicate]

This question already has an answer here:
Matlab: How can I split my data matrix into two random subsets of column vectors while keeping the label information?
(1 answer)
Closed 5 years ago.
I want to split a very large dataset that I have (over one million observations) into a test and train set. As, you can see I have already managed to perform something similar in the code bellow with the use of dividerand.
What the code does is we have a very large set X, on every iteration we select N=1700 variables and then I split them in a ratio 7/3 - train/test. But, what I would further like to do though is instead of using %'s with the dividerand to use specific values. For instance, split the data into mini-batches with size 2000, and then use 500 for test and 1500 for training. Again, in the next loop we will select the data (2001:4000) and split them in 500 test and 1500 train etc.
Again, dividerand allows to do that with ratios but I would like to use actual values.
X = randn(10000,9);
mu_6 = zeros(510,613); % 390/802 - 450/695 - 510/613 - Test/Iterations
s2_6 = zeros(510,613);
nl6 = zeros(613,1);
RSME6 = zeros(613,1);
prev_batch = 0;
inf = #infGaussLik;
meanfunc = []; % empty: don't use a mean function
covfunc = #covSEiso; % Squared Exponential covariance
likfunc = #likGauss; % Gaussian likelihood
for k=1:613
new_batch = k*1700;
X_batch = X(1+prev_batch:new_batch,:);
[train,~,test] = dividerand(transpose(X_batch),0.7,0,0.3);
train = transpose(train);
test = transpose(test);
x_t = train(:,1:8); % Train batch we get 910 values
y_t = train(:,9);
x_z = test(:,1:8); % Test batch we get 390 values
y_z = test(:,9);
% Calculations for Gaussian process regression
if k==1
hyp = struct('mean', [], 'cov', [0 0], 'lik', -1);
else
hyp = hyp2;
end
hyp2 = minimize(hyp, #gp, -100, inf, meanfunc, covfunc, likfunc, x_t, y_t);
[m4 s4] = gp(hyp2, inf, meanfunc, covfunc, likfunc, x_t, y_t, x_z);
[nlZ4,dnlZ4] = gp(hyp2, inf, meanfunc, covfunc, likfunc, x_t, y_t);
RSME6(k,1) = sqrt(sum(((m4-y_z).^2))/450);
nl6(k,1) = nlZ4;
mu_6(:,k) = m4;
s2_6(:,k) = s4;
% End of calculations
prev_batch = new_batch;
disp(k);
end
How about:
[~, idx] = sort([randn(2000,1)]);
group1_idx = idx(1:1500);
group2_idx = idx(1501:end);

BVP4c solve for unknown boundary

I am trying to use bvp4c to solve a system of 4 odes. The issue is that one of the boundaries is unknown.
Can bvp4c handle this? In my code L is the unknown I am solving for.
I get an error message printed below.
function mat4bvp
L = 8;
solinit = bvpinit(linspace(0,L,100),#mat4init);
sol = bvp4c(#mat4ode,#mat4bc,solinit);
sint = linspace(0,L);
Sxint = deval(sol,sint);
end
% ------------------------------------------------------------
function dtdpdxdy = mat4ode(s,y,L)
Lambda = 0.3536;
dtdpdxdy = [y(2)
-sin(y(1)) + Lambda*(L-s)*cos(y(1))
cos(y(1))
sin(y(1))];
end
% ------------------------------------------------------------
function res = mat4bc(ya,yb,L)
res = [ ya(1)
ya(2)
ya(3)
ya(4)
yb(1)];
end
% ------------------------------------------------------------
function yinit = mat4init(s)
yinit = [ cos(s)
0
0
0
];
end
Unfortunately I get the following error message ;
>> mat4bvp
Not enough input arguments.
Error in mat4bvp>mat4ode (line 13)
-sin(y(1)) + Lambda*(L-s)*cos(y(1))
Error in bvparguments (line 105)
testODE = ode(x1,y1,odeExtras{:});
Error in bvp4c (line 130)
bvparguments(solver_name,ode,bc,solinit,options,varargin);
Error in mat4bvp (line 4)
sol = bvp4c(#mat4ode,#mat4bc,solinit);
One trick to transform a variable end point into a fixed one is to change the time scale. If x'(t)=f(t,x(t)) is the differential equation, set t=L*s, s from 0 to 1, and compute the associated differential equation for y(s)=x(L*s)
y'(s)=L*x'(L*s)=L*f(L*s,y(s))
The next trick to employ is to transform the global variable into a part of the differential equation by computing it as constant function. So the new system is
[ y'(s), L'(s) ] = [ L(s)*f(L(s)*s,y(s)), 0 ]
and the value of L occurs as additional free left or right boundary value, increasing the number of variables = dimension of the state vector to the number of boundary conditions.
I do not have Matlab readily available, in Python with the tools in scipy this can be implemented as
from math import sin, cos
import numpy as np
from scipy.integrate import solve_bvp, odeint
import matplotlib.pyplot as plt
# The original function with the interval length as parameter
def fun0(t, y, L):
Lambda = 0.3536;
#print t,y,L
return np.array([ y[1], -np.sin(y[0]) + Lambda*(L-t)*np.cos(y[0]), np.cos(y[0]), np.sin(y[0]) ]);
# Wrapper function to apply both tricks to transform variable interval length to a fixed interval.
def fun1(s,y):
L = y[-1];
dydt = np.zeros_like(y);
dydt[:-1] = L*fun0(L*s, y[:-1], L);
return dydt;
# Implement evaluation of the boundary condition residuals:
def bc(ya, yb):
return [ ya[0],ya[1], ya[2], ya[3], yb[0] ];
# Define the initial mesh with 5 nodes:
x = np.linspace(0, 1, 3)
# This problem has multiple solutions. Try two initial guesses.
L_a=8
L_b=9
y_a = odeint(lambda y,t: fun1(t,y), [0,0,0,0,L_a], x)
y_b = odeint(lambda y,t: fun1(t,y), [0,0,0,0,L_b], x)
# Now we are ready to run the solver.
res_a = solve_bvp(fun1, bc, x, y_a.T)
res_b = solve_bvp(fun1, bc, x, y_b.T)
L_a = res_a.sol(0)[-1]
L_b = res_b.sol(0)[-1]
print "L_a=%.8f, L_b=%.8f" % ( L_a,L_b )
# Plot the two found solutions. The solution are in a spline form, use this to produce a smooth plot.
x_plot = np.linspace(0, 1, 100)
y_plot_a = res_a.sol(x_plot)[0]
y_plot_b = res_b.sol(x_plot)[0]
plt.plot(L_a*x_plot, y_plot_a, label='L=%.8f'%L_a)
plt.plot(L_b*x_plot, y_plot_b, label='L=%.8f'%L_b)
plt.legend()
plt.xlabel("t")
plt.ylabel("y")
plt.grid(); plt.show()
which produces
Trying different initial values for L finds other solutions on quite different scales, among them
L=0.03195111
L=0.05256775
L=0.05846539
L=0.06888907
L=0.08231966
L=4.50411522
L=6.84868060
L=20.01725616
L=22.53189063