Multi-output GP with multi-inputs? - gpflow

I am trying to implement a multi-output GP in GPFlow with multi-dimensional input data.
I have seen from this issue in GPflow that a multi-dimensional input is possible by 'define a multidimensional base kernel and then apply the coregion on top of that'.
I have written the following code, I know for isotopic data (all outputs are obtained) one can use something alternatively like described in this notebook but here as I need to try ICM so let's continue with the code below.
However, when I try running the following code:
from gpflow.gpr import GPR
import gpflow
import numpy as np
from gpflow.kernels import Coregion
def f(x):
def _y(_x):
function_sum = 0
for i in np.arange(0, len(_x) - 1):
function_sum += (1 - _x[i]) ** 2 + 100 * ((_x[i + 1] - _x[i] ** 2) ** 2)
return function_sum
return np.atleast_2d([_y(_x) for _x in (np.atleast_2d(x))]).T
isotropic_X = np.random.rand(100, 2) * 4 - 2
Y1 = f(isotropic_X)
Y2 = f(isotropic_X) + np.random.normal(loc=2000, size=(100,1))
Y3 = f(isotropic_X) + np.random.normal(loc=-2000, size=(100,1))
# a Coregionalization kernel. The base kernel is Matern, and acts on the first ([0]) data dimension.
# the 'Coregion' kernel indexes the outputs, and actos on the second ([1]) data dimension
k1 = gpflow.kernels.Matern32(2)
coreg = Coregion(1, output_dim=3, rank=1, active_dims=[3]) # gpflow.kernels.Coregion(2, output_dim=2, rank=1)
coreg.W = np.random.rand(3, 1)
kern = k1 * coreg
# Augment the time data with ones or zeros to indicate the required output dimension
X_augmented = np.vstack((np.hstack((isotropic_X, np.zeros(shape=(isotropic_X.shape[0], 1)))),
np.hstack((isotropic_X, np.ones(shape=(isotropic_X.shape[0], 1)))),
np.hstack((isotropic_X, 2 * np.ones(shape=(isotropic_X.shape[0], 1))))))
# Augment the Y data to indicate which likeloihood we should use
Y_augmented = np.vstack((np.hstack((Y1, np.zeros(shape=(Y1.shape[0], 1)))),
np.hstack((Y2, np.ones(shape=(Y2.shape[0], 1)))),
np.hstack((Y3, 2 * np.ones(shape=(Y3.shape[0], 1))))))
# now buld the GP model as normal
m = GPR(X_augmented, Y_augmented, kern=kern)
m.optimize()
print(m.predict_f(np.array([[0.2, 0.2, 0], [0.4, 0.4, 0]])))
It returns me something like:
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Traceback (most recent call last):
File "C:\Users\Administrator\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1356, in _do_call
return fn(*args)
File "C:\Users\Administrator\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\Administrator\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0] = 3 is not in [0, 3)
[[{{node name.build_likelihood/name.kern.K/name.kern.coregion.K/GatherV2}}]]
So my questions are:
- What is this problem and how to enable multi-output GP with multi-dimension input
- I didn't quite get the workflow of gpflow with coregion, from this multi-output gp slide, The ICM returns output GP from a additive form of a latent process $u$ sampled from a GP parameterized by its weight $W$. But in the gpflow notebook demo I can't see any latent process of that and the notebooks says 'The 'Coregion' kernel indexes the outputs, and acts on the last ([1]) data dimension (indices) of the augmented X values', which is quite different than the slides, I am really confused about these different descriptions, any hint on these?

The issue is simply with your offset indexing: the coregionalisation kernel should be
coreg = Coregion(input_dim=1, output_dim=3, rank=1, active_dims=[2])
Because active_dims=[2] indexes the third column.
Thanks for providing a fully reproducible example! I managed to run your code and succesfully optimize the model using a few steps of AdamOptimizer and then ScipyOptimizer, to a log-likelihood value of -2023.4.

Related

Precison issue with sigmoid activation function for Tensorflow/Keras 2.3.1

Assuming the following forward pass in a classic ANN
(Based on https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/):
Now let's use a sigmoid activation on that, I get:
So far so good, now let's check the result of this calculation in python:
1 / (1+ math.exp(-0.3775)) # ... = 0.5932699921071872, OK
However this is double precision and since Keras uses float32 let's calculate the same thing but with float32, I get:
w1 = np.array(0.15, dtype=np.float32)
i1 = np.array(0.05, dtype=np.float32)
w2 = np.array(0.2, dtype=np.float32)
i2 = np.array(0.1, dtype=np.float32)
b1 = np.array(0.35, dtype=np.float32)
k1 = np.array(1, dtype=np.float32)
k2= np.array(1, dtype=np.float32)
n1 = w1 * i1 + w2 * i2 + b1 * k1
np.array(k2 / (1 + np.exp(-n1)), dtype=np.float32) # --->array(0.59327, dtype=float32)
Ok so normally I should be able to get 0.59327 for our out_{h1} using Keras float32 framework.
Let's now try to get this result using Keras:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
import numpy as np
model = Sequential()
model.add(Dense(2,activation='sigmoid')) # Build only one layer (enough for example)
model.compile(optimizer = SGD(lr = 0.5),loss='mse') # Don't really care for this example, simply for compilation
input = np.array([[0.05, 0.10]]) # Set input, same as the ones provided in the example
output = np.array([[0.01, 0.99]]) # Don't really care for this example since we want to check activation only
weights = [np.array([[0.15, 0.20 ], # Required for set_weights
[0.25, 0.30]], dtype=np.float32),
np.array([0.35, 0.35], dtype=np.float32)]
model.build(input.shape) # Required for set_weights
model.set_weights(weights) # Set the weights to be the same as the one provided in the example
model.predict(np.array([[0.05, 0.10]], dtype=np.float32)) # This can be seen as out_{h1}
# array([[0.5944759 , 0.59628266]], dtype=float32)
# NOK: 0.5944759 != 0.59327
Can someone explain to me why I get 0.5944759 instead of 0.59327? The result seem far from the expected ouput and if possible provide an example of calculation and/or a way to get the expected output of 0.59327.
Please note this example was done using:
tensorflow 2.3.1
numpy 1.18.5
python 3.8.12
Thx for your help.
Tensorflow perform the Dense layer row to column:
It multiplies each row of your input with a column in the weights matrix, You need to transpose your weights matrix, if you do it you will get the correct result.
I validated using your code just modified:
weights = [np.array([[0.15, 0.20], # Required for set_weights
[0.2, 0.30]], dtype=np.float32),
np.array([0.35, 0.35], dtype=np.float32)]
To (pay attention to the transpose operator T:
weights = [np.array([[0.15, 0.20], # Required for set_weights
[0.2, 0.30]], dtype=np.float32).T,
np.array([0.35, 0.35], dtype=np.float32)]

GPFlow multiple independent realizations of same GP, irregular sampling times/lengths

In GPflow I have multiple time series and the sampling times are not aligned across time series, and the time series may have different length (longitudinal data). I assume that they are independent realizations from the same GP. What is the right way to handle this with svgp, and more generally with GPflow? Do i need to use coregionalization? The coregionalization notebook assumed correlated trajectories, while I want shared mean/kernel but independent.
Yes, the Coregion kernel implemented in GPflow is what you can use for your problem.
Let's set up some data from the generative model you describe, with different lengths for the timeseries:
import numpy as np
import gpflow
import matplotlib.pyplot as plt
Ns = [80, 90, 100] # number of observations for three different realizations
Xs = [np.random.uniform(0, 10, size=N) for N in Ns] # observation locations
# three different draws from the same GP:
k = gpflow.kernels.Matern52(variance=2.0, lengthscales=0.5) # kernel
Ks = [k(X[:, None]) for X in Xs]
Ls = [np.linalg.cholesky(K) for K in Ks]
vs = [np.random.randn(N, 1) for N in Ns]
fs = [(L # v).squeeze(axis=-1) for L, v in zip(Ls, vs)]
To actually set up the training data for the gpflow GP model:
# output indicator for the observations: which timeseries is this?
os = [o * np.ones(N) for o, N in enumerate(Ns)] # [0 ... 0, 1 ... 1, 2 ... 2]
# now assemble the three timeseries in single data set:
allX = np.concatenate(Xs)
allo = np.concatenate(os)
allf = np.concatenate(fs)
X = np.c_[allX, allo]
Y = allf[:, None]
assert X.shape == (sum(Ns), 2)
assert Y.shape == (sum(Ns), 1)
# now let's set up a copy of the original kernel:
k2 = gpflow.kernels.Matern52(active_dims=[0]) # the same as k above, but with different hyperparameters
# and a Coregionalization kernel that effectively says they are all independent:
kc = gpflow.kernels.Coregion(output_dim=len(Ns), rank=1, active_dims=[1])
kc.W.assign(np.zeros(kc.W.shape))
kc.kappa.assign(np.ones(kc.kappa.shape))
gpflow.set_trainable(kc, False) # we want W and kappa fixed
The Coregion kernel defines a covariance matrix B = W Wᵀ + diag(kappa), so by setting W=0 we prescribe zero correlations (independent realizations) and kappa=1 (actually the default) ensures that the variance hyperparameter of the copy of the original kernel remains interpretable.
Now construct the actual model and optimize hyperparameters:
k2c = k2 * kc
m = gpflow.models.GPR((X, Y), k2c, noise_variance=1e-5)
opt = gpflow.optimizers.Scipy()
opt.minimize(m.training_loss, m.trainable_variables, compile=False)
which recovers the initial variance and lengthscale hyperparameters pretty well.
If you want to predict, you have to provide the extra "output" column in the Xnew argument to m.predict_f(), e.g. as follows:
Xtest = np.linspace(0, 10, 100)
Xtest_augmented = np.c_[Xtest, np.zeros_like(Xtest)]
f_mean, f_var = m.predict_f(Xtest_augmented)
(whether you set the output column to 0, 1, or 2 does not matter, as we set them all to be the same with our choice of W and kappa).
If your input was more than one-dimensional, you could set
active_dims=list(range(X.shape[1] - 1)) for the first kernel(s) and active_dims=[X.shape[1]-1] for the Coregion kernel.

Combining Matern and Periodic kernels in coregionalized regression

I am trying to build a multi dimensional GP regression, initially with two outputs f1(x), f2(x).
On one output is somewhat arbitrary, hence I would like to use a Matern kernel here: f1'(x)=K_Matern f1(x). The other output f2(x) shows seasonality where the amplitude is related to the value of f1(x): f2'(x)=K_season(x) f1(x). I have been trying to compose a suitable kernel by combining the two with a Coreg kernel:
K_Matern * Coreg * K_season. As this does not seem to work I was wondering where is the mistake in my thinking.
k1 = gpflow.kernels.Matern32(1, active_dims=[0], lengthscales = 1)
k2 = gpflow.kernels.Periodic(1, active_dims=[1], lengthscales = 1)
coreg = gpflow.kernels.Coregion(1, output_dim=2, rank=1, active_dims=[1])
kern = k1 * coreg * k2
lik = gpflow.likelihoods.SwitchedLikelihood([gpflow.likelihoods.StudentT(), gpflow.likelihoods.StudentT()])
X_augmented = np.vstack((np.hstack((X1, np.zeros_like(X1))), np.hstack((X2, np.ones_like(X2)))))
Y_augmented = np.vstack((np.hstack((Y1, np.zeros_like(X1))), np.hstack((Y2, np.ones_like(X2)))))
m = gpflow.models.VGP(X_augmented, Y_augmented, kern=kern, likelihood=lik, num_latent=1)
Your k2 is operating on the same dimension of X_augmented as coreg (active_dims=[1]) - which is the column that's either 0 or 1 depending on which output it relates to, clearly not what you want! If you want to use different kernels on different outputs, you need to use the multi-output framework, there is a multi-output notebook in the GPflow documentation. Specifically, you probably want a SeparateIndependentMok([Matern32(1), Periodic(1)]). Note that in this framework you need to provide all outputs for each input, and instead of augmenting X with the index-into-output, Y has one column per output (so N x 2 in your example).

Merging two tensors by convolution in Keras

I'm trying to convolve two 1D tensors in Keras.
I get two inputs from other models:
x - of length 100
ker - of length 5
I would like to get the 1D convolution of x using the kernel ker.
I wrote a Lambda layer to do it:
import tensorflow as tf
def convolve1d(x):
y = tf.nn.conv1d(value=x[0], filters=x[1], padding='VALID', stride=1)
return y
x = Input(shape=(100,))
ker = Input(shape=(5,))
y = Lambda(convolve1d)([x,ker])
model = Model([x,ker], [y])
I get the following error:
ValueError: Shape must be rank 4 but is rank 3 for 'lambda_67/conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,100], [1,?,5].
Can anyone help me understand how to fix it?
It was much harder than I expected because Keras and Tensorflow don't expect any batch dimension in the convolution kernel so I had to write the loop over the batch dimension myself, which requires to specify batch_shape instead of just shape in the Input layer. Here it is :
import numpy as np
import tensorflow as tf
import keras
from keras import backend as K
from keras import Input, Model
from keras.layers import Lambda
def convolve1d(x):
input, kernel = x
output_list = []
if K.image_data_format() == 'channels_last':
kernel = K.expand_dims(kernel, axis=-2)
else:
kernel = K.expand_dims(kernel, axis=0)
for i in range(batch_size): # Loop over batch dimension
output_temp = tf.nn.conv1d(value=input[i:i+1, :, :],
filters=kernel[i, :, :],
padding='VALID',
stride=1)
output_list.append(output_temp)
print(K.int_shape(output_temp))
return K.concatenate(output_list, axis=0)
batch_input_shape = (1, 100, 1)
batch_kernel_shape = (1, 5, 1)
x = Input(batch_shape=batch_input_shape)
ker = Input(batch_shape=batch_kernel_shape)
y = Lambda(convolve1d)([x,ker])
model = Model([x, ker], [y])
a = np.ones(batch_input_shape)
b = np.ones(batch_kernel_shape)
c = model.predict([a, b])
In the current state :
It doesn't work for inputs (x) with multiple channels.
If you provide several filters, you get as many outputs, each being the convolution of the input with the corresponding kernel.
From given code it is difficult to point out what you mean when you say
is it possible
But if what you mean is to merge two layers and feed merged layer to convulation, yes it is possible.
x = Input(shape=(100,))
ker = Input(shape=(5,))
merged = keras.layers.concatenate([x,ker], axis=-1)
y = K.conv1d(merged, 'same')
model = Model([x,ker], y)
EDIT:
#user2179331 thanks for clarifying your intention. Now you are using Lambda Class incorrectly, that is why the error message is showing.
But what you are trying to do can be achieved using keras.backend layers.
Though be noted that when using lower level layers you will lose some higher level abstraction. E.g when using keras.backend.conv1d you need to have input shape of (BATCH_SIZE,width, channels) and kernel with shape of (kernel_size,input_channels,output_channels). So in your case let as assume the x has channels of 1(input channels ==1) and y also have the same number of channels(output channels == 1).
So your code now can be refactored as follows
from keras import backend as K
def convolve1d(x,kernel):
y = K.conv1d(x,kernel, padding='valid', strides=1,data_format="channels_last")
return y
input_channels = 1
output_channels = 1
kernel_width = 5
input_width = 100
ker = K.variable(K.random_uniform([kernel_width,input_channels,output_channels]),K.floatx())
x = Input(shape=(input_width,input_channels)
y = convolve1d(x,ker)
I guess I have understood what you mean. Given the wrong example code below:
input_signal = Input(shape=(L), name='input_signal')
input_h = Input(shape=(N), name='input_h')
faded= Lambda(lambda x: tf.nn.conv1d(input, x))(input_h)
You want to convolute each signal vector with different fading coefficients vector.
The 'conv' operation in TensorFlow, etc. tf.nn.conv1d, only support a fixed value kernel. Therefore, the code above can not run as you want.
I have no idea, too. The code you given can run normally, however, it is too complex and not efficient. In my idea, another feasible but also inefficient way is to multiply with the Toeplitz matrix whose row vector is the shifted fading coefficients vector. When the signal vector is too long, the matrix will be extremely large.

How to use scipy signal for MIMO systems

I am looking for a way to simulate the output of a signal for various input signals. To be more precise, I have a system defined by its transfer function H that takes one input and has one output. I generated several signals (stored in a numpy array). What I would like to do, is get the response of the system, to each input signal whithout using a for loop. Is there a way to proceed? Below is the code I wrote so far.
from __future__ import division
import numpy as np
from scipy import signal
nbr_inputs = 5
t_in = np.arange(0,10,0.2)
dim = (nbr_inputs, len(t_in))
x = np.cumsum(np.random.normal(0,2e-3, dim), axis=1)
H = signal.TransferFunction([1, 3, 3], [1, 2, 1])
t_out, y, _ = signal.lsim(H, x[0], t_in) # here, I would just like to simply write x
thanks for your help
This is not a MIMO system, it is a SISO system but you have multiple inputs.
You can create a MIMO system and apply your inputs all at once which will be computed channel by channel but simultaneously. Moreover, you can't use scipy.signal.lsim for MIMO systems yet. You can use other options such as python-control (if you have slycot extension otherwise again no MIMO) or harold if you have Python 3.6 or greater (disclaimer: I'm the author).
import numpy as np
from harold import *
import matplotlib.pyplot
nbr_inputs = 5
t_in = np.arange(0,10,0.2)
dim = (nbr_inputs, len(t_in))
x = np.cumsum(np.random.normal(0,2e-3, dim), axis=1)
# Forming a 1x5 system, common denominator will be completed automatically
H = Transfer([[[1, 3, 3]]*nbr_inputs], [1, 2, 1])
The keyword per_channel=True applies first input to first channel, second input to second and so on. Otherwise combined response is returned. You can check the shapes by playing around with it to see what I mean.
# Notice it is x.T below -> input shape = <num samples>, <num inputs>
y, t = simulate_linear_system(H, x.T, t_in, per_channel=True)
plt.plot(t, y)
This gives