N-dimensional GP Regression - gpflow

I'm trying to use GPflow for a multidimensional regression. But I'm confused by the shapes of the mean and variance.
For example: A 2-dimensional input space X of shape (20,20) is supposed to be predicted. My training samples are of shape (8,2) which means 8 training samples overall for the two dimensions. The y-values are of shape (8,1) which of course means one value of the ground truth per combination of the 2 input dimensions.
If I now use model.predict_y(X) I would expect to receive a mean of shape (20,20) but obtain a shape of (20,1). Same goes for the variance. I think that this problem comes from the shape of the y-values but I have have no idea how to fix it.
bound = 3
num = 20
X = np.random.uniform(-bound, bound, (num,num))
print(X_sample.shape) # (8,2)
print(Y_sample.shape) # (8,1)
k = gpflow.kernels.RBF(input_dim=2)
m = gpflow.models.GPR(X_sample, Y_sample, kern=k)
m.likelihood.variance = sigma_n
m.compile()
gpflow.train.ScipyOptimizer().minimize(m)
mean, var = m.predict_y(X)
print(mean.shape) # (20, 1)
print(var.shape) # (20, 1)

It sounds like you may be confused between the shape of a grid of input positions and the shape of the numpy arrays: if you want to predict on a 20 x 20 grid in two dimensions, you have 400 points in total, each with 2 values. So X (the one that you pass to m.predict_y()) should have shape (400, 2). (Note that the second dimension needs to have the same shape as X_sample!)
To construct this array of shape (400,2) you can use np.meshgrid (e.g., see What is the purpose of meshgrid in Python / NumPy?).
m.predict_y(X) only predicts the marginal variance at each test point, so the returned mean and var both have shape (400,1) (same length as X). You can of course reshape them to the 20 x 20 values on your grid.
(It is also possible to compute the full covariance, for the latent f this is implemented as m.predict_f_full_cov, which for X of shape (400,2) would return a 400x400 matrix. This is relevant if you want consistent samples from the GP, but I suspect that goes well beyond this question.)

I was indeed making the mistake to not flatten the arrays which in return produced the mistake. Thank you for the fast response STJ!
Here is an example of the working code:
# Generate data
bound = 3.
x1 = np.linspace(-bound, bound, num)
x2 = np.linspace(-bound, bound, num)
x1_mesh,x2_mesh = np.meshgrid(x1, x2)
X = np.dstack([x1_mesh, x2_mesh]).reshape(-1, 2)
z = f(x1_mesh, x2_mesh) # evaluation of the function on the grid
# Draw samples from feature vectors and function by a given index
size = 2
np.random.seed(1991)
index = np.random.choice(range(len(x1)), size=(size,X.ndim), replace=False)
samples = utils.sampleFeature([x1,x2], index)
X1_sample = samples[0]
X2_sample = samples[1]
X_sample = np.column_stack((X1_sample, X2_sample))
Y_sample = utils.samplefromFunc(f=z, ind=index)
# Change noise parameter
sigma_n = 0.0
# Construct models with initial guess
k = gpflow.kernels.RBF(2,active_dims=[0,1], lengthscales=1.0,ARD=True)
m = gpflow.models.GPR(X_sample, Y_sample, kern=k)
m.likelihood.variance = sigma_n
m.compile()
#print(X.shape)
mean, var = m.predict_y(X)
mean_square = mean.reshape(x1_mesh.shape) # Shape: (num,num)
var_square = var.reshape(x1_mesh.shape) # Shape: (num,num)
# Plot mean
fig = plt.figure(figsize=(16, 12))
ax = plt.axes(projection='3d')
ax.plot_surface(x1_mesh, x2_mesh, mean_square, cmap=cm.viridis, linewidth=0.5, antialiased=True, alpha=0.8)
cbar = ax.contourf(x1_mesh, x2_mesh, mean_square, zdir='z', offset=offset, cmap=cm.viridis, antialiased=True)
ax.scatter3D(X1_sample, X2_sample, offset, marker='o',edgecolors='k', color='r', s=150)
fig.colorbar(cbar)
for t in ax.zaxis.get_major_ticks(): t.label.set_fontsize(fontsize_ticks)
ax.set_title("$\mu(x_1,x_2)$", fontsize=fontsize_title)
ax.set_xlabel("\n$x_1$", fontsize=fontsize_label)
ax.set_ylabel("\n$x_2$", fontsize=fontsize_label)
ax.set_zlabel('\n\n$\mu(x_1,x_2)$', fontsize=fontsize_label)
plt.xticks(fontsize=fontsize_ticks)
plt.yticks(fontsize=fontsize_ticks)
plt.xlim(left=-bound, right=bound)
plt.ylim(bottom=-bound, top=bound)
ax.set_zlim3d(offset,np.max(z))
which leads to (red dots are the sample points drawn from the function). Note: Code not refactored what so ever :)

Related

Merging two tensors by convolution in Keras

I'm trying to convolve two 1D tensors in Keras.
I get two inputs from other models:
x - of length 100
ker - of length 5
I would like to get the 1D convolution of x using the kernel ker.
I wrote a Lambda layer to do it:
import tensorflow as tf
def convolve1d(x):
y = tf.nn.conv1d(value=x[0], filters=x[1], padding='VALID', stride=1)
return y
x = Input(shape=(100,))
ker = Input(shape=(5,))
y = Lambda(convolve1d)([x,ker])
model = Model([x,ker], [y])
I get the following error:
ValueError: Shape must be rank 4 but is rank 3 for 'lambda_67/conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,100], [1,?,5].
Can anyone help me understand how to fix it?
It was much harder than I expected because Keras and Tensorflow don't expect any batch dimension in the convolution kernel so I had to write the loop over the batch dimension myself, which requires to specify batch_shape instead of just shape in the Input layer. Here it is :
import numpy as np
import tensorflow as tf
import keras
from keras import backend as K
from keras import Input, Model
from keras.layers import Lambda
def convolve1d(x):
input, kernel = x
output_list = []
if K.image_data_format() == 'channels_last':
kernel = K.expand_dims(kernel, axis=-2)
else:
kernel = K.expand_dims(kernel, axis=0)
for i in range(batch_size): # Loop over batch dimension
output_temp = tf.nn.conv1d(value=input[i:i+1, :, :],
filters=kernel[i, :, :],
padding='VALID',
stride=1)
output_list.append(output_temp)
print(K.int_shape(output_temp))
return K.concatenate(output_list, axis=0)
batch_input_shape = (1, 100, 1)
batch_kernel_shape = (1, 5, 1)
x = Input(batch_shape=batch_input_shape)
ker = Input(batch_shape=batch_kernel_shape)
y = Lambda(convolve1d)([x,ker])
model = Model([x, ker], [y])
a = np.ones(batch_input_shape)
b = np.ones(batch_kernel_shape)
c = model.predict([a, b])
In the current state :
It doesn't work for inputs (x) with multiple channels.
If you provide several filters, you get as many outputs, each being the convolution of the input with the corresponding kernel.
From given code it is difficult to point out what you mean when you say
is it possible
But if what you mean is to merge two layers and feed merged layer to convulation, yes it is possible.
x = Input(shape=(100,))
ker = Input(shape=(5,))
merged = keras.layers.concatenate([x,ker], axis=-1)
y = K.conv1d(merged, 'same')
model = Model([x,ker], y)
EDIT:
#user2179331 thanks for clarifying your intention. Now you are using Lambda Class incorrectly, that is why the error message is showing.
But what you are trying to do can be achieved using keras.backend layers.
Though be noted that when using lower level layers you will lose some higher level abstraction. E.g when using keras.backend.conv1d you need to have input shape of (BATCH_SIZE,width, channels) and kernel with shape of (kernel_size,input_channels,output_channels). So in your case let as assume the x has channels of 1(input channels ==1) and y also have the same number of channels(output channels == 1).
So your code now can be refactored as follows
from keras import backend as K
def convolve1d(x,kernel):
y = K.conv1d(x,kernel, padding='valid', strides=1,data_format="channels_last")
return y
input_channels = 1
output_channels = 1
kernel_width = 5
input_width = 100
ker = K.variable(K.random_uniform([kernel_width,input_channels,output_channels]),K.floatx())
x = Input(shape=(input_width,input_channels)
y = convolve1d(x,ker)
I guess I have understood what you mean. Given the wrong example code below:
input_signal = Input(shape=(L), name='input_signal')
input_h = Input(shape=(N), name='input_h')
faded= Lambda(lambda x: tf.nn.conv1d(input, x))(input_h)
You want to convolute each signal vector with different fading coefficients vector.
The 'conv' operation in TensorFlow, etc. tf.nn.conv1d, only support a fixed value kernel. Therefore, the code above can not run as you want.
I have no idea, too. The code you given can run normally, however, it is too complex and not efficient. In my idea, another feasible but also inefficient way is to multiply with the Toeplitz matrix whose row vector is the shifted fading coefficients vector. When the signal vector is too long, the matrix will be extremely large.

Interpn - changing output

I have 4 grids:
kgrid which is [77x1]
x which is [15x1]
z which is [9x1]
s which is [2x1]
Then I have a function:
kprime which is [77x15x9x2]
I want to interpolate kprime at some points ksim (750 x 1) and zsim (750 x 1) (xsim is a scalar). I am doing:
[ks, xs, zs, ss] = ndgrid(kgrid, x, z, [1;2]);
Output = interpn(ks, xs, zs, ss, kprime, ksim, xsim, zsim, 1,'linear');
The problem with this interpolation is that the output given is for all combinations of ksim and zsim, meaning that the output is 750x750. I actually need an output of 750x1, meaning that instead of interpolation at all combinations of ksim and zsim I only need to interpolate at ksim(1,1) and zsim(1,1), then ksim(2,1) and zsim(2,1), then ksim(3,1) and zsim(3,1), etc.
In other words, after getting Output I am doing:
Output = diag(squeeze(Output));
I know I can use this output and then just pick the numbers I want, but this is extremely inefficient as it actually interpolates on all other points which I actually do not need. Any help appreciated.
tl;dr: Change xsim and (ssim) from scalars to vectors of the same size as ksim and zsim
Output = interpn (ks, xs, zs, ss, ...
kprime, ...
ksim, ...
repmat(xsim, size(ksim)), ... % <-- here
zsim, ...
repmat(1, size(ksim)), ... % <-- and here
'linear');
Explanation:
The ksim, xsim, zsim, and ssim inputs all need to have the same shape, so that at each common position in that shape, each input acts as an "interpolated subscript" component to the interpolated object. Note that while they all need to have the same shape, this shape can be arbitrary in terms of size and dimensions.
Conversely, if you pass vectors of different sizes (after all, a scalar is a vector of length 1), these get interpreted as the components of an ndgrid construction instead. So you were actually telling interpn to evaluate all interpolations on a grid defined by the vectors ksim, and zsim (and your singletons xsim and ssim). Which is why you got a 2D-grid-looking output.
Note that the same scheme applies with the constructing vectors as well (i.e. ks, xs, zs and ss) i.e. you could have used "vector syntax" instead of "common shape" syntax to define the grid instead, i.e.
Output = interpn(kgrid, x, z, s, kprime, % ...etc etc
and you would have gotten the same result.
From the documents:
Query points, specified as a real scalars, vectors, or arrays.
If Xq1,Xq2,...,Xqn are scalars, then they are the coordinates of a single query point in Rn.
If Xq1,Xq2,...,Xqn are vectors of different orientations, then Xq1,Xq2,...,Xqn are treated as grid vectors in Rn.
If Xq1,Xq2,...,Xqn are vectors of the same size and orientation, then Xq1,Xq2,...,Xqn are treated as scattered points in Rn.
If Xq1,Xq2,...,Xqn are arrays of the same size, then they represent either a full grid of query points (in ndgrid format) or scattered points in Rn.
Answer
You want the usage highlighted in bold. As such, you have to make sure that xsim and ssim ('1' in your code sample) are of size 750x1 also. Then all the query vectors are same length and orientation, such that it can be recognized as a vector of scattered points in Rn. The output will then be a 750x1 vector as needed.
This is to elaborate on #tvo/#Tasos answers, to test the fastest way to create a vector from a scalar:
function create_vector(n)
x = 5;
repm_time = timeit(#()repm(x,n))
repe_time = timeit(#()repe(x,n))
vrep_time = timeit(#()vrep(x,n))
onesv_time = timeit(#()onesv(x,n))
end
function A = repm(x,n)
for k = 1:10000
A = repmat(x,[n 1]);
end
end
function A = repe(x,n)
for k = 1:10000
A = repelem(x,n).';
end
end
function A = vrep(x,n)
v = ones(n,1);
for k = 1:10000
A = x*v;
end
end
function A = onesv(x,n)
for k = 1:10000
A = x*ones(n,1);
end
end
And the results are (for n = 750):
repm_time =
0.049847
repe_time =
0.044188
vrep_time =
0.0041342
onesv_time =
0.0024869
which means that the fastest way to create a vector from a scalar is simply writing x*ones(n,1).

Plot portfolio composition map in Julia (or Matlab)

I am optimizing portfolio of N stocks over M levels of expected return. So after doing this I get the time series of weights (i.e. a N x M matrix where where each row is a combination of stock weights for a particular level of expected return). Weights add up to 1.
Now I want to plot something called portfolio composition map (right plot on the picture), which is a plot of these stock weights over all levels of expected return, each with a distinct color and length (at every level of return) is proportional to it's weight.
My questions is how to do this in Julia (or MATLAB)?
I came across this and the accepted solution seemed so complex. Here's how I would do it:
using Plots
#userplot PortfolioComposition
#recipe function f(pc::PortfolioComposition)
weights, returns = pc.args
weights = cumsum(weights,dims=2)
seriestype := :shape
for c=1:size(weights,2)
sx = vcat(weights[:,c], c==1 ? zeros(length(returns)) : reverse(weights[:,c-1]))
sy = vcat(returns, reverse(returns))
#series Shape(sx, sy)
end
end
# fake data
tickers = ["IBM", "Google", "Apple", "Intel"]
N = 10
D = length(tickers)
weights = rand(N,D)
weights ./= sum(weights, dims=2)
returns = sort!((1:N) + D*randn(N))
# plot it
portfoliocomposition(weights, returns, labels = tickers)
matplotlib has a pretty powerful polygon plotting capability, e.g. this link on plotting filled polygons:
ploting filled polygons in python
You can use this from Julia via the excellent PyPlot.jl package.
Note that the syntax for certain things changes; see the PyPlot.jl README and e.g. this set of examples.
You "just" need to calculate the coordinates from your matrix and build up a set of polygons to plot the portfolio composition graph. It would be nice to see the code if you get this working!
So I was able to draw it, and here's my code:
using PyPlot
using PyCall
#pyimport matplotlib.patches as patch
N = 10
D = 4
weights = Array(Float64, N,D)
for i in 1:N
w = rand(D)
w = w/sum(w)
weights[i,:] = w
end
weights = [zeros(Float64, N) weights]
weights = cumsum(weights,2)
returns = sort!([linspace(1,N, N);] + D*randn(N))
##########
# Plot #
##########
polygons = Array(PyObject, 4)
colors = ["red","blue","green","cyan"]
labels = ["IBM", "Google", "Apple", "Intel"]
fig, ax = subplots()
fig[:set_size_inches](5, 7)
title("Problem 2.5 part 2")
xlabel("Weights")
ylabel("Return (%)")
ax[:set_autoscale_on](false)
ax[:axis]([0,1,minimum(returns),maximum(returns)])
for i in 1:(size(weights,2)-1)
xy=[weights[:,i] returns;
reverse(weights[:,(i+1)]) reverse(returns)]
polygons[i] = matplotlib[:patches][:Polygon](xy, true, color=colors[i], label = labels[i])
ax[:add_artist](polygons[i])
end
legend(polygons, labels, bbox_to_anchor=(1.02, 1), loc=2, borderaxespad=0)
show()
# savefig("CompositionMap.png",bbox_inches="tight")
Can't say that this is the best way, to do this, but at least it is working.

correlation coefficient data driven approach possible?

I have a matrix 64x64x32x90 which stands for pixels at x,y,z, at time t.
I have a reference signal 1x90 which stands for the behavior I expect for a pixel at some point (x,y,z).
I am constructing a new image of the correlation between each pixel versus my reference.
load('DATA.mat');
ON = ones(1,10);
OFF = zeros(1,10);
taskRef = [OFF ON OFF ON OFF ON OFF ON OFF];
corrImage = zeros(64,64,36);
for i=1:64,
for j=1:63,
for k=1:36
signal = squeeze(DATA(i,j,k,:));
coef = corrcoef(signal',taskRef);
corrImage(i,j,k) = coef(2);
end
end
end
My process is too slow. Is there a way to get rid of my loops or adjust the code to have a better runtime?
Reshape your data so that its first three dimensions are collapsed into one (so now there are 64*64*32 rows and 90 columns).
Then use pdist2 (with 'correlation' option) to compute the correlation of each row with the expected pattern.
Finally, reshape result into the desired shape.
DATA2 = reshape(DATA, [],90);
corrImage = 1 - pdist2(DATA2, taskRef, 'correlation');
corrImage = reshape(corrImage, 64,64,32);

Solving a system of equations using Python/Scipy for a set of measurements

I have an physical instrument of measurement (force platform with load cells) which gives me three values, A, B and C. It happens, though, that these values - that should be orthogonal - actually are somewhat coupled, due to physical characteristics of the measuring device, which causes cross-talk between applied and returned values of force and torque.
Then, it is recommended that a calibration matrix be used to transform the measured values into a better estimate of the actual values, like this:
The problem is that it is necessary to perform a SET of measurements, so that different measured(Fz, Mx, My) and actual(Fz, Mx, My) are least-squared to get some C matrix that works best for the system as a whole.
I can solve Ax = B problems with scipy.linalg.lststq, or even scipy.linalg.solve (giving an exact solution) for ONE measurement, but how should I proceed to consider a set of different measurements, each one with its own equation giving a potentially different 3x3 matrix?
Any help is much appreciated, thanks for reading.
I posted a similar question containing just the mathematical part of this at math.stackexchange.com, and this answer solved the problem:
math.stackexchange.com/a/232124/27435
In case anyone have a similar problem in the future, here is the almost literal Scipy implementation of that answer (first lines are initialization boilerplate code):
import numpy
import scipy.linalg
### Origin of the coordinate system: upper left corner!
"""
1----------2
| |
| |
4----------3
"""
platform_width = 600
platform_height = 400
# positions of each load cell (one per corner)
loadcell_positions = numpy.array([[0, 0],
[platform_width, 0],
[platform_width, platform_height],
[0, platform_height]])
platform_origin = numpy.array([platform_width, platform_height]) * 0.5
# applying a known force at known positions and taking the measurements
measurements_per_axis = 5
total_load = 50
results = []
for x in numpy.linspace(0, platform_width, measurements_per_axis):
for y in numpy.linspace(0, platform_height, measurements_per_axis):
position = numpy.array([x,y])
for loadpos in loadcell_positions:
moments = platform_origin-loadpos * total_load
load = numpy.array([total_load])
result = numpy.hstack([load, moments])
results.append(result)
results = numpy.array(results)
noise = numpy.random.rand(*results.shape) - 0.5
measurements = results + noise
# now expand ("stuff") the 3x3 matrix to get a linearly independent 3x3 matrix
expands = []
for n in xrange(measurements.shape[0]):
k = results[n,:]
m = measurements[n,:]
expand = numpy.zeros((3,9))
expand[0,0:3] = m
expand[1,3:6] = m
expand[2,6:9] = m
expands.append(expand)
expands = numpy.vstack(expands)
# perform the actual regression
C = scipy.linalg.lstsq(expands, measurements.reshape((-1,1)))
C = numpy.array(C[0]).reshape((3,3))
# the result with pure noise (not actual coupling) should be
# very close to a 3x3 identity matrix (and is!)
print C
Hope this helps someone!