Likelihoods combining multiple latent GPs in GPflow

Likelihoods combining multiple latent GPs in GPflow - gpflow

I am trying to implement Variational Heteroscedastic Gaussian Process Regression in GPFlow.
My idea is to use the Variational Sparse Gaussian Process model (gpflow.models.SVGP) with a custom built Likelihood which expresses the density of y given two independend GPs f, g:
p(y|f, g) = N(y | f, t(g) )
Where t(·) is some transformation to make g positive (currently using tf.nn.softplus).
To make this work, I am setting model.num_latent to 2, but will implement the Likelihood in such a way that logp, conditional_mean, conditional_variance method only output tensors with shape (N, 1). Below is my current implementation:
from gpflow.likelihoods import Likelihood
from gpflow.decors import params_as_tensors
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
class HeteroscedasticGaussian(Likelihood):
r"""
When using this class, num_latent must be 2.
It does not support multi-output (num_output will be 1)
"""
def __init__(self, transform=tf.nn.softplus, **kwargs):
super().__init__(**kwargs)
self.transform = transform
#params_as_tensors
def Y_given_F(self, F):
mu = tf.squeeze(F[:, 0])
sigma = self.transform(tf.squeeze(F[:, 1]))
Y_given_F = tfd.Normal(mu, sigma)
return Y_given_F
#params_as_tensors
def logp(self, F, Y):
return self.Y_given_F(F).log_prob(Y)
#params_as_tensors
def conditional_mean(self, F):
return self.Y_given_F(F).mean()
#params_as_tensors
def conditional_variance(self, F):
return self.Y_given_F(F).variance()
My doubt is how to make the method variational_expectations work with what would be a double integral over df dg. I intend to use Gauss-Hermite quadrature, but I could not understand how to perform this double integral with the ndiagquad.
Is it as simple as calling
ndiagquad(self.logp, self.num_gauss_hermite_points, Fmu, Fvar, Y=Y)
??
EDIT:
Some MWE, using the implementation of variational_expectations from the base class Likelihood.
import gpflow as gpf
import tensorflow as tf
import numpy as np
N = 1001
M = 100
X = np.linspace(0, 4*np.pi, N)[:, None]
F = np.sin(X)
G = np.cos(X)
E = np.logaddexp(0, G) * np.random.normal(size=(N,1))
Y = F + E
Z_idx = np.random.choice(N, M, replace=False)
kernel = gpf.kernels.SquaredExponential(input_dim=1)
likelihood = HeteroscedasticGaussian()
likelihood.num_gauss_hermite_points = 11
model = gpf.models.SVGP(
X=X, Z=X[Z_idx], Y=Y,
kern=kernel,
likelihood=likelihood,
num_latent=2
)
# This method will call
# model.likelihood.variational_expectations(...)
# internally
model.compute_log_likelihood()
I am getting the following error message:
InvalidArgumentError: Incompatible shapes: [1001,11] vs. [2002]
[[{{node SVGP-bdd79b25-24/Normal/log_prob/standardize/sub}}]]
Which I believe has something to do with f, g being stacked on top of each other (shape [2002] = 2*N, with N = 1001), and Gauss-Hermite Points (11) being generated for only one Dimension for each observation (N = 1001), otherwise we would have a shape [1001, 11, 11] or [1001, 121=11^2].
All help appreciated.

You were very close - we did implement multi-dimensional quadrature in ndiagquad exactly for this use-case, though it needs to be called slightly differently. It would be good to have it work such that what you wrote works out of the box. But unfortunately it's not completely trivial to find a design that works both for actual multi-output regression, single-output likelihoods with several latent GPs, and the combination of the two! ndiagquad expects tuples (or lists) for Fmu and Fvar to indicate that you want multi-dimensional integration -- this preserves backwards-compatibility for f having shape (N, L) when you want to predict multiple outputs with Y also having shape (N, L).
So you've got to write the code slightly differently. This version works with your MWE:
from gpflow.likelihoods import Likelihood
from gpflow.decors import params_as_tensors
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
class MultiLatentLikelihood(Likelihood):
def __init__(self, num_latent=1, **kwargs):
super().__init__(**kwargs)
self.num_latent = num_latent
def _transform(self, F):
return [F[:, i] for i in range(self.num_latent)]
def predict_mean_and_var(self, Fmu, Fvar):
return super().predict_mean_and_var(self._transform(Fmu), self._transform(Fvar))
def predict_density(self, Fmu, Fvar, Y):
return super().predict_density(self._transform(Fmu), self._transform(Fvar), Y)
def variational_expectations(self, Fmu, Fvar, Y):
return super().variational_expectations(self._transform(Fmu), self._transform(Fvar), Y)
class HeteroscedasticGaussian(MultiLatentLikelihood):
r"""
When using this class, num_latent must be 2.
It does not support multi-output (num_output will be 1)
"""
def __init__(self, transform=tf.nn.softplus, **kwargs):
super().__init__(num_latent=2, **kwargs)
self.transform = transform
#params_as_tensors
def Y_given_F(self, F, G):
mu = tf.squeeze(F)
sigma = self.transform(tf.squeeze(G))
Y_given_F = tfd.Normal(mu, sigma)
return Y_given_F
#params_as_tensors
def logp(self, F, G, Y):
return self.Y_given_F(F, G).log_prob(Y)
#params_as_tensors
def conditional_mean(self, F, G):
return self.Y_given_F(F, G).mean()
#params_as_tensors
def conditional_variance(self, F, G):
return self.Y_given_F(F, G).variance()
I separated the boiler-plate code in its own class, MultiLatentLikelihood, to make it clearer what is generic and what is specific to a heteroscedastic Gaussian.
Maybe we should put both the MultiLatentLikelihood and examples into GPflow - if you would be up for it, why not add this to GPflow and make a Pull Request on github.com/GPflow/GPflow ? Would be happy to review it.
Also, the GPflow tutorials contain a notebook demonstrating how to deal with heteroscedastic noise in case you hadn't come across it - but it doesn't allow you to learn a latent GP to model the noise variance. So again, I think it'd be great if you wanted to extend the notebook with this example ("Demo 3") and make a Pull Request for your amendments :)

Related

How to correctly set the 'rtol' and 'atol' in scipy integration module 'solve_ivp' for solving a system of ODE with unknown analytic solution?

I was trying to reproduce some results of ode45 solver in Python using solve_ivp. Though all parameters, initial conditions, step size, and 'atol' and 'rtol' (which are 1e-6 and 1e-3) are same, I am getting different solutions. Both of the solutions are converging to a periodic solution but of different kind. As solve_ivp uses same rk4(5) method as ode45, this discrepancy in the final result is not quite understable. How can we know which one is the correct solution?
The code is included below
import sys
import numpy as np
from scipy.integrate import solve_ivp
#from scipy import integrate
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
# Pendulum rod lengths (m), bob masses (kg).
L1, L2, mu, a1 = 1, 1, 1/5, 1
m1, m2, B = 1, 1, 0.1
# The gravitational acceleration (m.s-2).
g = 9.81
# The forcing frequency,forcing amplitude
w, a_m =10, 4.5
A=(a_m*w**2)/g
A1=a_m/g
def deriv(t, y, mu, a1, B, w, A): # beware of the order of the aruments
"""Return the first derivatives of y = theta1, z1, theta2, z2, z3."""
a, c, b, d, e = y
#c, s = np.cos(theta1-theta2), np.sin(theta1-theta2)
adot = c
cdot = (-(1-A*np.sin(e))*(((1+mu)*np.sin(a))-(mu*np.cos(a-b)*np.sin(b)))-((mu/a1)*((d**2)+(a1*np.cos(a-b)*c**2))*np.sin(a-b))-(2*B*(1+(np.sin(a-b))**2)*c)-((2*B*A/w)*(2*np.sin(a)-(np.cos(a-b)*np.sin(b)))*np.cos(e)))/(1+mu*(np.sin(a-b))**2)
bdot = d
ddot = ((-a1*(1+mu)*(1-A*np.sin(e))*(np.sin(b)-(np.cos(a-b)*np.sin(a))))+(((a1*(1+mu)*c**2)+(mu*np.cos(a-b)*d**2))*np.sin(a-b))-((2*B/mu)*(((1+mu*(np.sin(a-b))**2)*d)+(a1*(1-mu)*np.cos(a-b)*c)))-((2*B*a1*A/(w*mu))*(((1+mu)*np.sin(b))-(2*mu*np.cos(a-b)*np.sin(a)))*np.cos(e)))/(1+mu*(np.sin(a-b))**2)
edot = w
return adot, cdot, bdot, ddot, edot
# Initial conditions: theta1, dtheta1/dt, theta2, dtheta2/dt.
y0 = np.array([3.15, -0.1, 3.13, 0.1, 0])
# Do the numerical integration of the equations of motion
sol = integrate.solve_ivp(deriv,[0,40000], y0, args=(mu, a1, B, w, A), method='RK45',t_eval=np.arange(0, 40000, 0.005), dense_output=True, rtol=1e-3, atol=1e-6)
T = sol.t
Y = sol.y
I am expecting similar result from ode45 in MATLAB and solve_ivp in Python. How can I exactly reproduce the result from ode45 in python? What is the reason of discrepancy?

Even if ode45and RK45use the same underlying scheme, they do not necessarily use the same exact strategy regarding the evolution of the time step and its adaptation to match the error tolerance. Thus, it is difficult to know which one is better.
The only thing you could is simply trying lower tolerances, e.g. 1e-10. Then, both solutions should end up being virtually identical... Here, your current error tolerance might be insufficiently low, so that small discrepancies in the fine details of both algorithms create a visible difference in the solution.

Using scipy solve_bvp for a nonhomogeneous ODE

I am trying to solve the following 4th order BVP
y'''' = K - C*y
My x variable is a linspace with 100 nodes. As you can see, K is a vector of the same length=100 and makes the equation nonhomogeneous. When I press solve, however, there is the following error:
Cell In [11], line 18, in fun(x, y)
17 def fun(x, y):
---> 18 ans = vector-np.multiply(C,y[0])
19 return np.vstack((y[1],y[2],y[3],ans))
ValueError: operands could not be broadcast together with shapes (100,) (99,)
Why does the solver suddenly change the length of y by 1 and how can I fix this error?
EDIT: I must add that the solver works fine when K is absent i.e. the equation is homogeneous.
from scipy.integrate import solve_bvp
import numpy as np
L = 10
nodes = 100
A = 1000
B = 1500
C = 0.05
x = np.linspace(0,L,nodes)
vector = np.ones(nodes)
def fun(x, y):
ans = vector-np.multiply(C,y[0])
return np.vstack((y[1],y[2],y[3],ans))
def bc(ya, yb):
return np.array([ya[2], yb[2], ya[3]+A/B, yb[3]])
y_a = np.zeros((4, x.size))
res_a = solve_bvp(fun, bc, x, y_a)
res1 = res_a.sol(x)[0]
res2 = res_a.sol(x)[1]
res3 = B*res_a.sol(x)[2]
res4 = B*res_a.sol(x)[3]

The solver establishes in the first round a system for polynomial approximations over the nodes-1=99 segments of the first subdivision.
There is no guarantee that the subdivision remains unchanged in the later solver rounds. So your ODE right-side function has to work with arbitrary x arrays. This means that parameters given as a function table need to be interpolated for the general x array. There are procedures in numpy.interp for instantaneous interpolation and scipy.interpolate.interp1d to generate interpolation functions.

gaussian process regression in multiple dimensions with GPflow

I would like to perform some multivariant regression using gaussian process regression as implemented in GPflow using version 2.
Installed with pip install gpflow==2.0.0rc1
Below is some example code that generates some 2D data and then attempts to fit it with using GPR and the finally computes the difference
between the true input data and the GPR prediction.
Eventually I would like to extend to higher dimensions
and do tests against a validation set to check for over-fitting
and experiment with other kernels and "Automatic Relevance Determination"
but understanding how to get this to work is the first step.
Thanks!
The following code snippet will work in a jupyter notebook.
import gpflow
import numpy as np
import matplotlib
from gpflow.utilities import print_summary
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (12, 6)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def gen_data(X, Y):
"""
make some fake data.
X, Y are np.ndarrays with shape (N,) where
N is the number of samples.
"""
ys = []
for x0, x1 in zip(X,Y):
y = x0 * np.sin(x0*10)
y = x1 * np.sin(x0*10)
y += 1
ys.append(y)
return np.array(ys)
# generate some fake data
x = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, x)
X = X.ravel()
Y = Y.ravel()
z = gen_data(X, Y)
#note X.shape, Y.shape and z.shape
#are all (400,) for this case.
# if you would like to plot the data you can do the following
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X, Y, z, s=100, c='k')
# had to set this
# to avoid the following error
# tensorflow.python.framework.errors_impl.InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid. [Op:Cholesky]
gpflow.config.set_default_positive_minimum(1e-7)
# setup the kernel
k = gpflow.kernels.Matern52()
# set up GPR model
# I think the shape of the independent data
# should be (400, 2) for this case
XY = np.column_stack([[X, Y]]).T
print(XY.shape) # this will be (400, 2)
m = gpflow.models.GPR(data=(XY, z), kernel=k, mean_function=None)
# optimise hyper-parameters
opt = gpflow.optimizers.Scipy()
def objective_closure():
return - m.log_marginal_likelihood()
opt_logs = opt.minimize(objective_closure,
m.trainable_variables,
options=dict(maxiter=100)
)
# predict training set
mean, var = m.predict_f(XY)
print(mean.numpy().shape)
# (400, 400)
# I would expect this to be (400,)
# If it was then I could compute the difference
# between the true data and the GPR prediction
# `diff = mean - z`
# but because the shape is not as expected this of course
# won't work.

The shape of z must be (N, 1), whereas in your case it is (N,). However, this is a missing check in GPflow and not your fault.

How to define a loss function in pytorch with dependency to partial derivatives of the model w.r.t input?

After reading about how to solve an ODE with neural networks following the paper Neural Ordinary Differential Equations and the blog that uses the library JAX I tried to do the same thing with "plain" Pytorch but found a point rather "obscure": How to properly use the partial derivative of a function (in this case the model) w.r.t one of the input parameters.
To resume the problem at hand as shown in 2 it is intended to solve the ODE y' = -2*x*y with the condition y(x=0) = 1 in the domain -2 <= x <= 2. Instead of using finite differences the solution is replaced by a NN as y(x) = NN(x) with a single layer with 10 nodes.
I managed to (more or less) replicate the blog with the following code
import torch
import torch.nn as nn
from torch import optim
import matplotlib.pyplot as plt
import numpy as np
# Define the NN model to solve the problem
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.lin1 = nn.Linear(1,10)
self.lin2 = nn.Linear(10,1)
def forward(self, x):
x = torch.sigmoid(self.lin1(x))
x = torch.sigmoid(self.lin2(x))
return x
model = Model()
# Define loss_function from the Ordinary differential equation to solve
def ODE(x,y):
dydx, = torch.autograd.grad(y, x,
grad_outputs=y.data.new(y.shape).fill_(1),
create_graph=True, retain_graph=True)
eq = dydx + 2.* x * y # y' = - 2x*y
ic = model(torch.tensor([0.])) - 1. # y(x=0) = 1
return torch.mean(eq**2) + ic**2
loss_func = ODE
# Define the optimization
# opt = optim.SGD(model.parameters(), lr=0.1, momentum=0.99,nesterov=True) # Equivalent to blog
opt = optim.Adam(model.parameters(),lr=0.1,amsgrad=True) # Got faster convergence with Adam using amsgrad
# Define reference grid
x_data = torch.linspace(-2.0,2.0,401,requires_grad=True)
x_data = x_data.view(401,1) # reshaping the tensor
# Iterative learning
epochs = 1000
for epoch in range(epochs):
opt.zero_grad()
y_trial = model(x_data)
loss = loss_func(x_data, y_trial)
loss.backward()
opt.step()
if epoch % 100 == 0:
print('epoch {}, loss {}'.format(epoch, loss.item()))
# Plot Results
plt.plot(x_data.data.numpy(), np.exp(-x_data.data.numpy()**2), label='exact')
plt.plot(x_data.data.numpy(), y_data.data.numpy(), label='approx')
plt.legend()
plt.show()
From here I manage to get the results as shown in the fig.
enter image description here
The problems is that at the definition of the ODE functional, instead of passing (x,y) I would rather prefer to pass something like (x,fun) (where fun is my model) such that the partial derivative and specific evaluations of the model can be done with a call . So, something like
def ODE(x,fun):
dydx, = "grad of fun w.r.t x as a function"
eq = dydx(x) + 2.* x * fun(x) # y' = - 2x*y
ic = fun( torch.tensor([0.]) ) - 1. # y(x=0) = 1
return torch.mean(eq**2) + ic**2
Any ideas? Thanks in advance
EDIT:
After some trials I found a way to pass the model as an input but found another strange behavior... The new problem is to solve the ODE y'' = -2 with the BC y(x=-2) = -1 and y(x=2) = 1, for which the analytical solution is y(x) = -x^2+x/2+4
Let's modify a bit the previous code as:
import torch
import torch.nn as nn
from torch import optim
import matplotlib.pyplot as plt
import numpy as np
# Define the NN model to solve the equation
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.lin1 = nn.Linear(1,10)
self.lin2 = nn.Linear(10,1)
def forward(self, x):
y = torch.sigmoid(self.lin1(x))
z = torch.sigmoid(self.lin2(y))
return z
model = Model()
# Define loss_function from the Ordinary differential equation to solve
def ODE(x,fun):
y = fun(x)
dydx = torch.autograd.grad(y, x,
grad_outputs=y.data.new(y.shape).fill_(1),
create_graph=True, retain_graph=True)[0]
d2ydx2 = torch.autograd.grad(dydx, x,
grad_outputs=dydx.data.new(dydx.shape).fill_(1),
create_graph=True, retain_graph=True)[0]
eq = d2ydx2 + torch.tensor([ 2.]) # y'' = - 2
bc1 = fun(torch.tensor([-2.])) - torch.tensor([-1.]) # y(x=-2) = -1
bc2 = fun(torch.tensor([ 2.])) - torch.tensor([ 1.]) # y(x= 2) = 1
return torch.mean(eq**2) + bc1**2 + bc2**2
loss_func = ODE
So, here I passed the model as argument and managed to derive twice... so far so good. BUT, using the sigmoid function for this case is not only not necessary but also gives a result that is far from the analytical one.
If I change the NN for:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.lin1 = nn.Linear(1,1)
self.lin2 = nn.Linear(1,1)
def forward(self, x):
y = self.lin1(x)
z = self.lin2(y)
return z
In which case I would expect to optimize a double pass through two linear functions that would retrieve a 2nd order function ... I get the error:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
Adding the option to the definition of dydx doesn't solve the problem, and adding it to d2ydx2 gives a NoneType definition.
Is there something wrong with the layers as they are?

Quick Solution:
add allow_unused=True to .grad functions. So, change
dydx = torch.autograd.grad(
y, x,
grad_outputs=y.data.new(y.shape).fill_(1),
create_graph=True, retain_graph=True)[0]
d2ydx2 = torch.autograd.grad(dydx, x, grad_outputs=dydx.data.new(
dydx.shape).fill_(1), create_graph=True, retain_graph=True)[0]
To
dydx = torch.autograd.grad(
y, x,
grad_outputs=y.data.new(y.shape).fill_(1),
create_graph=True, retain_graph=True, allow_unused=True)[0]
d2ydx2 = torch.autograd.grad(dydx, x, grad_outputs=dydx.data.new(
dydx.shape).fill_(1), create_graph=True, retain_graph=True, allow_unused=True)[0]
More explanation:
See what allow_unused do:
allow_unused (bool, optional): If ``False``, specifying inputs that were not
used when computing outputs (and therefore their grad is always zero)
is an error. Defaults to ``False``.
So, if you try to differentiate w.r.t to a variable that is not in being used to compute the value, it will give an error. Also, note that error only occurs when you use linear layers.
This is because when you use linear layers, you have y=W1*W2*x + b = Wx+b and dy/dx is not a function of x, it is simply W. So when you try to differentiate dy/dx w.r.t x it throws an error. This error goes away as soon as you use sigmoid because then dy/dx will be a function of x. To avoid the error, either make sure dy/dx is a function of x or use allow_unused=True

Merging two tensors by convolution in Keras

I'm trying to convolve two 1D tensors in Keras.
I get two inputs from other models:
x - of length 100
ker - of length 5
I would like to get the 1D convolution of x using the kernel ker.
I wrote a Lambda layer to do it:
import tensorflow as tf
def convolve1d(x):
y = tf.nn.conv1d(value=x[0], filters=x[1], padding='VALID', stride=1)
return y
x = Input(shape=(100,))
ker = Input(shape=(5,))
y = Lambda(convolve1d)([x,ker])
model = Model([x,ker], [y])
I get the following error:
ValueError: Shape must be rank 4 but is rank 3 for 'lambda_67/conv1d/Conv2D' (op: 'Conv2D') with input shapes: [?,1,100], [1,?,5].
Can anyone help me understand how to fix it?

It was much harder than I expected because Keras and Tensorflow don't expect any batch dimension in the convolution kernel so I had to write the loop over the batch dimension myself, which requires to specify batch_shape instead of just shape in the Input layer. Here it is :
import numpy as np
import tensorflow as tf
import keras
from keras import backend as K
from keras import Input, Model
from keras.layers import Lambda
def convolve1d(x):
input, kernel = x
output_list = []
if K.image_data_format() == 'channels_last':
kernel = K.expand_dims(kernel, axis=-2)
else:
kernel = K.expand_dims(kernel, axis=0)
for i in range(batch_size): # Loop over batch dimension
output_temp = tf.nn.conv1d(value=input[i:i+1, :, :],
filters=kernel[i, :, :],
padding='VALID',
stride=1)
output_list.append(output_temp)
print(K.int_shape(output_temp))
return K.concatenate(output_list, axis=0)
batch_input_shape = (1, 100, 1)
batch_kernel_shape = (1, 5, 1)
x = Input(batch_shape=batch_input_shape)
ker = Input(batch_shape=batch_kernel_shape)
y = Lambda(convolve1d)([x,ker])
model = Model([x, ker], [y])
a = np.ones(batch_input_shape)
b = np.ones(batch_kernel_shape)
c = model.predict([a, b])
In the current state :
It doesn't work for inputs (x) with multiple channels.
If you provide several filters, you get as many outputs, each being the convolution of the input with the corresponding kernel.

From given code it is difficult to point out what you mean when you say
is it possible
But if what you mean is to merge two layers and feed merged layer to convulation, yes it is possible.
x = Input(shape=(100,))
ker = Input(shape=(5,))
merged = keras.layers.concatenate([x,ker], axis=-1)
y = K.conv1d(merged, 'same')
model = Model([x,ker], y)
EDIT:
#user2179331 thanks for clarifying your intention. Now you are using Lambda Class incorrectly, that is why the error message is showing.
But what you are trying to do can be achieved using keras.backend layers.
Though be noted that when using lower level layers you will lose some higher level abstraction. E.g when using keras.backend.conv1d you need to have input shape of (BATCH_SIZE,width, channels) and kernel with shape of (kernel_size,input_channels,output_channels). So in your case let as assume the x has channels of 1(input channels ==1) and y also have the same number of channels(output channels == 1).
So your code now can be refactored as follows
from keras import backend as K
def convolve1d(x,kernel):
y = K.conv1d(x,kernel, padding='valid', strides=1,data_format="channels_last")
return y
input_channels = 1
output_channels = 1
kernel_width = 5
input_width = 100
ker = K.variable(K.random_uniform([kernel_width,input_channels,output_channels]),K.floatx())
x = Input(shape=(input_width,input_channels)
y = convolve1d(x,ker)

I guess I have understood what you mean. Given the wrong example code below:
input_signal = Input(shape=(L), name='input_signal')
input_h = Input(shape=(N), name='input_h')
faded= Lambda(lambda x: tf.nn.conv1d(input, x))(input_h)
You want to convolute each signal vector with different fading coefficients vector.
The 'conv' operation in TensorFlow, etc. tf.nn.conv1d, only support a fixed value kernel. Therefore, the code above can not run as you want.
I have no idea, too. The code you given can run normally, however, it is too complex and not efficient. In my idea, another feasible but also inefficient way is to multiply with the Toeplitz matrix whose row vector is the shifted fading coefficients vector. When the signal vector is too long, the matrix will be extremely large.