Context juggling in PyCUDA to create store parameters in different GPU memory - pycuda

I'm trying to juggle the context in the different GPUs to store different parameters in different GPUs. So what I have done is to initialize cuda, set the device, create the context (ctx1) for the device, then use curandom to generate a GPUarray of random values. After which I pop the context and follow the same steps, set it to another device, create a new context and generate another GPUarray of random values. So from what I understand, my x1 is stored in the memory of GPU1 and x2 is stored in the memory of GPU2. However, I realised that when I .pop() and .push() ctx1 and ctx2 respectively and vice versa, I am still able to access both x1 and x2. Is thus due to the Unified Virtual Addressing or enable peer access that allows me to access both x1 and x2 regardless of which context I'm in?
import pycuda
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import pycuda.curandom as curandom
d = 2 ** 15
cuda.init()
dev1 = cuda.Device(1)
ctx1 = dev1.make_context()
curng1 = curandom.XORWOWRandomNumberGenerator()
x1 = curng1.gen_normal((d,d), dtype = np.float32) # so x1 is stored in GPU 1
ctx1.pop() # clearing ctx of GPU1
dev2 = cuda.Device(1)
ctx2 = dev2.make_context()
curng2 = curandom.XORWOWRandomNumberGenerator()
x2 = curng2.gen_normal((d,d), dtype = np.float32) # so x2 is stored in GPU 2

Related

GPFlow multiple independent realizations of same GP, irregular sampling times/lengths

In GPflow I have multiple time series and the sampling times are not aligned across time series, and the time series may have different length (longitudinal data). I assume that they are independent realizations from the same GP. What is the right way to handle this with svgp, and more generally with GPflow? Do i need to use coregionalization? The coregionalization notebook assumed correlated trajectories, while I want shared mean/kernel but independent.
Yes, the Coregion kernel implemented in GPflow is what you can use for your problem.
Let's set up some data from the generative model you describe, with different lengths for the timeseries:
import numpy as np
import gpflow
import matplotlib.pyplot as plt
Ns = [80, 90, 100] # number of observations for three different realizations
Xs = [np.random.uniform(0, 10, size=N) for N in Ns] # observation locations
# three different draws from the same GP:
k = gpflow.kernels.Matern52(variance=2.0, lengthscales=0.5) # kernel
Ks = [k(X[:, None]) for X in Xs]
Ls = [np.linalg.cholesky(K) for K in Ks]
vs = [np.random.randn(N, 1) for N in Ns]
fs = [(L # v).squeeze(axis=-1) for L, v in zip(Ls, vs)]
To actually set up the training data for the gpflow GP model:
# output indicator for the observations: which timeseries is this?
os = [o * np.ones(N) for o, N in enumerate(Ns)] # [0 ... 0, 1 ... 1, 2 ... 2]
# now assemble the three timeseries in single data set:
allX = np.concatenate(Xs)
allo = np.concatenate(os)
allf = np.concatenate(fs)
X = np.c_[allX, allo]
Y = allf[:, None]
assert X.shape == (sum(Ns), 2)
assert Y.shape == (sum(Ns), 1)
# now let's set up a copy of the original kernel:
k2 = gpflow.kernels.Matern52(active_dims=[0]) # the same as k above, but with different hyperparameters
# and a Coregionalization kernel that effectively says they are all independent:
kc = gpflow.kernels.Coregion(output_dim=len(Ns), rank=1, active_dims=[1])
kc.W.assign(np.zeros(kc.W.shape))
kc.kappa.assign(np.ones(kc.kappa.shape))
gpflow.set_trainable(kc, False) # we want W and kappa fixed
The Coregion kernel defines a covariance matrix B = W Wᵀ + diag(kappa), so by setting W=0 we prescribe zero correlations (independent realizations) and kappa=1 (actually the default) ensures that the variance hyperparameter of the copy of the original kernel remains interpretable.
Now construct the actual model and optimize hyperparameters:
k2c = k2 * kc
m = gpflow.models.GPR((X, Y), k2c, noise_variance=1e-5)
opt = gpflow.optimizers.Scipy()
opt.minimize(m.training_loss, m.trainable_variables, compile=False)
which recovers the initial variance and lengthscale hyperparameters pretty well.
If you want to predict, you have to provide the extra "output" column in the Xnew argument to m.predict_f(), e.g. as follows:
Xtest = np.linspace(0, 10, 100)
Xtest_augmented = np.c_[Xtest, np.zeros_like(Xtest)]
f_mean, f_var = m.predict_f(Xtest_augmented)
(whether you set the output column to 0, 1, or 2 does not matter, as we set them all to be the same with our choice of W and kappa).
If your input was more than one-dimensional, you could set
active_dims=list(range(X.shape[1] - 1)) for the first kernel(s) and active_dims=[X.shape[1]-1] for the Coregion kernel.

Random number seed overlapping issue

I am using Matlab GPU computing to run a simulation. I suspect I may encounter a "random number seed" overlapping issue. My code is the following
N = 10000;
v = rand(N,1);
p = [0:0.1:1];
pA = [0:0.1:2];
[v,p,pA] = ndgrid(v,p,pA);
v = gpuArray(v);
p = gpuArray(p);
pA = gpuArray(pA);
t = 1;
bH = 0.9;
bL = 0.6;
a = 0.5;
Y = MyFunction(v,p,pA,t,bH,bL,a);
function[RA] = MyFunction(v,p,pA,t,bH,bL,a)
function[RA] = SSP1(v,p,pA)
RA = 0;
S1 = rand;
S2 = rand;
S3 = rand;
vA1 = (S1<a)*bH+(S1>=a)*bL;
vA2 = (S2<a)*bH+(S2>=a)*bL;
vA3 = (S3<a)*bH+(S3>=a)*bL;
if p<=t && pA>3*bL && pA<=3*bH
if pA>vA1+vA2+vA3
if v>=p
RA = p;
end
else
if v+vA1+vA2+vA3>=p+pA
RA = p+pA;
end
end
end
end
[RA] = gather(arrayfun(#SSP1,v,p,pA));
end
The idea of the code is the following:
I generate N random agents, which is characterized by the value of v. Then for each agent, I have to compute a quantity given (p,pA). As I have N agents and many combinations of (p,pA), I want to use GPU to speed up the process. But here comes a tricky thing:
for each agent, in order to finish my computation, I have to generate 3 extra random variables, vA1,vA2,vA3. Based on my understanding of GPU (I could be wrong), it does these computations simultaneously, i.e, for each agent v, it generates 3 random variables vA1,vA2,vA3. And GPU does this N procedures at the same time. However, I am not sure whether for agent 1 and agent 2, the corresponding vA1,vA2,vA3 may overlap? Because here N could be 1 million. I want to make sure that for all of these agents, the random number seed that is used to generate their corresponding vA1,vA2,vA3 won't overlap; otherwise, I am in big trouble.
There is a way to prevent this from happening, which is: I first generate 3N of these random variables vA1,vA2,vA3. Then I put them into my GPU. However, that may require a lot of GPU memory, which I don't have. The current method, I guess does not need too much GPU memory, as I am generating vA1,vA2,vA3 on the fly?
What you say does not happen. The proof is that the following code snipped generates random values in hB.
A=ones(100,1);
dA=gpuArray(A);
[hB] = gather(arrayfun(#applyrand,dA));
function dB=applyrand(dA)
r=rand;
dB=dA*r;
end
That said, your code has only 12 values for your random variables (4 for each) because for your use of S1, S2 and S3 you are basically flipping a coin:
vA1 = (S1<0.5)*bH+(S1>=0.5)*bL;
so vA1 is either 0, bH, bL or bH+bL.
Maybe this lack of variability is what is making you think that you don't have much randomness, not very clear from the question.

How to use `feedback` function in Matlab?

Matlab's feedback function is used to obtain the closed loop transfer function of a system. Example:
sys = feedback(sys1,sys2) returns a model object sys for the negative feedback interconnection of model objects sys1,sys2. To compute the closed-loop system with positive feedback, use sign = +1, for negative feedback we use -1.
My question arises when we have a system of the following type:
According to these docs, we can use feedback to create the negative feedback loop with G and C.
sys = feedback(G*C,-1)
This is a source of confusion, shouldn't the above be: sys = feedback(G*C,1,-1)? These are not the same.
However, looking at these docs, for a unit loop gain k, you can compute the closed-loop transfer function T using:
G = tf([.5 1.3],[1 1.2 1.6 0]);
T = feedback(G,1);
Why are we using 1 and not -1? This is still negative feedback and not positive feedback.
G = tf([.5 1.3],[1 1.2 1.6 0]);
T = feedback(G,1);
The one in feedback(G,1) represents sys2 and since the function has two inputs, the default value will be a negative unity feedback according to the following line
sys = feedback(sys1,sys2) returns a model object sys for the negative
feedback interconnection of model objects sys1,sys2.
Consider the following script
s = tf('s');
G = 1/s;
T1 = feedback(G,1)
T2 = feedback(G,1,-1)
T1 and T2 are same.

seed that controls the order of a random function in Matlab

I used Matlab kmeans function to do clustering for two datasets: data1 and data2.
I have three main files, containing the following codes respectively,
result1 = kmeans(data1, 4);
result2 = kmeans(data2, 4);
r1 = kmeans(data1,4);
r2 = kmeans(data2,4);
I noticed that result1 and r1 are the same, but result2 and r2 are slightly different. I believe that this is caused by the randomness in the kmeans algorithm. In the 1st and 2nd files, data1 is executed first and thus kmeans uses the same "seed". In the 1st and 3rd files, data2 is executed at different stages. The kmeans used for result1 has an effect on the following kmeans.
My question is: can we set up seed in certain way so that r2 and result2 are the same?
You can control random number generation in MATLAB using the rng function. With it, you can capture the state of the random number generator before running your code, then set the random number generator back to that state before you run it again, ensuring you get the same results. For example:
rngState1 = rng; % Capture state before processing data1
result1 = kmeans(data1, 4);
rngState2 = rng; % Capture state before processing data2
result2 = kmeans(data2, 4);
...
rng(rngState1); % Restore state previously used for processing data1
r1 = kmeans(data1,4);
...
rng(rngState2); % Restore state previously used for processing data2
r2 = kmeans(data2,4);
Since you're processing data in separate files, this might mean saving and loading the state variables to and from a MAT-file to do what I've outlined above. Another option is simply to set the seed to a given value before processing each data set:
rng(1); % Set seed to 1 for data1
result1 = kmeans(data1, 4);
rng(2); % Set seed to 2 for data2
result2 = kmeans(data2, 4);
...
rng(1);
r1 = kmeans(data1,4);
...
rng(2);
r2 = kmeans(data2,4);
Another alternative is to use non-random initialization:
start = data1(1:4,:); % This is not necessarily a good initialization!
result1 = kmeans(data1, 4, 'Start',start);
Don't copy-paste the code above, it is just for illustrative purposes. But you might have a good strategy to initialize your means non-randomly, it depends on your data how you could do this. For example for 2D data within a rectangular domain you can select the four corners of the domain.

How do I actually execute a saved TensorFlow model?

Tensorflow newbie here. I'm trying to build an RNN. My input data is a set of vector instances of size instance_size representing the (x,y) positions of a set of particles at each time step. (Since the instances already have semantic content, they do not require an embedding.) The goal is to learn to predict the positions of the particles at the next step.
Following the RNN tutorial and slightly adapting the included RNN code, I create a model more or less like this (omitting some details):
inputs, self._input_data = tf.placeholder(tf.float32, [batch_size, num_steps, instance_size])
self._targets = tf.placeholder(tf.float32, [batch_size, num_steps, instance_size])
with tf.variable_scope("lstm_cell", reuse=True):
lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_size, forget_bias=0.0)
if is_training and config.keep_prob < 1:
lstm_cell = tf.nn.rnn_cell.DropoutWrapper(
lstm_cell, output_keep_prob=config.keep_prob)
cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * config.num_layers)
self._initial_state = cell.zero_state(batch_size, tf.float32)
from tensorflow.models.rnn import rnn
inputs = [tf.squeeze(input_, [1])
for input_ in tf.split(1, num_steps, inputs)]
outputs, state = rnn.rnn(cell, inputs, initial_state=self._initial_state)
output = tf.reshape(tf.concat(1, outputs), [-1, hidden_size])
softmax_w = tf.get_variable("softmax_w", [hidden_size, instance_size])
softmax_b = tf.get_variable("softmax_b", [instance_size])
logits = tf.matmul(output, softmax_w) + softmax_b
loss = position_squared_error_loss(
tf.reshape(logits, [-1]),
tf.reshape(self._targets, [-1]),
)
self._cost = cost = tf.reduce_sum(loss) / batch_size
self._final_state = state
Then I create a saver = tf.train.Saver(), iterate over the data to train it using the given run_epoch() method, and write out the parameters with saver.save(). So far, so good.
But how do I actually use the trained model? The tutorial stops at this point. From the docs on tf.train.Saver.restore(), in order to read back in the variables, I need to either set up exactly the same graph I was running when I saved the variables out, or selectively restore particular variables. Either way, that means my new model will require inputs of size batch_size x num_steps x instance_size. However, all I want now is to do a single forward pass through the model on an input of size num_steps x instance_size and read out a single instance_size-sized result (the prediction for the next time step); in other words, I want to create a model that accepts a different-size tensor than the one I trained on. I can kludge it by passing the existing model my intended data batch_size times, but that doesn't seem like a best practice. What's the best way to do this?
You have to create a new graph that has the same structure but with the batch_size = 1 and import the saved variables with tf.train.Saver.restore(). You can take a look at how they define multiple models with variable batch size in ptb_word_lm.py: https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/rnn/ptb/ptb_word_lm.py
So you can have a separate file for instance, where you instantiate the graph with the batch_size that you want, then restore the saved variables. Then you can execute your graph.