Reading recustructed vector from autoencoder in DL4J - autoencoder

My goal is to have an autoencoding network where I can train the identity function and then do forward passes yielding a reconstruction of the input.
For this, I'm trying to use VariationalAutoencoder, e.g. something like:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(77147718)
.trainingWorkspaceMode(WorkspaceMode.NONE)
.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
.gradientNormalizationThreshold(1.0)
.optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT)
.list()
.layer(0, new VariationalAutoencoder.Builder()
.activation(Activation.LEAKYRELU)
.nIn(100).nOut(15)
.encoderLayerSizes(120, 60, 30)
.decoderLayerSizes(30, 60, 120)
.pzxActivationFunction(Activation.IDENTITY)
.reconstructionDistribution(new BernoulliReconstructionDistribution(Activation.SIGMOID.getActivationFunction()))
.build())
.pretrain(true).backprop(false)
.build();
However, VariationalAutoencoder seems to be designed for training (and providing) mappings from an input to an encoded version, i.e. a vector of size 100 to a vector of size 15 in above example configuration.
However, I'm not particularly interested in the encoded version, but would like to train a mapping of a 100-vector to itself. Then, I'd like to run a other 100-vectors through it and get back their reconstructed versions.
But even when looking at the API of of the VariationalAutoencoder (or AutoEncoder too), I can't figure out how to do this. Or are those layers not designed for this kind of "end-to-end usage" and I would have to manually construct an autoencoding network?

You can see how to use the VAE layer to extract averaged reconstructions from the variational example.
There's two methods for getting the reconstruction from a variational layer. The standard is generateAtMeanGivenZ Which will draw samples from the layer and give you the average. If you want raw samples you can use generateRandomGivenZ. See the javadoc page for all the other methods.

Related

Panel data regression comparison in Matlab

I have a very large panel data and would like to apply a number of simple machine learning techniques in Matlab (Logistic Regression, Decision Trees, Bagged Trees).
During my preparation I came across fitglm and fitLifetimePDModel, the latter of which is meant to capture panel data. I was trying to understand how/if that differs from fitglm because when I try the below, the results are exactly the same.
Why is that? For example, under fitglm I'm not telling the program that each customer can have more than one data points.
load RetailCreditPanelData.mat
pdModel_1 = fitLifetimePDModel(data,"Logistic", 'AgeVar','YOB', 'IDVar','ID', 'LoanVars','ScoreGroup','ResponseVar','Default');
disp(pdModel_1.Model)
pdModel_2 = fitglm(data,'Default ~ 1 + ScoreGroup + YOB', 'Distribution','binomial', 'link', 'logit');
disp(pdModel_2)

Removing (Pruning) neurons from TensorFlow.JS layer

I'm new to Tensorflow, Neural Nets and I never used other than the JavaScript version of Tensorflow. And basically I'm experimenting and studdying all this.
Reading the (Python) Tensorflow docs I saw that Pruning can be done by TF.CONTRIB.MODEL_PRUNING, but, as far as I have found, there is nothing similar for Tensorflow.JS. So I'd like to experiment a bit and implement at least a very simple / basic pruning method.
This "very simple / basic pruning method" can be something like removing from the hidden layers those neurons whose weight is very near to 0. I would then train the model a bit more and see if I can recover the loss in accuracy.
I know I can access the weights with something like this:
const weights = model.layers.map(layer => {
return layer.getWeights()[0].dataSync();
});
What I would like to know if it is actually possible to find and remove units associated with those weights (and if I can do this during training).
Thanks!
Edu
It is possible to set the weights on the model. The same way you retrieve the model weights using get, you can use set to change the weights of your model.
model.fit(x, y, {epochs: 1000,
callbacks: {
onEpochEnd: () => {
// check your weight
model.layers[0].getWeights()
// set your weiths
model.layers[0].setWeights([tensors])
}
}})

Keras: make specific weights in a dense layer untrainable [duplicate]

I am using keras and tensorflow 1.4.
I want to explicitly specify which neurons are connected between two layers. Therefor I have a matrix A with ones in it, whenever neuron i in the first Layer is connected to neuron j in the second Layer and zeros elsewhere.
My first attempt was to create a custom layer with a kernel, that has the same size as A with non-trainable zeros in it, where A has zeros in it and trainable weights, where A has ones in it. Then, the desired output would be a simple dot-product. Unfortunately I did not manage to figure out, how to implement a kernel that is partly trainable and partly non-trainable.
Any suggestions?
(Building a functional model with a lot of neurons that are connected by hand could be a work around, but somehow 'ugly' solution)
The simplest way I can think of, if you have this matrix correctly shaped, is to derive the Dense layer and simply add the matrix in the code multiplying the original weights:
class CustomConnected(Dense):
def __init__(self,units,connections,**kwargs):
#this is matrix A
self.connections = connections
#initalize the original Dense with all the usual arguments
super(CustomConnected,self).__init__(units,**kwargs)
def call(self,inputs):
#change the kernel before calling the original call:
self.kernel = self.kernel * self.connections
#call the original calculations:
super(CustomConnected,self).call(inputs)
Using:
model.add(CustomConnected(units,matrixA))
model.add(CustomConnected(hidden_dim2, matrixB,activation='tanh')) #can use all the other named parameters...
Notice that all the neurons/units have yet a bias added at the end. The argument use_bias=False will still work if you don't want biases. You can also do exactly the same thing using a vector B, for instance, and mask the original biases with self.biases = self.biases * vectorB
Hint for testing: use different input and output dimensions, so you can be sure that your matrix A has the correct shape.
I just realized that my code is potentially buggy, because I'm changing a property that is used by the original Dense layer. If weird behaviors or messages appear, you can try another call method:
def call(self, inputs):
output = K.dot(inputs, self.kernel * self.connections)
if self.use_bias:
output = K.bias_add(output, self.bias)
if self.activation is not None:
output = self.activation(output)
return output
Where K comes from import keras.backend as K.
You may also go further and set a custom get_weights() method if you want to see the weights masked with your matrix. (This would not be necessary in the first approach above)

Using hidden activations in loss function

I want to create a custom loss function for a double-input double-output model in Keras that:
minimizes the reconstruction error of two autoencoders;
maximizes the correlation of the bottleneck features of the autoencoders.
For this I need to pass to the loss function:
both inputs;
both outputs / reconstructions;
output of intermediate layers for both (hidden activations).
I know I can pass both inputs and outputs to Model, but am struggling to find a way to pass the hidden activations.
I could create two new Models that have the output of the intermediate layers and pass that to loss, like:
intermediate_layer_model1 = Model(input=input1, output=autoencoder.get_layer('encoded1').output)
intermediate_layer_model2 = Model(input=input2, output=autoencoder.get_layer('encoded2').output)
autoencoder.compile(optimizer='adadelta', loss=loss(intermediate_layer_model1, intermediate_layer_model2))
But still, I would need to find a way to match the y_true in loss to the correct intermediate model.
What is the right way to approach this?
Edit
Here's an approach that I think should work. Simplified:
# autoencoder 1
input1 = Input(shape=(input_dim,))
encoded1 = Dense(encoding_dim, activation='relu', name='encoded1')(input1)
decoded1 = Dense(input_dim, activation='sigmoid', name='decoded1')(encoded1)
# autoencoder 2
input2 = Input(shape=(input_dim,))
encoded2 = Dense(encoding_dim, activation='relu', name='encoded2')(input2)
decoded2 = Dense(input_dim, activation='sigmoid', name='decoded2')(encoded2)
# merge encodings
merge_layer = merge([encoded1, encoded2], mode='concat', name='merge', concat_axis=1)
model = Model(input=[input1, input2], output=[decoded1, decoded2, merge_layer])
model.compile(optimizer='rmsprop', loss={
'decoded1': 'binary_crossentropy',
'decoded2': 'binary_crossentropy',
'merge': correlation,
})
Then in correlation I can split y_pred and do the calculations.
How about:
Defining a single model with a multiple outputs (be sure that you named a coding and reconstruction layer properly):
duo_model = Model(input=input, output=[coding_layer, reconstruction_layer])
Compiling your model with two different losses (or even performing a loss reweighting):
duo_model.compile(optimizer='rmsprop',
loss={'coding_layer': correlation_loss,
'reconstruction_layer': 'mse'})
Taking your final model as a:
encoder = Model(input=input, output=[coding_layer])
autoencoder = Model(input=input, output=[reconstruction_layer])
After proper compilation this should do the job.
When it comes to defining a proper correlation loss function there are two ways:
when coding layer and your output layer have the same dimension -
you could easly use predefinied cosine_proximity function from
Keras library.
when coding layer has different dimensonality -
you shoud first find embedding of coding vector and reconstruction vector to the same space and then - compute correlation there. Remember that this embedding should either be a Keras layer / function or Theano / Tensor flow operation (depending on which backend you are using). Of course you can compute both embedding and correlation function as a part of one loss function.

function parameters in matlab wander off after curve fitting

first a little background. I'm a psychology student so my background in coding isn't on par with you guys :-)
My problem is as follow and the most important observation is that curve fitting with 2 different programs gives completly different results for my parameters, altough my graphs stay the same. The main program we have used to fit my longitudinal data is kaleidagraph and this should be seen as kinda the 'golden standard', the program I'm trying to modify is matlab.
I was trying to be smart and wrote some code (a lot at least for me) and the goal of that code was the following:
1. Taking an individual longitudinal datafile
2. curve fitting this data on a non-parametric model using lsqcurvefit
3. obtaining figures and the points where f' and f'' are zero
This all worked well (woohoo :-)) but when I started comparing the function parameters both programs generate there is a huge difference. The kaleidagraph program stays close to it's original starting values. Matlab wanders off and sometimes gets larger by a factor 1000. The graphs stay however more or less the same in both situations and both fit the data well. However it would be lovely if I would know how to make the matlab curve fitting more 'conservative' and more located near it's original starting values.
validFitPersons = true(nbValidPersons,1);
for i=1:nbValidPersons
personalData = data{validPersons(i),3};
personalData = personalData(personalData(:,1)>=minAge,:);
% Fit a specific model for all valid persons
try
opts = optimoptions(#lsqcurvefit, 'Algorithm', 'levenberg-marquardt');
[personalParams,personalRes,personalResidual] = lsqcurvefit(heightModel,initialValues,personalData(:,1),personalData(:,2),[],[],opts);
catch
x=1;
end
Above is a the part of the code i've written to fit the datafiles into a specific model.
Below is an example of a non-parametric model i use with its function parameters.
elseif strcmpi(model,'jpa2')
% y = a.*(1-1/(1+(b_1(t+e))^c_1+(b_2(t+e))^c_2+(b_3(t+e))^c_3))
heightModel = #(params,ages) abs(params(1).*(1-1./(1+(params(2).* (ages+params(8) )).^params(5) +(params(3).* (ages+params(8) )).^params(6) +(params(4) .*(ages+params(8) )).^params(7) )));
modelStrings = {'a','b1','b2','b3','c1','c2','c3','e'};
% Define initial values
if strcmpi('male',gender)
initialValues = [176.76 0.339 0.1199 0.0764 0.42287 2.818 18.52 0.4363];
else
initialValues = [161.92 0.4173 0.1354 0.090 0.540 2.87 14.281 0.3701];
end
I've tried to mimick the curve fitting process in kaleidagraph as good as possible. There I've found they use the levenberg-marquardt algorithm which I've selected. However results still vary and I don't have any more clues about how I can change this.
Some extra adjustments:
The idea for this code was the following:
I'm trying to compare different fitting models (they are designed for this purpose). So what I do is I have 5 models with different parameters and different starting values ( the second part of my code) and next I have the general curve fitting file. Since there are different models it would be interesting if I could put restrictions into how far my starting values could wander off.
Anyone any idea how this could be done?
Anybody willing to help a psychology student?
Cheers
This is a common issue when dealing with non-linear models.
If I were, you, I would try to check if you can remove some parameters from the model in order to simplify it.
If you really want to keep your solution not too far from the initial point, you can use upper bounds and lower bounds for each variable:
x = lsqcurvefit(fun,x0,xdata,ydata,lb,ub)
defines a set of lower and upper bounds on the design variables in x so that the solution is always in the range lb ≤ x ≤ ub.
Cheers
You state:
I'm trying to compare different fitting models (they are designed for
this purpose). So what I do is I have 5 models with different
parameters and different starting values ( the second part of my code)
and next I have the general curve fitting file.
You will presumably compare the statistics from fits with different models, to see whether reductions in the fitting error are unlikely to be due to chance. You may want to rely on that comparison to pick the model that not only fits your data suitably but is also simplest (which is often referred to as the principle of parsimony).
The problem is really with the model you have shown resulting in correlated parameters and therefore overfitting, as mentioned by #David. Again, this should be resolved when you compare different models and find that some do just as well (statistically speaking) even though they involve fewer parameters.
edit
To drive the point home regarding the problem with the choice of model, here are (1) results of a trial fit using simulated data (2) the correlation matrix of the parameters in graphical form:
Note that absolute values of the correlation close to 1 indicate strongly correlated parameters, which is highly undesirable. Note also that the trend in the data is practically linear over a long portion of the dataset, which implies that 2 parameters might suffice over that stretch, so using 8 parameters to describe it seems like overkill.