Compute softmax using breeze - scala

I am constructing a deep neural network from scratch and I want to implement the softmax http://neuralnetworksanddeeplearning.com/chap3.html#softmax distributed function.
I am using breeze for that but it is not working as expected.
The documentation is also poor with very few examples, so it is difficult for me to understand how I should use it.
here is an example :
I have an ouput array that contains 10 dimensions.
I have my label array also.
Z contains 10 rows with the weighted values.
My label array contains also 10 rows and one is set to 1 to specify which row is the expected result.
lab(0) = 1
lab(1 to 9) = 0
my code :
def ComputeZ(ActivationFunction : String, z:Array[Double], label:Array[Double]) : Array[Double] = {
ActivationFunction match {
case "SoftMax" => **val t = softmax(z,label)**
t
}
}
I was expecting having a distributed probability with a total of 1 for the 10 rows but it returns actually the same values as Z.
I don't know what I am doing wrong
thanks for your help

Your question seems a little bit confusing to me. I mean, creating a SoftMax from scratch has nothing to do with the label or the real output value. A Softmax function is used to create a valid output probability distribution of a neural network, used in multiclass classification problems. As I see you have a one hot vector as label, it seems that you want to implement a CrossEntropy criterion or some error function that evaluates the divergence of the prediction distribution and the label distribution. That needs the output prediction probability distribution(applying your Softmax to the output layer) and the one hot vector of the output.
I watched the code of the softmax function in breeze but I don´t see a Layer implementation and it doesn´t do what I was expecting. Have in mind that you need a forward an a backward function.

Related

Pytorch - how to undersample using weightedrandomsampler

I have an unbalanced dataset and would like to undersample the class that is overrepresented.How do I go about it. I would like to use to weightedrandomsampler but I am also open to other suggestions.
So far I am assuming that my code will have to be structured kind of like the following. But I dont know how to exaclty do it.
trainset = datasets.ImageFolder(path_train,transform=transform)
...
sampler = data.WeightedRandomSampler(weights=..., num_samples=..., replacement=...)
...
trainloader = data.DataLoader(trainset, batchsize = batchsize, sampler=sampler)
I hope someone can help. Thanks a lot
From my understanding, pytorch WeightedRandomSampler 'weights' argument is somewhat similar to numpy.random.choice 'p' argument which is the probability that a sample will get randomly selected. Pytorch uses weights instead to random sample training examples and they state in the doc that the weights don't have to sum to 1 so that's what I mean that it's not exactly like numpy's random choice. The stronger the weight, the more likely that sample will get sampled.
When you have replacement=True, it means that training examples can be drawn more than once which means you can have copies of training examples in your train set that get used to train your model; oversampling. Alongside, if the weights are low COMPARED TO THE OTHER TRAINING SAMPLE WEIGHTS the opposite occurs which means that those samples have a lower chance of being selected for random sampling; undersampling.
I have no clue how the num_samples argument works when using it with the train loader but I can warn you to NOT put your batch size there. Today, I tried putting the batch size and it gave horrible results. My co-worker put the number of classes*100 and his results were much better. All I know is that you should not put the batch size there. I also tried putting the size of all my training data for num_samples and it had better results but took forever to train. Either way, play around with it and see what works best for you. I would guess that the safe bet is to use the number of training examples for the num_samples argument.
Here's the example I saw somebody else use and I use it as well for binary classification. It seems to work just fine. You take the inverse of the number of training examples for each class and you set all training examples with that class its respective weight.
A quick example using your trainset object
labels = np.array(trainset.samples)[:,1] # turn to array and take all of column index 1 which are the labels
labels = labels.astype(int) # change to int
majority_weight = 1/num_of_majority_class_training_examples
minority_weight = 1/num_of_minority_class_training_examples
sample_weights = np.array([majority_weight, minority_weight]) # This is assuming that your minority class is the integer 1 in the labels object. If not, switch places so it's minority_weight, majority_weight.
weights = samples_weights[labels] # this goes through each training example and uses the labels 0 and 1 as the index in sample_weights object which is the weight you want for that class.
sampler = WeightedRandomSampler(weights=weights, num_samples=, replacement=True)
trainloader = data.DataLoader(trainset, batchsize = batchsize, sampler=sampler)
Since the pytorch doc says that the weights don't have to sum to 1, I think you can also just use the ratio which between the imbalanced classes. For example, if you had 100 training examples of the majority class and 50 training examples of the minority class, it would be a 2:1 ratio. To counterbalance this, I think you can just use a weight of 1.0 for each majority class training example and a weight 2.0 for all minority class training examples because technically you want the minority class to be 2 times more likely to be selected which would balance your classes during random selection.
I hope this helped a little bit. Sorry for the sloppy writing, I was in a huge rush and saw that nobody answered. I struggled through this myself without being able to find any help for it either. If it doesn't make sense just say so and I'll re-edit it and make it more clear when I get free time.
Based on torchdata (disclaimer: I'm the author) one can create a custom undersampler.
First, _Equalizer base class which:
creates multiple RandomSubsetSamplers (one for each class)
based on function (torch.max or torch.min) will behave as oversampler or undersampler
Code:
class _Equalizer(Sampler):
def __init__(self, labels: torch.tensor, function):
if len(labels.shape) > 1:
raise ValueError(
"labels can only have a single dimension (N, ), got shape: {}".format(
labels.shape
)
)
tensors = [
torch.nonzero(labels == i, as_tuple=False).flatten()
for i in torch.unique(labels)
]
self.samples_per_label = getattr(builtins, function)(map(len, tensors))
self.samplers = [
iter(
RandomSubsetSampler(
tensor,
replacement=len(tensor) < self.samples_per_label,
num_samples=self.samples_per_label
if len(tensor) < self.samples_per_label
else None,
)
)
for tensor in tensors
]
#property
def num_samples(self):
return self.samples_per_label * len(self.samplers)
def __iter__(self):
for _ in range(self.samples_per_label):
for index in torch.randperm(len(self.samplers)).tolist():
yield next(self.samplers[index])
def __len__(self):
return self.num_samples
Now, we can create undersampler (added oversampler as it is really short right now):
class RandomUnderSampler(_Equalizer):
def __init__(self, labels: torch.tensor):
super().__init__(labels, "min")
class RandomOverSampler(_Equalizer):
def __init__(self, labels):
super().__init__(labels, "max")
Just pass in your labels to the __init__ (has to be 1D but can have multiple or binary classes) and you can up/under sample your data.

No Model Summary For GLMs in Pyspark / SparkML

I'm familiarizing myself with Pyspark and SparkML at the moment. To do so I use the titanic dataset to train a GLM for predicting the 'Fare' in that dataset.
I'm following closely the Spark documentation. I do get a working model (which I call glm_fare) but when I try to assess the trained model using summary I get the following error message:
RuntimeError: No training summary available for this GeneralizedLinearRegressionModel
Why is this?
The code for training was as such:
glm_fare = GeneralizedLinearRegression(
labelCol="Fare",
featuresCol="features",
predictionCol='prediction',
family='gamma',
link='log',
weightCol='wght',
maxIter=20
)
glm_fit = glm_fare.fit(training_df)
glm_fit.summary
Just in case someone comes across this question, I ran into this problem as well and it seems that this error occurs when the Hessian matrix is not invertible. This matrix is used in the maximization of the likelihood for estimating the coefficients.
The matrix is not invertible if one of the eigenvalues is 0, which occurs when there is multicollinearity in your variables. This means that one of the variables can be predicted with a linear combination of the other variables. Consequently, the effect of each of the variables cannot be identified with any significance.
A possible solution would be to find the variables that are (multi)collinear and remove one of them from the regression. Note however that multicollinearity is only a problem if you want to interpret the coefficients and not when the model is used for prediction.
It is documented possibly there could be no summary available for a model in GeneralizedLinearRegressionModel docs.
However you can do an initial check to avoid the error:
glm_fit.hasSummary() which is a public boolean method.
Using it as
if glm_fit.hasSummary():
print(glm_fit.summary)
Here is a direct like to the Pyspark source code
and the GeneralizedLinearRegressionTrainingSummary class source code and where the error is thrown
Make sure your input variables for one hot encoder starts from 0.
One error I made that caused summary not created is, I put quarter(1,2,3,4) directly to one hot encoder, and get a vector of length 4, and one column is 0. I converted quarter to 0,1,2,3 and problem solved.

Keras: make specific weights in a dense layer untrainable [duplicate]

I am using keras and tensorflow 1.4.
I want to explicitly specify which neurons are connected between two layers. Therefor I have a matrix A with ones in it, whenever neuron i in the first Layer is connected to neuron j in the second Layer and zeros elsewhere.
My first attempt was to create a custom layer with a kernel, that has the same size as A with non-trainable zeros in it, where A has zeros in it and trainable weights, where A has ones in it. Then, the desired output would be a simple dot-product. Unfortunately I did not manage to figure out, how to implement a kernel that is partly trainable and partly non-trainable.
Any suggestions?
(Building a functional model with a lot of neurons that are connected by hand could be a work around, but somehow 'ugly' solution)
The simplest way I can think of, if you have this matrix correctly shaped, is to derive the Dense layer and simply add the matrix in the code multiplying the original weights:
class CustomConnected(Dense):
def __init__(self,units,connections,**kwargs):
#this is matrix A
self.connections = connections
#initalize the original Dense with all the usual arguments
super(CustomConnected,self).__init__(units,**kwargs)
def call(self,inputs):
#change the kernel before calling the original call:
self.kernel = self.kernel * self.connections
#call the original calculations:
super(CustomConnected,self).call(inputs)
Using:
model.add(CustomConnected(units,matrixA))
model.add(CustomConnected(hidden_dim2, matrixB,activation='tanh')) #can use all the other named parameters...
Notice that all the neurons/units have yet a bias added at the end. The argument use_bias=False will still work if you don't want biases. You can also do exactly the same thing using a vector B, for instance, and mask the original biases with self.biases = self.biases * vectorB
Hint for testing: use different input and output dimensions, so you can be sure that your matrix A has the correct shape.
I just realized that my code is potentially buggy, because I'm changing a property that is used by the original Dense layer. If weird behaviors or messages appear, you can try another call method:
def call(self, inputs):
output = K.dot(inputs, self.kernel * self.connections)
if self.use_bias:
output = K.bias_add(output, self.bias)
if self.activation is not None:
output = self.activation(output)
return output
Where K comes from import keras.backend as K.
You may also go further and set a custom get_weights() method if you want to see the weights masked with your matrix. (This would not be necessary in the first approach above)

How to use `crossval` in matlab for a Leave one Out Validation method

I have been reading the documentation: here and here but it's really unclear for me and I don't see how to use pratically crossval to do a leave one out cross-validation.
vals = crossval(fun,X)
vals = crossval(fun,X,Y,...)
mse = crossval('mse',X,y,'Predfun',predfun)
mcr = crossval('mcr',X,y,'Predfun',predfun)
val = crossval(criterion,X1,X2,...,y,'Predfun',predfun)
vals = crossval(...,'name',value)
I really don't understand the funpart.
I have estimatimate chlorophyll rate with different index. Then I have done a linear regression between those index and the field taken chlorophyll rate. Now I want to validate them, one of my estimation is a column with 22 entries, so I want to use 21 of them as trainee and 1 as a test, and do 22 loops so that all the data have been used as test.
But I don't where should I put the regression model? If my regression is Y=aX+b,
do I re-use the a and b calculated before for the train part, or do I do a new linear regression with the train part then see what's the test will be with that?
I am not sure I totally understood how to make a leave one out model.
Then I want to know the result of the test by calculating the RMSE (and maybe the R²).
How do I code that using crossval?
I saw the answer to the question here but I don't have access to the crossvalind fonction with my license.
Well I finaly figure it out: so this is my script:
First I charged my data and the linear regression fonction
X=indicesCha_without_Cloud(:,3);
y=Cha_g_m2t_without_Cloud(:,3);
testval=#(XTRAIN,ytrain,XTEST)Linear_regression_indices( XTRAIN,ytrain,XTEST);
where in my case fun(in the Mathwork help) is testvaland Linear_regression_indices is a very simple fonction:
function [ Linear_regression_indices ] = Linear_regression_indices(XTRAIN,ytrain,XTEST )
Linear_regression_indices=(polyval(polyfit(XTRAIN,ytrain,1),XTEST));
end
There is 2 ways to do it and they both give the same result:
one by using simply the crossval fonction
cvMse = crossval('mse',X,y,'predfun',testval,'leaveout',1);
this will do as many fold as the data size, using each time one of the data as Xtest
the second one is using cvpartition
c = cvpartition(n,'LeaveOut') creates a random partition for leave-one-out cross validation on n observations. Leave-one-out is a special case of 'KFold', in which the number of folds equals the number of observations. link
c = cvpartition(y,'LeaveOut');
cvMse2=crossval('mse',X,y,'predfun',testval,'partition',c);
then the RMSE can be easily calculated
RMSE=sqrt(cvMse);
RMSE2=sqrt(cvMse2);
then I simply get my answer, in my case RMSE=0,3548

Tensorflow max-margin loss training?

I want to train a neural network in tensorflow with a max-margin loss function using one negative sample per positive sample:
max(0,1 -pos_score +neg_score)
What I'm currently doing is this:
The network takes three inputs: input1, and then one positive example input2_pos and one negative example input2_neg. (These are indices to a word embeddings layer.) The network is supposed to calculate a score that expresses how related two examples are.
Here's a simplified version of my code:
input1 = tf.placeholder(dtype=tf.int32, shape=[batch_size])
input2_pos = tf.placeholder(dtype=tf.int32, shape=[batch_size])
input2_neg = tf.placeholder(dtype=tf.int32, shape=[batch_size])
# f is a neural network outputting a score
pos_score = f(input1,input2_pos)
neg_score = f(input1,input2_neg)
cost = tf.maximum(0., 1. -pos_score +neg_score)
optimizer= tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
What I see when I run this, is that like this the network just learns which input holds the positive example - it always predicts a similar score along the lines of:
pos_score = 0.9965983
neg_score = 0.00341663
How can I structure the variables/training so that the network learns the task instead?
I want just one network that takes two inputs and calculates a score expressing the correlation between them, and train it with max-margin loss.
Calculating scores for positive and negative separately does not seem like an option to me, since then it won't backpropagate properly. Another option seems to be randomizing inputs - but then for the loss function I need to know which example is the positive one - inputting that as another parameter would give away the solution again?
Any ideas?
Given your results (1 for every positive, 0 for every negative) it seems you have two different networks learning:
to predict 1 for the first one
to predict 0 for the second one
When using max-margin loss, you need to use the same network for computing both pos_score and neg_score. The way to do that is to share the variables. I will give you a small example using tf.get_variable():
with tf.variable_scope("network"):
w = tf.get_variable("weights", shape=..., initializer=...)
def f(x, y):
with tf.variable_scope("network", reuse=True):
w = tf.get_variable("weights")
res = w * (x - y) # some computation
return res
With this function f as model, the training will optimize the shared variable with name "network/weights".