I want to classify 2 types of sentences: statements and questions. For this I need already learned word2vec NN to pass sentences throw it and receive 2d array for each sentence, e.g.:
[[~300 items], [~300 items], [~300 items], ...]
"300" is approximated length of word vector.
how to do that is keras? what library is better to use?
What I adivce you is to use an Embedding layer and set its weights:
input = Input(shape=(seq_len,))
embedding = Embedding(input_dim=vocabulary_size,
output_dim=300, weights=[your_w2v_matrix])(input)
...
Here you could find a really similiar question.
Related
I am constructing a deep neural network from scratch and I want to implement the softmax http://neuralnetworksanddeeplearning.com/chap3.html#softmax distributed function.
I am using breeze for that but it is not working as expected.
The documentation is also poor with very few examples, so it is difficult for me to understand how I should use it.
here is an example :
I have an ouput array that contains 10 dimensions.
I have my label array also.
Z contains 10 rows with the weighted values.
My label array contains also 10 rows and one is set to 1 to specify which row is the expected result.
lab(0) = 1
lab(1 to 9) = 0
my code :
def ComputeZ(ActivationFunction : String, z:Array[Double], label:Array[Double]) : Array[Double] = {
ActivationFunction match {
case "SoftMax" => **val t = softmax(z,label)**
t
}
}
I was expecting having a distributed probability with a total of 1 for the 10 rows but it returns actually the same values as Z.
I don't know what I am doing wrong
thanks for your help
Your question seems a little bit confusing to me. I mean, creating a SoftMax from scratch has nothing to do with the label or the real output value. A Softmax function is used to create a valid output probability distribution of a neural network, used in multiclass classification problems. As I see you have a one hot vector as label, it seems that you want to implement a CrossEntropy criterion or some error function that evaluates the divergence of the prediction distribution and the label distribution. That needs the output prediction probability distribution(applying your Softmax to the output layer) and the one hot vector of the output.
I watched the code of the softmax function in breeze but I don´t see a Layer implementation and it doesn´t do what I was expecting. Have in mind that you need a forward an a backward function.
I am working on a inter-class and intra-class classification problem with one CNN such as first there is two classes Cat and Dog than in Cat there is a classification three different breeds of cats and in Dog there are 5 different breeds dogs.
I haven't tried the coding yet just working on feasibility if that works.
My question is what will be the feasible design for this kind of problem.
I am thinking to design for the training, first CNN-1 network that will differentiate cat and dog and gather the image data of all the training images. After the separation of cat and dog, CNN-2 and CNN-3 will train these images further for each breed of dog and cat. I am just not sure how the testing will work in this situation.
I have approached a similar problem previously in Python. Hopefully this is helpful and you can come up with an alternative implementation in Matlab if that is what you are using.
After all was said and done, I landed on a single model for all predictions. For your purpose you could have one binary output for dog vs. cat, another multi-class output for the dog breeds, and another multi-class output for the cat breeds.
Using Tensorflow, I created a mask for the irrelevant classes. For example, if the image was of a cat, then all of the dog breeds are irrelevant and they should not impact model training for that example. This required a customized TF Dataset (that converted 0's to -1 for the mask) and a customized loss function that returned 0 error when the mask was present for that example.
Finally for the training process. Specific to your question, you will have to create custom accuracy functions that can handle the mask values how you want them to, but otherwise this part of the process should be standard. It was best practice to evenly spread out the classes among the training data but they can all be trained together.
If you google "Multi-Task Training" you can find additional resources for this problem.
Here are some code snips if you are interested:
For the customize TF dataset that masked irrelevant labels...
# Replace 0's with -1 for mask when there aren't any labels
def produce_mask(features):
for filt, tensor in features.items():
if "target" in filt:
condition = tf.equal(tf.math.reduce_sum(tensor), 0)
features[filt] = tf.where(condition, tf.ones_like(tensor) * -1, tensor)
return features
def create_dataset(filepath, batch_size=10):
...
# **** This is where the mask was applied to the dataset
dataset = dataset.map(produce_mask, num_parallel_calls=cpu_count())
...
return parsed_features
Custom loss function. I was using binary-crossentropy because my problem was multi-label. You will likely want to adapt this to categorical-crossentropy.
# Custom loss function
def masked_binary_crossentropy(y_true, y_pred):
mask = backend.cast(backend.not_equal(y_true, -1), backend.floatx())
return backend.binary_crossentropy(y_true * mask, y_pred * mask)
Then for the custom accuracy metrics. I was using top-k accuracy, you may need to modify for your purposes, but this will give you the general idea. When comparing this to the loss function, instead of converting all to 0, which would over-inflate the accuracy, this function filters those values out entirely. That works because the outputs are measured individually, so each output (binary, cat breed, dog breed) would have a different accuracy measure filtered only to the relevant examples.
backend is keras backend.
def top_5_acc(y_true, y_pred, k=5):
mask = backend.cast(backend.not_equal(y_true, -1), tf.bool)
mask = tf.math.reduce_any(mask, axis=1)
masked_true = tf.boolean_mask(y_true, mask)
masked_pred = tf.boolean_mask(y_pred, mask)
return top_k_categorical_accuracy(masked_true, masked_pred, k)
Edit
No, in the scenario I described above there is only one model and it is trained with all of the data together. There are 3 outputs to the single model. The mask is a major part of this as it allows the network to only adjust weights that are relevant to the example. If the image was a cat, then the dog breed prediction does not result in loss.
I want to add new words into a trained gensim word2vec model using a new text dataset. However, I want to preserve the old word embeddings and just add the new words from the dataset into the existing model. This means simple retraining of the old model with the new text dataset isn't an option as it will readjust the vectors of the previous word embeddings that are also in the new text dataset. Can you give any suggestions regarding this task? I would like something like Gensim's doc2vec infer feature where you feed the model some text input and it gives a vector as an output. Thanks.
I would do the following (pseudoPython):
for word in new_words:
# find words that should be nearby
synonyms = thesaurus.lookup(word)
# initialize an empty word vector
new_word_embedding = np.zeros(number_of_dimensions_a_word_vector_is)
# average the embeddings of synonyms
for syn in synonyms:
if w2v.get_embedding(syn):
a = np.array(new_word_embedding, w2v.get_embedding(syn))
new_word_embedding = np.mean(a, axis=0)
While trying to learn how to use Matlab's function fmincon, I am wondering: is it possible to use data-structures as inputs (for the design variables and boundaries) instead of vectors?
Here's some background details for clarification: I have a number of optimization variables (WingWeight, FuelWeight, ...). Instead of storing them in a vectors:
X(1) = FuelWeight
X(2) = WingWeight
...
Xub(1) = FuelWeightub
Xub(2) = WingWeightub
...
Xlb(1) = FuelWeightlb
Xlb(2) = WingWeightlb
...
I would like to store them in a data structure:
X.WingWeight
X.FuelWeight
Xub.FuelWeightub
Xub.WingWeightub
Xlb.FuelWeightlb
Xlb.WingWeightlb
My overall questions is, will fmincon allow for data-structures as inputs?
I would really like to use structures because the calculation and optimization assignment is really complex and it will take me quite a while to fully understand all the computations needed (I would have to re-edit the design vector many times and that seems really really time consuming to edit all elements everywhere in the code).
It is not possible at the current version of Matlab.
I worked on the problem of handwritten recognition images. For this, I use support vector machines as a classifier . the matrix score shows an example of the scores returned by svm for 5 samples. the number of classes is also 5. I want to transform this matrix into probabilities.
score=[ 0,2590 -0,6033 -1,1350 -1,2347 -0,9776
-1,4727 -0,2136 -0,9649 0,1480 -1,4761
-0,9637 -0,8662 0,0674 -1,0051 -1,1293
-2,1230 -0,8805 -0,9808 -0,0520 -0,0836
-1,6976 -1,1578 -0,9205 -1,1101 1,0796]
According to research on existing methods, I found that the Platt's scaling method is most appropriate in my case. I found an implementation of this method on this link Platt scaling but the problem is that I don't understand the third parameter to enter. Please, help me to understand this implementation and to make it executable
I await your answers and thank you in advance