Vowpal Wabbit: Input of neural network? - neural-network

In the machine learning tool vowpal wabbit (https://github.com/JohnLangford/vowpal_wabbit/), normally a linear estimator y*=wx is trained. However, it is possible to add a forward neural.
My question is: When I use the neural network by the command line option "-nn x", is the linear estimator wx completely replaced by an neural network?
Edit: Thanks Martin and arielf. So apperently the different constellations look like this:
The weights of the models with "--nn" are estimated by backpropagation?

[Edit: corrected answer: original wasn't accurate, thanks Martin]
The 1-layer NN feeds input features into the NN layer (all possible interactions) which are then fed to the output layer.
In order to add pass-through features as-is, without interactions, you should add the --inpass option.
You can look at models created by using --invert_hash to get a readable model on a small example:
$ cat dat.vw
1 | a b
2 | a c
# default linear model, no NN:
$ vw --invert_hash dat.ih dat.vw
...
$ cat dat.ih
...
:0
Constant:116060:0.387717
a:92594:0.387717
b:163331:0.193097
c:185951:0.228943
# Now add --nn 2 (note double-dash in long option)
# to use a 1-layer NN with 2 nodes
$ vw --nn 2 --invert_hash dat-nn.ih dat.vw
...
$ cat dat-nn.ih
...
:0
Constant:202096:-0.270493
Constant[1]:202097:0.214776
a:108232:-0.270493
a[1]:108233:0.214776
b:129036:-0.084952
b[1]:129037:0.047303
c:219516:-0.196927
c[1]:219517:0.172029
Looks like a[N] is the contribution of a to hidden-layer NN node N (starting with base/index zero apparently, the standalone a notation is a bit confusing).
When you add --inpass you get an additional weight per feature (index [2]):
$ vw --nn 2 --inpass --invert_hash dat-nn-ip.ih dat.vw
...
$ cat dat-nn-ip.ih
...
:0
Constant:202096:-0.237726
Constant[1]:202097:0.180595
Constant[2]:202098:0.451169
a:108232:-0.237726
a[1]:108233:0.180595
a[2]:108234:0.451169
b:129036:-0.084570
b[1]:129037:0.047293
b[2]:129038:0.239481
c:219516:-0.167271
c[1]:219517:0.139488
c[2]:219518:0.256326

Related

Interclass and Intraclass classification structure of CNN

I am working on a inter-class and intra-class classification problem with one CNN such as first there is two classes Cat and Dog than in Cat there is a classification three different breeds of cats and in Dog there are 5 different breeds dogs.
I haven't tried the coding yet just working on feasibility if that works.
My question is what will be the feasible design for this kind of problem.
I am thinking to design for the training, first CNN-1 network that will differentiate cat and dog and gather the image data of all the training images. After the separation of cat and dog, CNN-2 and CNN-3 will train these images further for each breed of dog and cat. I am just not sure how the testing will work in this situation.
I have approached a similar problem previously in Python. Hopefully this is helpful and you can come up with an alternative implementation in Matlab if that is what you are using.
After all was said and done, I landed on a single model for all predictions. For your purpose you could have one binary output for dog vs. cat, another multi-class output for the dog breeds, and another multi-class output for the cat breeds.
Using Tensorflow, I created a mask for the irrelevant classes. For example, if the image was of a cat, then all of the dog breeds are irrelevant and they should not impact model training for that example. This required a customized TF Dataset (that converted 0's to -1 for the mask) and a customized loss function that returned 0 error when the mask was present for that example.
Finally for the training process. Specific to your question, you will have to create custom accuracy functions that can handle the mask values how you want them to, but otherwise this part of the process should be standard. It was best practice to evenly spread out the classes among the training data but they can all be trained together.
If you google "Multi-Task Training" you can find additional resources for this problem.
Here are some code snips if you are interested:
For the customize TF dataset that masked irrelevant labels...
# Replace 0's with -1 for mask when there aren't any labels
def produce_mask(features):
for filt, tensor in features.items():
if "target" in filt:
condition = tf.equal(tf.math.reduce_sum(tensor), 0)
features[filt] = tf.where(condition, tf.ones_like(tensor) * -1, tensor)
return features
def create_dataset(filepath, batch_size=10):
...
# **** This is where the mask was applied to the dataset
dataset = dataset.map(produce_mask, num_parallel_calls=cpu_count())
...
return parsed_features
Custom loss function. I was using binary-crossentropy because my problem was multi-label. You will likely want to adapt this to categorical-crossentropy.
# Custom loss function
def masked_binary_crossentropy(y_true, y_pred):
mask = backend.cast(backend.not_equal(y_true, -1), backend.floatx())
return backend.binary_crossentropy(y_true * mask, y_pred * mask)
Then for the custom accuracy metrics. I was using top-k accuracy, you may need to modify for your purposes, but this will give you the general idea. When comparing this to the loss function, instead of converting all to 0, which would over-inflate the accuracy, this function filters those values out entirely. That works because the outputs are measured individually, so each output (binary, cat breed, dog breed) would have a different accuracy measure filtered only to the relevant examples.
backend is keras backend.
def top_5_acc(y_true, y_pred, k=5):
mask = backend.cast(backend.not_equal(y_true, -1), tf.bool)
mask = tf.math.reduce_any(mask, axis=1)
masked_true = tf.boolean_mask(y_true, mask)
masked_pred = tf.boolean_mask(y_pred, mask)
return top_k_categorical_accuracy(masked_true, masked_pred, k)
Edit
No, in the scenario I described above there is only one model and it is trained with all of the data together. There are 3 outputs to the single model. The mask is a major part of this as it allows the network to only adjust weights that are relevant to the example. If the image was a cat, then the dog breed prediction does not result in loss.

an error in caffe train

everybody.I'd like to use caffe to train a 5 classes detection task with "SSD: Single Shot MultiBox Detector", so I changed the num_classes from 21 to 6.However,I get an following error:
"Check failed: num_priors_ * num_classes_ == bottom[1]->channels() (52392 vs. 183372) Number of priors must match number of confidence predictions."
I can understand this error,and I found 52392/6=183372/21,namely why I changed num_classes to 6,but the number of confidence predictions is still 183372. So how to solve this problem. Thank you very much!
Since SSD depends on the number of labels not only for the classification output, but also for the BB prediction, you would need to change num_output in several other places in the model.
I would strongly suggest you wouldn't do that manually, but rather use the python scripts provided in the 'examples/ssd' folder. For instance, you can change line 277 in 'examples/ssd/ssd_pascal_speed.py' to:
num_classes = 5 # instead of 21
And then use the model files this script provides.

MATLAB How can i train NARX neural network with multi dataset

I create NARX network for 16 input and 1 output like this
in=[u1(1) u1(2) ... u1(t)
u2(1) u2(2) ... u2(t)
. . .
u16(1) u16(2) ... u16(t)];
target=[1 2 ... t];
and i want to train with 5 dataset of input and output, but i don't know how to create the one input and target matrix with 5 dataset to train NARX.
You can combine datasets with
catsamples()
For example:
X = catsamples(x1, x2,..., xn)
T = catsamples(t1, t2,..., tn)
The optional parameter 'pad' allows concatenating datasets with varying sizes.
For further informations take a look at catsamples in the MathWorks documentation.
There is also a small example available at MathWorks:
Multiple Sequences with Dynamic Neural Networks

Performing additional validation in LIBSVM matlab

I am working on MATLAB LIBSVM for a while to do prediction. I have a dataset out of which I use 75% for training, 15% for finding best parameters and remaining for testing. The code is given below.
trainX and trainY are the input and output training instances
testValX and testValY are the validation dataset I use
for j = 1:100
for jj = 1:10
model(j,jj) = svmtrain(trainY,trainX,...
['-s 3 -t 2 -c ' num2str(j) ' -p 0.001 -g ' num2str(jj) '-v 5']);
[predicted_label, ~, ~]=svmpredict(testValY,...
testValX,model(j,jj));
MSE(j,jj) = sum(((predicted_label-testValY).^2)/2);
end
end
[min_val,min_indi] = min(MSE(:));
best_predicted_model_rbf(i) = model(min_indi);
My question here is whether this is correct. I am creating model matrix with different values of c and g. I use -v option which is a key here. From the predicted models I use validation dataset for prediction and there by compute mean square error. Using this MSE I pick the best c and g. Since I am using -v which returns the cross validated output, is the procedure I follow correct?
First, I think there is a slight problem with the code shown, which is that num2str(jj) '-v 5']); doesn't have a space before the -v. That may cause that flag to not be read. In the other question, you stated that this 'sometimes returns a model', which is what would happen if that flag was not read. If the flag is read, you should only get a number, not a model, when the '-v' flag is used.
Second, it looks like you are doing two different things here, either one of which would be reasonable on its own. Calling svmtrain with '-v' runs cross validation on the training set. That shouldn't return a model, it should just return an mse estimate. You could use these estimates to determine which parameter setting was best, and then train one model with that setting on all of the training data.
Anyway, next you call svmpredict(y,x,model) on a hold-out validation set, testValX, but having called svmtrain with '-v', model should just be a scalar at this point. In order for this call to run correctly, you have to get the model from svmtrain without '-v', so that it is a struct. The rest of what you are doing makes sense for this case, in which you are doing hold-out validation using testValX.

How to improve the perfomance of SVM?

Im using LIBSVM and MatLab to classify 34x5 data in 3 classes. I applied 10 fold Kfold cross validation method and RBF kernel. The output is this confusion matrix with 0.88 Correct rate (88 % accuracy). This is my confusion matrix
9 0 0
0 3 0
0 4 18
I would like to know what methods inside SVM to consider to improve the accuracy or other classifications method in Machine learning techniques. Any help?
Here is my SVM classification code
load Turn180SVM1; //load data file
libsvm_options = '-s 1 -t 2 -d 3 -r 0 -c 1 -n 0.1 -p 0.1 -m 100 -e 0.000001 -h 1 -b 0 -wi 1 -q';//svm options
C=size(Turn180SVM1,2);
% cross validation
for i = 1:10
indices = crossvalind('Kfold',Turn180SVM1(:,C),10);
cp = classperf(Turn180SVM1(:,C));
for j = 1:10
[X, Z] = find(indices(:,end)==j);%testing
[Y, Z] = find(indices(:,end)~=j);%training
feature_training = Turn180SVM1([Y'],[1:C-1]); feature_testing = Turn180SVM1([X'],[1:C-1]);
class_training = Turn180SVM1([Y'],end); class_testing = Turn180SVM1([X'], end);
% SVM Training
disp('training');
[feature_training,ps] = mapminmax(feature_training',0,1);
feature_training = feature_training';
feature_testing = mapminmax('apply',feature_testing',ps)';
model = svmtrain(class_training,feature_training,libsvm_options);
%
% SVM Prediction
disp('testing');
TestPredict = svmpredict(class_testing,sparse(feature_testing),model);
TestErrap = sum(TestPredict~=class_testing)./length(class_testing)*100;
cp = classperf(cp, TestPredict, X);
disp(((i-1)*10 )+j);
end;
end;
[ConMat,order] = confusionmat(TestPredict,class_testing);
cp.CorrectRate;
cp.CountingMatrix;
Many methods exist. If your tuning procedure is optimal (e.g. well executed cross-validation) your choices include:
Improve preprocessing, perhaps tailor new aggregated features based on domain knowledge. Most importantly (and most effectively): make sure your inputs are standardized properly, for example by scaling every dimension onto [-1,1].
Use another kernel: RBF kernels are known to perform very well in a wide variety of settings, but specialised kernels exist for many tasks. Don't consider this unless you know what you are doing. Since you are dealing with a low-dimensional problem, RBF is probably a good choice if your data is not structured.
Reweigh training instances: particularly important when your data set is unbalanced (e.g. some classes have a lot less instances than others). You can do this with the -wX options in libsvm. All sorts of reweighting schemes exist, including variants of boosting. I'm not a major fan of this, since such approaches are prone to overfitting.
Change the cross-validation cost function to suit your exact needs. Is accuracy really what you are looking for or do you want, say, high F1 or high ROC-AUC? It is surprising how many people optimize a performance measure they are not really interested in.