Vowpal Wabbit: question on weight of interaction features - feature-engineering

In VW, the format for feature namespaces is shown below:
Label [Tag]|Namespace Features |Namespace Features ... |Namespace
Features Where:
Namespace=String[:Value]
and an example is:
1 1.0 |MetricFeatures:3.28 height:1.5 length:2.0 |Says black with white stripes |OtherFeatures NumberOfLegs:4.0 HasStripes
Notice that the |MetricFeatures namespace has a higher weight than 1 (3.28). Based on the above example, if I create some feature interactions, say between the M and the S namespaces with -q MS, does the new feature namespace that is the cross product of the two original ones have an importance weighting of 1 by default? Or would it inherit the product of the two importance Values (in this case 1*3.28 = 3.28)?
And is there a way to modify the weight of the feature interactions manually? E.g. say MetricFeatures has an importance weight of 1, can I have the features generated by the quadratic interaction of MetricFeaturesXSays have an importance weighting of x?

Currently there is no way to individually weight interactions.
The namespace weight is processed at parse time, so when reading in the features of that namespace they are multiplied by the weight.
This can be verified by using --audit:
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = data.txt
num sources = 1
average since example example current current current
loss last counter weight label predict features
0
MetricFeatures^height:146807:4.92:0#0 MetricFeatures^length:38580:6.56:0#0 Says^black:100768:1:0#0 Says^with:163314:1:0#0 Says^white:106708:1:0#0Says^stripes:112832:1:0#0 OtherFeatures^NumberOfLegs:146847:4:0#0 OtherFeatures^HasStripes:229154:1:0#0 Constant:116060:1:0#0
1.000000 1.000000 1 1.0 1.0000 0.0000 9
finished run
number of examples = 1
weighted example sum = 1.000000
weighted label sum = 1.000000
average loss = 1.000000
best constant = 1.000000
best constant's loss = 0.000000
total feature number = 9
MetricFeatures^height:146807:4.92:0#0 -> 3.28 * 1.5 = 4.92

Related

Individual class accuracy calculation confusion

The total number of data points for which the following binary classification result is obtained = 1500. Out of which, I have
1473 labelled as 0 and
the remaining 27 as 1 .
As can be seen from the confusion matrix, out of 27 data points belonging to class 1, I got only 1 data point misclassified as 0 . So, I calculated the accuracy for individual classes and got Accuracy for class labelled as 0 = 98.2% and for the other as 1.7333%. Is this calculation correct? I am not sure...I did get a pretty good classification for the class labelled as 1 so why the accuracy for it is low?
The individual class accuracies should have been 100% for class0 and around 98% for class1
Does one misclassification reduce the accuracy of class 1 by so much amount? This is the how I calculated the individual class accuracies in MAtlab.
cmMatrix =
1473 0
1 26
acc_class0 = 100*(cmMatrix(1,1))/1500;
acc_class1= 100*(cmMatrix(2,2))/1500;
If everything had been classified correctly, your computation would indicate accuracy for class 1 as 27/1500=0.018. This is obviously wrong. Overall accuracy is 1499/1500, but per-class accuracy cannot use 1500 as denominator. 27 is the maximum correctly classified elements, and should therefore be the denominator.
acc_class0 = 100*cmMatrix(1,1)/sum(cmMatrix(1,:));
acc_class1 = 100*cmMatrix(2,2)/sum(cmMatrix(2,:));

How do you apply a custom median filter with threshold?

I am trying to create a custom median filter for my image processing where for a 3x3 neighbourhood, the central pixel (being changed) is excluded. My kernel is therefore
1 1 1
1 0 1
1 1 1
But I want to only change the central pixel to the median of the surrounding pixels if its value deviates by more than the surrounding pixels by some threshold value. E.g. if the pixel is more than 10 times the median of the surrounding pixels, then the central pixel value is changed to the median.
I've looked at using ordfilt2 and I can create a median filter with it. But I am not sure how I can implement the threshold condition. I am essentially trying to remove any outliers within my image which meet the threshold condition within my kernel.
Thanks for any help.
You don't have a single function for doing that, but ord2filt is a good start.
N = uint8([1 1 1 ; 1 0 1 ; 1 1 1]); % neighborhood, faster with integer class
J = (ordfilt2(I,4,N) + ordfilt2(I,5,N))/2; % median of even set
M = I>J+10; % put here your threshold method
Out = I;
Out(M) = J(M);
Rem: question already asked here, but without any good answer IMO.
I suggest the following approach:
%defines input
A = repmat(1:5,5,1);
%step 1: median filtering, ignoring the central pixel
fun = #(x) median([x(1:ceil(length(x(:))/2-1)),x(ceil(length(x(:))/2+1):end)]);
filteredA = nlfilter(A,[3 3],fun);
%step 2: changing each pixel, onlyt if its 10 times bigger from the median
result = A;
changeMask = (A./filteredA)>10 | (A./filteredA)<0.1;
result(changeMask ) = filteredA(changeMask);

Using bin counts as weights for random number selection

I have a set of data that I wish to approximate via random sampling in a non-parametric manner, e.g.:
eventl=
4
5
6
8
10
11
12
24
32
In order to accomplish this, I initially bin the data up to a certain value:
binsize = 5;
nbins = 20;
[bincounts,ind] = histc(eventl,1:binsize:binsize*nbins);
Then populate a matrix with all possible numbers covered by the bins which the approximation can choose:
sizes = transpose(1:binsize*nbins);
To use the bin counts as weights for selection i.e. bincount (1-5) = 2, thus the weight for choosing 1,2,3,4 or 5 = 2 whilst (16-20) = 0 so 16,17,18, 19 or 20 can never be chosen, I simply take the bincounts and replicate them across the bin size:
w = repelem(bincounts,binsize);
To then perform weighted number selection, I use:
[~,R] = histc(rand(1,1),cumsum([0;w(:)./sum(w)]));
R = sizes(R);
For some reason this approach is unable to approximate the data. It was my understanding that was sufficient sampling depth, the binned version of R would be identical to the binned version of eventl however there is significant variation and often data found in bins whose weights were 0.
Could anybody suggest a better method to do this or point out the error?
For a better method, I suggest randsample:
values = [1 2 3 4 5 6 7 8]; %# values from which you want to pick
numberOfElements = 1000; %# how many values you want to pick
weights = [2 2 2 2 2 1 1 1]; %# weights given to the values (1-5 are twice as likely as 6-8)
sample = randsample(values, numberOfElements, true, weights);
Note that even with 1000 samples, the distribution does not exactly correspond to the weights, so if you only pick 20 samples, the histogram may look rather different.

Neuralnetwork activation function

This is beginner level question.
I have several training inputs in binary and for the neural network I am using a sigmoid thresholding function SigmoidFn(Input1*Weights) where
SigmoidFn(x) = 1./(1+exp(-1.*x));
The use of the above function will give continuous real numbers. But, I want the output to be in binary since the network is a Hopfield neural net (single layer 5 input nodes and 5 output nodes). The problem which I am facing is I am unable to correctly understand the usage and implementation of the various thresholding fucntions. The weights given below are the true weights and provided in the paper. So, I am using the weights to generate several training examples, several output samples by keeping the weight fixed, that is just run the neural network several times.
Weights = [0.0 0.5 0.0 0.2 0.0
0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0
0.0 1.0 0.0 0.0 0.0
0.0 0.0 0.0 -0.6 0.0];
Input1 = [0,1,0,0,0]
x = Input1*Weights; % x = 0 0 1 0 0
As can be seen the result of the multiplication is the second row of the Weights. Is this a mere coincidence?
Next,
SigmoidFn = 1./(1+exp(-1.*x))
SigmoidFn =
0.5000 0.5000 0.7311 0.5000 0.5000
round(SigmoidFn)
ans =
1 1 1 1 1
Input2 = [1,0,0,0,0]
x = Input2*Weights
x = 0 0.5000 0 0.2000 0
SigmoidFn = 1./(1+exp(-1.*x))
SigmoidFn = 0.5000 0.6225 0.5000 0.5498 0.5000
>> round(SigmoidFn)
ans =
1 1 1 1 1
Is it a good practice to use the round function round(SigmoidFn(x)) . ? The result obtained is not correct.
or how should I obtain binary result when I use any threshold function:
(a) HArd Limit
(b) Logistic sigmoid
(c) Tanh
Can somebody please show the proper code for thresholding and a brief explanation of when to use which activation function?I mean there should be certain logic otherwise why are there different kinds of functions?
EDIT : Implementation of Hopfield to recall the input pattern by successive iterations by keeping the weight fixed.
Training1 = [1,0,0,0,0];
offset = 0;
t = 1;
X(t,:) = Training1;
err = 1;
while(err~=0)
Out = X(t,:)*Weights > offset;
err = ((Out - temp)*(Out - temp).')/numel(temp);
t = t+1
X(t,:) = temp;
end
Hopfield networks do not use a sigmoid nonlinearity; the state of a node is simply updated to whether its weighted input is greater than or equal to its offset.
You want something like
output2 = Weights * Input1' >= offsets;
where offsets is the same size as Input1. I used Weights * Input1' instead of Input1 * Weights because most examples I have seen use left-multiplication for updating (that is, the rows of the weight matrix label the input nodes and the columns label the output nodes), but you will have to look at wherever you got your weight matrix to be sure.
You should be aware that you will have to perform this update operation many times before you converge to a fixed point which represents a stored pattern.
In response to your further questions, the weight matrix you have chosen does not store any memories that can be recalled with a Hopfield network. It contains a cycle 2 -> 3 -> 4 -> 2 ... that will not allow the network to converge.
In general you would recover a memory in a way similar to what you wrote in your edit:
X = [1,0,0,0,0];
offset = 0;
t = 1;
err = 1;
nIter = 100;
while err ~= 0 && t <= nIter
prev = X;
X = X * Weights >= offset;
err = ~isequal(X, prev);
t = t + 1;
end
if ~err
disp(X);
end
If you refer to the wikipedia page, this is what's referred to as the synchronous update method.

Extremely large weighted average

I am using 64 bit matlab with 32g of RAM (just so you know).
I have a file (vector) of 1.3 million numbers (integers). I want to make another vector of the same length, where each point is a weighted average of the entire first vector, weighted by the inverse distance from that position (actually it's position ^-0.1, not ^-1, but for example purposes). I can't use matlab's 'filter' function, because it can only average things before the current point, right? To explain more clearly, here's an example of 3 elements
data = [ 2 6 9 ]
weights = [ 1 1/2 1/3; 1/2 1 1/2; 1/3 1/2 1 ]
results=data*weights= [ 8 11.5 12.666 ]
i.e.
8 = 2*1 + 6*1/2 + 9*1/3
11.5 = 2*1/2 + 6*1 + 9*1/2
12.666 = 2*1/3 + 6*1/2 + 9*1
So each point in the new vector is the weighted average of the entire first vector, weighting by 1/(distance from that position+1).
I could just remake the weight vector for each point, then calculate the results vector element by element, but this requires 1.3 million iterations of a for loop, each of which contains 1.3million multiplications. I would rather use straight matrix multiplication, multiplying a 1x1.3mil by a 1.3milx1.3mil, which works in theory, but I can't load a matrix that large.
I am then trying to make the matrix using a shell script and index it in matlab so only the relevant column of the matrix is called at a time, but that is also taking a very long time.
I don't have to do this in matlab, so any advice people have about utilizing such large numbers and getting averages would be appreciated. Since I am using a weight of ^-0.1, and not ^-1, it does not drop off that fast - the millionth point is still weighted at 0.25 compared to the original points weighting of 1, so I can't just cut it off as it gets big either.
Hope this was clear enough?
Here is the code for the answer below (so it can be formatted?):
data = load('/Users/mmanary/Documents/test/insertion.txt');
data=data.';
total=length(data);
x=1:total;
datapad=[zeros(1,total) data];
weights = ([(total+1):-1:2 1:total]).^(-.4);
weights = weights/sum(weights);
Fdata = fft(datapad);
Fweights = fft(weights);
Fresults = Fdata .* Fweights;
results = ifft(Fresults);
results = results(1:total);
plot(x,results)
The only sensible way to do this is with FFT convolution, as underpins the filter function and similar. It is very easy to do manually:
% Simulate some data
n = 10^6;
x = randi(10,1,n);
xpad = [zeros(1,n) x];
% Setup smoothing kernel
k = 1 ./ [(n+1):-1:2 1:n];
% FFT convolution
Fx = fft(xpad);
Fk = fft(k);
Fxk = Fx .* Fk;
xk = ifft(Fxk);
xk = xk(1:n);
Takes less than half a second for n=10^6!
This is probably not the best way to do it, but with lots of memory you could definitely parallelize the process.
You can construct sparse matrices consisting of entries of your original matrix which have value i^(-1) (where i = 1 .. 1.3 million), multiply them with your original vector, and sum all the results together.
So for your example the product would be essentially:
a = rand(3,1);
b1 = [1 0 0;
0 1 0;
0 0 1];
b2 = [0 1 0;
1 0 1;
0 1 0] / 2;
b3 = [0 0 1;
0 0 0;
1 0 0] / 3;
c = sparse(b1) * a + sparse(b2) * a + sparse(b3) * a;
Of course, you wouldn't construct the sparse matrices this way. If you wanted to have less iterations of the inside loop, you could have more than one of the i's in each matrix.
Look into the parfor loop in MATLAB: http://www.mathworks.com/help/toolbox/distcomp/parfor.html
I can't use matlab's 'filter' function, because it can only average
things before the current point, right?
That is not correct. You can always add samples (i.e, adding or removing zeros) from your data or from the filtered data. Since filtering with filter (you can also use conv by the way) is a linear action, it won't change the result (it's like adding and removing zeros, which does nothing, and then filtering. Then linearity allows you to swap the order to add samples -> filter -> remove sample).
Anyway, in your example, you can take the averaging kernel to be:
weights = 1 ./ [3 2 1 2 3]; % this kernel introduces a delay of 2 samples
and then simply:
result = filter(w,1,[data, zeros(1,3)]); % or conv (data, w)
% removing the delay introduced by the kernel
result = result (3:end-1);
You considered only 2 options:
Multiplying 1.3M*1.3M matrix with a vector once or multiplying 2 1.3M vectors 1.3M times.
But you can divide your weight matrix to as many sub-matrices as you wish and do a multiplication of n*1.3M matrix with the vector 1.3M/n times.
I assume that the fastest will be when there will be the smallest number of iterations and n is such that creates the largest sub-matrix that fits in your memory, without making your computer start swapping pages to your hard drive.
with your memory size you should start with n=5000.
you can also make it faster by using parfor (with n divided by the number of processors).
The brute force way will probably work for you, with one minor optimisation in the mix.
The ^-0.1 operations to create the weights will take a lot longer than the + and * operations to compute the weighted-means, but you re-use the weights across all the million weighted-mean operations. The algorithm becomes:
Create a weightings vector with all the weights any computation would need:
weights = (-n:n).^-0.1
For each element in the vector:
Index the relevent portion of the weights vector to consider the current element as the 'centre'.
Perform the weighted-mean with the weights portion and the entire vector. This can be done with a fast vector dot-multiply followed by a scalar division.
The main loop does n^2 additions and subractions. With n equal to 1.3 million that's 3.4 trillion operations. A single core of a modern 3GHz CPU can do say 6 billion additions/multiplications a second, so that comes out to around 10 minutes. Add time for indexing the weights vector and overheads, and I still estimate you could come in under half an hour.