How to read the classifier confusion matrix in WEKA - classification

Sorry, I am new to WEKA and just learning.
In my decision tree (J48) classifier output, there is a confusion Matrix:
a b <----- classified as
130 8 a = functional
15 150 b = non-functional
How do I read this matrix? What's the difference between a & b?
Also, can anyone explain to me what domain values are?

Have you read the wikipedia page on confusion matrices? The text around the matrix is arranged slightly differently in their example (row labels on the left instead of on the right), but you read it just the same.
The row indicates the true class, the column indicates the classifier output. Each entry, then, gives the number of instances of <row> that were classified as <column>. In your example, 15 Bs were (incorrectly) classified as As, 150 Bs were correctly classified as Bs, etc.
As a result, all correct classifications are on the top-left to bottom-right diagonal. Everything off that diagonal is an incorrect classification of some sort.
Edit: The Wikipedia page has since switched the rows and columns around. This happens. When studying a confusion matrix, always make sure to check the labels to see whether it's true classes in rows, predicted class in columns or the other way around.

I'd put it this way:
The confusion matrix is Weka reporting on how good this J48 model is in terms of what it gets right, and what it gets wrong.
In your data, the target variable was either "functional" or "non-functional;" the right side of the matrix tells you that column "a" is functional, and "b" is non-functional.
The columns tell you how your model classified your samples - it's what the model predicted:
The first column contains all the samples which your model thinks are "a" - 145 of them, total
The second column contains all the samples which your model thinks are "b" - 158 of them
The rows, on the other hand, represent reality:
The first row contains all the samples which really are "a" - 138 of them, total
The second row contains all the samples which really are "b" - 165 of them
Knowing the columns and rows, you can dig into the details:
Top left, 130, are things your model thinks are "a" which really are
"a" <- these were correct
Bottom left, 15, are things your model thinks are "a" but which
are really "b" <- one kind of error
Top right, 8, are things your model thinks are "b" but which
really are "a" <- another kind of error
Bottom right, 150 are things your model thinks are "b" which
really are "b"
So top-left and bottom-right of the matrix are showing things your model gets right.
Bottom-left and top-right of the matrix are are showing where your model is confused.

Related

Matlab - Error bars for (large) noisy data sets

I have ten large linear arrays (n elements) such as
A = [ A1 A2....An ]
B = [ B1 B2....Bn ]
....
J = [ J1 J2....Jn ]
I can make an arithmentic mean of these arrays by adding them and dividing by ten and this reduces the noise substantially and shows the trend I am looking for. (note, that often I have more or less than ten data sets, but this is representative. Also, n varies, but is generally 10,000s of data points)
What I would like to do is plot this average with error bars that represent the noise in the original ten arrays. The arrays are large, so maybe error bars at sensible increments (say ten error bars across the entire range where the deviation from the average is greatest).
The image shows 10 noisy data sets plotted as grey lines with the mean as a black line.
thanks
I have come to a rather laborious solution to this problem by writing a (what seems) lengthy piece of code.
The code takes all the input arrays and creates two new arrays. One which is the maximum y value for each x and one which is the miniumum. This is done with the max and min functions in matlab.
The minimum is substracted from the maximum to create an array of the magnitudes of the "error" at each value of x.
Then every nth value of the error array is plotted as an error bar on top of the arithmetic mean value of all the original input arrays.
It's a fix to the problem - and the screenshot shows the result - but I was wondering if there is a more elegant "built in" solution that does this in one shot.

Multi-Output Multi-Class Keras Model

For each input I have, I have a 49x2 matrix associated. Here's what 1 input-output couple looks like
input :
[Car1, Car2, Car3 ..., Car118]
output :
[[Label1 Label2]
[Label1 Label2]
...
[Label1 Label2]]
Where both Label1 and Label2 are LabelEncode and they have respectively 1200 and 1300 different classes.
Just to make sure this is what we call a multi-output multi-class problem?
I tried to flatten the output but I feared the model wouldn't understand that all similar Label share the same classes.
Is there a Keras layer that handle output this peculiar array shape?
Generally, multi-class problems correspond with models outputting a probability distribution over the set of classes (that is typically scored against the one-hot encoding of the actual class through cross-entropy). Now, independently of whether you are structuring it as one single output, two outputs, 49 outputs or 49 x 2 = 98 outputs, that would mean having 1,200 x 49 + 1,300 x 49 = 122,500 output units - which is not something a computer cannot handle, but maybe not the most convenient thing to have. You could try having each class output to be a single (e.g. linear) unit and round it's value to choose the label, but, unless the labels have some numerical meaning (e.g. order, sizes, etc.), that is not likely to work.
If the order of the elements in the input has some meaning (that is, shuffling it would affect the output), I think I'd approach the problem through an RNN, like an LSTM or a bidirectional LSTM model, with two outputs. Use return_sequences=True and TimeDistributed Dense softmax layers for the outputs, and for each 118-long input you'd have 118 pairs of outputs; then you can just use temporal sample weighting to drop, for example, the first 69 (or maybe do something like dropping the 35 first and the 34 last if you're using a bidirectional model) and compute the loss with the remaining 49 pairs of labellings. Or, if that makes sense for your data (maybe it doesn't), you could go with something more advanced like CTC (although Keras does not have it, I'm trying to integrate TensorFlow implementation into it without much sucess), which is also implemented in Keras (thanks #indraforyou)!.
If the order in the input has no meaning but the order of the outputs does, then you could have an RNN where your input is the original 118-long vector plus a pair of labels (each one-hot encoded), and the output is again a pair of labels (again two softmax layers). The idea would be that you get one "row" of the 49x2 output on each frame, and then you feed it back to the network along with the initial input to get the next one; at training time, you would have the input repeated 49 times along with the "previous" label (an empty label for the first one).
If there are no sequential relationships to exploit (i.e. the order of the input and the output do not have a special meaning), then the problem would only be truly represented by the initial 122,500 output units (plus all the hidden units you may need to get those right). You could also try some kind of middle ground between a regular network and a RNN where you have the two softmax outputs and, along with the 118-long vector, you include the "id" of the output that you want (e.g. as a 49-long one-hot encoded vector); if the "meaning" of each label at each of the 49 outputs is similar, or comparable, it may work.

Matlab non-linear binary Minimisation

I have to set up a phoneme table with a specific probability distribution for encoding things.
Now there are 22 base elements (each with an assigned probability, sum 100%), which shall be mapped on a 12 element table, which has desired element probabilities (sum 100%).
So part of the minimisation is to merge several base elements to get 12 table elements. Each base element must occur exactly once.
In addition, the table has 3 rows. So the same 12 element composition of the 22 base elements must minimise the error for 3 target vectors. Let's say the given target vectors are b1,b2,b3 (dimension 12x1), the given base vector is x (dimension 22x1) and they are connected by the unknown matrix A (12x22) by:
b1+err1=Ax
b2+err2=Ax
b3+err3=Ax
To sum it up: A is to be found so that dot_prod(err1+err2+err3, err1+err2+err3)=min (least squares). And - according to the above explanation - A must contain only 1's and 0's, while having exactly one 1 per column.
Unfortunately I have no idea how to approach this problem. Can it be expressed in a way different from the matrix-vector form?
Which tools in matlab could do it?
I think I found the answer while parsing some sections of the Matlab documentation.
First of all, the problem can be rewritten as:
errSum=err1+err2+err3=3Ax-b1-b2-b3
=> dot_prod(errSum, errSum) = min(A)
Applying the dot product (least squares) yields a quadratic scalar expression.
Syntax-wise, the fmincon tool within the optimization box could do the job. It has constraints parameters, which allow to force Aij to be binary and each column to be 1 in sum.
But apparently fmincon is not ideal for binary problems algorithm-wise and the ga tool should be used instead, which can be called in a similar way.
Since the equation would be very long in my case and needs to be written out, I haven't tried yet. Please correct me, if I'm wrong. Or add further solution-methods, if available.

Beginner Matrix Access in MATLAB

Now first off, I am not even sure this is called a matrix, and I am new to MATLAB. But let's say I have a "matrix" that looks like this:
for n=1:10
...
someImage = mat(:,:,n) %The "matrix"
...
end
where n could be the frames in a video, for example, and the first 2 ':' are the row and column data for the 2D image (the frame).
If I only wanted the first ':' of data (the row? column? element?), how would I access only that?
Intuitively, I think something like:
row1 = mat(:,0,0)
row2 = mat(0,:,0)
row3 = mat(0,0,:)
but that doesn't seem to be working.
P.S. I know that these aren't really rows, the terminology for all this would also be greatly appreciated
Also, it may not have anything to do with this, but I am using a MATLAB GUI as well, and the "matrix" is stored like this:
handles.mat(:,:,n)
I don't think it has anything to do with my actual question, but it might so I will put it here
-Thanks!
One point I would like to make before starting: MATLAB starts indexing at 1, and not 0. This is a common mistake that most people who have a C/Java/Python programming background make going into MATLAB.
Also, by doing:
row1 = mat(:,1,1);
This accesses all of the rows for the first column and the first frame of your video. Be aware that this will produce a M x 1 vector, where M denotes the number of rows for a frame in your video.
Also:
row2 = mat(1,:,1);
This accesses all of the columns in the first row of the first frame. Be aware that this will produce a 1 x N vector, where N denotes the number of columns for a frame in your video.
Also:
row3 = mat(1,1,:);
This accesses all of the pixels in the entire video sequence at row 1 and column 1. You can think of this as a temporal slice at the top left corner of your video sequence. Be aware that this will produce a 1 x 1 x T vector, where T is the number of frames in your video. If you access just a single pixel location in your video, the first two dimensions are superfluous, and so you can use the squeeze command to shrink all of the singleton dimensions so that it simplifies to a T x 1 vector. In other words, do this:
row3 = squeeze(mat(1,1,:));
FWIW, you do have the right terminology. Rows and columns are used in image / video processing all the time. As for the "matrix", you can call this a temporal sequence or a frame sequence in terms of video processing. It certainly is a 3D matrix, but people in this domain denote it as either one of the two as it is really a sequence of images / frames stacked on top of each other.

Preserving matrix columns using Matlab brush/select data tool

I'm working with matrices in Matlab which have five columns and several million rows. I'm interested in picking particular groups of this data. Currently I'm doing this using plot3() and the brush/select data tool.
I plot the first three columns of the matrix as X,Y, Z and highlight the matrix region I'm interested in. I then use the brush/select tool's "Create variable" tool to export that region as a new matrix.
The problem is that when I do that, the remaining two columns of the original, bigger matrix are dropped. I understand why- they weren't plotted and hence the figure tool doesn't know about them. I need all five columns of that subregion though in order to continue the processing pipeline.
I'm adding the appropriate 4th and 5th column values to the exported matrix using a horrible nested if loop approach- if columns 1, 2 and 3 match in both the original and exported matrix, attach columns 4/5 of the original matrix to the exported one. It's bad design and agonizingly slow. I know there has to be a Matlab function/trick for this- can anyone help?
Thanks!
This might help:
1. I start with matrix 1 with columns X,Y,Z,A,B
2. Using the brush/select tool, I create a new (subregion) matrix 2 with columns X,Y,Z
3. I then loop through all members of matrix 2 against all members of matrix 1. If X,Y,Z match for a pair of rows, I append A and B
from that row in matrix 1 to the appropriate row in matrix 2.
4. I become very sad as this takes forever and shows my ignorance of Matlab.
If I understand your situation correctly here is a simple way to do it:
Assuming you have a matrix like so: M = [A B C D E] where each letter is a Nx1 vector.
You select a range, this part is not really clear to me, but suppose you can create the following:
idxA,idxB and idxC, that are 1 if they are in the region and 0 otherwise.
Then you can simply use:
M(idxA&idxB&idxC,:)
and you will get the additional two columns as well.