Algorithm for best-effort classification of vector - matlab

Given four binary vectors which represent "classes":
[1,0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0,1]
[0,1,1,1,1,1,1,1,1,0]
[0,1,0,0,0,0,0,0,0,0]
What methods are available for classifying a vector of floating point values into one of these "classes"?
Basic rounding works in most cases:
round([0.8,0,0,0,0.3,0,0.1,0,0,0]) = [1 0 0 0 0 0 0 0 0 0]
But how can I handle some interference?
round([0.8,0,0,0,0.6,0,0.1,0,0,0]) != [1 0 0 0 0 1 0 0 0 0]
This second case should be a better match for 1000000000, but instead, I have lost the solution entirely as there is no clear match.
I want to use MATLAB for this task.

Find the SSD (sum of squared differences) of your test vector with each "class" and use the one with the least SSD.
Here's some code: I added a 0 to the end of the test vector you provided since it was only 9 digits whereas the classes had 10.
CLASSES = [1,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,1
0,1,1,1,1,1,1,1,1,0
0,1,0,0,0,0,0,0,0,0];
TEST = [0.8,0,0,0,0.6,0,0.1,0,0,0];
% Find the difference between the TEST vector and each row in CLASSES
difference = bsxfun(#minus,CLASSES,TEST);
% Class differences
class_diff = sum(difference.^2,2);
% Store the row index of the vector with the minimum difference from TEST
[val CLASS_ID] = min(class_diff);
% Display
disp(CLASSES(CLASS_ID,:))
For illustrative purposes, difference looks like this:
0.2 0 0 0 -0.6 0 -0.1 0 0 0
-0.8 0 0 0 -0.6 0 -0.1 0 0 1
-0.8 1 1 1 0.4 1 0.9 1 1 0
-0.8 1 0 0 -0.6 0 -0.1 0 0 0
And the distance of each class from TEST looks like this, class_diff:
0.41
2.01
7.61
2.01
And obviously, the first one is the best match since it has the least difference.

This is the same thing as Jacob did, only with four different distance measures:
Euclidean distance
City-block distance
Cosine distance
Chebychev distance
%%
CLASSES = [1,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,1
0,1,1,1,1,1,1,1,1,0
0,1,0,0,0,0,0,0,0,0];
TEST = [0.8,0,0,0,0.6,0,0.1,0,0,0];
%%
% sqrt( sum((x-y).^2) )
euclidean = sqrt( sum(bsxfun(#minus,CLASSES,TEST).^2, 2) );
% sum( |x-y| )
cityblock = sum(abs(bsxfun(#minus,CLASSES,TEST)), 2);
% 1 - dot(x,y)/(sqrt(dot(x,x))*sqrt(dot(y,y)))
cosine = 1 - ( CLASSES*TEST' ./ (norm(TEST)*sqrt(sum(CLASSES.^2,2))) );
% max( |x-y| )
chebychev = max( abs(bsxfun(#minus,CLASSES,TEST)), [], 2 );
dist = [euclidean cityblock cosine chebychev];
%%
[minDist classIdx] = min(dist);
Pick the one you like :)

A simple Euclidean distance algorithm should suffice. The class with the minimum distance to the point would be your candidate.
http://en.wikipedia.org/wiki/Euclidean_distance

Related

Count length and frequency of island of consecutive numbers

I have a sequence of ones and zeros and I would like to count how often islands of consecutive ones appear.
Given:
S = [1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1 0 1]
By counting the islands of consecutive ones I mean this:
R = [4 3 1]
…because there are four single ones, three double ones and a single triplet of ones.
So that when multiplied by the length of the islands [1 2 3].
[4 3 1] * [1 2 3]’ = 13
Which corresponds to sum(S), because there are thirteen ones.
I hope to vectorize the solution rather than loop something.
I came up with something like:
R = histcounts(diff( [0 (find( ~ (S > 0) ) ) numel(S)+1] ))
But the result does not make much sense. It counts too many triplets.
All pieces of code I find on the internet revolve around diff([0 something numel(S)]) but the questions are always slightly different and don’t really help me
Thankful for any advice!
The following should do it. Hopefully the comments are clear.
S = [1 1 0 0 1 1 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 1 1 0 1];
% use diff to find the rising and falling edges, padding the start and end with 0
edges = diff([0,S,0]);
% get a list of the rising edges
rising = find(edges==1);
% and falling edges
falling = find(edges==-1);
% and thereby get the lengths of all the runs
SRuns = falling - rising;
% The longest run
maxRun = max(SRuns);
% Finally make a histogram, putting the bin centres
R = hist(SRuns,1:maxRun);
You could also obtain the same result with:
x = find(S==1)-(1:sum(S)) %give a specific value to each group of 1
h = histc(x,x) %compute the length of each group, you can also use histc(x,unique(x))
r = histc(h,1:max(h)) %count the occurence of each length
Result:
r =
4,3,1

How to reduce coefficients to their lowest possible integers using Matlab - Balancing Chemical Equations

I am attempting to develop a Matlab program to balance chemical equations. I am able to balance them via solving a system of linear equations. Currently my output is a column vector with the coefficients.
My problem is that I need to return the smallest integer values of these coefficients. For example, if [10, 20, 30] was returned. I want [1, 2, 3] to be returned.
What is the best way to accomplish this?
I want this program to be fully autonomous once it is fed a matrix with the linear system. Thus I can not play around with the values, I need to automate this from the code. Thanks!
% Chemical Equation in Matrix Form
Chem = [1 0 0 -1 0 0 0; 1 0 1 0 0 -3 0; 0 2 0 0 -1 0 0; 0 10 0 0 0 -1 0; 0 35 4 -4 0 12 1; 0 0 2 -1 -3 0 2]
%set x4 = 1 then Chem(:, 4) = b and
b = Chem(:, 4); % Arbitrarily set x4 = 1 and set its column equal to b
Chem(:,4) = [] % Delete the x4 column from Chem and shift over
g = 1; % Initialize variable for LCM
x = Chem\b % This is equivalent to the reduced row echelon form of
% Chem | b
% Below is my sad attempt at factoring the values, I divide by the smallest decimal to raise all the values to numbers greater than or equal to 1
for n = 1:numel(x)
g = x(n)*g
M = -min(abs(x))
y = x./M
end
I want code that will take some vector with coefficients, and return an equivalent coefficient vector with the lowest possible integer coefficients. Thanks!
I was able to find a solution without using integer programming. I converted the non-integer values to rational expressions, and used a built-in matlab function to extract the denominator of each of these expressions. I then used a built in matlab function to find the least common multiples of these values. Finally, I multiplied the least common multiple by the matrix to find my answer coefficients.
% Chemical Equation in Matrix Form
clear, clc
% Enter chemical equation as a linear system in matrix form as Chem
Chem = [1 0 0 -1 0 0 0; 1 0 1 0 0 -3 0; 0 2 0 0 -1 0 0; 0 10 0 0 0 -1 0; 0 35 4 -4 0 -12 -1; 0 0 2 -1 -3 0 -2];
% row reduce the system
C = rref(Chem);
% parametrize the system by setting the last variable xend (e.g. x7) = 1
x = [C(:,end);1];
% extract numerator and denominator from the rational expressions of these
% values
[N,D] = rat(x);
% take the least common multiple of the first pair, set this to the
% variable least
least = lcm(D(1),D(2));
% loop through taking the lcm of the previous values with the next value
% through x
for n = 3:numel(x)
least = lcm(least,D(n));
end
% give answer as column vector with the coefficients (now factored to their
% lowest possible integers
coeff = abs(least.*x)

How does Y = eye(K)(y, :); replace a "for" loop? Coursera

Working on an assignment from Coursera Machine Learning. I'm curious how this works... From an example, this much simpler code:
% K is the number of classes.
K = num_labels;
Y = eye(K)(y, :);
seems to be a substitute for the following:
I = eye(num_labels);
Y = zeros(m, num_labels);
for i=1:m
Y(i, :)= I(y(i), :);
end
and I have no idea how. I'm having some difficulty Googling this info as well.
Thanks!
Your variable y in this case must be an m-element vector containing integers in the range of 1 to num_labels. The goal of the code is to create a matrix Y that is m-by-num_labels where each row k will contain all zeros except for a 1 in column y(k).
A way to generate Y is to first create an identity matrix using the function eye. This is a square matrix of all zeroes except for ones along the main diagonal. Row k of the identity matrix will therefore have one non-zero element in column k. We can therefore build matrix Y out of rows indexed from the identity matrix, using y as the row index. We could do this with a for loop (as in your second code sample), but that's not as simple and efficient as using a single indexing operation (as in your first code sample).
Let's look at an example (in MATLAB):
>> num_labels = 5;
>> y = [2 3 3 1 5 4 4 4]; % The columns where the ones will be for each row
>> I = eye(num_labels)
I =
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
>> Y = I(y, :)
Y =
% 1 in column ...
0 1 0 0 0 % 2
0 0 1 0 0 % 3
0 0 1 0 0 % 3
1 0 0 0 0 % 1
0 0 0 0 1 % 5
0 0 0 1 0 % 4
0 0 0 1 0 % 4
0 0 0 1 0 % 4
NOTE: Octave allows you to index function return arguments without first placing them in a variable, but MATLAB does not (at least, not very easily). Therefore, the syntax:
Y = eye(num_labels)(y, :);
only works in Octave. In MATLAB, you have to do it as in my example above, or use one of the other options here.
The first set of code is Octave, which has some additional indexing functionality that MATLAB does not have. The second set of code is how the operation would be performed in MATLAB.
In both cases Y is a matrix generated by re-arranging the rows of an identity matrix. In both cases it may also be posible to calculate Y = T*y for a suitable linear transformation matrix T.
(The above assumes that y is a vector of integers that are being used as an indexing variables for the rows. If that's not the case then the code most likely throws an error.)

How to find value from matrix

Let say I have a matrix
A=[0.8 0.9 0.7 0.5 0.3 0.8 0.2 0.1]; % 8 points
where A come from logical 1 from B
B=[1 0 1 0 0 1 0 1 0 1 1 0 1 1];
As I want to find location C that satisfies
C=find(A<0.6 & A>0.35)
where the ans is C=4. My question is how to get the true location in B=8?
Unless you do not have the indices stored away somewhere, I cannot see that you have much of a choice here.
tmp = find(B);
idx = tmp(C);
In case you actually want to use this mapping more than once, I would suggest that you store the indices instead of a binary vector. This will also be more memory efficient in case the binary vector is sparse (or not a boolean vector), since you will need less entries.
In case you also need the binary vector, you should store both in case memory allows. When I have done this kind of mapping in Matlab I have actually used both a binary vector (a mask) and an index vector. This have saved me from first mapping the mask to index and then index to filtered position (so to say, skipping the tmp = find(B); idx = tmp(C); part every time and go directly to idx = allIdx(C)).
This will get you the index in B
A=[0.8 0.9 0.7 0.5 0.3 0.8 0.2 0.1];
B=[1 0 1 0 0 1 0 1 0 1 1 0 1 1];
C=find(A<0.6 & A>0.35);
temp=0;
for i=1:size(B,2)
temp=temp+B(i);
if(temp==C)
break;
end
end
locationB=i;
locationB

MATLAB calculate area of shape on plot

I Create a plot using imagesc. The X/Y axis are longitude and latitude respectively. The Z values are the intensity of the images for the image shown below. What I'd like to be able to do is calculate the area in each of the polygons shown. Can anybody recommend a straightforward (or any) method in accomplishing this?
EDIT
Forgot to include image.
Below is a toy example. It hinges on the assumption that the Z values are different inside the objects from outside (here: not 0). Also here I assume a straight divider at column 4, but the same principle (applying a mask) can be applied with other boundaries. This also assumes that the values are equidistant along x and y axes, but the question does not state the opposite. If that is not the case, a little more work using bsxfun is needed.
A = [0 2 0 0 0 2 0
3 5 3 0 1 4 0
1 4 0 0 3 2 3
2 3 0 0 0 4 2
0 2 6 0 1 6 1
0 3 0 0 2 3 0
0 0 0 0 0 0 0];
area_per_pix = 0.5; % or whatever
% plot it
cm = parula(10);
cm(1, :) = [1 1 1];
figure(1);
clf
imagesc(A);
colormap(cm);
% divider
dv_idx = 4;
left_object = A(:, 1:(dv_idx-1));
left_mask = left_object > 0; % threshold object
num_pix_left = sum(left_mask(:));
% right object, different method
right_mask = repmat((1:size(A, 2)) > dv_idx, size(A, 1), 1);
right_mask = (A > 0) & right_mask;
num_pix_right = sum(right_mask(:));
fprintf('The left object is %.2f units large, the right one %.2f units.\n', ...
num_pix_left * area_per_pix, num_pix_right * area_per_pix);
This might be helpful: http://se.mathworks.com/matlabcentral/answers/35501-surface-area-from-a-z-matrix
He has not used imagesc, but it's a similar problem.