Matlab Histogram Function - matlab

I'm new to Matlab and for an assignment my professor is having the class write (complete really) a custom Matlab function for generating a histogram from a set of data. Essentially a new vector is being created, L which is being updated with the information from a 2D matrix M. The first column of L contains the information from M(i,j) and in a second column contains the count (total) of M(i,j) in the data set. I'm in need of some direction as to how to proceed next.
Below is where I'm at thus far:
function L = hist_count(M)
L = [ [0:255' zeros(256,1) ];
for i = 1:size(M,1)
for j = 1:size(M,2)
L(double(M(i,j))+1,2) = <<finish code here>>;
end
end
figure;
plot(L(:1),L(:2));
The <<finish code here>> section is where I'm stuck. I understand everything up to the point where I need to update L with the information.
Assistance is appreciated.

Note: Your initialization of your histogram L has the brackets mismatched.
Remove the second [ bracket in the code. In addition, the creation of the 0:255 vector is incorrect. Doing 0:255' transposes the single constant of 255, which means that it will still create a horizontal vector of 0:255 which will make the code fail. You should surround the creation of this vector with parantheses, then transpose that result. Therefore:
L = [ (0:255)' zeros(256,1) ];
Now onto your actual problem. Judging by your initialization of the histogram, there are 256 possible values so your input is most likely of type uint8, which means that the values in your data will only be from [0-255] in steps of 1. Recall that a histogram records the total number of times you see a value. In this case, you have a two column matrix where the first column tells you the value you want to examine and the second column tells you how many times you see that value in your data. Therefore, each row tells you which value you are examining in your data as well as how many times you have seen that value in your data. Note that the counts are all initialized to zero, so the logic is that every time you see a value, you need to access the right row corresponding to the data point, then increment that value by 1.
Therefore, the line is simply just accessing the current count and adding 1 to it... you then store it back:
L(double(M(i,j))+1,2) = L(double(M(i,j))+1,2) + 1;
M(i,j) is the value found at location (i,j) in your 2D data. The last question you have is why cast the intensity to double and add 1? You cast to double because the input may be an integer type. This means that any values that are beyond the dynamic range of the type will get saturated. Because your input is type uint8, any values beyond 255 will saturate to 255. In MATLAB, we index into rows and columns of a matrix starting at 1 and because the values will potentially start at value 0, this corresponds to row 1 of your histogram so you have to offset by 1. When we get to the most extreme case of value 255 for type uint8 for example, adding 1 to this using the native uint8 will saturate to 255, which means that the values of 254 and 255 get lumped into the same bin. Therefore, you must convert to some type that extends beyond the limits of uint8 and then you add by 1 to avoid saturation. double is usually done here as a default as it has higher precision than uint8, but any type that is higher than uint8 in precision is suitable.

Related

How to vectorize conditional triple nested for loop - MATLAB

I have two 3D arrays:
shape is a 240 x 121 x 10958 array
area is a 240 x 1 x 10958 array
The values of the arrays are of the type double. Both have NaN as fill values where there is no relevant data.
For each [240 x 121] page of the shape array, there are several elements filled by the same number. For example, there will be a block of 1s, a block of 2s, etc. For each corresponding page of the area array there is a single column of numeric values 240 rows long. What I need to do is progressively go through each page of the shape array (moving along the 3rd, 10958-long axis) and replace each numbered element in that page with the number that fills the row of the matching number in the area array.
For example, if I'm looking at shape(:,:,500), I want to replace all the 8s on that page with area(8,1,500). I need to do this for numbers 1 through 20, and I need to do it for all 10958 pages of the array.
If I extract a single page and only replace one number I can get it to work:
shapetest = shape(:,:,500);
shapetest(shapetest==8)=area(8,1,500);
This does exactly what I need for one page and for one number. Going through numbers 1-20 with a for loop doesn't seem like an issue, but I can't find a vectorized way to do this for all the pages of the original 3D array. In fact, I couldn't even get it work for a single page without extracting that page as its own matrix like I did above. I tried things like this to no avail:
shape(shape(:,:,500)==8)=area(8,1,500);
If I can't do it for one page, I'm even more lost as to how to do it for all at once. But I'm inexperienced in MATLAB, and I think I am just ignorant of the proper syntax.
Instead, I ended up using a cell array and the following very inefficient nested for loops:
MyCell=num2cell(shape,[2 1]);
shapetest3=reshape(MyCell,1,10958);
for w=1:numel(shapetest3)
test_result{1,w}=zeros(121,240)*NaN;
end
for k=1:10958
for i=1:29040 % 121 x 240
for n=1:20
if shapetest3{1,k}(i)==n
test_result{1,k}(i)=area(n,1,k);
end
end
end
end
This gets the job done, and I can easily turn it back to an array, but it is very slow, and I am confident there is a much better vectorized way. I'd appreciate any help or tips. Thanks in advance.
To vectorize the mapping operation, we can use shape as an index into area. But because the mapping is different for each plane, we need to loop over the planes to accomplish this. In short, it'll look like this:
test_result = zeros(size(shape)); % pre-allocate output
for k=1:size(area,3) % loop over planes
lut = area(:,1,k);
test_result(:,:,k) = lut(shape(:,:,k));
end
The above only works if shape only contains integer values in the range [1,N], where N = size(area,1). That is, for other values in shape we'll be doing wrong indexing. We will need to fix shape to avoid this. The question here is, what do we want to do with those out-of-range values?
As an example, preparing shape to deal with NaN values:
code = size(area,1) + 1; % this is an unused code word
shape(isnan(shape)) = code;
area(code,1,:) = NaN;
This replaces all NaN values in shape with the value code, which is one larger than any code value we were mapping. Then, we extend area to have one more value, a value for the input code. The value we fill in here is the value that the output test_result will have where shape is NaN. In this case, we write NaN, such that NaN in the input maps to NaN in the output.
Something similar can be done with values below 0 and above 240 (shape(shape<1 | shape>240) = code), or with non-integer values (shape(mod(shape,1)~=0) = code).

'Find' function working incorrectly, have tried floating point accuracy resolution

I have vertically concatenated files from my directory into a matrix that is about 60000 x 15 in size (verified).
d=dir('*.log');
n=length(d);
data=[];
for k=1:n
data{k}=importdata(d(k).name);
end
total=[];
for k=1:n
total=[total;data{n}];
end
I am using a the following 32-iteration loop and the 'Find" function to locate row numbers where the final column is an integer corresponding to the integer iteration of the loop:
for i=1:32
v=[];
vn=[];
[v,vn]=find(abs(fix(i)-fix(total))<eps);
g=length(v)
end
I have tried to account for the floating point accuracy by using 'fix' on values of 'i' and values from matrix 'total', in addition to taking their absolute difference and checking it to be less than a tolerance of 'eps' (floating-point relative accuracy function), up to a tolerance of .99.
The 'Find' function is not working correctly. It is only working for certain integers (although it should be locating all of them (1-32)), and for the integers it does find the values are incomplete.
What is the problem here? If 'Find' is inadequate for this purpose, what is a suitable alternative?
You are getting a lot of zeros because you are looking not just at the 15th column of data but the entire data matrix so you are going to have a lot of non-integers.
Also, you're using fix on both numbers and since floating point errors can cause the number to be slightly above and below the desired integer, this will cause the ones that are below to round down an integer lower than what you'd expect. You should use round to round to the nearest integer instead.
Rather than using find to do this, I would use simple boolean logic to determine the value of the last column
for k = 1:32
% Compare column 15 to the current index
matches = abs(total(:,end) - k) < eps;
% Do stuff with these matches
g = sum(matches); % Count the matches
end
Depending on what you want to actually do with the data, you may be able to use the last column as an input to accumarray to perform an operation on each group.
As a side note, you can replace the first chunk of code with
d = dir('*.log');
data = cellfun(#importdata, {d.name}, 'UniformOutput', false);
total = cat(1, data{:});

MATLAB: Automatic assigning of matrix element indices

I am currently in the process of writing a custom function to compute the RREF of a given m x n matrix. Since I am a complete newbie to MATLAB, I thought it would be a good idea to sample the built-in rref() function.
While examining the part of code that found "the value and index of largest element in the remainder" of the leading column, I had that:
[p,k] = max(abs(A(i:m,j)))
where m is the number of rows of the matrix, and i=j=1.
I understand that max(abs(A(i:m,j))) gives you the value of the largest element in the leading column - a single scalar answer. However, I cannot understand why it manages to assign two values to [p,k], with kbeing the index number for p. could someone please be kind enough to help?
k is the position in your vector where the maximum value is.
For instance, assume we use the vector [1,2,5,2,1]. There the max value is 5. This value is at the third position in the vector. So [p,k] = max([1,2,5,2,1]);will return p=5 and k=3.
The function will assing values depending on how you call it.
p = max(...
will assign only p
[p,k] = max(...
will assign p and k.

MATLAB image summation confusion

I am trying to sum over my image (it is a 128x128 Uint8) in MATLAB (just a simple for loop), however, my sum will only go up to a value of 255. Afterwords it just keeps displaying 255 over and over again.
Does this mean that my variable has been assigned a Uint8 or something? If so how do I change this?
Cheers!
Yes, presumably your data is of type Uint8. But you don't have to loop to sum, just use the sum function. Assuming your data is in x:
total = sum(double(x(:)))
sum will operate over a single dimension, so if you just passed it double(x) directly, it would return a 1x128 matrix; here we have passed it all the data reshaped to a single-dimension vector (using (:)), which has been converted to double using the double function.
Note that the type of your variable will be displayed along with its name and size in the Workspace window.

MATLAB: What's [Y,I]=max(AS,[],2);?

I just started matlab and need to finish this program really fast, so I don't have time to go through all the tutorials.
can someone familiar with it please explain what the following statement is doing.
[Y,I]=max(AS,[],2);
The [] between AS and 2 is what's mostly confusing me. And is the max value getting assigned to both Y and I ?
According to the reference manual,
C = max(A,[],dim) returns the largest elements along the dimension of A specified by scalar dim. For example, max(A,[],1) produces the maximum values along the first dimension (the rows) of A.
[C,I] = max(...) finds the indices of the maximum values of A, and returns them in output vector I. If there are several identical maximum values, the index of the first one found is returned.
I think [] is there just to distinguish itself from max(A,B).
C = max(A,[],dim) returns the largest elements along the dimension of A specified by scalar dim. For example, max(A,[],1) produces the maximum values along the first dimension (the rows) of A.
Also, the [C, I] = max(...) form gives you the maximum values in C, and their indices (i.e. locations) in I.
Why don't you try an example, like this? Type it into MATLAB and see what you get. It should make things much easier to see.
m = [[1;6;2] [5;8;0] [9;3;5]]
max(m,[],2)
AS is matrix.
This will return the largest elements of AS in its 2nd dimension (i.e. its columns)
This function is taking AS and producing the maximum value along the second dimension of AS. It returns the max value 'Y' and the index of it 'I'.
note the apparent wrinkle in the matlab convention; there are a number of builtin functions which have signature like:
xs = sum(x,dim)
which works 'along' the dimension dim. max and min are the oddbal exceptions:
xm = max(x,dim); %this is probably a silent semantical error!
xm = max(x,[],dim); %this is probably what you want
I sometimes wish matlab had a binary max and a collapsing max, instead of shoving them into the same function...