Swift MPSCNNConvolution -- weights all set to 1, shouldn't the output look just like the input? - swift

Trying to figure out how to use MPSCNNConvolution. I have a 4 x 3 image, and a 4 x 3 kernel. I'm setting all the weights to 1, and all the inputs to 1, and I sort of expected to get all 1's back. What I get instead is
12 9 6 3
8 6 4 2
4 3 2 1
The problem is that I don't know whether it's supposed to behave like this or not. I've been all over every shred of Apple doc I can find, every online article, every github repo, and I can't find anything that says what kind of output to expect when the layer is set up correctly.
The pattern holds for differently sized images. A 3 x 2 gives me
6 4 2
3 2 1
And a 2 x 2 gives me
4 2
2 1
I've pushed my "minimal" example to github. It's not small. Xcode 12.4 no longer supports Float16, so there's utility code for floating between Float16 and Float32, plus all the convoluted setup for convolution, and yet more code for trying to un-headache un-safe pointers.
My specific questions: is this output "just the normal behavior" for MPSCNNConvolution? Is there a name for this function/algorithm, something I can look up?

The documentation for MPSCNNConvolution is slightly confusing. To the uninitiated, it might seem that MPSCNNConvolution is a kind of container that holds convolution kernels. This is not the case. MPSCNNConvolution is itself a kernel. Specifically, it weights and sums all the input values under the kernel window. Just a straight sum, no averaging or maxing. What you're seeing is the result of the kernel starting at (0, 0) and sliding way off the right edge, and eventually way off the bottom edge.
Set your kernel offset and your clip rectangle on the input image, and MPSCNNConvolution will work the same way as MPSCNNPooling* kernels and all the others

Related

Translating a kd-tree in MATLAB

I'm using a kd-tree to perform quick nearest neighbor search queries. I'm using the following piece of code to generate the kd-tree and perform queries on it:
% 3 dimensional vertex data
x = [1 2 2 1 2 5 6 3 4;
3 2 3 2 2 7 6 5 2;
1 2 9 9 7 5 8 9 3]';
% create the kd-tree
kdtree = createns(x, 'NSMethod', 'kdtree');
% perform a nearest neighbor search
nearestNeighborIndex = knnsearch(kdtree, [1 1 1]);
This works well enough for when the data is static. However, every once in a while, I need to translate every vertex on the kd-tree. I know that changing the whole data means I need to re-generate the whole tree to perform a nearest neighbor search again. Having a couple of thousand vertices for each kd-tree, re-generating the whole tree from scratch seems to me like an overkill as it takes a significant amount of time. Is there a way to translate the kd-tree without re-generating it from scratch? I tried accessing and changing the X property (which holds the actual vertex data) of the kd-tree but it seems to be read-only, and it probably wouldn't have worked even if it wasn't since there is a lot more going on behind the curtains.

Calculating the Local Ternary Pattern of an depth image

I found the detail and implementation of Local Ternary Pattern (LTP) on Calculating the Local Ternary Pattern of an image?. I want to ask more details that what the best way to choose the threshold t and also I have confusion in understand the role of reorder_vector = [8 7 4 1 2 3 6 9];
Unfortunately there isn't a good way to figure out what the threshold is using LTPs. It's mostly trial and error or by experimentation. However, I could suggest to make the threshold adaptive. You can use Otsu's algorithm to dynamically determine the best threshold of your image. This is assuming that the distribution of your intensities in the image is bimodal. In other words, there is a clear separation between objects and background. MATLAB has an implementation of this by the graythresh function. However, this generates a threshold between 0 and 1, so you will need to multiply the result by 255, assuming that the type of your image is uint8.
Therefore, do:
t = 255*graythresh(im);
im is the image that you desire to compute the LTPs. Now, I can certainly provide insight on what the reorder_vector is doing. Look at the following figure on how to calculate LTPs:
(source: hindawi.com)
When we generate the ternary code matrix (matrix in the middle), we need to generate an 8 element sequence that doesn't include the middle of the neighbourhood. We start from the east most element (row 2, column 3), then traverse the elements in counter-clockwise order. The reorder_vector variable allows you to select those specific elements that respect that order. If you recall, MATLAB can access matrices using column-major linear indices. Specifically, given a 3 x 3 matrix, we can access an element using a number from 1 to 9 and the memory is laid out like so:
1 4 7
2 5 8
3 6 9
Therefore, the first element of reorder_vector is index 8, which is the east most element. Next is index 7, which is the top right element, then index 4 which is the north facing element, then 1, 2, 3, 6 and finally 9.
If you follow these numbers, you will determine how I got the reorder_vector:
reorder_vector = [8 7 4 1 2 3 6 9];
By using this variable for accessing each 3 x 3 local neighbourhood, we would thus generate the correct 8 element sequence that respects the ordering of the ternary code so that we can proceed with the next stage of the algorithm.

Similarity measure between two images

I am currently implementing image segmentation on MATLAB . I have two implementations.
The image is segmented into two regions - foreground and background.
The image is segmented into more than two regions - suppose 3 segmented regions or 4.
I am trying to compute the similarity measure between the segmented image and the ground truth (manual segmented images) by using the dice coefficient or the Jaccard Index. This works well for the segmented images that have been divided into two regions. This is implemented by the following code.
dice = 2*nnz(segIm&grndTruth)/(nnz(segIm) + nnz(grndTruth))
Click Here
This expects segIm and grndTruth to be of the same size. They must also be numeric or logical.
However I have not been able to find a way to apply this measure for comparison of similarity for multiple - region segmented images. Can anyone tell me how I can use the dice coefficient in my application ?
EDIT: With respect to nkjt's suggestions I have done a basic implementation and am giving the results below. Please feel free to upgrade the code anyone for better accuracy.
I am considering two images in the form of two matrices. A is the segmented image and B is the manual ground truth. The matlab code for the above suggested implementation is given below. Please check and do give your thoughts.
A=[1 2 3 4;1 2 3 4;1 2 3 4;1 2 3 4]
B=[1 3 4 4;1 1 3 4;1 2 3 4;1 2 3 1]
%//First Suggestion
dice = 2*nnz(A==B)/(nnz(A) + nnz(B))
%//2nd Suggestion
A1=(A==1);B1=(B==1);
A2=(A==2);B2=(B==2);
A3=(A==3);B3=(B==3);
A4=(A==4);B4=(B==4);
dice = (2*nnz(A1&B1)/(nnz(A1) + nnz(B1))...
+2*nnz(A2&B2)/(nnz(A2) + nnz(B2))...
+2*nnz(A3&B3)/(nnz(A3) + nnz(B3))...
+2*nnz(A4&B4)/(nnz(A4) + nnz(B4)))/4
Please Note : I am also interested to know if Hausdorff Distance Measure can be applied in this case for both the 3 phase and 4 phase segmented images??
EDIT: I do have a new query . If suppose the image has 4 regions and it has been correctly segmented in this manner as shown in the example below: If now different intensity values are used to denote the different regions , then using Dice coefficient the two segmented results will give different results. For Segmented Reg 1, I have dice = 1 ** and for **Segmented Region 2, I have dice = 0.75. But both the results are accurate. How can I modify my code such that the segmented results will reflect the answer of the dice coefficients ?
The work of ArbelĀ“aez et al. describes several methods to compare results of image segmentation algorithms. See section 3.1 and its sub-sections.
I believe some Matlab code can be found in thier project's webpage.
The Berkeley segmentation dataset (bsds500) is a well established benchmark in the image segmentaiton community.
You might want to look into measures designed for segmentation, such as Normalized Probabilistic Rand.
However, I can see two possible ways of doing something quick with your existing code.
1) Instead of using logical images and &, use:
dice = 2*nnz(segIm==grndTruth)/(nnz(segIm) + nnz(grndTruth));
Both segIm and grndTruth here should be numerical (ideally integer with foreground regions having values of 1,2,3... etc).
2) Produce a set of binary images out of both segIm & grndTruth for each foreground area, and define a dice coefficient for each.

Why does crossvalind fail?

I am using cross valind function on a very small data... However I observe that it gives me incorrect results for the same. Is this supposed to happen ?
I have Matlab R2012a and here is my output
crossvalind('KFold',1:1:11,5)
ans =
2
5
1
3
2
1
5
3
5
1
5
Notice the absence of set 4.. Is this a bug ? I expected atleast 2 elements per set but it gives me 0 in one... and it happens a lot that is the values are not uniformly distributed in the sets.
The help for crossvalind says that the form you are using is: crossvalind(METHOD, GROUP, ...). In this case, GROUP is the e.g. the class labels of your data. So 1:11 as the second argument is confusing here, because it suggests no two examples have the same label. I think this is sufficiently unusual that you shouldn't be surprised if the function does something strange.
I tried doing:
numel(unique(crossvalind('KFold', rand(11, 1) > 0.5, 5)))
and it reliably gave 5 as a result, which is what I would expect; my example would correspond to a two-class problem (I would guess that, as a general rule, you'd want something like numel(unique(group)) <= numel(group) / folds) - my hypothesis would be that it tries to have one example of each class in the Kth fold, and at least 2 examples in every other, with a difference between fold sizes of no more than 1 - but I haven't looked in the code to verify this.
It is possible that you mean to do:
crossvalind('KFold', 11, 5);
which would compute 5 folds for 11 data points - this doesn't attempt to do anything clever with labels, so you would be sure that there will be K folds.
However, in your problem, if you really have very few data points, then it is probably better to do leave-one-out cross validation, which you could do with:
crossvalind('LeaveMOut', 11, 1);
although a better method would be:
for leave_out=1:11
fold_number = (1:11) ~= leave_out;
<code here; where fold_number is 0, this is the leave-one-out example. fold_number = 1 means that the example is in the main fold.>
end

Adapting the mode function to favor central values (Matlab)

The mode-function in Matlab returns the value that occurs most frequently in a dataset. But "when there are multiple values occurring equally frequently, mode returns the smallest of those values."
This is not very useful for what i am using it for, i would rather have it return a median, or arithmetic mean in the absence of a modal value (as they are at least somewhat in the middle of the distibution). Otherwise the results of using mode are far too much on the low side of the scale (i have a lot of unique values in my distribution).
Is there an elegant way to make mode favor more central values in a dataset (in the absence of a true modal value)?
btw.: i know i could use [M,F] = mode(X, ...), to manually check for the most frequent value (and calculate a median or mean when necessary). But that seems like a bit of an awkward solution, since i would be almost entirely rewriting everything that mode is supposed to be doing. I'm hoping that there's a more elegant solution.
Looks like you want the third output argument from mode. EG:
x = [1 1 1 2 2 2 3 3 3 4 4 4 5 6 7 8];
[m,f,c] = mode(x);
valueYouWant = median(c{1});
Or (since median takes the average of values when there are an even number of entries), in the cases where an even number of values may have the same max number of occurrences, maybe do something like this:
valueYouWant = c{1}(ceil(length(c{1})/2))