How come dice coefficient comes in bigger than 1 - matlab

I want to evaluate my automatic image segmentation results. I use Dice coefficients by a function written in Matlab. Following is the link for the code.
mathworklink
I am comparing the segmented patch and manually cropped patch; interestingly DICE comes in bigger than one. I dispatched the code many times such as taking the absolute value of patches(to get rid of negative pixels) but could not find the reason. How come, while sum each individual set is (say 3 and 5), and their union comes (say 45)? The maximum of union must be 8.
Can somebody guide me more precise sources to implement dice coefficients?
function[Dice]=evaldem(man,auto)
for i=1:size(auto,1)
for j=1:size(auto,2)
auto(i,j).autosegmentedpatch=imresize(auto(i,j).autosegmentedpatch,[224 224]);)
man(i,j).mansegmentedpatch=imresize(man(i,j).mansegmentedpatch,[224 224]);
Dice(i,j)=sevaluate(man(i,j).mansegmentedpatch,auto(i,j).autosegmentedpatch)
end
Since I have many automatically segmented patches, and manually segmented patches, I stored them in structures[man and auto]. Structures` size is [i,j]. Definitely I have to imresize to have them be in equal size! Then, I call the FEX submission file. When it comes to negative pixels of these patches, they have some. Note that, I take the absolute value of these patches when I am computing 'common' and 'union' for Dice. All in all, I get still Dice values bigger than one.

Related

Boxplot is broken, only showing one line

so my data centres around different treatments and how they impact the day of germination. image of dodgy boxplot data
A while ago whilst making violin plots in R to show the distribution of when germination occurs according to treatment, I attempted to add a boxplot as a descriptive statistic and was met with only one line.
I contacted many people who simply had no idea what the issue was, I used this same data in another violin plot as part of a bigger data collection with more treatments including this one.
I moved on from this and found it odd, now when I have come to perform stats tests in SPSS, I have the same problem as imaged below. When I try a Mann Whitney U test I am told "cannot compute" due to not having solely two variables, when I try a Kruskal Wallis test I am met with the dodgy boxplot below and I am told pairwise comparisons cannot be done due to less than 3 test fields (i.e. 2).
I am at an absolute loss, I have tried rewriting the data out, copying data labels with 'stratified' 'strat' 's' etc and I have no idea where the problem could lie, if anyone could give me any guidance this would be really appreciated!
Thank you
The dependent variable in question appears to have only values 1, 2, and 3 in the Stratified group. If there is at least one case with a value of 1, at least one case with a value of 3, but most values at 2, then a box plot like you're seeing would be expected. In SPSS, run the EXAMINE procedure (Analyze>Descriptive Statistics>Explore in the menus), specifying the same dependent variable and grouping variable, and asking for percentiles. The box plots should match what you're getting, and in the percentiles table you should see that Tukey's hinges show the same value of 2 for the 25th, 50th, and 75th percentiles.
Tukey's hinges are the basis for the box and the line in box plots. The line is at the median or 50th percentile, and the upper and lower box edges are at the 25th and 75h percentiles, respectively. When all three coincide, you get just a line instead of a box.
There are two types of outlying values identified in box plots in SPSS. Points greater than 1.5 box lengths below or above the box edges are outliers, marked with circles, and points greater than 3 box lengths below or above the box edges are extremes, marked with asterisks. Since the box length here is 0, anything at other values is automatically an extreme.
Pairwise comparisons following a Kruskal-Wallis test are available only when there are at least three groups, since with only two groups the overall or omnibus test has already compared the two groups. I'm not sure what the issue was when trying to run a Mann-Whitney test.

Efficient Matlab way of reducing 2D segments?

I need to write a function to simplify a set of segments. In particular:
Given the set of 2D segments (x and y coordinates)
Keep only one replica in case of overlapping (or almost overlapping) segments
For each kept part keep count of how many overlaps were present on it.
So the input would be a big set of segments (with many overlaps)
And the output would be non overlapping segments with counts for each one of them.
An example input is shown in this matlab plot
As you can see some lines look thicker cause there are multiple overlaps.
As a result I would need just the skeleton with no overlaps, but I would need the information of the overlaps as a number for each output segment.
I am not an expert working with segments and geometrical problems.
What would be the most efficient way of solving such problem? I am using Matlab, but a code example in any high level language would help. Thanks!
EDIT
As requested here is also a sample dataset, you can get it from this google drive link
https://drive.google.com/open?id=1r2hkG7gI0qhzfP-Mmn8HzIil1o47Z2nA
The input is a csv with 276 cable segments
The output is a csv with 58 cable segments (the reduced segments) + an extra column containing the number of overlapping cables for each segment kept.
The input could contain many more segments. The idea is that the reduction should eliminate cables that are parallel and overlapping with each other, with a certain tolerance.
For example if 2 cables are parallel but far away they should be kept both.
I don't care about the tolerances it's fine if the output is different, I just need an idea on how to solve this problem efficiently cause the code should run fast even with many many segments as input.
Matlab is probably not the most suited language for geometrical manipulation. PostgreSQL/PostGIS would be a better tool, but if you don't have choice here is one solution to get the skeleton of a line:
% Random points
P = [0 0;
1 1;
2 1;
1.01 1.02;
0.01 0.01];
% positive buffer followed by a negative buffer
pol = polybuffer(polybuffer(P,'lines',0.25,'JointType','miter'),-0.249,'JointType','miter');
plot(P(:,1),P(:,2),'r.','MarkerSize',10)
hold on
plot(pol)
axis equal
% Drop the duplicate with uniquetol (with a tolerance of 0.005) to get the centerline
utol = uniquetol(pol.Vertices,0.005,'ByRows',true)
hold off;
plot(utol(:,1),utol(:,2),'b')
Result:
And the center line:

Normalized histogram in MATLAB incorrect?

I have the following set of data:
X=[4.692
6.328
4.677
6.836
5.032
5.269
5.732
5.083
4.772
4.659
4.564
5.627
4.959
4.631
6.407
4.747
4.920
4.771
5.308
5.200
5.242
4.738
4.758
4.725
4.808
4.618
4.638
7.829
7.702
4.659]; % Sample set
I fitted a Pareto distribution to this using the maximum likelihood method and I obtain the following graph:
Where the following bit of code is what draws the histogram:
[N,edges,bin] = histcounts(X,'BinMethod','auto');
bin_middles=mean([edges(1:end-1);edges(2:end)]);
f_X_sample=N/trapz(bin_middles,N);
bar(bin_middles,f_X_sample,1);;
Am I doing this right? I checked 100 times and the Pareto distribution is indeed optimal, but it seems awfully different from the histogram. Is there an error that may be causing this? Thank you!
I would agree with #tashuhka's comment that you need to think about how you're binning your data.
Imagine the extreme case where you lump everything together into one bin, and then try to fit that single point to a distribution. Your PDF would look nothing like your single square bar. Split into two bins, and now the fit still sucks, but at least one bar is (probably) a little bigger than the other, etc., etc. At the other extreme, every data point has its own bar and the bar graph is nothing but a random forest of bars with only one count.
There are a number of different strategies for choosing an "optimal" bin size that minimizes the number of bins but maximizes the representation of the underlying PDF.
Finally, note that you only have 30 points here, so your other problem may be that you just haven't collected enough data to really nail down the underlying PDF.

MATLAB: Using CONVN for moving average on Matrix

I'm looking for a bit of guidance on using CONVN to calculate moving averages in one dimension on a 3d matrix. I'm getting a little caught up on the flipping of the kernel under the hood and am hoping someone might be able to clarify the behaviour for me.
A similar post that still has me a bit confused is here:
CONVN example about flipping
The Problem:
I have daily river and weather flow data for a watershed at different source locations.
So the matrix is as so,
dim 1 (the rows) represent each site
dim 2 (the columns) represent the date
dim 3 (the pages) represent the different type of measurement (river height, flow, rainfall, etc.)
The goal is to try and use CONVN to take a 21 day moving average at each site, for each observation point for each variable.
As I understand it, I should just be able to use a a kernel such as:
ker = ones(1,21) ./ 21.;
mat = randn(150,365*10,4);
avgmat = convn(mat,ker,'valid');
I tried playing around and created another kernel which should also work (I think) and set ker2 as:
ker2 = [zeros(1,21); ker; zeros(1,21)];
avgmat2 = convn(mat,ker2,'valid');
The question:
The results don't quite match and I'm wondering if I have the dimensions incorrect here for the kernel. Any guidance is greatly appreciated.
Judging from the context of your question, you have a 3D matrix and you want to find the moving average of each row independently over all 3D slices. The code above should work (the first case). However, the valid flag returns a matrix whose size is valid in terms of the boundaries of the convolution. Take a look at the first point of the post that you linked to for more details.
Specifically, the first 21 entries for each row will be missing due to the valid flag. It's only when you get to the 22nd entry of each row does the convolution kernel become completely contained inside a row of the matrix and it's from that point where you get valid results (no pun intended). If you'd like to see these entries at the boundaries, then you'll need to use the 'same' flag if you want to maintain the same size matrix as the input or the 'full' flag (which is default) which gives you the size of the output starting from the most extreme outer edges, but bear in mind that the moving average will be done with a bunch of zeroes and so the first 21 entries wouldn't be what you expect anyway.
However, if I'm interpreting what you are asking, then the valid flag is what you want, but bear in mind that you will have 21 entries missing to accommodate for the edge cases. All in all, your code should work, but be careful on how you interpret the results.
BTW, you have a symmetric kernel, and so flipping should have no effect on the convolution output. What you have specified is a standard moving averaging kernel, and so convolution should work in finding the moving average as you expect.
Good luck!

Mixing sound files of different size

I want to mix audio files of different size into a one single .wav file without clipping any file.,i.e. The resulting file size should be equal to the largest sized file of all.
There is a sample through which we can mix files of same size
[(http://www.modejong.com/iOS/#ex4 )(Example 4)].
I modified the code to get the mixed file as a .wav file.
But I am not able to understand that how to modify this code for unequal sized files.
If someone can help me out with some code snippet,i'll be really thankful.
It should be as easy as sending all the files to the mixer simultaneously. When any single file gets to the end, just treat it as if the remainder is filled with zeroes. When all files get to the end, you are done.
Note that the example code says it returns an error if there would be clipping (the sum of the waves is greater than the max representable value.). This condition is more likely if you are mixing multiple inputs. The best way around it is to create some "headroom" in the input waves. You can do either do this in preprocessing, by ensuring that each wave's volume is no more than X% of maximum. (~80-90%, depending on number of inputs.). The other way is to do it dynamically in the mixer code by multiplying each sample by some value <1.0 as you add it to the mix.
If you are selecting the waves to mix at runtime and failure due to clipping is unacceptable, you will need to modify the sample code to pin the values at max/min instead of returning an error. Don't just let them overflow or you will get noisy artifacts.
(Clipping creates artifacts as well, but when you haven't created enough headroom before mixing, it is definitely preferrable to overflow. It is a more familiar-sounding type of distortion, similar to what you get when you overdrive your speakers. See this wikipedia article on clipping:
Clipping is preferable to the alternative in digital systems—wrapping—which occurs if the digital hardware is allowed to "overflow", ignoring the most significant bits of the magnitude, and sometimes even the sign of the sample value, resulting in gross distortion of the signal.
How I'd do it:
Much like the mix_buffers function that you linked to, but pass in 2 parameters for mixbufferNumSamples. Iterate over the whole of the longer of the two buffers. When the index has gone beyond the end of the shorter buffer, simply set the sample from that buffer to 0 for the rest of the function.
If you must avoid clipping and do it in real-time and you know nothing else about the two sounds, you must provide enough headroom. The simplest method is by halving each of the samples before mixing:
mixed = s1/2 + s2/2;
This ensures that the resultant mixed sample won't overflow an int16_t. It will have the side effect of making everything quieter though.
If you can run it offline, you can calculate a scale factor to apply to both waveforms which will keep the peaks when summed below the maximum allowed value.
Or you could mix them all at full volume to an int32_t buffer, keeping track of the largest (magnitude) mixed sample and then go back through the buffer multiplying each sample by a scale factor which will make that extreme sample just reach the +32767/-32768 limits.